Files
aiot-platform-cloud/docs/ops-architecture/part6-性能与可靠性.md

302 lines
7.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Part 6: 性能与可靠性
本文档详细说明 Ops 模块的性能优化策略和可靠性保障措施。
---
## 6.1 性能优化
### 6.1.1 Redis 缓存策略
**热点数据缓存**
```java
@Service
public class CleanerStatusService {
@Cacheable(value = "cleaner:status", key = "#userId", unless = "#result == null")
public CleanerStatusDO getStatus(Long userId) {
return cleanerStatusMapper.selectByUserId(userId);
}
@CacheEvict(value = "cleaner:status", key = "#userId")
public void updateStatus(Long userId, CleanerStatusEnum status) {
cleanerStatusMapper.updateStatus(userId, status);
}
}
```
**缓存更新策略**
- **Cache Aside**: 先更新数据库,再删除缓存
- **过期时间**: 5-30 分钟,避免数据过期
- **空值缓存**: 防止缓存穿透
### 6.1.2 数据库优化
**索引设计**
```sql
-- 工单表索引
CREATE INDEX idx_status ON ops_order(status);
CREATE INDEX idx_assignee_id ON ops_order(assignee_id);
CREATE INDEX idx_priority_status ON ops_order(priority, status);
CREATE INDEX idx_create_time ON ops_order(create_time);
-- 队列表索引
CREATE UNIQUE INDEX uk_user_order ON ops_order_queue(user_id, ops_order_id);
CREATE INDEX idx_status ON ops_order_queue(status);
CREATE INDEX idx_priority ON ops_order_queue(priority);
```
**查询优化**
```java
// ❌ N+1 查询
List<OpsOrderDO> orders = orderMapper.selectList(wrapper);
for (OpsOrderDO order : orders) {
UserDO user = userMapper.selectById(order.getAssigneeId()); // N次查询
}
// ✅ 批量查询
List<OpsOrderDO> orders = orderMapper.selectList(wrapper);
Set<Long> userIds = orders.stream().map(OpsOrderDO::getAssigneeId).collect(Collectors.toSet());
List<UserDO> users = userMapper.selectBatchIds(userIds); // 1次查询
Map<Long, UserDO> userMap = users.stream().collect(Collectors.toMap(UserDO::getId, u -> u));
```
### 6.1.3 异步处理
**事件异步发布**
```java
@Service
public class OrderEventPublisherImpl {
@Async("eventExecutor")
public void publishStateChanged(OrderStateChangedEvent event) {
applicationEventPublisher.publishEvent(event);
}
}
@Configuration
public class AsyncConfig {
@Bean("eventExecutor")
public Executor eventExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(5);
executor.setMaxPoolSize(10);
executor.setQueueCapacity(100);
executor.setThreadNamePrefix("event-");
executor.initialize();
return executor;
}
}
```
### 6.1.4 批量操作优化
```java
// 批量入队
public void batchEnqueue(List<OrderQueueDTO> queueList) {
// MySQL 批量插入
orderQueueMapper.insertBatch(queueList);
// Redis 批量写入
CompletableFuture.runAsync(() -> {
redisQueueService.batchEnqueue(queueList);
});
}
```
---
## 6.2 并发控制
### 6.2.1 分布式锁
**Redis 分布式锁**
```java
@Service
public class OrderQueueService {
public void enqueue(Long orderId, Long userId) {
String lockKey = "queue:lock:" + userId;
// 获取分布式锁
Boolean lock = redisTemplate.opsForValue()
.setIfAbsent(lockKey, "1", 10, TimeUnit.SECONDS);
if (Boolean.TRUE.equals(lock)) {
try {
// 执行入队操作
doEnqueue(orderId, userId);
} finally {
// 释放锁
redisTemplate.delete(lockKey);
}
} else {
throw new ServiceException("系统繁忙,请稍后重试");
}
}
}
```
### 6.2.2 乐观锁
```java
@TableName("ops_order")
public class OpsOrderDO {
private Long id;
private String status;
@Version // 乐观锁字段
private Integer version;
}
// 更新时自动检查版本号
orderMapper.updateById(order); // 如果 version 不匹配,更新失败
```
---
## 6.3 可靠性保障
### 6.3.1 Redis + MySQL 双写策略
**写入流程**
```java
@Transactional(rollbackFor = Exception.class)
public Long enqueue(OrderQueueDTO dto) {
// 1. 先写 MySQL保证数据不丢失
Long queueId = orderQueueMapper.insert(dto);
// 2. 异步写 Redis失败不影响主流程
CompletableFuture.runAsync(() -> {
try {
redisQueueService.enqueue(dto);
} catch (Exception e) {
log.error("Redis 队列写入失败,依赖定时同步任务补偿", e);
}
});
return queueId;
}
```
### 6.3.2 定时同步任务
```java
@Scheduled(cron = "0 */5 * * * ?") // 每5分钟同步一次
public void syncMySQLToRedis() {
// 查询最近变更的数据
List<OrderQueueDTO> changedTasks =
orderQueueMapper.selectChangedAfter(DateUtils.addHours(-1));
// 同步到 Redis
redisQueueService.batchEnqueue(changedTasks);
log.info("定时同步完成,同步{}条记录", changedTasks.size());
}
```
### 6.3.3 故障恢复机制
**Redis 宕机降级**
```java
public List<OrderQueueDTO> getTasksByUserId(Long userId) {
try {
// 1. 优先从 Redis 获取
List<OrderQueueDTO> redisTasks = redisQueueService.getTasksByUserId(userId);
if (redisTasks != null && !redisTasks.isEmpty()) {
return redisTasks;
}
} catch (Exception e) {
log.error("Redis 查询失败,降级到 MySQL", e);
}
// 2. Redis 失败,从 MySQL 获取
return orderQueueMapper.selectListByUserId(userId);
}
```
---
## 6.4 监控与告警
### 6.4.1 关键指标监控
**系统指标**
- QPS每秒请求数
- 响应时间P50、P95、P99
- 错误率
- 线程池使用率
**业务指标**
- 工单创建速率
- 派单成功率
- 平均响应时长
- 队列积压数量
### 6.4.2 告警规则
| 告警项 | 触发条件 | 级别 | 处理方式 |
|--------|---------|------|----------|
| P0 工单超时 | 超时未接单 > 3 分钟 | P0 | 钉钉 + 短信 |
| 队列积压 | 某区域积压 > 10 个 | P1 | 钉钉通知 |
| 派单失败率 | 失败率 > 5% | P1 | 钉钉通知 |
| API 错误率 | 错误率 > 1% | P2 | 邮件通知 |
---
## 6.5 容灾设计
### 6.5.1 服务降级
```java
@Service
public class DispatchEngineService {
@Resource
private DispatchEngine dispatchEngine;
public DispatchResult dispatch(DispatchContext context) {
try {
// 正常派单流程
return dispatchEngine.dispatch(context);
} catch (Exception e) {
log.error("自动派单失败,降级为手动派单", e);
// 降级方案:创建待分配工单
return createPendingOrder(context);
}
}
}
```
### 6.5.2 数据备份
**MySQL 备份**
```bash
# 每日备份
mysqldump -h localhost -u root -p aiot_ops > backup_$(date +%Y%m%d).sql
```
**Redis 持久化**
```bash
# redis.conf
save 900 1 # 900秒内至少1个key变化执行BGSAVE
save 300 10 # 300秒内至少10个key变化
save 60 10000 # 60秒内至少10000个key变化
appendonly yes # 开启 AOF
appendfsync everysec # 每秒同步
```
---
**下一章**[Part 7: 扩展性设计](./part7-扩展性设计.md)