修复:P0+P1 生产稳定性和性能优化(6项)

P0 稳定性修复:
- 告警去重字典添加惰性清理机制,防止长时间运行内存溢出
- Redis 连接断开时显式 close() 后再置 None,防止文件描述符泄漏
- 截图消息 ACK 移至成功路径,失败消息留在 pending list 自动重试

P1 性能优化:
- GPU NMS 添加 torch.no_grad() + 显式释放临时张量,减少显存碎片
- 截图存储改为 Redis 原始 bytes,去掉 Base64 编解码开销(兼容旧格式)
- ROI 配置查询 N+1 改为 get_all_bindings() 单次 JOIN 查询

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-07 14:05:57 +08:00
parent a9a5457583
commit 5a0265de52
8 changed files with 593 additions and 41 deletions

View File

@@ -78,22 +78,24 @@ class NMSProcessor:
max_output_size: int
) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
"""GPU 加速 NMS"""
boxes_t = torch.from_numpy(boxes).cuda()
scores_t = torch.from_numpy(scores).cuda()
keep = torch_nms(boxes_t, scores_t, iou_threshold=self.nms_threshold)
keep_np = keep.cpu().numpy()
if len(keep_np) > max_output_size:
top_k = np.argsort(scores[keep_np])[::-1][:max_output_size]
keep_np = keep_np[top_k]
return (
keep_np.astype(np.int32),
scores[keep_np],
class_ids[keep_np] if class_ids is not None else np.array([])
)
with torch.no_grad():
boxes_t = torch.from_numpy(boxes).cuda()
scores_t = torch.from_numpy(scores).cuda()
keep = torch_nms(boxes_t, scores_t, iou_threshold=self.nms_threshold)
keep_np = keep.cpu().numpy()
del boxes_t, scores_t, keep
if len(keep_np) > max_output_size:
top_k = np.argsort(scores[keep_np])[::-1][:max_output_size]
keep_np = keep_np[top_k]
return (
keep_np.astype(np.int32),
scores[keep_np],
class_ids[keep_np] if class_ids is not None else np.array([])
)
def _process_cpu(
self,