- 新增 app/self_report.py:后端运行时异常直接写入自身数据库 - main.py:添加全局异常处理器 + 启动时注册 log_center_api/web 项目 - web/api.ts:添加 reportError 函数 + Axios 5xx 拦截器 - web/main.tsx:添加 window.onerror / onunhandledrejection 全局捕获 - deploy.yaml:CI/CD 流水线各步骤失败时上报(build/deploy) - 重写 integration_guide.md:按三类上报(runtime/cicd/deployment)重新组织 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
23 KiB
Log Center 接入指南
概述
Log Center 是一个集中式错误日志收集与 AI 自动修复平台,提供 REST API 供各项目接入。
接入后覆盖三类错误上报:
| 类型 | source 值 |
说明 | 触发方式 |
|---|---|---|---|
| 日常运行错误 | runtime |
应用运行时的异常(Python/JS/Dart) | 代码中全局捕获异常自动上报 |
| CI/CD 错误 | cicd |
构建、测试、Lint 等流水线失败 | Gitea Actions 步骤失败时上报 |
| K8s 部署错误 | deployment |
Pod 异常状态(CrashLoopBackOff、OOMKilled 等) | K8s CronJob 定时扫描上报 |
完整接入流程:
- 注册项目信息 — 调用 API 提交项目元信息(名称、仓库地址、本地路径)
- 接入日常运行错误上报 — 在应用代码中集成全局异常捕获
- 接入 CI/CD 错误上报 — 在 Gitea Actions 流水线中添加失败上报步骤
- 接入 K8s 部署错误上报 — 在 K8s Pod 健康监控中添加项目映射
重要: 必须先完成步骤 1,否则 Repair Agent 无法定位代码仓库和本地路径。
服务地址
| 环境 | API 地址 | 仪表盘 |
|---|---|---|
| 本地开发 | http://localhost:8002 |
http://localhost:8003 |
| 生产环境 | https://qiyuan-log-center-api.airlabs.art |
https://qiyuan-log-center.airlabs.art |
步骤 1:注册项目信息
首次接入 Log Center 时,必须先注册项目信息。这是 Repair Agent 正常工作的前提。
注册方式
先上报一条初始化日志(触发项目自动创建),再调用 PUT 接口补全元信息:
# 1. 上报初始化日志,触发项目自动创建
curl -X POST "${LOG_CENTER_URL}/api/v1/logs/report" \
-H "Content-Type: application/json" \
-d '{
"project_id": "your_project_id",
"environment": "production",
"level": "WARNING",
"error": {
"type": "ProjectInit",
"message": "Project registered to Log Center",
"stack_trace": ["Project initialization"]
},
"repo_url": "https://gitea.airlabs.art/team/your_project.git"
}'
# 2. 补全项目元信息
curl -X PUT "${LOG_CENTER_URL}/api/v1/projects/your_project_id" \
-H "Content-Type: application/json" \
-d '{
"name": "项目显示名称",
"repo_url": "https://gitea.airlabs.art/team/your_project.git",
"local_path": "/absolute/path/to/project",
"description": "项目描述"
}'
各语言注册示例
Python
import requests
import os
LOG_CENTER_URL = os.getenv("LOG_CENTER_URL", "http://localhost:8002")
def register_project():
"""首次接入时调用,注册项目到 Log Center。"""
project_id = "your_project_id"
# 1. 上报初始化日志触发项目创建
requests.post(f"{LOG_CENTER_URL}/api/v1/logs/report", json={
"project_id": project_id,
"environment": os.getenv("ENVIRONMENT", "production"),
"level": "WARNING",
"error": {
"type": "ProjectInit",
"message": "Project registered to Log Center",
"stack_trace": ["Project initialization"],
},
"repo_url": "https://gitea.airlabs.art/team/your_project.git",
}, timeout=5)
# 2. 补全项目元信息
requests.put(f"{LOG_CENTER_URL}/api/v1/projects/{project_id}", json={
"name": "项目显示名称",
"repo_url": "https://gitea.airlabs.art/team/your_project.git",
"local_path": "/absolute/path/to/project",
"description": "项目描述",
}, timeout=5)
JavaScript / TypeScript
const LOG_CENTER_URL = import.meta.env.VITE_LOG_CENTER_URL || 'http://localhost:8002';
async function registerProject() {
const projectId = 'your_project_id';
// 1. 上报初始化日志触发项目创建
await fetch(`${LOG_CENTER_URL}/api/v1/logs/report`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
project_id: projectId,
environment: import.meta.env.MODE,
level: 'WARNING',
error: {
type: 'ProjectInit',
message: 'Project registered to Log Center',
stack_trace: ['Project initialization'],
},
repo_url: 'https://gitea.airlabs.art/team/your_project.git',
}),
});
// 2. 补全项目元信息
await fetch(`${LOG_CENTER_URL}/api/v1/projects/${projectId}`, {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
name: '项目显示名称',
repo_url: 'https://gitea.airlabs.art/team/your_project.git',
local_path: '/absolute/path/to/project',
description: '项目描述',
}),
});
}
项目元信息字段
| 字段 | 类型 | 必填 | 说明 |
|---|---|---|---|
project_id |
string | ✅ | 项目唯一标识,如 rtc_backend, rtc_web |
name |
string | ✅ | 项目显示名称 |
repo_url |
string | ✅ | Git 仓库地址(Repair Agent 克隆/推送代码用) |
local_path |
string | ✅ | 本地项目绝对路径(Repair Agent 在此目录执行修复) |
description |
string | ❌ | 项目描述 |
步骤 2:接入日常运行错误上报
source: "runtime"(默认值,可不传)
在应用代码中集成全局异常捕获,运行时发生未处理异常时自动上报到 Log Center。
上报格式
{
"project_id": "rtc_backend",
"environment": "production",
"level": "ERROR",
"error": {
"type": "ValueError",
"message": "invalid literal for int() with base 10: 'abc'",
"file_path": "apps/users/views.py",
"line_number": 42,
"stack_trace": [
"Traceback (most recent call last):",
" File \"apps/users/views.py\", line 42, in get_user",
"ValueError: invalid literal for int() with base 10: 'abc'"
]
},
"context": {
"url": "/api/users/123",
"method": "GET",
"user_id": "u_12345"
}
}
Runtime 字段说明
| 字段 | 类型 | 必填 | 说明 |
|---|---|---|---|
project_id |
string | ✅ | 项目标识 |
environment |
string | ✅ | 环境:development, staging, production |
level |
string | ✅ | 日志级别:ERROR, WARNING, CRITICAL |
source |
string | ❌ | 默认 runtime,无需传 |
timestamp |
string | ❌ | ISO 8601 格式,不传则用服务器时间 |
version |
string | ❌ | 应用版本号 |
commit_hash |
string | ❌ | Git commit hash |
error.type |
string | ✅ | 异常类型,如 ValueError, TypeError |
error.message |
string | ✅ | 错误消息 |
error.file_path |
string | ✅ | 出错文件路径 |
error.line_number |
int | ✅ | 出错行号 |
error.stack_trace |
array | ✅ | 堆栈信息(数组或字符串) |
context |
object | ❌ | 额外上下文信息 |
Python (Django / FastAPI)
import requests
import traceback
import os
LOG_CENTER_URL = os.getenv("LOG_CENTER_URL", "http://localhost:8002")
def report_error(exc, context=None):
"""上报运行时错误到 Log Center"""
tb = traceback.extract_tb(exc.__traceback__)
last_frame = tb[-1] if tb else None
payload = {
"project_id": "rtc_backend",
"environment": os.getenv("ENVIRONMENT", "development"),
"level": "ERROR",
"error": {
"type": type(exc).__name__,
"message": str(exc),
"file_path": last_frame.filename if last_frame else "unknown",
"line_number": last_frame.lineno if last_frame else 0,
"stack_trace": traceback.format_exception(exc)
},
"context": context or {}
}
try:
requests.post(
f"{LOG_CENTER_URL}/api/v1/logs/report",
json=payload,
timeout=3
)
except Exception:
pass # 静默失败,不影响主业务
Django 集成位置 — 修改 utils/exceptions.py 的 custom_exception_handler:
def custom_exception_handler(exc, context):
# 上报到 Log Center
report_error(exc, {
"view": str(context.get("view")),
"request_path": context.get("request").path if context.get("request") else None,
})
# ... 原有逻辑不变 ...
FastAPI 集成位置 — 添加全局异常处理器:
from fastapi import Request
from fastapi.responses import JSONResponse
@app.exception_handler(Exception)
async def global_exception_handler(request: Request, exc: Exception):
await report_error(exc, context={
"url": str(request.url),
"method": request.method,
})
return JSONResponse(status_code=500, content={"detail": "Internal Server Error"})
JavaScript / TypeScript (React / Vue)
const LOG_CENTER_URL = import.meta.env.VITE_LOG_CENTER_URL || 'http://localhost:8002';
export function reportError(error: Error, context?: Record<string, unknown>) {
const stackLines = error.stack?.split('\n') || [];
const match = stackLines[1]?.match(/at\s+.*\s+\((.+):(\d+):\d+\)/);
const payload = {
project_id: 'rtc_web',
environment: import.meta.env.MODE,
level: 'ERROR',
error: {
type: error.name,
message: error.message,
file_path: match?.[1] || 'unknown',
line_number: parseInt(match?.[2] || '0'),
stack_trace: stackLines,
},
context: {
url: window.location.href,
userAgent: navigator.userAgent,
...context,
},
};
const blob = new Blob([JSON.stringify(payload)], { type: 'application/json' });
if (navigator.sendBeacon) {
navigator.sendBeacon(`${LOG_CENTER_URL}/api/v1/logs/report`, blob);
} else {
fetch(`${LOG_CENTER_URL}/api/v1/logs/report`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
keepalive: true,
}).catch(() => {});
}
}
全局错误捕获 — 在 main.tsx / main.ts 入口文件中:
// JS 运行时异常
window.onerror = (_message, source, lineno, colno, error) => {
if (error) reportError(error, { source, lineno, colno });
};
// 未处理的 Promise rejection
window.onunhandledrejection = (event: PromiseRejectionEvent) => {
const error = event.reason instanceof Error
? event.reason
: new Error(String(event.reason));
reportError(error, { type: 'unhandledrejection' });
};
Axios 拦截器 — 在 api.ts / request.ts 中(仅上报 5xx 服务端错误):
api.interceptors.response.use(
(response) => response,
(error: AxiosError) => {
if (error.response && error.response.status >= 500) {
reportError(error, {
api_url: error.config?.url,
method: error.config?.method,
status: error.response.status,
});
}
return Promise.reject(error);
},
);
Flutter (Dart)
import 'dart:convert';
import 'package:http/http.dart' as http;
const logCenterUrl = String.fromEnvironment(
'LOG_CENTER_URL',
defaultValue: 'http://localhost:8002',
);
Future<void> reportError(dynamic error, StackTrace stackTrace, {Map<String, dynamic>? context}) async {
final stackLines = stackTrace.toString().split('\n');
final match = RegExp(r'#0\s+.*\((.+):(\d+):\d+\)').firstMatch(stackLines.first);
final payload = {
'project_id': 'airhub_app',
'environment': const String.fromEnvironment('ENVIRONMENT', defaultValue: 'development'),
'level': 'ERROR',
'error': {
'type': error.runtimeType.toString(),
'message': error.toString(),
'file_path': match?.group(1) ?? 'unknown',
'line_number': int.tryParse(match?.group(2) ?? '0') ?? 0,
'stack_trace': stackLines.take(20).toList(),
},
'context': context ?? {},
};
try {
await http.post(
Uri.parse('$logCenterUrl/api/v1/logs/report'),
headers: {'Content-Type': 'application/json'},
body: jsonEncode(payload),
).timeout(const Duration(seconds: 3));
} catch (_) {
// 静默失败
}
}
全局捕获 — 在 main.dart 中:
void main() {
FlutterError.onError = (details) {
reportError(details.exception, details.stack ?? StackTrace.current);
};
runZonedGuarded(() {
runApp(const MyApp());
}, (error, stack) {
reportError(error, stack);
});
}
步骤 3:接入 CI/CD 错误上报
source: "cicd"
在 Gitea Actions 流水线中,为每个关键步骤添加失败上报,构建/测试/Lint 失败时自动上报到 Log Center。
上报格式
{
"project_id": "rtc_backend",
"environment": "cicd",
"level": "ERROR",
"source": "cicd",
"commit_hash": "abc1234",
"error": {
"type": "DockerBuildError",
"message": "Docker build failed",
"file_path": null,
"line_number": null,
"stack_trace": ["Build step failed. Check CI logs for details."]
},
"context": {
"workflow_name": "Build and Deploy",
"job_name": "build",
"step_name": "Build Docker Image",
"run_id": "123",
"branch": "main",
"repository": "team/rtc_backend",
"run_url": "https://gitea.airlabs.art/team/rtc_backend/actions/runs/123"
}
}
CI/CD 特有字段
| 字段 | 说明 |
|---|---|
source |
必须设为 "cicd" |
environment |
设为 "cicd" |
error.type |
推荐值:DockerBuildError, NpmBuildError, TestFailure, LintError, CIBuildError |
error.file_path |
可为 null |
error.line_number |
可为 null |
context.workflow_name |
工作流名称 |
context.job_name |
Job 名称 |
context.step_name |
失败的步骤名称 |
context.run_id |
运行 ID |
context.run_url |
CI 运行详情链接 |
context.branch |
分支名 |
Gitea Actions 集成方式
为每个关键步骤添加 id,然后在末尾添加条件上报步骤:
name: Build and Deploy
on:
push:
branches: [main]
env:
LOG_CENTER_URL: https://qiyuan-log-center-api.airlabs.art
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# 关键步骤:添加 id
- name: Build Docker Image
id: build
run: docker build -t myapp:latest .
- name: Run Tests
id: test
run: docker run myapp:latest python -m pytest
- name: Deploy
id: deploy
run: kubectl apply -f k8s/
# ===== 失败上报步骤(放在所有关键步骤之后) =====
- name: Report Build Failure
if: failure() && steps.build.outcome == 'failure'
run: |
curl -s -X POST "${LOG_CENTER_URL}/api/v1/logs/report" \
-H "Content-Type: application/json" \
-d '{
"project_id": "'"${GITHUB_REPOSITORY##*/}"'",
"environment": "cicd",
"level": "ERROR",
"source": "cicd",
"commit_hash": "'"$GITHUB_SHA"'",
"error": {
"type": "DockerBuildError",
"message": "Docker build failed",
"file_path": null,
"line_number": null,
"stack_trace": ["Docker build step failed. Check CI logs."]
},
"context": {
"workflow_name": "'"$GITHUB_WORKFLOW"'",
"job_name": "'"$GITHUB_JOB"'",
"step_name": "Build Docker Image",
"run_id": "'"$GITHUB_RUN_ID"'",
"branch": "'"$GITHUB_REF_NAME"'",
"repository": "'"$GITHUB_REPOSITORY"'",
"run_url": "'"$GITHUB_SERVER_URL"'/'"$GITHUB_REPOSITORY"'/actions/runs/'"$GITHUB_RUN_ID"'"
}
}' --connect-timeout 5 --max-time 10 || true
- name: Report Test Failure
if: failure() && steps.test.outcome == 'failure'
run: |
curl -s -X POST "${LOG_CENTER_URL}/api/v1/logs/report" \
-H "Content-Type: application/json" \
-d '{
"project_id": "'"${GITHUB_REPOSITORY##*/}"'",
"environment": "cicd",
"level": "ERROR",
"source": "cicd",
"commit_hash": "'"$GITHUB_SHA"'",
"error": {
"type": "TestFailure",
"message": "Tests failed in CI pipeline",
"file_path": null,
"line_number": null,
"stack_trace": ["Test step failed. Check CI logs."]
},
"context": {
"workflow_name": "'"$GITHUB_WORKFLOW"'",
"job_name": "'"$GITHUB_JOB"'",
"step_name": "Run Tests",
"run_id": "'"$GITHUB_RUN_ID"'",
"branch": "'"$GITHUB_REF_NAME"'",
"repository": "'"$GITHUB_REPOSITORY"'",
"run_url": "'"$GITHUB_SERVER_URL"'/'"$GITHUB_REPOSITORY"'/actions/runs/'"$GITHUB_RUN_ID"'"
}
}' --connect-timeout 5 --max-time 10 || true
使用 report-cicd-error.sh 脚本
项目提供了通用上报脚本 scripts/report-cicd-error.sh(需要 jq),可在 CI 步骤中使用:
# 用法: ./scripts/report-cicd-error.sh <project_id> <step_name> <error_message_or_file>
./scripts/report-cicd-error.sh rtc_backend "Build Docker Image" "Docker build failed: exit code 1"
./scripts/report-cicd-error.sh rtc_backend "Run Tests" /tmp/test-output.log
脚本会自动:
- 根据步骤名推断
error_type(DockerBuildError / NpmBuildError / TestFailure / LintError) - 读取 Gitea Actions 环境变量填充 context
- 如果传入文件路径,读取最后 100 行作为 stack_trace
步骤 4:接入 K8s 部署错误上报
source: "deployment"
通过 K8s Pod 健康监控 CronJob,定时扫描集群中异常 Pod 并上报到 Log Center。
上报格式
{
"project_id": "rtc_backend",
"environment": "production",
"level": "CRITICAL",
"source": "deployment",
"error": {
"type": "CrashLoopBackOff",
"message": "CrashLoopBackOff: back-off restarting failed container (pod: rtc-backend-xxx, container: api)",
"file_path": null,
"line_number": null,
"stack_trace": ["...容器崩溃前的日志(最后 50 行)..."]
},
"context": {
"namespace": "default",
"pod_name": "rtc-backend-xxx-yyy",
"container_name": "api",
"deployment_name": "rtc-backend",
"restart_count": 5,
"node_name": "node-1"
}
}
Deployment 特有字段
| 字段 | 说明 |
|---|---|
source |
必须设为 "deployment" |
level |
建议 "CRITICAL",Pod 异常通常较严重 |
error.type |
取自 K8s 状态:CrashLoopBackOff, OOMKilled, ImagePullBackOff, ErrImagePull 等 |
error.file_path |
可为 null |
error.line_number |
可为 null |
error.stack_trace |
容器崩溃前的日志输出 |
context.namespace |
K8s 命名空间 |
context.pod_name |
Pod 名称 |
context.deployment_name |
Deployment 名称(用于指纹去重) |
context.restart_count |
重启次数 |
context.node_name |
节点名 |
监控的异常状态
| 状态 | 说明 |
|---|---|
CrashLoopBackOff |
容器反复崩溃重启 |
OOMKilled |
内存溢出被杀 |
ImagePullBackOff / ErrImagePull |
拉取镜像失败 |
CreateContainerConfigError |
容器配置错误 |
RunContainerError |
容器启动失败 |
接入方式:添加 Pod label 映射
K8s Monitor CronJob 已在集群中运行,每 5 分钟扫描一次。新项目接入只需在 k8s-monitor/monitor.py 的 APP_TO_PROJECT 字典中添加映射:
# k8s-monitor/monitor.py
APP_TO_PROJECT = {
"rtc-backend": "rtc_backend", # Pod 的 app label -> project_id
"rtc-backend-dev": "rtc_backend",
"rtc-web": "rtc_web",
"rtc-web-dev": "rtc_web",
"log-center-api": "log_center_api",
"log-center-web": "log_center_web",
# 新项目在此添加映射
"your-app": "your_project_id",
}
确保你的 K8s Deployment 有 app label:
metadata:
labels:
app: your-app # 与 APP_TO_PROJECT 中的 key 一致
CronJob 部署配置
如果集群中尚未部署 Monitor,使用以下配置:
# k8s/monitor-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: pod-health-monitor
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: pod-monitor
containers:
- name: monitor
image: your-registry/k8s-pod-monitor:latest
env:
- name: LOG_CENTER_URL
value: "https://qiyuan-log-center-api.airlabs.art"
- name: MONITOR_NAMESPACE
value: "default"
restartPolicy: OnFailure
错误去重机制
Log Center 使用 指纹(fingerprint) 对错误进行去重,三类来源使用不同的指纹策略:
| 来源 | 指纹组成 |
|---|---|
runtime |
MD5(project_id | error_type | file_path | line_number) |
cicd |
MD5(project_id | cicd | error_type | job_name | step_name) |
deployment |
MD5(project_id | deployment | error_type | namespace | deployment_name) |
相同指纹的错误只记录一次。已修复的错误再次出现会自动重新打开(回归检测)。
错误状态流转
NEW → VERIFYING → PENDING_FIX → FIXING → FIXED → VERIFIED → DEPLOYED
↓ ↓
CANNOT_REPRODUCE FIX_FAILED
| 状态 | 说明 |
|---|---|
NEW |
新上报的错误 |
VERIFYING |
正在验证复现 |
CANNOT_REPRODUCE |
无法复现 |
PENDING_FIX |
等待修复 |
FIXING |
AI Agent 正在修复中 |
FIXED |
已修复,待验证 |
VERIFIED |
已验证修复 |
DEPLOYED |
已部署上线 |
FIX_FAILED |
修复失败 |
API 参考
上报错误日志
POST /api/v1/logs/report
响应:
// 新错误
{"message": "Log reported", "id": 123}
// 重复错误(去重)
{"message": "Log deduplicated", "id": 123, "status": "NEW"}
// 回归(已修复的错误再次出现)
{"message": "Regression detected, reopened", "id": 123}
项目管理 API
| 方法 | 路径 | 说明 |
|---|---|---|
| GET | /api/v1/projects |
获取项目列表 |
| GET | /api/v1/projects/{project_id} |
获取项目详情 |
| PUT | /api/v1/projects/{project_id} |
编辑项目配置 |
最佳实践
- 设置超时: 上报请求设置 3 秒超时,避免影响主业务
- 静默失败: 上报失败不应影响用户体验,所有 catch 块静默处理
- 异步上报: 使用异步方式上报,不阻塞主流程
- 添加上下文: 尽量添加有用的上下文信息(用户ID、请求URL等)
- 环境区分: 正确设置
environment字段区分开发/生产 - CI/CD 用
|| true: 上报步骤失败不应阻断流水线
环境变量配置
Python 项目
# .env
LOG_CENTER_URL=http://localhost:8002
ENVIRONMENT=development
JavaScript 项目
# .env
VITE_LOG_CENTER_URL=http://localhost:8002
Flutter 项目
# 编译时传入
flutter run --dart-define=LOG_CENTER_URL=http://localhost:8002
flutter run --dart-define=ENVIRONMENT=development
Gitea Actions
env:
LOG_CENTER_URL: https://qiyuan-log-center-api.airlabs.art