monitor 启动时调用 GET /api/v1/projects 拉取项目列表, 自动生成 app label -> project_id 映射(下划线转短横线 + -dev 变体), 新项目只需在 Log Center 注册即可自动纳入 K8s 监控。 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
24 KiB
Log Center 接入指南
概述
Log Center 是一个集中式错误日志收集与 AI 自动修复平台,提供 REST API 供各项目接入。
接入后覆盖三类错误上报:
| 类型 | source 值 |
说明 | 触发方式 |
|---|---|---|---|
| 日常运行错误 | runtime |
应用运行时的异常(Python/JS/Dart) | 代码中全局捕获异常自动上报 |
| CI/CD 错误 | cicd |
构建、测试、Lint 等流水线失败 | Gitea Actions 步骤失败时上报 |
| K8s 部署错误 | deployment |
Pod 异常状态(CrashLoopBackOff、OOMKilled 等) | K8s CronJob 定时扫描上报 |
完整接入流程:
- 注册项目信息 — 调用 API 提交项目元信息(名称、仓库地址、本地路径)
- 接入日常运行错误上报 — 在应用代码中集成全局异常捕获
- 接入 CI/CD 错误上报 — 在 Gitea Actions 流水线中添加失败上报步骤
- 接入 K8s 部署错误上报 — 在 K8s Pod 健康监控中添加项目映射
重要: 必须先完成步骤 1,否则 Repair Agent 无法定位代码仓库和本地路径。
服务地址
| 环境 | API 地址 | 仪表盘 |
|---|---|---|
| 本地开发 | http://localhost:8002 |
http://localhost:8003 |
| 生产环境 | https://qiyuan-log-center-api.airlabs.art |
https://qiyuan-log-center.airlabs.art |
步骤 1:注册项目信息
首次接入 Log Center 时,必须先注册项目信息。这是 Repair Agent 正常工作的前提。
注册方式
先上报一条初始化日志(触发项目自动创建),再调用 PUT 接口补全元信息:
# 1. 上报初始化日志,触发项目自动创建
curl -X POST "${LOG_CENTER_URL}/api/v1/logs/report" \
-H "Content-Type: application/json" \
-d '{
"project_id": "your_project_id",
"environment": "production",
"level": "WARNING",
"error": {
"type": "ProjectInit",
"message": "Project registered to Log Center",
"stack_trace": ["Project initialization"]
},
"repo_url": "https://gitea.airlabs.art/team/your_project.git"
}'
# 2. 补全项目元信息
curl -X PUT "${LOG_CENTER_URL}/api/v1/projects/your_project_id" \
-H "Content-Type: application/json" \
-d '{
"name": "项目显示名称",
"repo_url": "https://gitea.airlabs.art/team/your_project.git",
"local_path": "/absolute/path/to/project",
"description": "项目描述"
}'
各语言注册示例
Python
import requests
import os
LOG_CENTER_URL = os.getenv("LOG_CENTER_URL", "http://localhost:8002")
def register_project():
"""首次接入时调用,注册项目到 Log Center。"""
project_id = "your_project_id"
# 1. 上报初始化日志触发项目创建
requests.post(f"{LOG_CENTER_URL}/api/v1/logs/report", json={
"project_id": project_id,
"environment": os.getenv("ENVIRONMENT", "production"),
"level": "WARNING",
"error": {
"type": "ProjectInit",
"message": "Project registered to Log Center",
"stack_trace": ["Project initialization"],
},
"repo_url": "https://gitea.airlabs.art/team/your_project.git",
}, timeout=5)
# 2. 补全项目元信息
requests.put(f"{LOG_CENTER_URL}/api/v1/projects/{project_id}", json={
"name": "项目显示名称",
"repo_url": "https://gitea.airlabs.art/team/your_project.git",
"local_path": "/absolute/path/to/project",
"description": "项目描述",
}, timeout=5)
JavaScript / TypeScript
const LOG_CENTER_URL = import.meta.env.VITE_LOG_CENTER_URL || 'http://localhost:8002';
async function registerProject() {
const projectId = 'your_project_id';
// 1. 上报初始化日志触发项目创建
await fetch(`${LOG_CENTER_URL}/api/v1/logs/report`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
project_id: projectId,
environment: import.meta.env.MODE,
level: 'WARNING',
error: {
type: 'ProjectInit',
message: 'Project registered to Log Center',
stack_trace: ['Project initialization'],
},
repo_url: 'https://gitea.airlabs.art/team/your_project.git',
}),
});
// 2. 补全项目元信息
await fetch(`${LOG_CENTER_URL}/api/v1/projects/${projectId}`, {
method: 'PUT',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
name: '项目显示名称',
repo_url: 'https://gitea.airlabs.art/team/your_project.git',
local_path: '/absolute/path/to/project',
description: '项目描述',
}),
});
}
项目元信息字段
| 字段 | 类型 | 必填 | 说明 |
|---|---|---|---|
project_id |
string | ✅ | 项目唯一标识,如 rtc_backend, rtc_web |
name |
string | ✅ | 项目显示名称 |
repo_url |
string | ✅ | Git 仓库地址(Repair Agent 克隆/推送代码用) |
local_path |
string | ✅ | 本地项目绝对路径(Repair Agent 在此目录执行修复) |
description |
string | ❌ | 项目描述 |
步骤 2:接入日常运行错误上报
source: "runtime"(默认值,可不传)
在应用代码中集成全局异常捕获,运行时发生未处理异常时自动上报到 Log Center。
上报格式
{
"project_id": "rtc_backend",
"environment": "production",
"level": "ERROR",
"error": {
"type": "ValueError",
"message": "invalid literal for int() with base 10: 'abc'",
"file_path": "apps/users/views.py",
"line_number": 42,
"stack_trace": [
"Traceback (most recent call last):",
" File \"apps/users/views.py\", line 42, in get_user",
"ValueError: invalid literal for int() with base 10: 'abc'"
]
},
"context": {
"url": "/api/users/123",
"method": "GET",
"user_id": "u_12345"
}
}
Runtime 字段说明
| 字段 | 类型 | 必填 | 说明 |
|---|---|---|---|
project_id |
string | ✅ | 项目标识 |
environment |
string | ✅ | 环境:development, staging, production |
level |
string | ✅ | 日志级别:ERROR, WARNING, CRITICAL |
source |
string | ❌ | 默认 runtime,无需传 |
timestamp |
string | ❌ | ISO 8601 格式,不传则用服务器时间 |
version |
string | ❌ | 应用版本号 |
commit_hash |
string | ❌ | Git commit hash |
error.type |
string | ✅ | 异常类型,如 ValueError, TypeError |
error.message |
string | ✅ | 错误消息 |
error.file_path |
string | ✅ | 出错文件路径 |
error.line_number |
int | ✅ | 出错行号 |
error.stack_trace |
array | ✅ | 堆栈信息(数组或字符串) |
context |
object | ❌ | 额外上下文信息 |
Python (Django / FastAPI)
import requests
import traceback
import os
LOG_CENTER_URL = os.getenv("LOG_CENTER_URL", "http://localhost:8002")
def report_error(exc, context=None):
"""上报运行时错误到 Log Center"""
tb = traceback.extract_tb(exc.__traceback__)
last_frame = tb[-1] if tb else None
payload = {
"project_id": "rtc_backend",
"environment": os.getenv("ENVIRONMENT", "development"),
"level": "ERROR",
"error": {
"type": type(exc).__name__,
"message": str(exc),
"file_path": last_frame.filename if last_frame else "unknown",
"line_number": last_frame.lineno if last_frame else 0,
"stack_trace": traceback.format_exception(exc)
},
"context": context or {}
}
try:
requests.post(
f"{LOG_CENTER_URL}/api/v1/logs/report",
json=payload,
timeout=3
)
except Exception:
pass # 静默失败,不影响主业务
Django 集成位置 — 修改 utils/exceptions.py 的 custom_exception_handler:
def custom_exception_handler(exc, context):
# 上报到 Log Center
report_error(exc, {
"view": str(context.get("view")),
"request_path": context.get("request").path if context.get("request") else None,
})
# ... 原有逻辑不变 ...
FastAPI 集成位置 — 添加全局异常处理器:
from fastapi import Request
from fastapi.responses import JSONResponse
@app.exception_handler(Exception)
async def global_exception_handler(request: Request, exc: Exception):
await report_error(exc, context={
"url": str(request.url),
"method": request.method,
})
return JSONResponse(status_code=500, content={"detail": "Internal Server Error"})
JavaScript / TypeScript (React / Vue)
const LOG_CENTER_URL = import.meta.env.VITE_LOG_CENTER_URL || 'http://localhost:8002';
export function reportError(error: Error, context?: Record<string, unknown>) {
const stackLines = error.stack?.split('\n') || [];
const match = stackLines[1]?.match(/at\s+.*\s+\((.+):(\d+):\d+\)/);
const payload = {
project_id: 'rtc_web',
environment: import.meta.env.MODE,
level: 'ERROR',
error: {
type: error.name,
message: error.message,
file_path: match?.[1] || 'unknown',
line_number: parseInt(match?.[2] || '0'),
stack_trace: stackLines,
},
context: {
url: window.location.href,
userAgent: navigator.userAgent,
...context,
},
};
const blob = new Blob([JSON.stringify(payload)], { type: 'application/json' });
if (navigator.sendBeacon) {
navigator.sendBeacon(`${LOG_CENTER_URL}/api/v1/logs/report`, blob);
} else {
fetch(`${LOG_CENTER_URL}/api/v1/logs/report`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
keepalive: true,
}).catch(() => {});
}
}
全局错误捕获 — 在 main.tsx / main.ts 入口文件中:
// JS 运行时异常
window.onerror = (_message, source, lineno, colno, error) => {
if (error) reportError(error, { source, lineno, colno });
};
// 未处理的 Promise rejection
window.onunhandledrejection = (event: PromiseRejectionEvent) => {
const error = event.reason instanceof Error
? event.reason
: new Error(String(event.reason));
reportError(error, { type: 'unhandledrejection' });
};
Axios 拦截器 — 在 api.ts / request.ts 中(仅上报 5xx 服务端错误):
api.interceptors.response.use(
(response) => response,
(error: AxiosError) => {
if (error.response && error.response.status >= 500) {
reportError(error, {
api_url: error.config?.url,
method: error.config?.method,
status: error.response.status,
});
}
return Promise.reject(error);
},
);
Flutter (Dart)
import 'dart:convert';
import 'package:http/http.dart' as http;
const logCenterUrl = String.fromEnvironment(
'LOG_CENTER_URL',
defaultValue: 'http://localhost:8002',
);
Future<void> reportError(dynamic error, StackTrace stackTrace, {Map<String, dynamic>? context}) async {
final stackLines = stackTrace.toString().split('\n');
final match = RegExp(r'#0\s+.*\((.+):(\d+):\d+\)').firstMatch(stackLines.first);
final payload = {
'project_id': 'airhub_app',
'environment': const String.fromEnvironment('ENVIRONMENT', defaultValue: 'development'),
'level': 'ERROR',
'error': {
'type': error.runtimeType.toString(),
'message': error.toString(),
'file_path': match?.group(1) ?? 'unknown',
'line_number': int.tryParse(match?.group(2) ?? '0') ?? 0,
'stack_trace': stackLines.take(20).toList(),
},
'context': context ?? {},
};
try {
await http.post(
Uri.parse('$logCenterUrl/api/v1/logs/report'),
headers: {'Content-Type': 'application/json'},
body: jsonEncode(payload),
).timeout(const Duration(seconds: 3));
} catch (_) {
// 静默失败
}
}
全局捕获 — 在 main.dart 中:
void main() {
FlutterError.onError = (details) {
reportError(details.exception, details.stack ?? StackTrace.current);
};
runZonedGuarded(() {
runApp(const MyApp());
}, (error, stack) {
reportError(error, stack);
});
}
步骤 3:接入 CI/CD 错误上报
source: "cicd"
在 Gitea Actions 流水线中添加失败上报,构建/测试/部署失败时自动捕获实际错误日志并上报到 Log Center。
核心要点
- 用
tee捕获日志 — 构建和部署步骤的输出必须通过2>&1 | tee /tmp/xxx.log捕获,否则上报的 stack_trace 为空 - 用
github.run_number— URL 中必须使用${{ github.run_number }}(仓库维度序号),不要用github.run_id(全局ID,会导致跳转到错误页面) - 用
${{ }}模板语法 — 比$GITHUB_*环境变量更可靠 - 单一综合上报步骤 — 一个
if: failure()步骤自动判断哪个阶段失败,收集对应日志
上报格式
{
"project_id": "rtc_backend",
"environment": "main",
"level": "ERROR",
"source": "cicd",
"commit_hash": "abc1234def5678",
"repo_url": "https://gitea.airlabs.art/zyc/rtc_backend.git",
"error": {
"type": "CICDFailure",
"message": "[build] Build and Deploy failed on branch main",
"stack_trace": ["...实际构建日志最后 50 行..."]
},
"context": {
"job_name": "build-and-deploy",
"step_name": "build",
"workflow": "Build and Deploy",
"run_id": "24",
"branch": "main",
"actor": "zyc",
"commit": "abc1234def5678",
"run_url": "https://gitea.airlabs.art/zyc/rtc_backend/actions/runs/24"
}
}
CI/CD 特有字段
| 字段 | 说明 |
|---|---|
source |
必须设为 "cicd" |
environment |
用分支名 ${{ github.ref_name }},如 main、dev |
repo_url |
仓库地址,便于 Repair Agent 关联 |
error.type |
推荐 CICDFailure(通用)或 DockerBuildError / TestFailure / DeployError |
error.stack_trace |
实际错误日志(通过 tee 捕获),不要写死占位文字 |
context.run_id |
必须用 ${{ github.run_number }}(不是 github.run_id) |
context.run_url |
拼接方式:https://gitea.airlabs.art/${{ github.repository }}/actions/runs/${{ github.run_number }} |
context.step_name |
失败的步骤名称 |
context.actor |
触发者 |
context.commit |
完整 commit hash |
Gitea Actions 集成方式(推荐)
以下是完整示例,关键点:构建步骤用 tee 捕获日志,末尾一个综合上报步骤自动判断失败阶段。
name: Build and Deploy
on:
push:
branches: [main]
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
# ===== 构建步骤:用 tee 捕获日志 =====
- name: Build Docker Image
id: build
run: |
set -o pipefail
docker buildx build \
--push \
--provenance=false \
--tag your-registry/your-app:latest \
. 2>&1 | tee /tmp/build.log
- name: Deploy
id: deploy
run: |
set -o pipefail
{
kubectl apply -f k8s/deployment.yaml
kubectl rollout restart deployment/your-app
} 2>&1 | tee /tmp/deploy.log
# ===== 失败上报(单一综合步骤) =====
- name: Report failure to Log Center
if: failure()
run: |
# 判断哪个步骤失败,收集对应日志
BUILD_LOG=""
DEPLOY_LOG=""
FAILED_STEP="unknown"
if [[ "${{ steps.build.outcome }}" == "failure" ]]; then
FAILED_STEP="build"
if [ -f /tmp/build.log ]; then
BUILD_LOG=$(tail -50 /tmp/build.log | sed 's/"/\\"/g' | sed ':a;N;$!ba;s/\n/\\n/g')
fi
elif [[ "${{ steps.deploy.outcome }}" == "failure" ]]; then
FAILED_STEP="deploy"
if [ -f /tmp/deploy.log ]; then
DEPLOY_LOG=$(tail -50 /tmp/deploy.log | sed 's/"/\\"/g' | sed ':a;N;$!ba;s/\n/\\n/g')
fi
fi
ERROR_LOG="${BUILD_LOG}${DEPLOY_LOG}"
if [ -z "$ERROR_LOG" ]; then
ERROR_LOG="No captured output. Check Gitea Actions UI for details."
fi
# 判断 source
if [[ "$FAILED_STEP" == "deploy" ]]; then
SOURCE="deployment"
ERROR_TYPE="DeployError"
else
SOURCE="cicd"
ERROR_TYPE="DockerBuildError"
fi
curl -s -X POST "https://qiyuan-log-center-api.airlabs.art/api/v1/logs/report" \
-H "Content-Type: application/json" \
-d "{
\"project_id\": \"your_project_id\",
\"environment\": \"${{ github.ref_name }}\",
\"level\": \"ERROR\",
\"source\": \"${SOURCE}\",
\"commit_hash\": \"${{ github.sha }}\",
\"repo_url\": \"https://gitea.airlabs.art/zyc/your_project.git\",
\"error\": {
\"type\": \"${ERROR_TYPE}\",
\"message\": \"[${FAILED_STEP}] Build and Deploy failed on branch ${{ github.ref_name }}\",
\"stack_trace\": [\"${ERROR_LOG}\"]
},
\"context\": {
\"job_name\": \"build-and-deploy\",
\"step_name\": \"${FAILED_STEP}\",
\"workflow\": \"${{ github.workflow }}\",
\"run_id\": \"${{ github.run_number }}\",
\"branch\": \"${{ github.ref_name }}\",
\"actor\": \"${{ github.actor }}\",
\"commit\": \"${{ github.sha }}\",
\"run_url\": \"https://gitea.airlabs.art/${{ github.repository }}/actions/runs/${{ github.run_number }}\"
}
}" || true
使用 report-cicd-error.sh 脚本
项目提供了通用上报脚本 scripts/report-cicd-error.sh(需要 jq),可在 CI 步骤中使用:
# 用法: ./scripts/report-cicd-error.sh <project_id> <step_name> <error_message_or_file>
./scripts/report-cicd-error.sh rtc_backend "Build Docker Image" "Docker build failed: exit code 1"
./scripts/report-cicd-error.sh rtc_backend "Run Tests" /tmp/test-output.log
脚本会自动:
- 根据步骤名推断
error_type(DockerBuildError / NpmBuildError / TestFailure / LintError) - 读取 Gitea Actions 环境变量填充 context
- 如果传入文件路径,读取最后 100 行作为 stack_trace
步骤 4:接入 K8s 部署错误上报
source: "deployment"
通过 K8s Pod 健康监控 CronJob,定时扫描集群中异常 Pod 并上报到 Log Center。
上报格式
{
"project_id": "rtc_backend",
"environment": "production",
"level": "CRITICAL",
"source": "deployment",
"error": {
"type": "CrashLoopBackOff",
"message": "CrashLoopBackOff: back-off restarting failed container (pod: rtc-backend-xxx, container: api)",
"file_path": null,
"line_number": null,
"stack_trace": ["...容器崩溃前的日志(最后 50 行)..."]
},
"context": {
"namespace": "default",
"pod_name": "rtc-backend-xxx-yyy",
"container_name": "api",
"deployment_name": "rtc-backend",
"restart_count": 5,
"node_name": "node-1"
}
}
Deployment 特有字段
| 字段 | 说明 |
|---|---|
source |
必须设为 "deployment" |
level |
建议 "CRITICAL",Pod 异常通常较严重 |
error.type |
取自 K8s 状态:CrashLoopBackOff, OOMKilled, ImagePullBackOff, ErrImagePull 等 |
error.file_path |
可为 null |
error.line_number |
可为 null |
error.stack_trace |
容器崩溃前的日志输出 |
context.namespace |
K8s 命名空间 |
context.pod_name |
Pod 名称 |
context.deployment_name |
Deployment 名称(用于指纹去重) |
context.restart_count |
重启次数 |
context.node_name |
节点名 |
监控的异常状态
| 状态 | 说明 |
|---|---|
CrashLoopBackOff |
容器反复崩溃重启 |
OOMKilled |
内存溢出被杀 |
ImagePullBackOff / ErrImagePull |
拉取镜像失败 |
CreateContainerConfigError |
容器配置错误 |
RunContainerError |
容器启动失败 |
接入方式:自动映射
K8s Monitor CronJob 已在集群中运行,每 5 分钟扫描一次。Monitor 启动时会从 Log Center API(GET /api/v1/projects)动态拉取项目列表,自动生成 app label -> project_id 的映射。
映射规则:project_id 中的下划线替换为短横线作为 app label,同时生成 -dev 后缀变体。
| project_id | 自动生成的 app label |
|---|---|
rtc_backend |
rtc-backend, rtc-backend-dev |
rtc_web |
rtc-web, rtc-web-dev |
log_center_api |
log-center-api, log-center-api-dev |
新项目接入 K8s 监控只需两步:
- 在步骤 1 中完成项目注册(确保项目出现在 Log Center 项目列表中)
- K8s Deployment 的
applabel 使用project_id的短横线形式
确保你的 K8s Deployment 有 app label:
metadata:
labels:
app: your-app # 与 APP_TO_PROJECT 中的 key 一致
CronJob 部署配置
如果集群中尚未部署 Monitor,使用以下配置:
# k8s/monitor-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: pod-health-monitor
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: pod-monitor
containers:
- name: monitor
image: your-registry/k8s-pod-monitor:latest
env:
- name: LOG_CENTER_URL
value: "https://qiyuan-log-center-api.airlabs.art"
- name: MONITOR_NAMESPACE
value: "default"
restartPolicy: OnFailure
错误去重机制
Log Center 使用 指纹(fingerprint) 对错误进行去重,三类来源使用不同的指纹策略:
| 来源 | 指纹组成 |
|---|---|
runtime |
MD5(project_id | error_type | file_path | line_number) |
cicd |
MD5(project_id | cicd | error_type | job_name | step_name) |
deployment |
MD5(project_id | deployment | error_type | namespace | deployment_name) |
相同指纹的错误只记录一次。已修复的错误再次出现会自动重新打开(回归检测)。
错误状态流转
NEW → VERIFYING → PENDING_FIX → FIXING → FIXED → VERIFIED → DEPLOYED
↓ ↓
CANNOT_REPRODUCE FIX_FAILED
| 状态 | 说明 |
|---|---|
NEW |
新上报的错误 |
VERIFYING |
正在验证复现 |
CANNOT_REPRODUCE |
无法复现 |
PENDING_FIX |
等待修复 |
FIXING |
AI Agent 正在修复中 |
FIXED |
已修复,待验证 |
VERIFIED |
已验证修复 |
DEPLOYED |
已部署上线 |
FIX_FAILED |
修复失败 |
API 参考
上报错误日志
POST /api/v1/logs/report
响应:
// 新错误
{"message": "Log reported", "id": 123}
// 重复错误(去重)
{"message": "Log deduplicated", "id": 123, "status": "NEW"}
// 回归(已修复的错误再次出现)
{"message": "Regression detected, reopened", "id": 123}
项目管理 API
| 方法 | 路径 | 说明 |
|---|---|---|
| GET | /api/v1/projects |
获取项目列表 |
| GET | /api/v1/projects/{project_id} |
获取项目详情 |
| PUT | /api/v1/projects/{project_id} |
编辑项目配置 |
最佳实践
- 设置超时: 上报请求设置 3 秒超时,避免影响主业务
- 静默失败: 上报失败不应影响用户体验,所有 catch 块静默处理
- 异步上报: 使用异步方式上报,不阻塞主流程
- 添加上下文: 尽量添加有用的上下文信息(用户ID、请求URL等)
- 环境区分: 正确设置
environment字段区分开发/生产 - CI/CD 用
|| true: 上报步骤失败不应阻断流水线
环境变量配置
Python 项目
# .env
LOG_CENTER_URL=http://localhost:8002
ENVIRONMENT=development
JavaScript 项目
# .env
VITE_LOG_CENTER_URL=http://localhost:8002
Flutter 项目
# 编译时传入
flutter run --dart-define=LOG_CENTER_URL=http://localhost:8002
flutter run --dart-define=ENVIRONMENT=development
Gitea Actions
env:
LOG_CENTER_URL: https://qiyuan-log-center-api.airlabs.art