# Log Center 接入指南 ## 概述 Log Center 是一个集中式错误日志收集与 AI 自动修复平台,提供 REST API 供各项目接入。 接入后覆盖三类错误上报: | 类型 | `source` 值 | 说明 | 触发方式 | |------|-------------|------|----------| | 日常运行错误 | `runtime` | 应用运行时的异常(Python/JS/Dart) | 代码中全局捕获异常自动上报 | | CI/CD 错误 | `cicd` | 构建、测试、Lint 等流水线失败 | Gitea Actions 步骤失败时上报 | | K8s 部署错误 | `deployment` | Pod 异常状态(CrashLoopBackOff、OOMKilled 等) | K8s CronJob 定时扫描上报 | **完整接入流程:** 1. **注册项目信息** — 调用 API 提交项目元信息(名称、仓库地址、本地路径) 2. **接入日常运行错误上报** — 在应用代码中集成全局异常捕获 3. **接入 CI/CD 错误上报** — 在 Gitea Actions 流水线中添加失败上报步骤 4. **接入 K8s 部署错误上报** — 在 K8s Pod 健康监控中添加项目映射 > **重要**: 必须先完成步骤 1,否则 Repair Agent 无法定位代码仓库和本地路径。 --- ## 服务地址 | 环境 | API 地址 | 仪表盘 | |------|----------|--------| | 本地开发 | `http://localhost:8002` | `http://localhost:8003` | | 生产环境 | `https://qiyuan-log-center-api.airlabs.art` | `https://qiyuan-log-center.airlabs.art` | --- ## 步骤 1:注册项目信息 首次接入 Log Center 时,**必须先注册项目信息**。这是 Repair Agent 正常工作的前提。 ### 注册方式 先上报一条初始化日志(触发项目自动创建),再调用 PUT 接口补全元信息: ```bash # 1. 上报初始化日志,触发项目自动创建 curl -X POST "${LOG_CENTER_URL}/api/v1/logs/report" \ -H "Content-Type: application/json" \ -d '{ "project_id": "your_project_id", "environment": "production", "level": "WARNING", "error": { "type": "ProjectInit", "message": "Project registered to Log Center", "stack_trace": ["Project initialization"] }, "repo_url": "https://gitea.airlabs.art/team/your_project.git" }' # 2. 补全项目元信息 curl -X PUT "${LOG_CENTER_URL}/api/v1/projects/your_project_id" \ -H "Content-Type: application/json" \ -d '{ "name": "项目显示名称", "repo_url": "https://gitea.airlabs.art/team/your_project.git", "local_path": "/absolute/path/to/project", "description": "项目描述" }' ``` ### 各语言注册示例 #### Python ```python import requests import os LOG_CENTER_URL = os.getenv("LOG_CENTER_URL", "http://localhost:8002") def register_project(): """首次接入时调用,注册项目到 Log Center。""" project_id = "your_project_id" # 1. 上报初始化日志触发项目创建 requests.post(f"{LOG_CENTER_URL}/api/v1/logs/report", json={ "project_id": project_id, "environment": os.getenv("ENVIRONMENT", "production"), "level": "WARNING", "error": { "type": "ProjectInit", "message": "Project registered to Log Center", "stack_trace": ["Project initialization"], }, "repo_url": "https://gitea.airlabs.art/team/your_project.git", }, timeout=5) # 2. 补全项目元信息 requests.put(f"{LOG_CENTER_URL}/api/v1/projects/{project_id}", json={ "name": "项目显示名称", "repo_url": "https://gitea.airlabs.art/team/your_project.git", "local_path": "/absolute/path/to/project", "description": "项目描述", }, timeout=5) ``` #### JavaScript / TypeScript ```typescript const LOG_CENTER_URL = import.meta.env.VITE_LOG_CENTER_URL || 'http://localhost:8002'; async function registerProject() { const projectId = 'your_project_id'; // 1. 上报初始化日志触发项目创建 await fetch(`${LOG_CENTER_URL}/api/v1/logs/report`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ project_id: projectId, environment: import.meta.env.MODE, level: 'WARNING', error: { type: 'ProjectInit', message: 'Project registered to Log Center', stack_trace: ['Project initialization'], }, repo_url: 'https://gitea.airlabs.art/team/your_project.git', }), }); // 2. 补全项目元信息 await fetch(`${LOG_CENTER_URL}/api/v1/projects/${projectId}`, { method: 'PUT', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ name: '项目显示名称', repo_url: 'https://gitea.airlabs.art/team/your_project.git', local_path: '/absolute/path/to/project', description: '项目描述', }), }); } ``` ### 项目元信息字段 | 字段 | 类型 | 必填 | 说明 | |------|------|------|------| | `project_id` | string | ✅ | 项目唯一标识,如 `rtc_backend`, `rtc_web` | | `name` | string | ✅ | 项目显示名称 | | `repo_url` | string | ✅ | Git 仓库地址(Repair Agent 克隆/推送代码用) | | `local_path` | string | ✅ | 本地项目绝对路径(Repair Agent 在此目录执行修复) | | `description` | string | ❌ | 项目描述 | --- ## 步骤 2:接入日常运行错误上报 > `source: "runtime"`(默认值,可不传) 在应用代码中集成全局异常捕获,运行时发生未处理异常时自动上报到 Log Center。 ### 上报格式 ```json { "project_id": "rtc_backend", "environment": "production", "level": "ERROR", "error": { "type": "ValueError", "message": "invalid literal for int() with base 10: 'abc'", "file_path": "apps/users/views.py", "line_number": 42, "stack_trace": [ "Traceback (most recent call last):", " File \"apps/users/views.py\", line 42, in get_user", "ValueError: invalid literal for int() with base 10: 'abc'" ] }, "context": { "url": "/api/users/123", "method": "GET", "user_id": "u_12345" } } ``` ### Runtime 字段说明 | 字段 | 类型 | 必填 | 说明 | |------|------|------|------| | `project_id` | string | ✅ | 项目标识 | | `environment` | string | ✅ | 环境:`development`, `staging`, `production` | | `level` | string | ✅ | 日志级别:`ERROR`, `WARNING`, `CRITICAL` | | `source` | string | ❌ | 默认 `runtime`,无需传 | | `timestamp` | string | ❌ | ISO 8601 格式,不传则用服务器时间 | | `version` | string | ❌ | 应用版本号 | | `commit_hash` | string | ❌ | Git commit hash | | `error.type` | string | ✅ | 异常类型,如 `ValueError`, `TypeError` | | `error.message` | string | ✅ | 错误消息 | | `error.file_path` | string | ✅ | 出错文件路径 | | `error.line_number` | int | ✅ | 出错行号 | | `error.stack_trace` | array | ✅ | 堆栈信息(数组或字符串) | | `context` | object | ❌ | 额外上下文信息 | ### Python (Django / FastAPI) ```python import requests import traceback import os LOG_CENTER_URL = os.getenv("LOG_CENTER_URL", "http://localhost:8002") def report_error(exc, context=None): """上报运行时错误到 Log Center""" tb = traceback.extract_tb(exc.__traceback__) last_frame = tb[-1] if tb else None payload = { "project_id": "rtc_backend", "environment": os.getenv("ENVIRONMENT", "development"), "level": "ERROR", "error": { "type": type(exc).__name__, "message": str(exc), "file_path": last_frame.filename if last_frame else "unknown", "line_number": last_frame.lineno if last_frame else 0, "stack_trace": traceback.format_exception(exc) }, "context": context or {} } try: requests.post( f"{LOG_CENTER_URL}/api/v1/logs/report", json=payload, timeout=3 ) except Exception: pass # 静默失败,不影响主业务 ``` **Django 集成位置** — 修改 `utils/exceptions.py` 的 `custom_exception_handler`: ```python def custom_exception_handler(exc, context): # 上报到 Log Center report_error(exc, { "view": str(context.get("view")), "request_path": context.get("request").path if context.get("request") else None, }) # ... 原有逻辑不变 ... ``` **FastAPI 集成位置** — 添加全局异常处理器: ```python from fastapi import Request from fastapi.responses import JSONResponse @app.exception_handler(Exception) async def global_exception_handler(request: Request, exc: Exception): await report_error(exc, context={ "url": str(request.url), "method": request.method, }) return JSONResponse(status_code=500, content={"detail": "Internal Server Error"}) ``` ### JavaScript / TypeScript (React / Vue) ```typescript const LOG_CENTER_URL = import.meta.env.VITE_LOG_CENTER_URL || 'http://localhost:8002'; export function reportError(error: Error, context?: Record) { const stackLines = error.stack?.split('\n') || []; const match = stackLines[1]?.match(/at\s+.*\s+\((.+):(\d+):\d+\)/); const payload = { project_id: 'rtc_web', environment: import.meta.env.MODE, level: 'ERROR', error: { type: error.name, message: error.message, file_path: match?.[1] || 'unknown', line_number: parseInt(match?.[2] || '0'), stack_trace: stackLines, }, context: { url: window.location.href, userAgent: navigator.userAgent, ...context, }, }; const blob = new Blob([JSON.stringify(payload)], { type: 'application/json' }); if (navigator.sendBeacon) { navigator.sendBeacon(`${LOG_CENTER_URL}/api/v1/logs/report`, blob); } else { fetch(`${LOG_CENTER_URL}/api/v1/logs/report`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(payload), keepalive: true, }).catch(() => {}); } } ``` **全局错误捕获** — 在 `main.tsx` / `main.ts` 入口文件中: ```typescript // JS 运行时异常 window.onerror = (_message, source, lineno, colno, error) => { if (error) reportError(error, { source, lineno, colno }); }; // 未处理的 Promise rejection window.onunhandledrejection = (event: PromiseRejectionEvent) => { const error = event.reason instanceof Error ? event.reason : new Error(String(event.reason)); reportError(error, { type: 'unhandledrejection' }); }; ``` **Axios 拦截器** — 在 `api.ts` / `request.ts` 中(仅上报 5xx 服务端错误): ```typescript api.interceptors.response.use( (response) => response, (error: AxiosError) => { if (error.response && error.response.status >= 500) { reportError(error, { api_url: error.config?.url, method: error.config?.method, status: error.response.status, }); } return Promise.reject(error); }, ); ``` ### Flutter (Dart) ```dart import 'dart:convert'; import 'package:http/http.dart' as http; const logCenterUrl = String.fromEnvironment( 'LOG_CENTER_URL', defaultValue: 'http://localhost:8002', ); Future reportError(dynamic error, StackTrace stackTrace, {Map? context}) async { final stackLines = stackTrace.toString().split('\n'); final match = RegExp(r'#0\s+.*\((.+):(\d+):\d+\)').firstMatch(stackLines.first); final payload = { 'project_id': 'airhub_app', 'environment': const String.fromEnvironment('ENVIRONMENT', defaultValue: 'development'), 'level': 'ERROR', 'error': { 'type': error.runtimeType.toString(), 'message': error.toString(), 'file_path': match?.group(1) ?? 'unknown', 'line_number': int.tryParse(match?.group(2) ?? '0') ?? 0, 'stack_trace': stackLines.take(20).toList(), }, 'context': context ?? {}, }; try { await http.post( Uri.parse('$logCenterUrl/api/v1/logs/report'), headers: {'Content-Type': 'application/json'}, body: jsonEncode(payload), ).timeout(const Duration(seconds: 3)); } catch (_) { // 静默失败 } } ``` **全局捕获** — 在 `main.dart` 中: ```dart void main() { FlutterError.onError = (details) { reportError(details.exception, details.stack ?? StackTrace.current); }; runZonedGuarded(() { runApp(const MyApp()); }, (error, stack) { reportError(error, stack); }); } ``` --- ## 步骤 3:接入 CI/CD 错误上报 > `source: "cicd"` 在 Gitea Actions 流水线中添加失败上报,构建/测试/部署失败时自动捕获实际错误日志并上报到 Log Center。 ### 核心要点 1. **用 `tee` 捕获日志** — 构建和部署步骤的输出必须通过 `2>&1 | tee /tmp/xxx.log` 捕获,否则上报的 stack_trace 为空 2. **用 `github.run_number`** — URL 中必须使用 `${{ github.run_number }}`(仓库维度序号),**不要用 `github.run_id`**(全局ID,会导致跳转到错误页面) 3. **用 `${{ }}` 模板语法** — 比 `$GITHUB_*` 环境变量更可靠 4. **单一综合上报步骤** — 一个 `if: failure()` 步骤自动判断哪个阶段失败,收集对应日志 ### 上报格式 ```json { "project_id": "rtc_backend", "environment": "main", "level": "ERROR", "source": "cicd", "commit_hash": "abc1234def5678", "repo_url": "https://gitea.airlabs.art/zyc/rtc_backend.git", "error": { "type": "CICDFailure", "message": "[build] Build and Deploy failed on branch main", "stack_trace": ["...实际构建日志最后 50 行..."] }, "context": { "job_name": "build-and-deploy", "step_name": "build", "workflow": "Build and Deploy", "run_id": "24", "branch": "main", "actor": "zyc", "commit": "abc1234def5678", "run_url": "https://gitea.airlabs.art/zyc/rtc_backend/actions/runs/24" } } ``` ### CI/CD 特有字段 | 字段 | 说明 | |------|------| | `source` | **必须**设为 `"cicd"` | | `environment` | 用分支名 `${{ github.ref_name }}`,如 `main`、`dev` | | `repo_url` | 仓库地址,便于 Repair Agent 关联 | | `error.type` | 推荐 `CICDFailure`(通用)或 `DockerBuildError` / `TestFailure` / `DeployError` | | `error.stack_trace` | **实际错误日志**(通过 `tee` 捕获),不要写死占位文字 | | `context.run_id` | **必须用 `${{ github.run_number }}`**(不是 `github.run_id`) | | `context.run_url` | 拼接方式:`https://gitea.airlabs.art/${{ github.repository }}/actions/runs/${{ github.run_number }}` | | `context.step_name` | 失败的步骤名称 | | `context.actor` | 触发者 | | `context.commit` | 完整 commit hash | ### Gitea Actions 集成方式(推荐) 以下是完整示例,关键点:构建步骤用 `tee` 捕获日志,末尾一个综合上报步骤自动判断失败阶段。 ```yaml name: Build and Deploy on: push: branches: [main] jobs: build-and-deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 # ===== 构建步骤:用 tee 捕获日志 ===== - name: Build Docker Image id: build run: | set -o pipefail docker buildx build \ --push \ --provenance=false \ --tag your-registry/your-app:latest \ . 2>&1 | tee /tmp/build.log - name: Deploy id: deploy run: | set -o pipefail { kubectl apply -f k8s/deployment.yaml kubectl rollout restart deployment/your-app } 2>&1 | tee /tmp/deploy.log # ===== 失败上报(单一综合步骤) ===== - name: Report failure to Log Center if: failure() run: | # 判断哪个步骤失败,收集对应日志 BUILD_LOG="" DEPLOY_LOG="" FAILED_STEP="unknown" if [[ "${{ steps.build.outcome }}" == "failure" ]]; then FAILED_STEP="build" if [ -f /tmp/build.log ]; then BUILD_LOG=$(tail -50 /tmp/build.log | sed 's/"/\\"/g' | sed ':a;N;$!ba;s/\n/\\n/g') fi elif [[ "${{ steps.deploy.outcome }}" == "failure" ]]; then FAILED_STEP="deploy" if [ -f /tmp/deploy.log ]; then DEPLOY_LOG=$(tail -50 /tmp/deploy.log | sed 's/"/\\"/g' | sed ':a;N;$!ba;s/\n/\\n/g') fi fi ERROR_LOG="${BUILD_LOG}${DEPLOY_LOG}" if [ -z "$ERROR_LOG" ]; then ERROR_LOG="No captured output. Check Gitea Actions UI for details." fi # 判断 source if [[ "$FAILED_STEP" == "deploy" ]]; then SOURCE="deployment" ERROR_TYPE="DeployError" else SOURCE="cicd" ERROR_TYPE="DockerBuildError" fi curl -s -X POST "https://qiyuan-log-center-api.airlabs.art/api/v1/logs/report" \ -H "Content-Type: application/json" \ -d "{ \"project_id\": \"your_project_id\", \"environment\": \"${{ github.ref_name }}\", \"level\": \"ERROR\", \"source\": \"${SOURCE}\", \"commit_hash\": \"${{ github.sha }}\", \"repo_url\": \"https://gitea.airlabs.art/zyc/your_project.git\", \"error\": { \"type\": \"${ERROR_TYPE}\", \"message\": \"[${FAILED_STEP}] Build and Deploy failed on branch ${{ github.ref_name }}\", \"stack_trace\": [\"${ERROR_LOG}\"] }, \"context\": { \"job_name\": \"build-and-deploy\", \"step_name\": \"${FAILED_STEP}\", \"workflow\": \"${{ github.workflow }}\", \"run_id\": \"${{ github.run_number }}\", \"branch\": \"${{ github.ref_name }}\", \"actor\": \"${{ github.actor }}\", \"commit\": \"${{ github.sha }}\", \"run_url\": \"https://gitea.airlabs.art/${{ github.repository }}/actions/runs/${{ github.run_number }}\" } }" || true ``` ### 使用 report-cicd-error.sh 脚本 项目提供了通用上报脚本 `scripts/report-cicd-error.sh`(需要 `jq`),可在 CI 步骤中使用: ```bash # 用法: ./scripts/report-cicd-error.sh ./scripts/report-cicd-error.sh rtc_backend "Build Docker Image" "Docker build failed: exit code 1" ./scripts/report-cicd-error.sh rtc_backend "Run Tests" /tmp/test-output.log ``` 脚本会自动: - 根据步骤名推断 `error_type`(DockerBuildError / NpmBuildError / TestFailure / LintError) - 读取 Gitea Actions 环境变量填充 context - 如果传入文件路径,读取最后 100 行作为 stack_trace --- ## 步骤 4:接入 K8s 部署错误上报 > `source: "deployment"` 通过 K8s Pod 健康监控 CronJob,定时扫描集群中异常 Pod 并上报到 Log Center。 ### 上报格式 ```json { "project_id": "rtc_backend", "environment": "production", "level": "CRITICAL", "source": "deployment", "error": { "type": "CrashLoopBackOff", "message": "CrashLoopBackOff: back-off restarting failed container (pod: rtc-backend-xxx, container: api)", "file_path": null, "line_number": null, "stack_trace": ["...容器崩溃前的日志(最后 50 行)..."] }, "context": { "namespace": "default", "pod_name": "rtc-backend-xxx-yyy", "container_name": "api", "deployment_name": "rtc-backend", "restart_count": 5, "node_name": "node-1" } } ``` ### Deployment 特有字段 | 字段 | 说明 | |------|------| | `source` | **必须**设为 `"deployment"` | | `level` | 建议 `"CRITICAL"`,Pod 异常通常较严重 | | `error.type` | 取自 K8s 状态:`CrashLoopBackOff`, `OOMKilled`, `ImagePullBackOff`, `ErrImagePull` 等 | | `error.file_path` | 可为 `null` | | `error.line_number` | 可为 `null` | | `error.stack_trace` | 容器崩溃前的日志输出 | | `context.namespace` | K8s 命名空间 | | `context.pod_name` | Pod 名称 | | `context.deployment_name` | Deployment 名称(用于指纹去重) | | `context.restart_count` | 重启次数 | | `context.node_name` | 节点名 | ### 监控的异常状态 | 状态 | 说明 | |------|------| | `CrashLoopBackOff` | 容器反复崩溃重启 | | `OOMKilled` | 内存溢出被杀 | | `ImagePullBackOff` / `ErrImagePull` | 拉取镜像失败 | | `CreateContainerConfigError` | 容器配置错误 | | `RunContainerError` | 容器启动失败 | ### 接入方式:自动映射 K8s Monitor CronJob 已在集群中运行,每 5 分钟扫描一次。Monitor 启动时会从 Log Center API(`GET /api/v1/projects`)动态拉取项目列表,自动生成 app label -> project_id 的映射。 **映射规则**:`project_id` 中的下划线替换为短横线作为 app label,同时生成 `-dev` 后缀变体。 | project_id | 自动生成的 app label | |---|---| | `rtc_backend` | `rtc-backend`, `rtc-backend-dev` | | `rtc_web` | `rtc-web`, `rtc-web-dev` | | `log_center_api` | `log-center-api`, `log-center-api-dev` | **新项目接入 K8s 监控只需两步**: 1. 在步骤 1 中完成项目注册(确保项目出现在 Log Center 项目列表中) 2. K8s Deployment 的 `app` label 使用 `project_id` 的短横线形式 确保你的 K8s Deployment 有 `app` label: ```yaml metadata: labels: app: your-app # 与 APP_TO_PROJECT 中的 key 一致 ``` ### CronJob 部署配置 如果集群中尚未部署 Monitor,使用以下配置: ```yaml # k8s/monitor-cronjob.yaml apiVersion: batch/v1 kind: CronJob metadata: name: pod-health-monitor spec: schedule: "*/5 * * * *" jobTemplate: spec: template: spec: serviceAccountName: pod-monitor containers: - name: monitor image: your-registry/k8s-pod-monitor:latest env: - name: LOG_CENTER_URL value: "https://qiyuan-log-center-api.airlabs.art" - name: MONITOR_NAMESPACE value: "default" restartPolicy: OnFailure ``` --- ## 错误去重机制 Log Center 使用 **指纹(fingerprint)** 对错误进行去重,三类来源使用不同的指纹策略: | 来源 | 指纹组成 | |------|----------| | `runtime` | `MD5(project_id \| error_type \| file_path \| line_number)` | | `cicd` | `MD5(project_id \| cicd \| error_type \| job_name \| step_name)` | | `deployment` | `MD5(project_id \| deployment \| error_type \| namespace \| deployment_name)` | 相同指纹的错误只记录一次。已修复的错误再次出现会自动重新打开(回归检测)。 --- ## 错误状态流转 ``` NEW → VERIFYING → PENDING_FIX → FIXING → FIXED → VERIFIED → DEPLOYED ↓ ↓ CANNOT_REPRODUCE FIX_FAILED ``` | 状态 | 说明 | |------|------| | `NEW` | 新上报的错误 | | `VERIFYING` | 正在验证复现 | | `CANNOT_REPRODUCE` | 无法复现 | | `PENDING_FIX` | 等待修复 | | `FIXING` | AI Agent 正在修复中 | | `FIXED` | 已修复,待验证 | | `VERIFIED` | 已验证修复 | | `DEPLOYED` | 已部署上线 | | `FIX_FAILED` | 修复失败 | --- ## API 参考 ### 上报错误日志 **POST** `/api/v1/logs/report` **响应:** ```json // 新错误 {"message": "Log reported", "id": 123} // 重复错误(去重) {"message": "Log deduplicated", "id": 123, "status": "NEW"} // 回归(已修复的错误再次出现) {"message": "Regression detected, reopened", "id": 123} ``` ### 项目管理 API | 方法 | 路径 | 说明 | |------|------|------| | GET | `/api/v1/projects` | 获取项目列表 | | GET | `/api/v1/projects/{project_id}` | 获取项目详情 | | PUT | `/api/v1/projects/{project_id}` | 编辑项目配置 | --- ## 最佳实践 1. **设置超时**: 上报请求设置 3 秒超时,避免影响主业务 2. **静默失败**: 上报失败不应影响用户体验,所有 catch 块静默处理 3. **异步上报**: 使用异步方式上报,不阻塞主流程 4. **添加上下文**: 尽量添加有用的上下文信息(用户ID、请求URL等) 5. **环境区分**: 正确设置 `environment` 字段区分开发/生产 6. **CI/CD 用 `|| true`**: 上报步骤失败不应阻断流水线 --- ## 环境变量配置 ### Python 项目 ```bash # .env LOG_CENTER_URL=http://localhost:8002 ENVIRONMENT=development ``` ### JavaScript 项目 ```bash # .env VITE_LOG_CENTER_URL=http://localhost:8002 ``` ### Flutter 项目 ```bash # 编译时传入 flutter run --dart-define=LOG_CENTER_URL=http://localhost:8002 flutter run --dart-define=ENVIRONMENT=development ``` ### Gitea Actions ```yaml env: LOG_CENTER_URL: https://qiyuan-log-center-api.airlabs.art ``` --- ## 完整 API 文档 访问: [http://localhost:8002/docs](http://localhost:8002/docs)