log-center/docs/integration_guide.md
zyc 89f19e9776
Some checks failed
Build and Deploy Log Center / build-and-deploy (push) Has been cancelled
docs: 更新 CI/CD 接入指南,改用日志捕获 + run_number 方式
- 核心要点:tee 捕获日志、github.run_number、模板语法、单一综合上报
- 完整 Gitea Actions 集成示例(含日志收集和失败判断逻辑)
- 字段说明中标注 run_url 的正确拼接方式

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 10:18:22 +08:00

24 KiB
Raw Blame History

Log Center 接入指南

概述

Log Center 是一个集中式错误日志收集与 AI 自动修复平台,提供 REST API 供各项目接入。

接入后覆盖三类错误上报:

类型 source 说明 触发方式
日常运行错误 runtime 应用运行时的异常Python/JS/Dart 代码中全局捕获异常自动上报
CI/CD 错误 cicd 构建、测试、Lint 等流水线失败 Gitea Actions 步骤失败时上报
K8s 部署错误 deployment Pod 异常状态CrashLoopBackOff、OOMKilled 等) K8s CronJob 定时扫描上报

完整接入流程:

  1. 注册项目信息 — 调用 API 提交项目元信息(名称、仓库地址、本地路径)
  2. 接入日常运行错误上报 — 在应用代码中集成全局异常捕获
  3. 接入 CI/CD 错误上报 — 在 Gitea Actions 流水线中添加失败上报步骤
  4. 接入 K8s 部署错误上报 — 在 K8s Pod 健康监控中添加项目映射

重要: 必须先完成步骤 1否则 Repair Agent 无法定位代码仓库和本地路径。


服务地址

环境 API 地址 仪表盘
本地开发 http://localhost:8002 http://localhost:8003
生产环境 https://qiyuan-log-center-api.airlabs.art https://qiyuan-log-center.airlabs.art

步骤 1注册项目信息

首次接入 Log Center 时,必须先注册项目信息。这是 Repair Agent 正常工作的前提。

注册方式

先上报一条初始化日志(触发项目自动创建),再调用 PUT 接口补全元信息:

# 1. 上报初始化日志,触发项目自动创建
curl -X POST "${LOG_CENTER_URL}/api/v1/logs/report" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "your_project_id",
    "environment": "production",
    "level": "WARNING",
    "error": {
      "type": "ProjectInit",
      "message": "Project registered to Log Center",
      "stack_trace": ["Project initialization"]
    },
    "repo_url": "https://gitea.airlabs.art/team/your_project.git"
  }'

# 2. 补全项目元信息
curl -X PUT "${LOG_CENTER_URL}/api/v1/projects/your_project_id" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "项目显示名称",
    "repo_url": "https://gitea.airlabs.art/team/your_project.git",
    "local_path": "/absolute/path/to/project",
    "description": "项目描述"
  }'

各语言注册示例

Python

import requests
import os

LOG_CENTER_URL = os.getenv("LOG_CENTER_URL", "http://localhost:8002")

def register_project():
    """首次接入时调用,注册项目到 Log Center。"""
    project_id = "your_project_id"

    # 1. 上报初始化日志触发项目创建
    requests.post(f"{LOG_CENTER_URL}/api/v1/logs/report", json={
        "project_id": project_id,
        "environment": os.getenv("ENVIRONMENT", "production"),
        "level": "WARNING",
        "error": {
            "type": "ProjectInit",
            "message": "Project registered to Log Center",
            "stack_trace": ["Project initialization"],
        },
        "repo_url": "https://gitea.airlabs.art/team/your_project.git",
    }, timeout=5)

    # 2. 补全项目元信息
    requests.put(f"{LOG_CENTER_URL}/api/v1/projects/{project_id}", json={
        "name": "项目显示名称",
        "repo_url": "https://gitea.airlabs.art/team/your_project.git",
        "local_path": "/absolute/path/to/project",
        "description": "项目描述",
    }, timeout=5)

JavaScript / TypeScript

const LOG_CENTER_URL = import.meta.env.VITE_LOG_CENTER_URL || 'http://localhost:8002';

async function registerProject() {
  const projectId = 'your_project_id';

  // 1. 上报初始化日志触发项目创建
  await fetch(`${LOG_CENTER_URL}/api/v1/logs/report`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      project_id: projectId,
      environment: import.meta.env.MODE,
      level: 'WARNING',
      error: {
        type: 'ProjectInit',
        message: 'Project registered to Log Center',
        stack_trace: ['Project initialization'],
      },
      repo_url: 'https://gitea.airlabs.art/team/your_project.git',
    }),
  });

  // 2. 补全项目元信息
  await fetch(`${LOG_CENTER_URL}/api/v1/projects/${projectId}`, {
    method: 'PUT',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      name: '项目显示名称',
      repo_url: 'https://gitea.airlabs.art/team/your_project.git',
      local_path: '/absolute/path/to/project',
      description: '项目描述',
    }),
  });
}

项目元信息字段

字段 类型 必填 说明
project_id string 项目唯一标识,如 rtc_backend, rtc_web
name string 项目显示名称
repo_url string Git 仓库地址Repair Agent 克隆/推送代码用)
local_path string 本地项目绝对路径Repair Agent 在此目录执行修复)
description string 项目描述

步骤 2接入日常运行错误上报

source: "runtime"(默认值,可不传)

在应用代码中集成全局异常捕获,运行时发生未处理异常时自动上报到 Log Center。

上报格式

{
  "project_id": "rtc_backend",
  "environment": "production",
  "level": "ERROR",
  "error": {
    "type": "ValueError",
    "message": "invalid literal for int() with base 10: 'abc'",
    "file_path": "apps/users/views.py",
    "line_number": 42,
    "stack_trace": [
      "Traceback (most recent call last):",
      "  File \"apps/users/views.py\", line 42, in get_user",
      "ValueError: invalid literal for int() with base 10: 'abc'"
    ]
  },
  "context": {
    "url": "/api/users/123",
    "method": "GET",
    "user_id": "u_12345"
  }
}

Runtime 字段说明

字段 类型 必填 说明
project_id string 项目标识
environment string 环境:development, staging, production
level string 日志级别:ERROR, WARNING, CRITICAL
source string 默认 runtime,无需传
timestamp string ISO 8601 格式,不传则用服务器时间
version string 应用版本号
commit_hash string Git commit hash
error.type string 异常类型,如 ValueError, TypeError
error.message string 错误消息
error.file_path string 出错文件路径
error.line_number int 出错行号
error.stack_trace array 堆栈信息(数组或字符串)
context object 额外上下文信息

Python (Django / FastAPI)

import requests
import traceback
import os

LOG_CENTER_URL = os.getenv("LOG_CENTER_URL", "http://localhost:8002")

def report_error(exc, context=None):
    """上报运行时错误到 Log Center"""
    tb = traceback.extract_tb(exc.__traceback__)
    last_frame = tb[-1] if tb else None

    payload = {
        "project_id": "rtc_backend",
        "environment": os.getenv("ENVIRONMENT", "development"),
        "level": "ERROR",
        "error": {
            "type": type(exc).__name__,
            "message": str(exc),
            "file_path": last_frame.filename if last_frame else "unknown",
            "line_number": last_frame.lineno if last_frame else 0,
            "stack_trace": traceback.format_exception(exc)
        },
        "context": context or {}
    }

    try:
        requests.post(
            f"{LOG_CENTER_URL}/api/v1/logs/report",
            json=payload,
            timeout=3
        )
    except Exception:
        pass  # 静默失败,不影响主业务

Django 集成位置 — 修改 utils/exceptions.pycustom_exception_handler:

def custom_exception_handler(exc, context):
    # 上报到 Log Center
    report_error(exc, {
        "view": str(context.get("view")),
        "request_path": context.get("request").path if context.get("request") else None,
    })
    # ... 原有逻辑不变 ...

FastAPI 集成位置 — 添加全局异常处理器:

from fastapi import Request
from fastapi.responses import JSONResponse

@app.exception_handler(Exception)
async def global_exception_handler(request: Request, exc: Exception):
    await report_error(exc, context={
        "url": str(request.url),
        "method": request.method,
    })
    return JSONResponse(status_code=500, content={"detail": "Internal Server Error"})

JavaScript / TypeScript (React / Vue)

const LOG_CENTER_URL = import.meta.env.VITE_LOG_CENTER_URL || 'http://localhost:8002';

export function reportError(error: Error, context?: Record<string, unknown>) {
  const stackLines = error.stack?.split('\n') || [];
  const match = stackLines[1]?.match(/at\s+.*\s+\((.+):(\d+):\d+\)/);

  const payload = {
    project_id: 'rtc_web',
    environment: import.meta.env.MODE,
    level: 'ERROR',
    error: {
      type: error.name,
      message: error.message,
      file_path: match?.[1] || 'unknown',
      line_number: parseInt(match?.[2] || '0'),
      stack_trace: stackLines,
    },
    context: {
      url: window.location.href,
      userAgent: navigator.userAgent,
      ...context,
    },
  };

  const blob = new Blob([JSON.stringify(payload)], { type: 'application/json' });
  if (navigator.sendBeacon) {
    navigator.sendBeacon(`${LOG_CENTER_URL}/api/v1/logs/report`, blob);
  } else {
    fetch(`${LOG_CENTER_URL}/api/v1/logs/report`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(payload),
      keepalive: true,
    }).catch(() => {});
  }
}

全局错误捕获 — 在 main.tsx / main.ts 入口文件中:

// JS 运行时异常
window.onerror = (_message, source, lineno, colno, error) => {
  if (error) reportError(error, { source, lineno, colno });
};

// 未处理的 Promise rejection
window.onunhandledrejection = (event: PromiseRejectionEvent) => {
  const error = event.reason instanceof Error
    ? event.reason
    : new Error(String(event.reason));
  reportError(error, { type: 'unhandledrejection' });
};

Axios 拦截器 — 在 api.ts / request.ts 中(仅上报 5xx 服务端错误):

api.interceptors.response.use(
  (response) => response,
  (error: AxiosError) => {
    if (error.response && error.response.status >= 500) {
      reportError(error, {
        api_url: error.config?.url,
        method: error.config?.method,
        status: error.response.status,
      });
    }
    return Promise.reject(error);
  },
);

Flutter (Dart)

import 'dart:convert';
import 'package:http/http.dart' as http;

const logCenterUrl = String.fromEnvironment(
  'LOG_CENTER_URL',
  defaultValue: 'http://localhost:8002',
);

Future<void> reportError(dynamic error, StackTrace stackTrace, {Map<String, dynamic>? context}) async {
  final stackLines = stackTrace.toString().split('\n');
  final match = RegExp(r'#0\s+.*\((.+):(\d+):\d+\)').firstMatch(stackLines.first);

  final payload = {
    'project_id': 'airhub_app',
    'environment': const String.fromEnvironment('ENVIRONMENT', defaultValue: 'development'),
    'level': 'ERROR',
    'error': {
      'type': error.runtimeType.toString(),
      'message': error.toString(),
      'file_path': match?.group(1) ?? 'unknown',
      'line_number': int.tryParse(match?.group(2) ?? '0') ?? 0,
      'stack_trace': stackLines.take(20).toList(),
    },
    'context': context ?? {},
  };

  try {
    await http.post(
      Uri.parse('$logCenterUrl/api/v1/logs/report'),
      headers: {'Content-Type': 'application/json'},
      body: jsonEncode(payload),
    ).timeout(const Duration(seconds: 3));
  } catch (_) {
    // 静默失败
  }
}

全局捕获 — 在 main.dart 中:

void main() {
  FlutterError.onError = (details) {
    reportError(details.exception, details.stack ?? StackTrace.current);
  };

  runZonedGuarded(() {
    runApp(const MyApp());
  }, (error, stack) {
    reportError(error, stack);
  });
}

步骤 3接入 CI/CD 错误上报

source: "cicd"

在 Gitea Actions 流水线中添加失败上报,构建/测试/部署失败时自动捕获实际错误日志并上报到 Log Center。

核心要点

  1. tee 捕获日志 — 构建和部署步骤的输出必须通过 2>&1 | tee /tmp/xxx.log 捕获,否则上报的 stack_trace 为空
  2. github.run_number — URL 中必须使用 ${{ github.run_number }}(仓库维度序号),不要用 github.run_id全局ID会导致跳转到错误页面
  3. ${{ }} 模板语法 — 比 $GITHUB_* 环境变量更可靠
  4. 单一综合上报步骤 — 一个 if: failure() 步骤自动判断哪个阶段失败,收集对应日志

上报格式

{
  "project_id": "rtc_backend",
  "environment": "main",
  "level": "ERROR",
  "source": "cicd",
  "commit_hash": "abc1234def5678",
  "repo_url": "https://gitea.airlabs.art/zyc/rtc_backend.git",
  "error": {
    "type": "CICDFailure",
    "message": "[build] Build and Deploy failed on branch main",
    "stack_trace": ["...实际构建日志最后 50 行..."]
  },
  "context": {
    "job_name": "build-and-deploy",
    "step_name": "build",
    "workflow": "Build and Deploy",
    "run_id": "24",
    "branch": "main",
    "actor": "zyc",
    "commit": "abc1234def5678",
    "run_url": "https://gitea.airlabs.art/zyc/rtc_backend/actions/runs/24"
  }
}

CI/CD 特有字段

字段 说明
source 必须设为 "cicd"
environment 用分支名 ${{ github.ref_name }},如 maindev
repo_url 仓库地址,便于 Repair Agent 关联
error.type 推荐 CICDFailure(通用)或 DockerBuildError / TestFailure / DeployError
error.stack_trace 实际错误日志(通过 tee 捕获),不要写死占位文字
context.run_id 必须用 ${{ github.run_number }}(不是 github.run_id
context.run_url 拼接方式:https://gitea.airlabs.art/${{ github.repository }}/actions/runs/${{ github.run_number }}
context.step_name 失败的步骤名称
context.actor 触发者
context.commit 完整 commit hash

Gitea Actions 集成方式(推荐)

以下是完整示例,关键点:构建步骤用 tee 捕获日志,末尾一个综合上报步骤自动判断失败阶段。

name: Build and Deploy

on:
  push:
    branches: [main]

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      # ===== 构建步骤:用 tee 捕获日志 =====

      - name: Build Docker Image
        id: build
        run: |
          set -o pipefail
          docker buildx build \
            --push \
            --provenance=false \
            --tag your-registry/your-app:latest \
            . 2>&1 | tee /tmp/build.log

      - name: Deploy
        id: deploy
        run: |
          set -o pipefail
          {
            kubectl apply -f k8s/deployment.yaml
            kubectl rollout restart deployment/your-app
          } 2>&1 | tee /tmp/deploy.log

      # ===== 失败上报(单一综合步骤) =====

      - name: Report failure to Log Center
        if: failure()
        run: |
          # 判断哪个步骤失败,收集对应日志
          BUILD_LOG=""
          DEPLOY_LOG=""
          FAILED_STEP="unknown"

          if [[ "${{ steps.build.outcome }}" == "failure" ]]; then
            FAILED_STEP="build"
            if [ -f /tmp/build.log ]; then
              BUILD_LOG=$(tail -50 /tmp/build.log | sed 's/"/\\"/g' | sed ':a;N;$!ba;s/\n/\\n/g')
            fi
          elif [[ "${{ steps.deploy.outcome }}" == "failure" ]]; then
            FAILED_STEP="deploy"
            if [ -f /tmp/deploy.log ]; then
              DEPLOY_LOG=$(tail -50 /tmp/deploy.log | sed 's/"/\\"/g' | sed ':a;N;$!ba;s/\n/\\n/g')
            fi
          fi

          ERROR_LOG="${BUILD_LOG}${DEPLOY_LOG}"
          if [ -z "$ERROR_LOG" ]; then
            ERROR_LOG="No captured output. Check Gitea Actions UI for details."
          fi

          # 判断 source
          if [[ "$FAILED_STEP" == "deploy" ]]; then
            SOURCE="deployment"
            ERROR_TYPE="DeployError"
          else
            SOURCE="cicd"
            ERROR_TYPE="DockerBuildError"
          fi

          curl -s -X POST "https://qiyuan-log-center-api.airlabs.art/api/v1/logs/report" \
            -H "Content-Type: application/json" \
            -d "{
              \"project_id\": \"your_project_id\",
              \"environment\": \"${{ github.ref_name }}\",
              \"level\": \"ERROR\",
              \"source\": \"${SOURCE}\",
              \"commit_hash\": \"${{ github.sha }}\",
              \"repo_url\": \"https://gitea.airlabs.art/zyc/your_project.git\",
              \"error\": {
                \"type\": \"${ERROR_TYPE}\",
                \"message\": \"[${FAILED_STEP}] Build and Deploy failed on branch ${{ github.ref_name }}\",
                \"stack_trace\": [\"${ERROR_LOG}\"]
              },
              \"context\": {
                \"job_name\": \"build-and-deploy\",
                \"step_name\": \"${FAILED_STEP}\",
                \"workflow\": \"${{ github.workflow }}\",
                \"run_id\": \"${{ github.run_number }}\",
                \"branch\": \"${{ github.ref_name }}\",
                \"actor\": \"${{ github.actor }}\",
                \"commit\": \"${{ github.sha }}\",
                \"run_url\": \"https://gitea.airlabs.art/${{ github.repository }}/actions/runs/${{ github.run_number }}\"
              }
            }" || true

使用 report-cicd-error.sh 脚本

项目提供了通用上报脚本 scripts/report-cicd-error.sh(需要 jq),可在 CI 步骤中使用:

# 用法: ./scripts/report-cicd-error.sh <project_id> <step_name> <error_message_or_file>
./scripts/report-cicd-error.sh rtc_backend "Build Docker Image" "Docker build failed: exit code 1"
./scripts/report-cicd-error.sh rtc_backend "Run Tests" /tmp/test-output.log

脚本会自动:

  • 根据步骤名推断 error_typeDockerBuildError / NpmBuildError / TestFailure / LintError
  • 读取 Gitea Actions 环境变量填充 context
  • 如果传入文件路径,读取最后 100 行作为 stack_trace

步骤 4接入 K8s 部署错误上报

source: "deployment"

通过 K8s Pod 健康监控 CronJob定时扫描集群中异常 Pod 并上报到 Log Center。

上报格式

{
  "project_id": "rtc_backend",
  "environment": "production",
  "level": "CRITICAL",
  "source": "deployment",
  "error": {
    "type": "CrashLoopBackOff",
    "message": "CrashLoopBackOff: back-off restarting failed container (pod: rtc-backend-xxx, container: api)",
    "file_path": null,
    "line_number": null,
    "stack_trace": ["...容器崩溃前的日志(最后 50 行)..."]
  },
  "context": {
    "namespace": "default",
    "pod_name": "rtc-backend-xxx-yyy",
    "container_name": "api",
    "deployment_name": "rtc-backend",
    "restart_count": 5,
    "node_name": "node-1"
  }
}

Deployment 特有字段

字段 说明
source 必须设为 "deployment"
level 建议 "CRITICAL"Pod 异常通常较严重
error.type 取自 K8s 状态:CrashLoopBackOff, OOMKilled, ImagePullBackOff, ErrImagePull
error.file_path 可为 null
error.line_number 可为 null
error.stack_trace 容器崩溃前的日志输出
context.namespace K8s 命名空间
context.pod_name Pod 名称
context.deployment_name Deployment 名称(用于指纹去重)
context.restart_count 重启次数
context.node_name 节点名

监控的异常状态

状态 说明
CrashLoopBackOff 容器反复崩溃重启
OOMKilled 内存溢出被杀
ImagePullBackOff / ErrImagePull 拉取镜像失败
CreateContainerConfigError 容器配置错误
RunContainerError 容器启动失败

接入方式:添加 Pod label 映射

K8s Monitor CronJob 已在集群中运行,每 5 分钟扫描一次。新项目接入只需在 k8s-monitor/monitor.pyAPP_TO_PROJECT 字典中添加映射:

# k8s-monitor/monitor.py
APP_TO_PROJECT = {
    "rtc-backend": "rtc_backend",       # Pod 的 app label -> project_id
    "rtc-backend-dev": "rtc_backend",
    "rtc-web": "rtc_web",
    "rtc-web-dev": "rtc_web",
    "log-center-api": "log_center_api",
    "log-center-web": "log_center_web",
    # 新项目在此添加映射
    "your-app": "your_project_id",
}

确保你的 K8s Deployment 有 app label

metadata:
  labels:
    app: your-app  # 与 APP_TO_PROJECT 中的 key 一致

CronJob 部署配置

如果集群中尚未部署 Monitor使用以下配置

# k8s/monitor-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: pod-health-monitor
spec:
  schedule: "*/5 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: pod-monitor
          containers:
            - name: monitor
              image: your-registry/k8s-pod-monitor:latest
              env:
                - name: LOG_CENTER_URL
                  value: "https://qiyuan-log-center-api.airlabs.art"
                - name: MONITOR_NAMESPACE
                  value: "default"
          restartPolicy: OnFailure

错误去重机制

Log Center 使用 指纹(fingerprint) 对错误进行去重,三类来源使用不同的指纹策略:

来源 指纹组成
runtime MD5(project_id | error_type | file_path | line_number)
cicd MD5(project_id | cicd | error_type | job_name | step_name)
deployment MD5(project_id | deployment | error_type | namespace | deployment_name)

相同指纹的错误只记录一次。已修复的错误再次出现会自动重新打开(回归检测)。


错误状态流转

NEW → VERIFYING → PENDING_FIX → FIXING → FIXED → VERIFIED → DEPLOYED
         ↓                         ↓
   CANNOT_REPRODUCE           FIX_FAILED
状态 说明
NEW 新上报的错误
VERIFYING 正在验证复现
CANNOT_REPRODUCE 无法复现
PENDING_FIX 等待修复
FIXING AI Agent 正在修复中
FIXED 已修复,待验证
VERIFIED 已验证修复
DEPLOYED 已部署上线
FIX_FAILED 修复失败

API 参考

上报错误日志

POST /api/v1/logs/report

响应:

// 新错误
{"message": "Log reported", "id": 123}

// 重复错误(去重)
{"message": "Log deduplicated", "id": 123, "status": "NEW"}

// 回归(已修复的错误再次出现)
{"message": "Regression detected, reopened", "id": 123}

项目管理 API

方法 路径 说明
GET /api/v1/projects 获取项目列表
GET /api/v1/projects/{project_id} 获取项目详情
PUT /api/v1/projects/{project_id} 编辑项目配置

最佳实践

  1. 设置超时: 上报请求设置 3 秒超时,避免影响主业务
  2. 静默失败: 上报失败不应影响用户体验,所有 catch 块静默处理
  3. 异步上报: 使用异步方式上报,不阻塞主流程
  4. 添加上下文: 尽量添加有用的上下文信息用户ID、请求URL等
  5. 环境区分: 正确设置 environment 字段区分开发/生产
  6. CI/CD 用 || true: 上报步骤失败不应阻断流水线

环境变量配置

Python 项目

# .env
LOG_CENTER_URL=http://localhost:8002
ENVIRONMENT=development

JavaScript 项目

# .env
VITE_LOG_CENTER_URL=http://localhost:8002

Flutter 项目

# 编译时传入
flutter run --dart-define=LOG_CENTER_URL=http://localhost:8002
flutter run --dart-define=ENVIRONMENT=development

Gitea Actions

env:
  LOG_CENTER_URL: https://qiyuan-log-center-api.airlabs.art

完整 API 文档

访问: http://localhost:8002/docs