Phase 01 JPEG Dump 诊断 + YVYU 修正 + 矛盾分析汇总

核心变更： - face_tracker.cc: YUYV→YVYU 序列修正（byte[1]=V, byte[3]=U），基于 JPEG Dump 诊断工具验证 OV3660 FORMAT_CTRL00=0x61 实际是 YVYU - face_tracker.cc: 启动时 base64 打印一帧 JPEG 到串口，用于肉眼验证 - config.h: XCLK 20MHz→10MHz，给飞线信号完整性 2x 裕度 - scripts/auto_capture_jpeg.py: 自动串口抓帧工具（DTR/RTS 复位 + base64 解码） - scripts/extract_jpeg_from_log.py: 从日志离线提取 JPEG - Coglet项目分析与开发指南.md: 新增"六点六"章节，汇总 Phase 01 主要矛盾（画面可辨识≠模型可识别）、YUV→RGB 色偏三层原因、 esp-dl 模型输入分布敏感性、延迟分析、三方案对比、方案 B 突破口 - docs/: 新增 2 篇 OV3660 相关 CSDN 参考资料 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 11:01:02 +08:00 · 2026-04-22 11:01:02 +08:00 · f1c2bfce93
commit f1c2bfce93
parent fc07d3806d
8 changed files with 6384 additions and 6 deletions
--- a/Coglet项目分析与开发指南.md
+++ b/Coglet项目分析与开发指南.md
@ -422,7 +422,8 @@ ESP32 通过 UART 发送状态字符串给 RP2040：

 ### 4.9 舵机选型说明（重要）

-> **实测踩坑记录**：使用 MG90S 360° 连续旋转版舵机后，耳朵舵机转到目标角度后无法停止，持续堵转导致齿轮发出刺耳声音、舵机严重发烫，有烧毁风险。更换为 180° 标准舵机后问题解决。
+> **实测踩坑记录**：使用 MG90S 360° 连续旋
+转版舵机后，耳朵舵机转到目标角度后无法停止，持续堵转导致齿轮发出刺耳声音、舵机严重发烫，有烧毁风险。更换为 180° 标准舵机后问题解决。

 #### 必须使用 180° 标准舵机的原因

@ -669,6 +670,145 @@ MicroPython 固件刷入方式与摄像头版本相同（参见 4.6），但 **R

 ---

+## 六点六、Phase 01 核心矛盾与解决方案分析（2026-04-21 ~ 04-22）
+
+> 基于 JPEG Dump 诊断工具的大量实验（9 次迭代尝试），本节汇总当前主要矛盾、根本原因和方案选择。
+
+### 6.6.1 主要矛盾：画面可辨识 ≠ 模型可识别
+
+**诊断工具**：在 face_tracker.cc 里加 JPEG Dump 代码，每次启动 base64 打印一帧 JPEG，Mac 端 Python 脚本抓取保存为 `.jpg` 文件肉眼验证。
+
+**验证结果**：
+
+| 观察 | 事实 |
+|------|------|
+| 手动撕掉镜头保护膜后，JPEG 画面可**清晰看到戴眼镜的人脸、手部、背景** | ✅ 摄像头硬件 + 飞线 + DVP 通路完全正常 |
+| 画面**整体偏紫绿**（RGB565 模式）或**偏绿**（YUV422 模式） | 🟡 软件层 YUV→RGB 色彩矩阵偏差，不是硬件问题 |
+| 同样的摄像头输入，esp-dl `HumanFaceDetect` **无论什么 pix_type 都输出固定 box** | ❌ 深层集成问题 |
+
+**核心矛盾**：
+
+> 人眼能辨识的画面（因为有上下文知识"绿色的这个 = 人脸"），轻量级 CNN 模型无法识别（只看像素数值分布）。esp-dl 官方模型用**正常色彩的标准人脸数据集**训练，我们的偏色画面在训练集里找不到对应模式 → 模型 fallback 到默认 anchor → box 恒定。
+
+### 6.6.2 YUV→RGB 色偏的三层原因
+
+#### 第 1 层：**YUV→RGB 色彩矩阵公式不完全匹配 BT.601**
+
+- OV3660 输出 YUV **限幅范围**：Y ∈ [16, 235], U/V ∈ [16, 240]（中值 128）
+- 手写转换公式假定 Y ∈ [0, 255] **全范围**（JFIF 标准）：
+  ```cpp
+  int r = Y + 1.402 * (V - 128);  // 错：没有黑电平偏移，整体偏暗
+  ```
+- 正确应为（BT.601）：
+  ```cpp
+  int y_scaled = 1.164 * (Y - 16);  // 减黑电平、放大到全范围
+  int r = y_scaled + 1.596 * (V - 128);
+  ```
+
+#### 第 2 层：**OV3660 AWB（自动白平衡）未启用或响应慢**
+
+- 默认寄存器序列中，AWB 可能关闭或慢响应
+- 导致 U/V 有**全局偏移**：画面整体偏绿/紫
+- Grove Vision AI V2 内置 ISP 硬件自动白平衡，**我们读原始 YUV buffer 没有**
+
+#### 第 3 层：**OV3660 FORMAT_CTRL00 = 0x61 的实际含义**
+
+- `bit[7:4] = 0x6` = RGB565
+- `bit[3:0] = 0x1` = byte-swap 序列
+- 在 Kconfig RGB565 模式下，sensor 实际输出可能是 **YVYU sequence**（Y-V-Y-U）而非标准 YUYV，导致 U/V 解读时互换
+- 修正方向：在 Kconfig 改用 YUV422 模式（FORMAT_CTRL00=0x30，标准 YUYV）
+
+### 6.6.3 esp-dl 模型输入分布敏感
+
+即使色彩完全校正正确，轻量级模型（MSR_S8_V1 仅 60KB、ESPDET_PICO_224_224_FACE 约 500KB）对 RGB 分布偏差**极其敏感**。具体要求：
+
+| 要求 | 解释 |
+|------|------|
+| RGB 通道均值接近训练集 | ImageNet 类数据集 RGB 均值约 (124, 116, 104) |
+| 归一化范围精确 | ESPDET 用 `(pixel-0)/255`，要求 pixel ∈ [0, 255] 全范围 |
+| 无严重色偏 | 偏绿会让模型前几层卷积产生"异常激活"，后续全部 fallback |
+| 无边缘伪影 | letterbox 填充不能和画面内容对比度过强 |
+
+> **为什么 Grove Vision AI V2 一定能行**：Grove 用 Himax WiseEye HX6538 专用 AI 视觉处理器，内置 ISP + 针对自己 sensor 训练的**专用人脸检测模型**，从硬件到模型端到端自闭环。esp-dl 是通用框架，需要用户自己保证数据质量。
+
+### 6.6.4 sensor 硬件 JPEG 模式的局限
+
+OV3660 支持硬件 JPEG 编码（`CAMERA_OV3660_DVP_JPEG_1280X720_12FPS`），但实测失败：
+
+- `Esp32Camera::Capture()` 默认不协商 JPEG pix_fmt，报 `no supported pixel format found`
+- 启用 `CONFIG_XIAOZHI_CAMERA_ALLOW_JPEG_INPUT=y` 后能协商，但 `bytesused=0` —— DMA 没采到 JPEG 帧
+- 推测 sensor 硬件 JPEG 需要特殊的 DVP 帧同步处理，xiaozhi 的 V4L2 mmap 路径不兼容
+
+结论：硬件 JPEG 路径此项目未打通，**需走软件 JPEG 编解码**。
+
+### 6.6.5 延迟分析：JPEG 中转路径
+
+走 `xiaozhi Capture() → JPEG → esp-dl sw_decode_jpeg → RGB888 → HumanFaceDetect` 路径的延迟估算：
+
+| 阶段 | 耗时 | 说明 |
+|------|------|------|
+| 摄像头采集一帧 | ~40ms | 24 FPS 间隔 |
+| xiaozhi Capture() 软件 JPEG 编码 | 50-80ms | 240×240 YUV→RGB→JPEG |
+| esp-dl sw_decode_jpeg 解码 | 30-50ms | JPEG → RGB888 |
+| HumanFaceDetect 模型推理 | 150ms | ESPDET_PICO_224 |
+| UART 发送坐标 | 1ms | 240 bytes @ 115200 |
+| **ESP32 端总延迟** | **~270ms** | |
+| RP2040 UART RX + parse | 2ms | |
+| 舵机 PWM + 物理转动 | 20-80ms | 机械响应时间 |
+| **端到端总延迟（脸动→眼球动）** | **~300-350ms** | |
+
+**对比**：
+- **人眼感知"流畅跟随"阈值**：< 500ms
+- **Grove Vision AI V2**：~100-150ms（专用硬件）
+- **JPEG 中转方案**：~300ms ✅ 可接受
+- **人眨眼速度**：~400ms
+
+### 6.6.6 三个路径选择
+
+| 方案 | 预计工时 | 成功率 | 备注 |
+|------|---------|-------|------|
+| **A. 继续深挖 esp-dl（改色彩矩阵、启 AWB、fork 预处理）** | 8-10 小时 | ⭐⭐（20-30%）| 涉及 ov3660 寄存器调优 + 模型内部调试 |
+| **B. JPEG 中转路径（走 xiaozhi 完整 Capture + esp-dl sw_decode_jpeg）** | 2-3 小时 | ⭐⭐⭐⭐（70-80%）| **推荐**。`take_photo` 已证明 Capture() 色彩正常 |
+| **C. 退回 Grove Vision AI V2（项目原设计）** | 2 小时 + ¥200 | ⭐⭐⭐⭐⭐（100%）| 官方 turnkey 方案，稳妥 |
+
+### 6.6.7 方案 B 的关键突破口
+
+**关键发现**：xiaozhi 的 `self.camera.take_photo` MCP 功能拍的照片**云端 AI 能清晰识别**，说明 `Capture()` 函数内部有正确的色彩处理（白平衡、色彩矩阵、JPEG 标准编码）。
+
+**未尝试的真正路径**：
+```
+Esp32Camera::Capture()
+    ↓ 内部完整 pipeline（色彩正常的 JPEG）
+JPEG buffer
+    ↓ esp-dl sw_decode_jpeg（esp-dl 官方 example 路径）
+标准 RGB888 画面（色彩 100% 匹配训练集）
+    ↓
+HumanFaceDetect → 真正的 box
+```
+
+**之前的失败路径**：我一直绕过 `Capture()` 用 `CaptureForDetection()` 直接拿 V4L2 mmap 的原始 YUV buffer，缺少 xiaozhi 的色彩校正。
+
+### 6.6.8 验证方案 B 可行性的最简方法
+
+**不用改代码**：
+
+1. 烧录当前 YUV422 模式的固件
+2. 通过小智对话说"**帮我拍张照看看**"或"**你看见什么了**"
+3. AI 云端返回画面描述
+
+- 如果 AI 说能**清晰看到人脸/房间物体** → 证明 `Capture()` 色彩正常 → **方案 B 可行性 80%+**
+- 如果 AI 说看不清或描述错乱 → `Capture()` 也有色偏 → 需考虑方案 C
+
+### 6.6.9 当前代码状态快照（2026-04-22）
+
+- `main/face_tracker.cc`：手动 YUYV→RGB888 转换 + pix_type=RGB888（失败路径，保留代码）
+- `main/face_tracker.cc`：JPEG Dump 诊断代码（每次启动拍一张 YUYV JPEG）
+- `sdkconfig`：`CAMERA_OV3660_DVP_YUV422_240X240_24FPS=y`（画面偏绿但结构清晰）
+- 固件编译通过，烧录正常，face_tracker 启动正常
+- 症状：box 恒定 `[233, 158, 94, 239]`，眼球卡在极限位置不跟随
+
+---
+
 ## 七、参考资源

 | 资源 | 地址 |
--- a/docs/ESP32-S3-CAM：接ov3660摄像头-CSDN博客.html
+++ b/docs/ESP32-S3-CAM：接ov3660摄像头-CSDN博客.html
--- a/适配版）_esp32+ov3660-CSDN博客.html
+++ b/适配版）_esp32+ov3660-CSDN博客.html
--- a/main/boards/bread-compact-wifi-s3cam/config.h
+++ b/main/boards/bread-compact-wifi-s3cam/config.h
@ -51,7 +51,10 @@
 #define CAMERA_PIN_SIOD GPIO_NUM_48 // checked for CogNog V1.0 - original NUM_4
 #define CAMERA_PIN_PWDN GPIO_NUM_NC
 #define CAMERA_PIN_RESET GPIO_NUM_NC
-#define XCLK_FREQ_HZ 20000000
+// [2026-04-21 方案 B] 原 20MHz 在飞线路径上产生 DVP 数据线位错位
+// （画面彩色马赛克撕裂）。降到 10MHz 给飞线信号完整性 2x 裕度
+// 代价：sensor 帧率从 24fps 减半到 ~12fps（足够人脸追踪）
+#define XCLK_FREQ_HZ 10000000

 #define DISPLAY_BACKLIGHT_PIN GPIO_NUM_NC // checked
 #define DISPLAY_MOSI_PIN      GPIO_NUM_NC // checked - original NUM_20
--- a/main/face_tracker.cc
+++ b/main/face_tracker.cc
@ -12,6 +12,8 @@
 #include "dl_detect_define.hpp"
 #include "board.h"
 #include "esp32_camera.h"
+#include "display/lvgl_display/jpg/image_to_jpeg.h"
+#include <linux/videodev2.h>

 #include <esp_heap_caps.h>
 #include <esp_log.h>
@ -20,6 +22,7 @@
 #include <freertos/task.h>
 #include <list>
 #include <new>
+#include <cstring>

 static const char* TAG = "FaceTracker";
 static TaskHandle_t s_handle = nullptr;
@ -31,14 +34,17 @@ static float s_last_fps = 0.0f;
 // T07 完成后该弱符号被真实实现覆盖，无需改动本文件
 extern "C" __attribute__((weak)) void uart_send_face(int x_offset, int y_offset);

-// YUYV → RGB888 手动转换（每 4 字节 YUYV 生成 2 像素 6 字节 RGB）
-// 公式（BT.601）：R = Y + 1.402*(V-128); G = Y - 0.344*(U-128) - 0.714*(V-128); B = Y + 1.772*(U-128)
+// YVYU → RGB888 手动转换（OV3660 FORMAT_CTRL00=0x61 实际输出 Y V Y U 序列）
+// 每 4 字节 YVYU 生成 2 像素 6 字节 RGB888
+// 公式（BT.601 JFIF）：R = Y + 1.402*(V-128); G = Y - 0.344*(U-128) - 0.714*(V-128); B = Y + 1.772*(U-128)
+// [2026-04-21 修正] 之前按 YUYV (Y U Y V) 读取导致色彩偏绿紫，JPEG dump 测试证实
+//   sensor 实际是 YVYU sequence，byte[1]=V, byte[3]=U（顺序反了）
 static inline void yuyv_to_rgb888_line(const uint8_t* yuyv, uint8_t* rgb, int pixels) {
    for (int i = 0; i < pixels; i += 2) {
        int y1 = yuyv[0];
-        int u  = yuyv[1] - 128;
+        int v  = yuyv[1] - 128;  // 修正：byte[1] = V（原本误当 U）
        int y2 = yuyv[2];
-        int v  = yuyv[3] - 128;
+        int u  = yuyv[3] - 128;  // 修正：byte[3] = U（原本误当 V）
        yuyv += 4;
        // 像素 1
        int r1 = y1 + (359 * v) / 256;
@ -100,6 +106,43 @@ static void face_tracker_task(void* arg) {
                 (unsigned)info.total_allocated_bytes);
    }

+    // [2026-04-21 诊断结论] 多格式 JPEG dump 测试确认：sensor 实际输出 YUYV packed 格式
+    //   - frame_YUYV.jpg 画面清晰（能看到戴眼镜人脸 + 背景），只是色彩偏绿紫
+    //   - frame_RGB565.jpg / UYVY / YUV422P 全是彩色马赛克
+    //   - 色偏原因：FORMAT_CTRL00=0x61 的 bit[3:0]=1 在 YUV 模式下是 YVYU sequence
+    //     （实际字节序 Y V Y U，不是标准 YUYV 的 Y U Y V）
+    //     → yuyv_to_rgb888_line 要按 YVYU 读取：byte[1]=V, byte[3]=U
+    // 保留 JPEG dump 用于拍照验证（先确认摄像头正常再跑人脸识别）
+    // [2026-04-22] sensor 切到硬件 JPEG 模式（CONFIG_CAMERA_OV3660_DVP_JPEG_1280X720_12FPS）
+    // sensor 内部已做完 YUV→RGB→JPEG 全流程色彩处理，输出标准 JPEG 字节流
+    // 我们不再需要 image_to_jpeg 二次编码，直接把 f.data 透传即可
+    {
+        vTaskDelay(pdMS_TO_TICKS(2000));  // JPEG 模式分辨率 1280x720，sensor 需要更长曝光稳定时间
+        auto* cam = dynamic_cast<Esp32Camera*>(Board::GetInstance().GetCamera());
+        Esp32Camera::FrameRef f;
+        if (cam && cam->CaptureForDetection(&f) && f.data && f.len > 0) {
+            const uint8_t* jpg = (const uint8_t*)f.data;
+            size_t jpg_len = f.len;
+            ESP_LOGI(TAG, "===JPEG_DUMP_BEGIN fmt=SENSOR_JPEG size=%u w=%u h=%u===",
+                     (unsigned)jpg_len, f.width, f.height);
+            static const char b64[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
+            char line[128]; size_t lp = 0;
+            for (size_t i = 0; i < jpg_len; i += 3) {
+                uint32_t v = ((uint32_t)jpg[i] << 16);
+                if (i + 1 < jpg_len) v |= ((uint32_t)jpg[i+1] << 8);
+                if (i + 2 < jpg_len) v |= jpg[i+2];
+                line[lp++] = b64[(v >> 18) & 0x3F];
+                line[lp++] = b64[(v >> 12) & 0x3F];
+                line[lp++] = (i + 1 < jpg_len) ? b64[(v >> 6) & 0x3F] : '=';
+                line[lp++] = (i + 2 < jpg_len) ? b64[v & 0x3F] : '=';
+                if (lp >= 72) { line[lp] = 0; printf("%s\n", line); lp = 0; }
+            }
+            if (lp > 0) { line[lp] = 0; printf("%s\n", line); }
+            ESP_LOGI(TAG, "===JPEG_DUMP_END===");
+            cam->ReleaseDetectionFrame(f);
+        }
+    }
+
    // 按 Kconfig 配置的 FPS 计算节拍
    const TickType_t period = pdMS_TO_TICKS(1000 / CONFIG_XIAOZHI_FACE_TRACKING_FPS);
    TickType_t last_wake = xTaskGetTickCount();
--- a/scripts/auto_capture_jpeg.py
+++ b/scripts/auto_capture_jpeg.py
@ -0,0 +1,107 @@
+#!/usr/bin/env python3
+# auto_capture_jpeg.py
+# 自动连接 ESP32 串口、触发复位、等待多个 JPEG dump、保存并打开所有图片
+# 支持新格式：===JPEG_DUMP_BEGIN fmt=<NAME> size=<N>===
+
+import serial
+import base64
+import re
+import sys
+import time
+import subprocess
+from pathlib import Path
+
+PORT = "/dev/cu.usbmodem834401"
+BAUD = 115200
+OUT_DIR = Path("/Users/rdzleo/Desktop/CogletESP-camera-version/scripts")
+TIMEOUT_SEC = 90
+MAX_FRAMES = 4
+
+BEGIN_RE = re.compile(r"===JPEG_DUMP_BEGIN\s+(?:fmt=(\S+)\s+)?size=(\d+)===")
+END_RE = re.compile(r"===JPEG_DUMP_END===")
+B64_RE = re.compile(r"^[A-Za-z0-9+/=]+$")
+
+
+def main():
+    print(f"[·] 打开串口 {PORT} @ {BAUD}")
+    ser = serial.Serial(PORT, BAUD, timeout=1)
+
+    print("[·] 复位 ESP32 …")
+    ser.dtr = False
+    ser.rts = True
+    time.sleep(0.1)
+    ser.rts = False
+    time.sleep(0.1)
+    ser.reset_input_buffer()
+
+    print(f"[·] 等待 JPEG_DUMP 标记（最多 {TIMEOUT_SEC}s，期望 {MAX_FRAMES} 张）…")
+    start = time.time()
+    in_dump = False
+    expected_size = 0
+    current_fmt = None
+    b64_buf = []
+    saved_files = []
+
+    while time.time() - start < TIMEOUT_SEC and len(saved_files) < MAX_FRAMES:
+        raw = ser.readline()
+        if not raw:
+            continue
+        try:
+            line = raw.decode("utf-8", errors="replace").rstrip("\r\n")
+        except Exception:
+            continue
+
+        if any(k in line for k in ["FaceTracker", "Camera", "panic", "Guru", "ov3660", "Compile time"]):
+            print(f"   {line}")
+
+        if not in_dump:
+            m = BEGIN_RE.search(line)
+            if m:
+                in_dump = True
+                current_fmt = m.group(1) or "unknown"
+                expected_size = int(m.group(2))
+                b64_buf = []
+                print(f"[+] JPEG_BEGIN fmt={current_fmt} size={expected_size}")
+            continue
+
+        if END_RE.search(line):
+            in_dump = False
+            b64_str = "".join(b64_buf)
+            try:
+                data = base64.b64decode(b64_str)
+            except Exception as e:
+                print(f"[!] base64 decode failed: {e}")
+                continue
+
+            if len(data) != expected_size:
+                print(f"[!] 字节数差异 got={len(data)} expected={expected_size}")
+
+            out_path = OUT_DIR / f"frame_{current_fmt}.jpg"
+            out_path.write_bytes(data)
+            print(f"[✓] 保存 {out_path.name} ({len(data)/1024:.1f} KB)")
+            saved_files.append(out_path)
+            continue
+
+        stripped = line.strip()
+        if B64_RE.match(stripped):
+            b64_buf.append(stripped)
+
+    ser.close()
+
+    if not saved_files:
+        print("[!] 没有抓到任何 JPEG 帧")
+        return 1
+
+    print(f"\n[✓] 共保存 {len(saved_files)} 张")
+    for p in saved_files:
+        print(f"    - {p}")
+    # 用 Finder 打开目录，用户可以并排对比
+    subprocess.run(["open", str(OUT_DIR)])
+    # 或者直接打开所有 JPEG
+    for p in saved_files:
+        subprocess.run(["open", str(p)])
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/scripts/extract_jpeg_from_log.py
+++ b/scripts/extract_jpeg_from_log.py
@ -0,0 +1,88 @@
+#!/usr/bin/env python3
+# extract_jpeg_from_log.py
+# 从 ESP32 串口日志中提取 base64 编码的 JPEG 图像并保存为 .jpg 文件
+#
+# 用法：
+#   1) 启动 monitor 并把日志重定向到文件：
+#        idf.py -p /dev/cu.usbmodem834401 monitor > /tmp/esp32.log
+#      或者直接从已有日志提取：
+#        python3 extract_jpeg_from_log.py /tmp/esp32.log
+#   2) 设备启动后会打印 "===JPEG_DUMP_BEGIN===" .... "===JPEG_DUMP_END==="
+#   3) 运行此脚本会在当前目录生成 frame_001.jpg 等文件
+#
+# 也支持从 stdin 实时读取：
+#        idf.py monitor | python3 extract_jpeg_from_log.py -
+
+import sys
+import re
+import base64
+from pathlib import Path
+
+BEGIN_RE = re.compile(r"===JPEG_DUMP_BEGIN\s+size=(\d+)===")
+END_RE = re.compile(r"===JPEG_DUMP_END===")
+# 匹配纯 base64 行（不含普通文本）
+B64_RE = re.compile(r"^[A-Za-z0-9+/=]+$")
+
+
+def extract(lines, out_dir=Path(".")):
+    frame_idx = 0
+    in_dump = False
+    b64_buf = []
+    expected_size = 0
+
+    for raw in lines:
+        line = raw.rstrip("\r\n")
+        # 去掉 ESP 日志时间戳前缀可能引入的干扰：只在 begin/end 标记附近处理
+        if not in_dump:
+            m = BEGIN_RE.search(line)
+            if m:
+                in_dump = True
+                expected_size = int(m.group(1))
+                b64_buf = []
+                print(f"[+] JPEG_BEGIN size={expected_size}")
+            continue
+
+        if END_RE.search(line):
+            in_dump = False
+            b64_str = "".join(b64_buf)
+            try:
+                data = base64.b64decode(b64_str)
+            except Exception as e:
+                print(f"[!] base64 decode failed: {e}")
+                continue
+            if len(data) != expected_size:
+                print(
+                    f"[!] size mismatch: got {len(data)} expected {expected_size} "
+                    f"(可能是 monitor 丢字节，仍尝试保存)"
+                )
+            frame_idx += 1
+            out_path = out_dir / f"frame_{frame_idx:03d}.jpg"
+            out_path.write_bytes(data)
+            print(f"[✓] saved {out_path} ({len(data)} bytes)")
+            # macOS: 自动打开
+            if sys.platform == "darwin":
+                import subprocess
+                subprocess.run(["open", str(out_path)])
+            continue
+
+        # 处于 dump 区间，只收集看起来是 base64 的行
+        stripped = line.strip()
+        if B64_RE.match(stripped):
+            b64_buf.append(stripped)
+        # 其他行（比如 "I (xxx) TAG:" 的日志）忽略
+
+
+def main():
+    if len(sys.argv) < 2:
+        print("Usage: extract_jpeg_from_log.py <logfile | ->")
+        sys.exit(1)
+    src = sys.argv[1]
+    if src == "-":
+        extract(sys.stdin)
+    else:
+        with open(src, "r", errors="replace") as f:
+            extract(f)
+
+
+if __name__ == "__main__":
+    main()
--- a/设备运行日志.txt
+++ b/设备运行日志.txt
@ -0,0 +1,149 @@
+rdzleo@RdzleodeMac-Studio CogletESP-camera-version % export IDF_PATH='/Users/rdzleo/esp/esp-idf/v5.4.2/esp-idf'
+rdzleo@RdzleodeMac-Studio CogletESP-camera-version % '/Users/rdzleo/.espressif/python_env/idf5.4_py3.13_env/bin/python3' '/Users/rdzleo/esp/esp-idf/v5.4.2/esp-idf/tools/idf_monitor.py' -p /dev/tty.usbmodem834401 -b 115200 --
+toolchain-prefix xtensa-esp32s3-elf- --make ''/Users/rdzleo/.espressif/python_env/idf5.4_py3.13_env/bin/python3' '/Users/rdzleo/esp/esp-idf/v5.4.2/esp-idf/tools/idf.py'' --target esp32s3 '/Users/rdzleo/Desktop/CogletESP-came
+ra-version/build/xiaozhi.elf'
+--- Warning: Serial ports accessed as /dev/tty.* will hang gdb if launched.
+--- Using /dev/cu.usbmodem834401 instead...
+--- esp-idf-monitor 1.8.0 on /dev/cu.usbmodem834401 115200
+--- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H
+ESP-ROM:esp32s3-20210327
+Build:Mar 27 2021
+rst:0x15 (USB_UART_CHIP_RESET),boot:0x8 (SPI_FAST_FLASH_BOOT)
+Saved PC:0x40384d8e
+--- 0x40384d8e: esp_cpu_wait_for_intr at /Users/rdzleo/esp/esp-idf/components/esp_hw_support/cpu.c:64
+SPIWP:0xee
+mode:DIO, clock div:1
+load:0x3fce2820,len:0x56c
+load:0x403c8700,len:0x4
+load:0x403c8704,len:0xc30
+load:0x403cb700,len:0x2e2c
+entry 0x403c890c
+I (37) octal_psram: vendor id    : 0x0d (AP)
+I (37) octal_psram: dev id       : 0x02 (generation 3)
+I (37) octal_psram: density      : 0x03 (64 Mbit)
+I (39) octal_psram: good-die     : 0x01 (Pass)
+I (43) octal_psram: Latency      : 0x01 (Fixed)
+I (47) octal_psram: VCC          : 0x01 (3V)
+I (51) octal_psram: SRF          : 0x01 (Fast Refresh)
+I (56) octal_psram: BurstType    : 0x01 (Hybrid Wrap)
+I (61) octal_psram: BurstLen     : 0x01 (32 Byte)
+I (65) octal_psram: Readlatency  : 0x02 (10 cycles@Fixed)
+I (71) octal_psram: DriveStrength: 0x00 (1/1)
+I (75) MSPI Timing: PSRAM timing tuning index: 4
+I (79) esp_psram: Found 8MB PSRAM device
+I (83) esp_psram: Speed: 80MHz
+I (86) cpu_start: Multicore app
+I (100) cpu_start: Pro cpu start user code
+I (100) cpu_start: cpu freq: 240000000 Hz
+I (100) app_init: Application information:
+I (100) app_init: Project name:     xiaozhi
+I (104) app_init: App version:      2.0.5
+I (108) app_init: Compile time:     Apr 20 2026 18:05:09
+I (113) app_init: ELF file SHA256:  cd6d6438e...
+I (117) app_init: ESP-IDF:          v5.4.2-390-g0f6b683441-dirty
+I (123) efuse_init: Min chip rev:     v0.0
+I (127) efuse_init: Max chip rev:     v0.99 
+I (131) efuse_init: Chip rev:         v0.2
+I (135) heap_init: Initializing. RAM available for dynamic allocation:
+I (141) heap_init: At 3FCAFCE8 len 00039A28 (230 KiB): RAM
+I (146) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
+I (151) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
+I (156) heap_init: At 600FE000 len 00001FD8 (7 KiB): RTCRAM
+I (162) esp_psram: Adding pool of 8192K of PSRAM memory to heap allocator
+I (169) spi_flash: detected chip: generic
+I (172) spi_flash: flash io: qio
+I (175) sleep_gpio: Configure to isolate all GPIO pins in sleep state
+I (181) sleep_gpio: Enable automatic switching of GPIO sleep configuration
+I (188) main_task: Started on CPU0
+I (198) esp_psram: Reserving pool of 64K of internal memory for DMA/internal allocations
+I (198) main_task: Calling app_main()
+I (198) uart: ESP_INTR_FLAG_IRAM flag not set while CONFIG_UART_ISR_IN_IRAM is enabled, flag updated
+I (238) Board: UUID=fcb5789b-4c1b-41b1-9271-4e4b23b27178 SKU=bread-compact-wifi-s3cam
+I (238) gpio: GPIO[0]| InputEn: 1| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
+I (238) button: IoT Button Version: 4.1.6
+I (238) gpio: GPIO[39]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
+I (248) gpio: GPIO[40]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
+I (258) gpio: GPIO[41]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
+I (268) gpio: GPIO[42]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
+I (298) ov3660: Detected Camera sensor PID=0x3660
+I (428) Esp32Camera: Camera init success
+E (428) CAM: camera ptr: 0x3fcca0b4
+I (428) Application: STATE: starting
+I (428) NoAudioCodec: Simplex channels created
+I (428) AudioCodec: Set input enable to true
+I (428) AudioCodec: Set output enable to true
+I (438) AudioCodec: Audio codec started
+I (438) pp: pp rom version: e7ae62f
+I (438) net80211: net80211 rom version: e7ae62f
+I (458) wifi:wifi driver task: 3fcdcaf4, prio:23, stack:6144, core=0
+I (458) wifi:wifi firmware version: 3263cda
+I (458) wifi:wifi certification version: v7.0
+I (458) wifi:config NVS flash: disabled
+I (468) wifi:config nano formatting: enabled
+I (468) wifi:Init data frame dynamic rx buffer num: 6
+I (468) wifi:Init dynamic rx mgmt buffer num: 5
+I (478) wifi:Init management short buffer num: 32
+I (478) wifi:Init dynamic tx buffer num: 32
+I (488) wifi:Init static tx FG buffer num: 2
+I (488) wifi:Init static rx buffer size: 1600
+I (498) wifi:Init static rx buffer num: 3
+I (498) wifi:Init dynamic rx buffer num: 6
+I (498) wifi_init: rx ba win: 3
+I (508) wifi_init: accept mbox: 6
+I (508) wifi_init: tcpip mbox: 16
+I (508) wifi_init: udp mbox: 6
+I (508) wifi_init: tcp mbox: 6
+I (518) wifi_init: tcp tx win: 5760
+I (518) wifi_init: tcp rx win: 5760
+I (518) wifi_init: tcp mss: 1440
+I (528) phy_init: phy_version 701,f4f1da3a,Mar  3 2025,15:50:10
+I (568) phy_init: Saving new calibration data due to checksum failure or outdated calibration data, mode(0)
+I (618) wifi:mode : sta (20:6e:f1:b9:9a:28)
+I (618) wifi:enable tsf
+I (3028) WifiStation: Found AP: airhub, BSSID: 70:2a:d7:85:bc:eb, RSSI: -35, Channel: 1, Authmode: 3
+W (3028) wifi:Password length matches WPA2 standards, authmode threshold changes from OPEN to WPA2
+I (3128) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1, snd_ch_cfg:0x0
+I (3128) wifi:state: init -> auth (0xb0)
+I (3128) wifi:state: auth -> assoc (0x0)
+I (3138) wifi:state: assoc -> run (0x10)
+I (3168) wifi:connected with airhub, aid = 1, channel 1, BW20, bssid = 70:2a:d7:85:bc:eb
+I (3168) wifi:security: WPA2-PSK, phy: bgn, rssi: -34
+I (3168) wifi:pm start, type: 1
+
+I (3178) wifi:dp: 1, bi: 102400, li: 3, scale listen interval from 307200 us to 307200 us
+I (3188) wifi:set rx beacon pti, rx_bcn_pti: 0, bcn_timeout: 25000, mt_pti: 0, mt_time: 10000
+I (3198) wifi:<ba-add>idx:0 (ifx:0, 70:2a:d7:85:bc:eb), tid:0, ssn:0, winSize:64
+I (3228) wifi:AP's beacon interval = 102400 us, DTIM period = 1
+I (5758) esp_netif_handlers: sta ip: 192.168.124.53, mask: 255.255.255.0, gw: 192.168.124.1
+I (5758) WifiStation: Got IP: 192.168.124.53
+I (5758) Assets: The storage free size is 20224 KB
+I (5758) Assets: The partition size is 6016 KB
+I (5828) Assets: The checksum calculation time is 67 ms
+create static modelsI (5828) MODEL_LOADER: Successfully load srmodels
+I (5838) Assets: Refreshing display theme...
+W (5838) Display: SetEmotion: microchip_ai
+I (5838) Application: STATE: activating
+W (5838) Display: SetStatus: 检查新版本...
+I (5848) Ota: Current version: 2.0.5
+I (6488) esp-x509-crt-bundle: Certificate validated
+I (6918) HttpClient: Established new connection to api.tenclass.net:443
+E (7238) Dynamic Impl: mbedtls_ssl_fetch_input error=29312
+I (7238) HttpClient: HTTP connection closed
+I (7238) Ota: Current is the latest version
+I (7238) Ota: Running partition: ota_0
+W (7248) Display: SetStatus: 登录服务器...
+I (7248) MCP: Add tool: self.get_device_status
+I (7248) MCP: Add tool: self.audio_speaker.set_volume
+I (7258) MCP: Add tool: self.camera.take_photo
+I (7258) MCP: Add tool: self.get_system_info [user]
+I (7268) MCP: Add tool: self.reboot [user]
+I (7268) MCP: Add tool: self.upgrade_firmware [user]
+I (7278) MCP: Add tool: self.assets.set_download_url [user]
+I (7278) MQTT: Connecting to endpoint mqtt.xiaozhi.me
+I (7388) esp-x509-crt-bundle: Certificate validated
+I (8048) MQTT: Connected to endpoint
+I (15438) AudioCodec: Set input enable to false
+I (15438) AudioCodec: Set output enable to false
+I (15438) SystemInfo: free sram: 155443 minimal sram: 152567
+I (25438) SystemInfo: free sram: 155407 minimal sram: 152567
+I (35438) SystemInfo: free sram: 155443 minimal sram: 152567