N16R8 模组无法跑硬件 ADC 回采 (32-bit STEREO codec + 火山 RTC + 80MHz PSRAM 三者 不可共存, 详见 commit fb4b607 探索教训)。改走软件 loopback ref 方案: codec 保持 baseline 1ch 16-bit (RTC 链路 100% 稳定), DAC 输出 PCM 软件复制一份作 AEC ref 信号, 用 esp_aec.h 底层同步 API (不启后台任务, 不抢 RTC 调度) 处理。 实测验证有效: - AI 说话: mic=187 ref=8929 clean=30 → 回声消除 84% - 用户说话: mic=456 ref=8 clean=456 → passthrough 100% 保留 - 服务端 ASR 正常识别用户语音, AI 正常响应 (📝 USER: + 📝 AI: 字幕完整) - 无 WiFi pm_coex panic, idle 倒计时稳定 主要变动: 1. main/CMakeLists.txt (4 行) - REQUIRES 加 esp-sr (引入 esp_aec.h 底层同步 API) 2. main/application.h (23 行) - aec_handle_ / aec_chunk_size_ / ref_ring_buf_ / ref_ring_capacity_ / ref_ring_write_idx_ / ref_ring_filled_ / aec_ref_delay_samples_ / ref_ring_mutex_ 成员 - InitAec / DeinitAec / AppendRefSamples / GetDelayedRef / ApplyAEC 函数声明 3. main/application.cc (242 行) - include esp_aec.h + esp_heap_caps.h - InitAec: lazy 初始化 (Application 构造时不调, ReadAudio 首次走 AEC 路径触发), 避免开机占内部 SRAM 影响 WiFi 启动; ref_ring_buf 优先 PSRAM 分配 200ms 容量 - DeinitAec: 析构时清理 aec_handle / ref_ring_buf / ref_ring_mutex - AppendRefSamples: DAC PCM 推入 ref ring buffer (mutex 互斥) - GetDelayedRef: 从 ref ring buffer 取延迟后 ref (mic 同步用) - ApplyAEC: 按 chunk_size 处理, 加 ref 静音检测 (RMS<50 时 passthrough), RMS 诊断日志每 2 秒打印一次 (mic/ref/clean) - OnAudioOutput 两个分支 (player_pipeline_write / codec->OutputData) 都加 AppendRefSamples hook, 复制 PCM 到 ref ring buffer - ReadAudio: recorder_pipeline 路径加 lazy InitAec + ApplyAEC, target_samples 取 max(caller_samples, chunk_size) 保持 baseline 20ms PCM 帧大小 - 析构调 DeinitAec 实施 4 大踩坑 (详见 ~/.claude/projects/.../memory/project_software_aec_implementation.md): a) portMUX (spinlock) 禁中断与 WiFi pm_coex 模块冲突 → IllegalInstruction panic 修复: 用 SemaphoreHandle_t (FreeRTOS mutex, 2ms 超时) 替代, 不禁中断 b) AI 静音后 AEC 滤波器维持 echo 模式错误压制用户语音 → ASR 不识别 修复: ApplyAEC 加 ref 静音检测, ref RMS<50 时 passthrough 不调 aec_process c) chunk_size (256, 16ms) ≠ caller_samples (320, 20ms) 让上行 PCM 帧大小变 → 服务端 ASR 不识别非标准帧 修复: target_samples = max(samples, aec_chunk_size_), 保持 baseline 20ms 帧 d) aec_create 占内部 SRAM (~30-50KB) 影响 WiFi RX buffer 分配 → panic 重启 修复: lazy init, ReadAudio 首次需要时才创建实例 资源占用 (实测): - Flash: +59 KB (esp-sr libaec.a) - Internal SRAM: +35-50 KB (aec_handle_t 工作 buffer) - PSRAM: +10-15 KB (ref_ring_buf 200ms + 临时 buffer) - Core 1 CPU: +6-12% (chunk=256, 每 16ms 一次 aec_process) - 整体评估: 适中, 不影响 RTC/WiFi 等其他功能 自言自语根因辨析 (重要认知更正): - 火山控制台 "AI 降噪 OFF" 是 NS 不是 AEC, 服务端 AEC 默认 ON 不显示在 UI - baseline 不自言自语 = 云端 AEC 在兜底 - 自言自语真因常是上行 PCM 数据异常 (如嘟嘟嘟阶段 channel_mask 错位) 触发服务端 VAD 误判, 不是 echo 太大 - 设备端软件 AEC 是减轻云端负载 + 极端场景兜底, 非必需但工程价值显著 调优指南: aec_ref_delay_samples_ 当前 800 (50ms), 根据 mic 离扬声器距离调 30-80ms, 监听 RMS 中 AI 说话期间 clean 最小为最优 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
XiaoZhi AI Chatbot
Introduction
👉 Build your AI chat companion with ESP32+SenseVoice+Qwen72B!【bilibili】
👉 Equipping XiaoZhi with DeepSeek's smart brain【bilibili】
👉 Build your own AI companion, a beginner's guide【bilibili】
Project Purpose
This is an open-source project released under the MIT license, allowing anyone to use it freely, including for commercial purposes.
Through this project, we aim to help more people get started with AI hardware development and understand how to implement rapidly evolving large language models in actual hardware devices. Whether you're a student interested in AI or a developer exploring new technologies, this project offers valuable learning experiences.
Everyone is welcome to participate in the project's development and improvement. If you have any ideas or suggestions, please feel free to raise an Issue or join the chat group.
Learning & Discussion QQ Group: 376893254
Implemented Features
- Wi-Fi / ML307 Cat.1 4G
- BOOT button wake-up and interruption, supporting both click and long-press triggers
- Offline voice wake-up ESP-SR
- Streaming voice dialogue (WebSocket or UDP protocol)
- Support for 5 languages: Mandarin, Cantonese, English, Japanese, Korean SenseVoice
- Voice print recognition to identify who's calling AI's name 3D Speaker
- Large model TTS (Volcano Engine or CosyVoice)
- Large Language Models (Qwen, DeepSeek, Doubao)
- Configurable prompts and voice tones (custom characters)
- Short-term memory, self-summarizing after each conversation round
- OLED / LCD display showing signal strength or conversation content
- Support for LCD image expressions
- Multi-language support (Chinese, English)
Hardware Section
Breadboard DIY Practice
See the Feishu document tutorial:
👉 XiaoZhi AI Chatbot Encyclopedia
Breadboard demonstration:
Supported Open Source Hardware
- LiChuang ESP32-S3 Development Board
- Espressif ESP32-S3-BOX3
- M5Stack CoreS3
- AtomS3R + Echo Base
- AtomMatrix + Echo Base
- Magic Button 2.4
- Waveshare ESP32-S3-Touch-AMOLED-1.8
- LILYGO T-Circle-S3
- XiaGe Mini C3
- Moji XiaoZhi AI Derivative Version
- CuiCan AI pendant
- WMnologo-Xingzhi-1.54TFT
- SenseCAP Watcher
Firmware Section
Flashing Without Development Environment
For beginners, it's recommended to first use the firmware that can be flashed without setting up a development environment.
The firmware connects to the official xiaozhi.me server by default. Currently, personal users can register an account to use the Qwen real-time model for free.
👉 Flash Firmware Guide (No IDF Environment)
Development Environment
- Cursor or VSCode
- Install ESP-IDF plugin, select SDK version 5.3 or above
- Linux is preferred over Windows for faster compilation and fewer driver issues
- Use Google C++ code style, ensure compliance when submitting code
Developer Documentation
- Board Customization Guide - Learn how to create custom board adaptations for XiaoZhi
- IoT Control Module - Understand how to control IoT devices through AI voice commands
AI Agent Configuration
If you already have a XiaoZhi AI chatbot device, you can configure it through the xiaozhi.me console.
👉 Backend Operation Tutorial (Old Interface)
Technical Principles and Private Deployment
👉 Detailed WebSocket Communication Protocol Documentation
For server deployment on personal computers, refer to another MIT-licensed project xiaozhi-esp32-server
