Rdzleo fe8173752d docs: 新增 Kapi 项目业务全貌与重构决策分析报告
基于深度代码分析(58 个 tool calls)输出 600 行决策分析文档:

- §1~7 项目业务全貌: 架构/RTC/音频管线/已有功能/资源/与 Baji 差异/已有规划
- §8 三选项决策矩阵:
  * 选项 A (移植 Baji Phase 6 软退出):  强烈推荐, 工作量 1-2 天
  * 选项 B-1 (启用 ESP-SR AFE):  推荐, 但需先确认 Moji 板 ES7210 REF 走线
  * 选项 B-2 (G711A→Opus): ⚠️ 暂缓
  * 选项 B-3 (Jitter buffer): ⚠️ 暂缓
  * 选项 C (ESP-ADF 重构):  强烈不推荐, 工作量 3-4 周风险极高
- §9 推荐 4 步执行路线
- §10 关键代码引用 + Baji 交叉引用清单
- §11 一句话总结
- 附录: Kapi 是更合适的 AFE 实验场地(SRAM 余量 80-130KB vs Baji 44KB)

关键结论:
- Kapi = Baji 无屏轻量底座, 资源宽松 30-50KB SRAM + 300-500KB PSRAM
- audio_processing 整套 AFE 代码已写好但 sdkconfig 未启用
- Phase 6 软退出 100% 可移植(基类同源代码, 字幕提示需替换为 LED/语音)
- 未来 AFE 实验应优先在 Kapi 跑通, 再回 Baji 评估

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 14:37:29 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00
2026-01-20 16:55:17 +08:00

XiaoZhi AI Chatbot

(中文 | English | 日本語)

Introduction

👉 Build your AI chat companion with ESP32+SenseVoice+Qwen72B!【bilibili】

👉 Equipping XiaoZhi with DeepSeek's smart brain【bilibili】

👉 Build your own AI companion, a beginner's guide【bilibili】

Project Purpose

This is an open-source project released under the MIT license, allowing anyone to use it freely, including for commercial purposes.

Through this project, we aim to help more people get started with AI hardware development and understand how to implement rapidly evolving large language models in actual hardware devices. Whether you're a student interested in AI or a developer exploring new technologies, this project offers valuable learning experiences.

Everyone is welcome to participate in the project's development and improvement. If you have any ideas or suggestions, please feel free to raise an Issue or join the chat group.

Learning & Discussion QQ Group: 376893254

Implemented Features

  • Wi-Fi / ML307 Cat.1 4G
  • BOOT button wake-up and interruption, supporting both click and long-press triggers
  • Offline voice wake-up ESP-SR
  • Streaming voice dialogue (WebSocket or UDP protocol)
  • Support for 5 languages: Mandarin, Cantonese, English, Japanese, Korean SenseVoice
  • Voice print recognition to identify who's calling AI's name 3D Speaker
  • Large model TTS (Volcano Engine or CosyVoice)
  • Large Language Models (Qwen, DeepSeek, Doubao)
  • Configurable prompts and voice tones (custom characters)
  • Short-term memory, self-summarizing after each conversation round
  • OLED / LCD display showing signal strength or conversation content
  • Support for LCD image expressions
  • Multi-language support (Chinese, English)

Hardware Section

Breadboard DIY Practice

See the Feishu document tutorial:

👉 XiaoZhi AI Chatbot Encyclopedia

Breadboard demonstration:

Breadboard Demo

Supported Open Source Hardware

Firmware Section

Flashing Without Development Environment

For beginners, it's recommended to first use the firmware that can be flashed without setting up a development environment.

The firmware connects to the official xiaozhi.me server by default. Currently, personal users can register an account to use the Qwen real-time model for free.

👉 Flash Firmware Guide (No IDF Environment)

Development Environment

  • Cursor or VSCode
  • Install ESP-IDF plugin, select SDK version 5.3 or above
  • Linux is preferred over Windows for faster compilation and fewer driver issues
  • Use Google C++ code style, ensure compliance when submitting code

Developer Documentation

AI Agent Configuration

If you already have a XiaoZhi AI chatbot device, you can configure it through the xiaozhi.me console.

👉 Backend Operation Tutorial (Old Interface)

Technical Principles and Private Deployment

👉 Detailed WebSocket Communication Protocol Documentation

For server deployment on personal computers, refer to another MIT-licensed project xiaozhi-esp32-server

Star History

Star History Chart
Description
Kapi_Rtc_toy 卡皮吧啦项目_火山RTC版本
Readme MIT 24 MiB
Languages
C++ 66.2%
C 27.3%
Python 5.9%
CMake 0.6%