17 KiB
17 KiB
VOLC RTC 语音交互例程
- English Version
- 例程难度:

例程简介
本例程主要功能是连接豆包 volcano rtc 云端并进行语音交互,可以适用于智能音箱产品、智能玩具、语音控制设备等。此示例是一个综合性较强的例程,使用了 ADF 提供的高封装简易实用接口。建议用户构建项目时,优先使用 ADF 提供的高封装接口,可快速简便地构建项目。
环境配置
硬件要求
目前该 example 支持 esp32s3 和 esp32 相关的开发板。
默认使用的是 ESP32-S3-Korvo-2 v3 开发板。
前期准备
- 关于豆包简介
-
请参考 火山·引擎 开放平台 开通火山引擎服务的正式版本,需要获取
appid,userid,roomid和临时token, 需要先使能 RTC_TOKEN_REQUEST, 将四个参数填到menuconfig-> Example Configuration对应的配置中, 然后在 api-explorer 中启动智能体,之后设备就可以连接到火山引擎进行语音交互了。 -
如果使用 coze 方案,需要注册相关的账号,example 默认使能 coze 的测试账号,该测试账号每次的使用时间是 5 分钟。
- 关于 wifi 配置
- 在 menuconfig 中, 填写 SSID 和 PASSWORD, 然后编译下载到开发板中。
- 关于编码格式
- 在 menuconfig 中,可选择
PCMA,OPUS和AAC编解码格式, 默认是OPUS格式。
OPUS编码默认的是 32kbps。
- 关于工作模式
目前支持 普通模式 和 唤醒模式
- 普通模式 用户无需唤醒词,直接与设备进行语音交互。
- 唤醒对话模式 用户需要通过唤醒词唤醒设备,唤醒后用户可与设备进行语音交互。默认的唤醒词是
Hi 乐鑫, 可在menuconfig -> ESP Speech Recognition → use wakenet → Select wake words中选择唤醒词。
如果使用唤醒模式,建议使能 120M flash 和 120M PSRAM
- CONFIG_IDF_EXPERIMENTAL_FEATURES=y
- CONFIG_ESPTOOLPY_FLASHFREQ_120M=y
- CONFIG_SPIRAM_SPEED_120M=y
使用说明
配置
编译和下载
IDF 默认分支
本例程支持 IDF release/v5.4 及以后的分支,例程默认使用 ADF 的內建分支 $ADF_PATH/esp-idf。
编译和下载
请先编译版本并烧录到开发板上,然后运行 monitor 工具来查看串口输出 (替换 PORT 为端口名称):
idf.py -p PORT flash monitor
退出调试界面使用 Ctrl-]。
有关配置和使用 ESP-IDF 生成项目的完整步骤,请参阅 《ESP-IDF 编程指南》。
如何使用例程
功能和用法
- 例程运行时,会先尝试连接 Wi-Fi 网络,待成功获取 IP 地址后会进入 RTC 房间。log 中会有
volc_rtc: join room success的打印,设备端播放服务器下发的"你好呀,很高兴认识你,哈哈哈"类的提示语,便可以与智能体进行对话。
I (25) boot: ESP-IDF v5.5-dev-972-gfa41fafd27-dirty 2nd stage bootloader
I (25) boot: compile time Jan 10 2025 10:39:14
I (25) boot: Multicore bootloader
I (27) boot: chip revision: v0.2
I (30) boot: efuse block revision: v1.3
I (34) qio_mode: Enabling default flash chip QIO
I (38) boot.esp32s3: Boot SPI Speed : 80MHz
I (42) boot.esp32s3: SPI Mode : QIO
I (46) boot.esp32s3: SPI Flash Size : 16MB
I (49) boot: Enabling RNG early entropy source...
I (54) boot: Partition Table:
I (56) boot: ## Label Usage Type ST Offset Length
I (63) boot: 0 nvs WiFi data 01 02 00009000 00004000
I (69) boot: 1 phy_init RF data 01 01 0000d000 00001000
I (76) boot: 2 factory factory app 00 00 00010000 00300000
I (82) boot: 3 model Unknown data 01 82 00310000 0040e000
I (89) boot: 4 spiffs_data Unknown data 01 82 0071e000 00010000
I (95) boot: End of partition table
I (99) esp_image: segment 0: paddr=00010020 vaddr=3c180020 size=5db38h (383800) map
I (163) esp_image: segment 1: paddr=0006db60 vaddr=3fca2600 size=024b8h ( 9400) load
I (165) esp_image: segment 2: paddr=00070020 vaddr=42000020 size=178fc4h (1544132) map
I (396) esp_image: segment 3: paddr=001e8fec vaddr=3fca4ab8 size=05db8h ( 23992) load
I (401) esp_image: segment 4: paddr=001eedac vaddr=40378000 size=1a520h (107808) load
I (421) esp_image: segment 5: paddr=002092d4 vaddr=600fe000 size=0001ch ( 28) load
I (433) boot: Loaded app from partition at offset 0x10000
I (433) boot: Disabling RNG early entropy source...
W (443) flash HPM: HPM mode is optional feature that depends on flash model. Read Docs First!
W (443) flash HPM: HPM mode with DC adjustment is disabled. Some flash models may not be supported. Read Docs First!
W (452) flash HPM: High performance mode of this flash model hasn't been supported.
I (460) MSPI Timing: Flash timing tuning index: 2
I (466) octal_psram: vendor id : 0x0d (AP)
I (471) octal_psram: dev id : 0x02 (generation 3)
I (476) octal_psram: density : 0x03 (64 Mbit)
I (482) octal_psram: good-die : 0x01 (Pass)
I (487) octal_psram: Latency : 0x01 (Fixed)
I (492) octal_psram: VCC : 0x01 (3V)
I (497) octal_psram: SRF : 0x01 (Fast Refresh)
I (503) octal_psram: BurstType : 0x01 (Hybrid Wrap)
I (509) octal_psram: BurstLen : 0x01 (32 Byte)
I (515) octal_psram: Readlatency : 0x02 (10 cycles@Fixed)
I (521) octal_psram: DriveStrength: 0x00 (1/1)
I (531) MSPI Timing: PSRAM timing tuning index: 2
I (532) esp_psram: Found 8MB PSRAM device
I (536) esp_psram: Speed: 120MHz
I (557) mmu_psram: Read only data copied and mapped to SPIRAM
I (641) mmu_psram: Instructions copied and mapped to SPIRAM
I (641) cpu_start: Multicore app
I (821) esp_psram: SPI SRAM memory test OK
I (830) cpu_start: Pro cpu start user code
I (830) cpu_start: cpu freq: 240000000 Hz
I (830) app_init: Application information:
I (833) app_init: Project name: volc_rtc
I (838) app_init: App version: v2.7-48-g01596882-dirty
I (844) app_init: Compile time: Jan 9 2025 22:09:38
I (850) app_init: ELF file SHA256: 75b7595ca...
I (856) app_init: ESP-IDF: v5.5-dev-972-gfa41fafd27-dirty
I (863) efuse_init: Min chip rev: v0.0
I (867) efuse_init: Max chip rev: v0.99
I (872) efuse_init: Chip rev: v0.2
I (877) heap_init: Initializing. RAM available for dynamic allocation:
I (884) heap_init: At 3FCAEF20 len 0003A7F0 (233 KiB): RAM
I (890) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
I (897) heap_init: At 600FE01C len 00001FCC (7 KiB): RTCRAM
I (903) esp_psram: Adding pool of 6258K of PSRAM memory to heap allocator
I (911) spi_flash: detected chip: gd
I (914) spi_flash: flash io: qio
W (919) ADC: legacy driver is deprecated, please migrate to `esp_adc/adc_oneshot.h`
I (927) sleep_gpio: Configure to isolate all GPIO pins in sleep state
I (934) sleep_gpio: Enable automatic switching of GPIO sleep configuration
I (942) main_task: Started on CPU0
I (954) esp_psram: Reserving pool of 32K of internal memory for DMA/internal allocations
I (955) main_task: Calling app_main()
I (961) main: Initialize board peripherals
I (965) PERIPH_SPIFFS: Partition size: total: 52961, used: 0
I (970) AUDIO_THREAD: The esp_periph task allocate stack on internal memory
W (978) i2c_bus_v2: I2C master handle is NULL, will create new one
I (989) DRV8311: ES8311 in Slave mode
I (999) gpio: GPIO[48]| InputEn: 0| OutputEn: 1| OpenDrain: 0| Pullup: 0| Pulldown: 0| Intr:0
I (1006) ES7210: ES7210 in Slave mode
I (1015) ES7210: Enable ES7210_INPUT_MIC1
I (1018) ES7210: Enable ES7210_INPUT_MIC2
I (1020) ES7210: Enable ES7210_INPUT_MIC3
W (1024) ES7210: Enable TDM mode. ES7210_SDP_INTERFACE2_REG12: 2
I (1028) ES7210: config fmt 60
I (1030) AUDIO_HAL: Codec mode is 3, Ctrl:1
I (1040) pp: pp rom version: e7ae62f
I (1040) net80211: net80211 rom version: e7ae62f
I (1044) wifi:wifi driver task: 3fcbee58, prio:23, stack:6656, core=0
I (1050) wifi:wifi firmware version: 34d97ea27
I (1053) wifi:wifi certification version: v7.0
I (1057) wifi:config NVS flash: enabled
I (1061) wifi:config nano formatting: disabled
I (1065) wifi:Init data frame dynamic rx buffer num: 32
I (1070) wifi:Init static rx mgmt buffer num: 8
I (1074) wifi:Init management short buffer num: 32
I (1079) wifi:Init static tx buffer num: 16
I (1083) wifi:Init tx cache buffer num: 32
I (1086) wifi:Init static tx FG buffer num: 2
I (1090) wifi:Init static rx buffer size: 1600
I (1095) wifi:Init static rx buffer num: 16
I (1098) wifi:Init dynamic rx buffer num: 32
I (1103) wifi_init: rx ba win: 16
I (1106) wifi_init: accept mbox: 6
I (1110) wifi_init: tcpip mbox: 32
I (1114) wifi_init: udp mbox: 32
I (1118) wifi_init: tcp mbox: 6
I (1122) wifi_init: tcp tx win: 65535
I (1126) wifi_init: tcp rx win: 65535
I (1131) wifi_init: tcp mss: 1440
I (1135) wifi_init: WiFi/LWIP prefer SPIRAM
I (1140) wifi_init: WiFi IRAM OP enabled
I (1144) wifi_init: WiFi RX IRAM OP enabled
W (1149) wifi:Password length matches WPA2 standards, authmode threshold changes from OPEN to WPA2
I (1158) wifi:Set ps type: 1, coexist: 0
I (1162) phy_init: phy_version 680,a6008b2,Jun 4 2024,16:41:10
I (1227) wifi:mode : sta (74:4d:bd:9d:b6:30)
I (1227) wifi:enable tsf
W (1227) PERIPH_WIFI: WiFi Event cb, Unhandle event_base:WIFI_EVENT, event_id:43
I (2520) wifi:new:<10,0>, old:<1,0>, ap:<255,255>, sta:<10,0>, prof:1, snd_ch_cfg:0x0
I (2521) wifi:state: init -> auth (0xb0)
W (2521) PERIPH_WIFI: WiFi Event cb, Unhandle event_base:WIFI_EVENT, event_id:43
I (2525) wifi:state: auth -> assoc (0x0)
I (2541) wifi:state: assoc -> run (0x10)
I (2761) wifi:connected with ESP-Audio, aid = 3, channel 10, BW20, bssid = fc:2f:ef:ab:db:70
I (2762) wifi:security: WPA2-PSK, phy: bgn, rssi: -39
I (2763) wifi:pm start, type: 1
I (2766) wifi:set rx beacon pti, rx_bcn_pti: 0, bcn_timeout: 25000, mt_pti: 0, mt_time: 10000
W (2775) PERIPH_WIFI: WiFi Event cb, Unhandle event_base:WIFI_EVENT, event_id:4
I (2792) wifi:AP's beacon interval = 204800 us, DTIM period = 1
I (2792) wifi:<ba-add>idx:0 (ifx:0, fc:2f:ef:ab:db:70), tid:0, ssn:3, winSize:64
I (3796) esp_netif_handlers: sta ip: 192.168.1.100, mask: 255.255.255.0, gw: 192.168.1.1
I (3796) PERIPH_WIFI: Got ip:192.168.1.100
I (3798) AUDIO_THREAD: The monitor_task task allocate stack on external memory
I (3807) audio pipeline: Create audio pipeline for audio player
I (3813) audio pipeline: Create audio player audio stream
I (3820) audio pipeline: Register all elements to playback pipeline
I (3826) audio pipeline: Link playback element together raw-->audio_decoder-->i2s_stream-->[codec_chip]
E (3836) gpio: gpio_install_isr_service(502): GPIO isr service already installed
E (3844) DISPATCHER: exe first list: 0x0
I (3849) DISPATCHER: dispatcher_event_task is running...
I (3951) wifi:<ba-add>idx:1 (ifx:0, fc:2f:ef:ab:db:70), tid:1, ssn:0, winSize:64
I (4807) AUDIO_SYS: | Task | Run Time | Per | Prio | HWM | State | CoreId | Stack
I (4808) AUDIO_SYS: | monitor_task | 675 | 0% | 23 | 4316 | Running | 0 | Extr
I (4817) AUDIO_SYS: | main | 254004 |12% | 1 | 1324 | Ready | 0 | Intr
I (4828) AUDIO_SYS: | IDLE1 | 994708 |49% | 0 | 700 | Ready | 1 | Intr
I (4838) AUDIO_SYS: | IDLE0 | 726577 |36% | 0 | 692 | Ready | 0 | Intr
I (4848) AUDIO_SYS: | tiT | 8424 | 0% | 18 | 1792 | Blocked | 7fffffff | Intr
I (4859) AUDIO_SYS: | esp_periph | 1842 | 0% | 5 | 1592 | Blocked | 0 | Intr
I (4869) AUDIO_SYS: | ipc1 | 0 | 0% | 24 | 540 | Suspended | 1 | Intr
I (4879) AUDIO_SYS: | ipc0 | 0 | 0% | 1 | 436 | Suspended | 0 | Intr
I (4890) AUDIO_SYS: | wifi | 7636 | 0% | 23 | 3452 | Blocked | 0 | Intr
I (4900) AUDIO_SYS: | esp_timer | 309 | 0% | 22 | 3092 | Suspended | 0 | Intr
I (4911) AUDIO_SYS: | sys_evt | 46 | 0% | 20 | 416 | Blocked | 0 | Intr
I (4921) AUDIO_SYS: | Tmr Svc | 0 | 0% | 1 | 3364 | Blocked | 7fffffff | Intr
I (4931) AUDIO_SYS: | audio_player_st | Created
I (4937) AUDIO_SYS: | esp_dispatcher | Created
I (4942) main: Func:monitor_task, Line:29, MEM Total:6377072 Bytes, Inter:135047 Bytes, Dram:135047 Bytes, Dram largest free:94208Bytes
1970-01-01 00:00:04.213 [E] VolcEngineRTCLite.c:105 ****************** HELLO BOOKA (671221aa298a540183df32d9)(1.56.001.58)(6059fcf26792a8820bc81f13662979d531e5504d) ********************
1970-01-01 00:00:04.227 [E] Cache.c:270 operation returned status code: 0x00000009
1970-01-01 00:00:04.242 [E] ThreadPool.c:92 coreid 1 set 1 stack_size 8192 priority 5
I (5075) audio pipeline: Create audio pipeline for recording
I (5078) audio pipeline: Create player audio stream
I (5085) audio pipeline: Register all player elements to audio pipeline
I (5091) audio pipeline: Link all player elements to audio pipeline
I (5099) audio pipeline: Create audio pipeline for playback
I (5105) audio pipeline: Create playback audio stream
W (5110) I2S_STREAM_IDF5.x: I2S(2) already startup
I (5116) audio pipeline: Create opus decoder
I (5121) audio pipeline: Register all elements to playback pipeline
I (5128) audio pipeline: Link playback element together raw-->audio_decoder-->i2s_stream-->[codec_chip]
1970-01-01 00:00:05.039 [E] RoomImplX.c:167 operation returned status code: 0x52000057
1970-01-01 00:00:05.459 [E] Cache.c:309 operation returned status code: 0x00000009
1970-01-01 00:00:05.465 [E] RoomImplX.c:167 operation returned status code: 0x52000057
1970-01-01 00:00:05.467 [E] LiteHttp.c:641 ID 340052878 E_LOGIC : NO need keepAlive
1970-01-01 00:00:05.475 [E] RoomImplX.c:167 operation returned status code: 0x52000057
1970-01-01 00:00:05.583 [E] RoomImplX.c:167 operation returned status code: 0x52000057
1970-01-01 00:00:06.042 [E] rx_net_delay_manager.c:1130
I (6880) volc_rtc: join channel success ************8105400500486185 elapsed 239 ms now 239 ms
I (6880) volc_rtc: join room success
I (6880) volc_rtc: remote user joined *****
1970-01-01 00:00:06.063 [E] EngineImplX.c:103 callback pEngineImplX->eventHandler.on_user_joined used too many times 12
I (6895) MODEL_LOADER: The storage free size is 24576 KB
I (6911) MODEL_LOADER: The partition size is 4152 KB
I (6917) MODEL_LOADER: Successfully load srmodels
I (6922) RECORDER_SR: The first wakenet model: wn9_hilexin
I (6929) AFE_SR: afe interface for speech recognition
I (6935) AFE_SR: AFE version: SR_V220727
I (6939) AFE_SR: Initial auido front-end, total channel: 2, mic num: 1, ref num: 1
I (6951) AFE_SR: aec_init: 1, se_init: 0, vad_init: 0(min speech:64, min noise:256)
I (6956) AFE_SR: wakenet_init: 0
I (7017) AFE_SR: afe mode: 0, (Jan 2 2025 19:06:11)
W (7017) RECORDER_SR: Multinet is not enabled in SDKCONFIG
I (7018) AUDIO_RECORDER: RECORDER_CMD_TRIGGER_START
I (7018) main_task: Returned from app_main()
1970-01-01 00:00:06.212 [E] rx_net_audio_jitterbuffer.c:1266
1970-01-01 00:00:06.219 [E] rx_net_audio_jitterbuffer.c:1190
1970-01-01 00:00:06.219 [E] rx_net_delay_manager.c:1130
1970-01-01 00:00:06.219 [E] rx_net_delay_manager.c:1130
1970-01-01 00:00:06.224 [E] Counter.c:90 AudioRecevied fps 0
1970-01-01 00:00:07.007 [E] Counter.c:90 AudioRecevied fps 39
1970-01-01 00:00:08.012 [E] Counter.c:90 AudioRecevied fps 49
1970-01-01 00:00:09.163 [E] Counter.c:90 AudioRecevied fps 44
1970-01-01 00:00:10.014 [E] Counter.c:90 AudioRecevied fps 56
1970-01-01 00:00:11.016 [E] Counter.c:90 AudioRecevied fps 50
1970-01-01 00:00:12.021 [E] Counter.c:90 AudioRecevied fps 46
1970-01-01 00:00:13.001 [E] Counter.c:90 AudioRecevied fps 53
故障排除
加入房间后没有任何声音
- 检查 token 有效性:token 是连接服务的关键凭证,其有效性直接关系到语音交互功能能否正常使用。您可通过 web 端 验证 token 的有效性。
- 确认智能体是否开启:智能体未开启会致使无法正常进行语音交互与声音输出。
- 查看常见集成问题:更多详细的排查方法与解决方案,可查看 常见的集成问题
出现自问自答的现象
- 不同硬件所采用的麦克风、喇叭型号各异,且麦克风与喇叭之间的距离也不尽相同。这些因素可能致使采集到的数据出现增益过大或过小的情况,最终使得回声消除(AEC)效果大打折扣 。
- 需要确认
AEC是否生效, 在menuconfig中使能ENABLE_RECORDER_DEBUG,然后查看录音数据。需要注意两点- 确定回采参考信号是否出现饱和
- 定麦克风录音是否出现饱和
技术支持
请按照下面的链接获取技术支持:
- 技术支持参见 esp32.com 论坛
- 故障和新功能需求,请创建 GitHub issue
我们会尽快回复。