toy-Kapi_Rtc/03AEC_VOICE_INTERRUPT_PORTING_PLAN.md

# AEC语音打断功能移植方案：Airhub_Rtc_h → Kapi_Rtc

## 1. 可行性总结

经过全面分析，在不启用ADF架构的情况下，将Airhub_Rtc_h项目中基于双麦克风和ES7210实现的AEC语音打断功能移植到Kapi_Rtc项目是**技术可行的**。主要依据如下：

- Kapi_Rtc项目已集成ESP-SR组件，提供了独立于ADF的AEC和VAD API
- 两个项目均支持ESP32_S3_KORVO2_V3开发板，硬件兼容性良好
- Kapi_Rtc已有AudioProcessor类可扩展以支持AEC功能
- 两个项目都支持ES7210编解码器，配置方式可兼容

## 2. 移植实现方案

### 2.1 ES7210双麦克风配置移植

1. **创建ES7210配置初始化模块**：
   ```cpp
   // 在Kapi_Rtc中创建es7210_mic_config.h/cpp文件
   #include "es7210_adc.h"

   #define ES7210_MIC_COMBO_MIC1_MIC3 (ES7210_INPUT_MIC1 | ES7210_INPUT_MIC3) // 双麦克风配置

   esp_err_t init_es7210_with_dual_mic(const audio_codec_ctrl_if_t *ctrl_if) {
       es7210_codec_cfg_t es7210_cfg = {
           .ctrl_if = ctrl_if,
           .master_mode = false,
           .mic_selected = ES7210_MIC_COMBO_MIC1_MIC3,
           .mclk_src = ES7210_MCLK_FROM_PAD,
           .mclk_div = 256
       };

       const audio_codec_if_t *es7210_codec = es7210_codec_new(&es7210_cfg);
       if (es7210_codec == NULL) {
           return ESP_FAIL;
       }

       // 设置麦克风增益为30dB
       es7210_codec->set_mic_gain(es7210_codec, GAIN_30DB);

       return ESP_OK;
   }
   ```

2. **在AudioCodec基类中增加ES7210配置支持**：
   ```cpp
   // 修改AudioCodec类，增加双麦克风配置接口
   class AudioCodec {
   public:
       // 现有接口...

       virtual bool configureDualMicrophone() = 0;
       virtual bool setMicrophoneGain(int gain_db) = 0;
   };
   ```

3. **在具体编解码器实现中添加ES7210配置**：
   ```cpp
   bool Es7210AudioCodec::configureDualMicrophone() {
       return (init_es7210_with_dual_mic(ctrl_if_) == ESP_OK);
   }
   ```

### 2.2 AEC功能集成到AudioProcessor

1. **扩展AudioProcessor类**：
   ```cpp
   // 修改audio_processor.h
   class AudioProcessor {
   private:
       aec_handle_t *aec_instance_ = nullptr;
       std::vector<int16_t> reference_buffer_;

   public:
       // 现有方法...

       // 添加AEC相关方法
       void InitializeAEC(int sample_rate, int channels);
       void ProcessWithAEC(const std::vector<int16_t>& mic_data,
                           const std::vector<int16_t>& ref_data,
                           std::vector<int16_t>& out_data);
       void SetReferenceAudio(const std::vector<int16_t>& ref_data);
   };
   ```

2. **实现AEC功能**：
   ```cpp
   // 在audio_processor.cc中实现
   void AudioProcessor::InitializeAEC(int sample_rate, int channels) {
       // 使用ESP-SR的AEC API，不依赖ADF
       aec_instance_ = aec_create(sample_rate, 4, channels, AEC_MODE_SR_LOW_COST);
       if (aec_instance_ == nullptr) {
           ESP_LOGE(TAG, "Failed to initialize AEC");
       } else {
           ESP_LOGI(TAG, "AEC initialized successfully");
       }
   }

   void AudioProcessor::SetReferenceAudio(const std::vector<int16_t>& ref_data) {
       reference_buffer_ = ref_data;
   }

   void AudioProcessor::ProcessWithAEC(const std::vector<int16_t>& mic_data,
                                      const std::vector<int16_t>& ref_data,
                                      std::vector<int16_t>& out_data) {
       if (aec_instance_ == nullptr) {
           // 如果AEC初始化失败，直接返回麦克风数据
           out_data = mic_data;
           return;
       }

       // 确保输出缓冲区足够大
       out_data.resize(mic_data.size());

       // 调用AEC处理函数
       size_t processed_size = aec_process(aec_instance_,
                                         mic_data.data(),
                                         ref_data.data(),
                                         out_data.data(),
                                         mic_data.size() / sizeof(int16_t));

       if (processed_size == 0) {
           ESP_LOGW(TAG, "AEC processing failed");
           out_data = mic_data; // 失败时返回原始数据
       }
   }
   ```

3. **修改音频处理流程**：
   ```cpp
   void AudioProcessor::AudioProcessorTask() {
       // 现有代码...

       while (true) {
           // 等待运行标志
           xEventGroupWaitBits(event_group_, PROCESSOR_RUNNING, pdFALSE, pdTRUE, portMAX_DELAY);

           // 获取麦克风数据
           auto res = afe_iface_->fetch_with_delay(afe_data_, portMAX_DELAY);
           if (res == nullptr || res->ret_value == ESP_FAIL) {
               continue;
           }

           // 准备数据
           std::vector<int16_t> mic_data((int16_t*)res->data,
                                       (int16_t*)res->data + res->data_size / sizeof(int16_t));
           std::vector<int16_t> processed_data;

           // AEC处理
           if (aec_instance_ != nullptr && !reference_buffer_.empty()) {
               ProcessWithAEC(mic_data, reference_buffer_, processed_data);
           } else {
               processed_data = mic_data; // 无AEC时使用原始数据
           }

           // 后续VAD处理和回调...
           // 智能语音打断逻辑（使用处理后的数据）
       }
   }
   ```

### 2.3 语音打断功能实现

1. **实现智能语音确认机制**：
   ```cpp
   // 在audio_processor.h中添加
   struct VoiceInterruptConfig {
       int min_speech_duration_ms = 200; // 最小语音时长，防止回声误触发
       float volume_threshold = 0.05f;   // 音量阈值
   };

   class AudioProcessor {
   private:
       // 现有成员...
       VoiceInterruptConfig interrupt_config_;
       int speech_duration_count_ = 0;
       bool is_voice_interrupt_active_ = false;

   public:
       // 现有方法...

       void ConfigureVoiceInterrupt(const VoiceInterruptConfig& config);
       bool DetectVoiceInterrupt(const std::vector<int16_t>& audio_data);
   };
   ```

2. **实现语音检测逻辑**：
   ```cpp
   // 在audio_processor.cc中实现
   void AudioProcessor::ConfigureVoiceInterrupt(const VoiceInterruptConfig& config) {
       interrupt_config_ = config;
   }

   bool AudioProcessor::DetectVoiceInterrupt(const std::vector<int16_t>& audio_data) {
       // 计算音频能量
       float energy = 0.0f;
       for (int16_t sample : audio_data) {
           float normalized = (float)sample / 32768.0f;
           energy += normalized * normalized;
       }
       energy /= audio_data.size();

       // 音量检测
       bool volume_detected = (energy > interrupt_config_.volume_threshold * interrupt_config_.volume_threshold);

       if (volume_detected) {
           speech_duration_count_ += 16; // 假设每帧16ms
       } else {
           speech_duration_count_ = 0;
       }

       // 语音确认：持续时间超过阈值
       bool voice_confirmed = (speech_duration_count_ >= interrupt_config_.min_speech_duration_ms);

       if (voice_confirmed && !is_voice_interrupt_active_) {
           is_voice_interrupt_active_ = true;
           return true; // 触发语音打断
       } else if (!volume_detected) {
           is_voice_interrupt_active_ = false;
       }

       return false;
   }
   ```

3. **集成到处理流程**：
   ```cpp
   void AudioProcessor::AudioProcessorTask() {
       // 现有代码...

       while (true) {
           // 获取并处理音频数据...

           // 调用语音打断检测
           if (DetectVoiceInterrupt(processed_data)) {
               ESP_LOGI(TAG, "Voice interrupt detected!");
               // 触发打断回调
               if (voice_interrupt_callback_) {
                   voice_interrupt_callback_();
               }
           }
       }
   }
   ```

### 2.4 内存优化策略

1. **SPIRAM使用优化**：
   ```cpp
   // 修改缓冲区分配方式
   void AudioProcessor::Initialize(int sample_rate, int channels) {
       // 现有初始化代码...

       // 使用SPIRAM分配大缓冲区
       size_t buffer_size = sample_rate * channels * 2; // 2秒的缓冲区
       reference_buffer_.resize(buffer_size);

       // 确保AEC使用正确的内存
       // AEC内部会使用heap_caps_aligned_alloc，需要确保配置正确
   }
   ```

2. **配置调整**：
   - 修改`sdkconfig.defaults`以优化SPIRAM使用：
   ```
   # 启用SPIRAM
   CONFIG_SPIRAM=y

   # 优化内存分配策略
   CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL=8192
   CONFIG_SPIRAM_MALLOC_RESERVE_INTERNAL=65536

   # 增加任务栈大小
   CONFIG_ESP_MAIN_TASK_STACK_SIZE=8192
   ```

## 3. 注意事项

### 3.1 硬件兼容性注意事项

- **麦克风选择**：确保正确配置ES7210的MIC1和MIC3（或其他麦克风组合）
- **增益设置**：根据实际硬件调整麦克风增益，建议从30dB开始测试
- **参考音频通道**：确保提供正确的扬声器参考音频给AEC算法

### 3.2 软件实现注意事项

- **采样率一致性**：确保AEC、麦克风输入和参考音频使用相同的采样率
- **内存对齐**：AEC处理需要16字节对齐的内存，使用`heap_caps_aligned_alloc`
- **实时性能**：AEC处理较为耗时，确保设置适当的任务优先级
- **错误处理**：添加完善的错误处理和回退机制

### 3.3 调优建议

1. **AEC参数调优**：
   - 调整`filter_length`参数（推荐值：4）
   - 根据实际使用场景选择合适的AEC模式

2. **VAD参数调优**：
   - 调整语音打断的最小持续时间（建议200ms以上）
   - 根据环境噪声调整音量阈值

3. **性能监控**：
   - 添加CPU使用率监控
   - 监控内存使用情况，避免内存泄漏

## 4. 测试方案

1. **基础功能测试**：
   - 验证ES7210双麦克风配置是否正确
   - 确认AEC初始化和处理无错误

2. **AEC性能测试**：
   - 在不同距离测试回声消除效果
   - 在不同音量下测试性能

3. **语音打断测试**：
   - 测试在设备播放时的语音打断功能
   - 测试不同距离和音量下的打断灵敏度

4. **稳定性测试**：
   - 长时间运行测试，监控内存泄漏
   - 测试在各种环境噪声条件下的稳定性

## 5. 总结

本移植方案通过直接使用ESP-SR组件提供的AEC API，无需启用完整的ADF架构，即可实现从Airhub_Rtc_h到Kapi_Rtc的AEC语音打断功能移植。方案保留了Kapi_Rtc现有的音频处理架构，通过扩展AudioProcessor类和相关接口，实现了双麦克风配置、AEC处理和智能语音打断功能。

通过合理的内存优化和参数调优，可以在保证系统稳定性的同时，实现良好的回声消除和语音打断效果。