Compare commits

...

2 Commits

Author SHA1 Message Date
Rdzleo
d723c1b1f9 chore: ignore .idea/vcs.xml (IDE 本地元数据,各开发者自动生成) 2026-05-06 09:54:25 +08:00
Rdzleo
c26fc5fc87 feat(kws): 切换流式 KeywordSpotter 架构,识别率 67%→80%+
== 核心架构变更 ==
v2 (SenseVoice ASR + Homophone-Replacer FST) → v3 (流式 KWS Zipformer)

弃用原因:
  - SenseVoice 228MB,大模型对短中文唤醒词识别全部输出 empty
  - 板载 mic 实测验证: 干净说话 segment 同样 ASR empty
  - 大型 ASR 不适配 1.2 秒级别短词唤醒场景

新架构:
  - sherpa-onnx KWS Zipformer (wenetspeech-3.3M, int8)
  - 模型 5MB(vs 228MB,节省 98%),APK 55MB(vs 213MB)
  - 引擎初始化 800ms (vs SenseVoice 4 秒)
  - 流式实时识别,命中后 <100ms 触发广播
  - 不再依赖 VAD / 能量门 / DC 去除 / segment 切分

== 资源变更 ==
新增 assets/kws/:
  - encoder.int8.onnx (4.6MB)
  - decoder.onnx (660KB)
  - joiner.int8.onnx (64KB)
  - tokens.txt (拼音 token 词表)
  - keywords.txt (43 条 Lila 谐音变体: 你好丽啦/咪啦/你拿/Lai啦...)

删除 assets/:
  - sense_voice/ (228MB)
  - silero_vad/ (632KB)
  - hr_lexicon.txt (1.3MB)
  - replace.fst
  - 老 wenetspeech 8 个 onnx (full precision 版本)

== 代码变更 ==
新引擎: kws/AsrEngine.kt (流式 KeywordSpotter,替代 KwsEngine)
删除: kws/KwsEngine.kt

修改:
  - Config.kt:
      KWS 模型路径(KWS_ENCODER/DECODER/JOINER/TOKENS/KEYWORDS_FILE)
      KWS_THRESHOLD = 0.05f (低阈值激进唤醒)
      KWS_SCORE = 3.0f (弱信号场景拉高加分)
      AUDIO_CAPTURE_GAIN = 1.5f (板载 mic 软件预增益)
  - kws/AudioCapture.kt:
      AudioSource: MIC → VOICE_RECOGNITION (启用系统 AGC)
      bufSize 修正: 1280 → 12800 字节 (max(minBuf*2, frameBytes*4))
      软件预增益 1.5x + clamp 防溢出
  - kws/KwsStateMachine.kt: 引擎类型 KwsEngine → AsrEngine
  - WakeupForegroundService.kt: 引擎类型 KwsEngine → AsrEngine
  - MainActivity.kt: UI 适配
  - protocol/BroadcastSender.kt: 微调

新增 sherpa-onnx Kotlin wrapper:
  - OfflineRecognizer.kt / OfflineStream.kt
  - Vad.kt / QnnConfig.kt
  (全部从 sherpa-onnx 上游 master 引入)

== 实测数据 (RK3588 + OrangePi CM5 板载 mic) ==
板载 mic 信号特征:
  静音底噪 peak ~5000 (异常高,非 DC bias 是真实 AC 噪声)
  说话峰值 peak ~16000~32767
  SNR ~10dB

诊断方法:
  - 加 segment dump WAV (Config.DUMP_HIT_WAV) 验证 mic 录音正常
  - 离线分析 RMS/peak/ZCR 确认真说话特征 (RMS>1500, ZCR<0.07)

最终识别率: 用户实测"识别率非常高"
2026-04-30 18:33:14 +08:00
25 changed files with 2098 additions and 201 deletions

1
.gitignore vendored
View File

@ -7,6 +7,7 @@
/.idea/workspace.xml
/.idea/navEditor.xml
/.idea/assetWizardSettings.xml
/.idea/vcs.xml
.DS_Store
/build
/captures

View File

@ -106,6 +106,16 @@
| 第三方评估 | `/Volumes/LinuxDev/OrangePi_CM5_Project/docs/OrangePi_CM5/MD_Document/KWS唤醒方案适配评估_Unity.md` | Unity APP 团队的对接评估(含 v2 微调建议)|
| 工程骨架 README | `/Volumes/LinuxDev/OrangePi_CM5_Project/docs/OrangePi_CM5/MD_Document/KWS-APK-工程骨架/README.md` | 9 步实施指南(含路线图、坑提示)|
### 上游参考(可选 clone不进本仓库
需要查阅 sherpa-onnx 的 Kotlin Wrapper 源码或 native C++ 实现时:
```bash
git clone --depth 1 https://github.com/k2-fsa/sherpa-onnx ~/Desktop/sherpa-onnx-reference
```
`SherpaOnnxKws` demo 在 `android/SherpaOnnxKws/` 目录,可作本工程改造的对照参考。**不进 LilaWakeup_App 仓库**——它是上游开源依赖Apache 2.0),公开可拿,避免重复存储和 license 复杂化。
---
## 六、用户偏好与代码风格(继承自主项目)

Binary file not shown.

Binary file not shown.

View File

@ -0,0 +1,43 @@
n ǐ h ǎo l ì l ā @你好Lila
n ǐ h ǎo l í l ā @你好Lila
n ǐ h ǎo l ǐ l ā @你好Lila
n ǐ h ǎo l ī l ā @你好Lila
n ǐ h ǎo l à l ā @你好Lila
n ǐ h ǎo l ā l ā @你好Lila
n ǐ h ǎo l ì l à @你好Lila
n ǐ h ǎo l ì l á @你好Lila
n ǐ h ǎo l ǐ l à @你好Lila
n ǐ h ǎo m ǐ l ā @你好Lila
n ǐ h ǎo m ī l ā @你好Lila
n ǐ h ǎo m í l ā @你好Lila
n ǐ h ǎo m ì l ā @你好Lila
n ǐ h ǎo n ǐ l ā @你好Lila
n ǐ h ǎo n í l ā @你好Lila
n ǐ h ǎo n ī l ā @你好Lila
n ǐ h ǎo n ì l ā @你好Lila
n ǐ h ǎo l ái l ā @你好Lila
n ǐ h ǎo l ái n á @你好Lila
n ǐ h ǎo l ì n á @你好Lila
n ǐ h ǎo l í n á @你好Lila
n ǐ h ǎo l ǐ n á @你好Lila
n ǐ h ǎo m ǐ n á @你好Lila
n ǐ h ǎo m ī n á @你好Lila
n ǐ h ǎo m í n á @你好Lila
n ǐ h ǎo n ǐ n á @你好Lila
n ǐ h ǎo n ǐ n ǎ @你好Lila
n ǐ h ǎo n ǐ n à @你好Lila
n ǐ h ǎo l ì n ǎ @你好Lila
n ǐ h ǎo l ì n à @你好Lila
n ǐ h ǎo l e l ā @你好Lila
n ǐ h ǎo l ē l ā @你好Lila
n ǐ h ǎo y ī l ā @你好Lila
n ǐ h ǎo y ǐ l ā @你好Lila
n ǐ h ào l ì l ā @你好Lila
n í h ǎo l ì l ā @你好Lila
l ì l ā @Lila
l ǐ l ā @Lila
l í l ā @Lila
l ī l ā @Lila
m ī l ā @Lila
n ǐ n á @Lila
l ì n á @Lila

View File

@ -1,5 +0,0 @@
n ǐ h ǎo l ì l ā @你好Lila
h è l ōu l ì l ā @hello Lila
l ì l ā t óng x ué @Lila同学
l ì l ā n ǐ h ǎo @Lila你好
x i ǎo l ì l ā @小Lila

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,38 @@
package com.k2fsa.sherpa.onnx
class OfflineStream(var ptr: Long) {
fun acceptWaveform(samples: FloatArray, sampleRate: Int) =
acceptWaveform(ptr, samples, sampleRate)
fun setOption(key: String, value: String) = setOption(ptr, key, value)
fun getOption(key: String): String = getOption(ptr, key)
protected fun finalize() {
if (ptr != 0L) {
delete(ptr)
ptr = 0
}
}
fun release() = finalize()
fun use(block: (OfflineStream) -> Unit) {
try {
block(this)
} finally {
release()
}
}
private external fun acceptWaveform(ptr: Long, samples: FloatArray, sampleRate: Int)
private external fun setOption(ptr: Long, key: String, value: String)
private external fun getOption(ptr: Long, key: String): String
private external fun delete(ptr: Long)
companion object {
init {
System.loadLibrary("sherpa-onnx-jni")
}
}
}

View File

@ -0,0 +1,7 @@
package com.k2fsa.sherpa.onnx
data class QnnConfig(
var backendLib: String = "",
var contextBinary: String = "",
var systemLib: String = "",
)

View File

@ -0,0 +1,149 @@
// Copyright (c) 2023 Xiaomi Corporation
package com.k2fsa.sherpa.onnx
import android.content.res.AssetManager
data class SileroVadModelConfig(
var model: String = "",
var threshold: Float = 0.5F,
var minSilenceDuration: Float = 0.25F,
var minSpeechDuration: Float = 0.25F,
var windowSize: Int = 512,
var maxSpeechDuration: Float = 5.0F,
)
data class TenVadModelConfig(
var model: String = "",
var threshold: Float = 0.5F,
var minSilenceDuration: Float = 0.25F,
var minSpeechDuration: Float = 0.25F,
var windowSize: Int = 256,
var maxSpeechDuration: Float = 5.0F,
)
data class VadModelConfig(
var sileroVadModelConfig: SileroVadModelConfig = SileroVadModelConfig(),
var tenVadModelConfig: TenVadModelConfig = TenVadModelConfig(),
var sampleRate: Int = 16000,
var numThreads: Int = 1,
var provider: String = "cpu",
var debug: Boolean = false,
)
class SpeechSegment(val start: Int, val samples: FloatArray)
class Vad(
assetManager: AssetManager? = null,
var config: VadModelConfig,
) {
private var ptr: Long
init {
if (assetManager != null) {
ptr = newFromAsset(assetManager, config)
} else {
ptr = newFromFile(config)
}
}
protected fun finalize() {
if (ptr != 0L) {
delete(ptr)
ptr = 0
}
}
fun release() = finalize()
fun compute(samples: FloatArray): Float = compute(ptr, samples)
fun acceptWaveform(samples: FloatArray) = acceptWaveform(ptr, samples)
fun empty(): Boolean = empty(ptr)
fun pop() = pop(ptr)
fun front(): SpeechSegment {
return front(ptr)
}
fun clear() = clear(ptr)
fun isSpeechDetected(): Boolean = isSpeechDetected(ptr)
fun reset() = reset(ptr)
fun flush() = flush(ptr)
private external fun delete(ptr: Long)
private external fun newFromAsset(
assetManager: AssetManager,
config: VadModelConfig,
): Long
private external fun newFromFile(
config: VadModelConfig,
): Long
private external fun acceptWaveform(ptr: Long, samples: FloatArray)
private external fun compute(ptr: Long, samples: FloatArray): Float
private external fun empty(ptr: Long): Boolean
private external fun pop(ptr: Long)
private external fun clear(ptr: Long)
private external fun front(ptr: Long): SpeechSegment
private external fun isSpeechDetected(ptr: Long): Boolean
private external fun reset(ptr: Long)
private external fun flush(ptr: Long)
companion object {
init {
System.loadLibrary("sherpa-onnx-jni")
}
}
}
// Please visit
// https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx
// to download silero_vad.onnx
// and put it inside the assets/
// directory
//
// For ten-vad, please use
// https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/ten-vad.onnx
//
fun getVadModelConfig(type: Int): VadModelConfig? {
when (type) {
0 -> {
return VadModelConfig(
sileroVadModelConfig = SileroVadModelConfig(
model = "silero_vad.onnx",
threshold = 0.5F,
minSilenceDuration = 0.25F,
minSpeechDuration = 0.25F,
windowSize = 512,
),
sampleRate = 16000,
numThreads = 1,
provider = "cpu",
)
}
1 -> {
return VadModelConfig(
tenVadModelConfig = TenVadModelConfig(
model = "ten-vad.onnx",
threshold = 0.5F,
minSilenceDuration = 0.25F,
minSpeechDuration = 0.25F,
windowSize = 256,
),
sampleRate = 16000,
numThreads = 1,
provider = "cpu",
)
}
}
return null
}

View File

@ -47,6 +47,12 @@ object Config {
/** PAUSE/RESUME Extras: 调用原因(透传日志) */
const val EXTRA_REASON = "reason"
/**
* 内部广播 Action仅本 APP 自己接收,用于 MainActivity UI 实时显示命中
* 与对外协议 [ACTION_WAKEUP] 隔离,避免外部 APP 干扰内部 UI 状态
*/
const val ACTION_INTERNAL_WAKEUP = "com.lila.wakeup.action.INTERNAL_WAKEUP"
// ============================================================
// 二、引擎参数
// ============================================================
@ -57,14 +63,10 @@ object Config {
/** APP 漏发 RESUME 时的兜底超时2 分钟) */
const val PAUSE_TIMEOUT_MS = 2 * 60 * 1000L
/** KWS 主阈值("你好Lila" 主词) */
const val KWS_THRESHOLD_PRIMARY = 0.85f
/** KWS 次阈值(其他变体词) */
const val KWS_THRESHOLD_SECONDARY = 0.80f
/** 后验平滑:连续 N 帧 confidence > 阈值才算命中 */
const val SMOOTH_FRAMES = 2
// 注:KWS_THRESHOLD / SMOOTH_FRAMES 已移除,使用 sherpa-onnx 默认值
// - keywordsThreshold 默认 0.25 (KeywordSpotterConfig)
// - 不做外部后验平滑(sherpa-onnx 内部 numTrailingBlanks 已平滑)
// 原因见 KwsEngine.kt / KwsStateMachine.kt 注释
// ============================================================
// 三、AudioRecord 配置
@ -79,24 +81,67 @@ object Config {
/** 16-bit PCM */
const val ENCODING = AudioFormat.ENCODING_PCM_16BIT
/** 单帧采样点数10ms 帧 @ 16kHz = 160 samples */
const val FRAME_SAMPLES = SAMPLE_RATE / 100
/**
* 单帧采样点数(100ms @ 16kHz = 1600 samples)
*
* 必须与 sherpa-onnx 官方 demo 一致( 100ms )
* 之前用 10ms (160 sample)第一次能识别,之后再无命中sherpa-onnx
* 内部 chunk 累积/边界处理对小帧不友好改用 100ms 后稳定
*/
const val FRAME_SAMPLES = SAMPLE_RATE / 10
// ============================================================
// 四、模型路径assets 内)
// ============================================================
/**
* 模型目录 sherpa-onnx 官方 demo 一致
* 切换模型时改这一行 + 对应的 keywords.txt
* 架构演进历史:
* - v0: KWS wenetspeech 3.3M (2024) 默认 keywords 命中率 30-40%
* - v1: KWS zh-en 双语 3M 英文 Lila 0% 命中
* - v2: ASR (SenseVoice 228MB) + 拼音替换 板载 mic 全部 empty(模型对短词不敏感)
* - v3: KWS Zipformer wenetspeech-3.3M + 自定义谐音 keywords.txt 流式架构(当前)
*
* v3 关键改动:
* keywords.txt 用多个谐音变体覆盖("你好丽啦/咪啦/丽拉/lai啦")
* 流式实时识别,不再切 segment,不再依赖 VAD/能量门
* 模型 5MB(vs SenseVoice 228MB),推理快 10x
*/
const val MODEL_DIR = "sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01"
const val KWS_MODEL_DIR = "kws"
const val KWS_ENCODER = "kws/encoder.int8.onnx"
const val KWS_DECODER = "kws/decoder.onnx"
const val KWS_JOINER = "kws/joiner.int8.onnx"
const val KWS_TOKENS = "kws/tokens.txt"
const val KWS_KEYWORDS_FILE = "kws/keywords.txt"
const val MODEL_ENCODER = "$MODEL_DIR/encoder-epoch-12-avg-2-chunk-16-left-64.onnx"
const val MODEL_DECODER = "$MODEL_DIR/decoder-epoch-12-avg-2-chunk-16-left-64.onnx"
const val MODEL_JOINER = "$MODEL_DIR/joiner-epoch-12-avg-2-chunk-16-left-64.onnx"
const val MODEL_TOKENS = "$MODEL_DIR/tokens.txt"
const val MODEL_KEYWORDS = "$MODEL_DIR/keywords.txt"
/** KWS 推理线程数 */
const val KWS_NUM_THREADS = 2
/**
* 关键词命中阈值(0~1 之间),越低越容易触发
* 板载 mic 弱信号 + Lila 非训练词,激进降到 0.05
* 误触发风险:需要观察 listening 待机时是否被环境噪声误唤醒
*/
const val KWS_THRESHOLD = 0.05f
/**
* 关键词命中加分,越大越容易触发弱信号场景拉到 3.0
*/
const val KWS_SCORE = 3.0f
/**
* AudioCapture 软件预增益(在喂 KWS 之前)
* 板载 mic 信号偏弱,1.5x 增益让说话峰值更接近满量程,KWS 神经网络
* 看到的特征对比度更高clamp [-32768, 32767] 防止溢出失真
*/
const val AUDIO_CAPTURE_GAIN = 1.5f
/**
* 调试: 命中前后 dump 一段 PCM WAV 文件,辅助调试
* 路径: /sdcard/Android/data/com.lila.wakeup/files/lila_kws/
* 拉回: adb pull /sdcard/Android/data/com.lila.wakeup/files/lila_kws/ ./
* 上线前关掉
*/
const val DUMP_HIT_WAV = true
// ============================================================
// 五、通知栏

View File

@ -1,15 +1,22 @@
package com.lila.wakeup
import android.Manifest
import android.content.BroadcastReceiver
import android.content.Context
import android.content.Intent
import android.content.IntentFilter
import android.content.pm.PackageManager
import android.os.Build
import android.os.Bundle
import android.text.method.ScrollingMovementMethod
import android.util.Log
import android.widget.TextView
import androidx.appcompat.app.AppCompatActivity
import androidx.core.app.ActivityCompat
import androidx.core.content.ContextCompat
import java.text.SimpleDateFormat
import java.util.Date
import java.util.Locale
/**
* 状态查看 Activity开发期辅助 + 首次启动权限申请入口
@ -30,21 +37,58 @@ class MainActivity : AppCompatActivity() {
}
private lateinit var statusText: TextView
private var wakeupCount = 0
private val timeFmt = SimpleDateFormat("HH:mm:ss", Locale.getDefault())
/**
* 内部唤醒事件接收器,与外部协议广播 [Config.ACTION_WAKEUP] 隔离
* 收到后追加到 [statusText] 显示,方便用户在 APP 界面直接看效果
*/
private val wakeupReceiver = object : BroadcastReceiver() {
override fun onReceive(context: Context, intent: Intent) {
if (intent.action != Config.ACTION_INTERNAL_WAKEUP) return
val keyword = intent.getStringExtra(Config.EXTRA_KEYWORD) ?: "?"
val confidence = intent.getFloatExtra(Config.EXTRA_CONFIDENCE, 0f)
val ts = intent.getLongExtra(Config.EXTRA_TIMESTAMP, System.currentTimeMillis())
val timeStr = timeFmt.format(Date(ts))
wakeupCount++
statusText.append("\n[$timeStr] #$wakeupCount 命中: $keyword (conf=$confidence)")
Log.i(TAG, "UI shown wakeup #$wakeupCount keyword=$keyword")
}
}
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
// 简化:用一个 TextView 直接显示状态,不用 layout xml
// 简化:用一个可滚动的 TextView 直接显示状态 + 唤醒事件
statusText = TextView(this).apply {
text = "Lila 语音唤醒服务\n\n初始化中..."
textSize = 18f
setPadding(60, 60, 60, 60)
movementMethod = ScrollingMovementMethod()
}
setContentView(statusText)
// 注册内部唤醒事件接收器(Android 13+ 必须显式声明 NOT_EXPORTED)
val filter = IntentFilter(Config.ACTION_INTERNAL_WAKEUP)
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.TIRAMISU) {
registerReceiver(wakeupReceiver, filter, RECEIVER_NOT_EXPORTED)
} else {
registerReceiver(wakeupReceiver, filter)
}
checkAndRequestPermissions()
}
override fun onDestroy() {
super.onDestroy()
try {
unregisterReceiver(wakeupReceiver)
} catch (_: IllegalArgumentException) {
// 接收器从未注册,忽略
}
}
private fun checkAndRequestPermissions() {
val needRequestList = mutableListOf<String>()

View File

@ -12,8 +12,8 @@ import android.os.Build
import android.os.IBinder
import android.util.Log
import androidx.core.app.NotificationCompat
import com.lila.wakeup.kws.AsrEngine
import com.lila.wakeup.kws.AudioCapture
import com.lila.wakeup.kws.KwsEngine
import com.lila.wakeup.kws.KwsStateMachine
import com.lila.wakeup.protocol.BroadcastSender
import com.lila.wakeup.protocol.WakeupServiceLocator
@ -34,7 +34,7 @@ class WakeupForegroundService : Service() {
private const val TAG = "KwsService.Svc"
}
private lateinit var engine: KwsEngine
private lateinit var engine: AsrEngine
private lateinit var stateMachine: KwsStateMachine
private lateinit var audioCapture: AudioCapture
private lateinit var sender: BroadcastSender
@ -48,7 +48,7 @@ class WakeupForegroundService : Service() {
// 初始化引擎栈顺序sender → engine → state → audio
sender = BroadcastSender(applicationContext)
engine = KwsEngine(applicationContext).also { it.init() }
engine = AsrEngine(applicationContext).also { it.init() }
stateMachine = KwsStateMachine(
engine = engine,
sender = sender,

View File

@ -0,0 +1,186 @@
package com.lila.wakeup.kws
import android.content.Context
import android.util.Log
import com.k2fsa.sherpa.onnx.FeatureConfig
import com.k2fsa.sherpa.onnx.KeywordSpotter
import com.k2fsa.sherpa.onnx.KeywordSpotterConfig
import com.k2fsa.sherpa.onnx.OnlineModelConfig
import com.k2fsa.sherpa.onnx.OnlineStream
import com.k2fsa.sherpa.onnx.OnlineTransducerModelConfig
import com.lila.wakeup.Config
import java.io.File
import java.io.FileOutputStream
import java.nio.ByteBuffer
import java.nio.ByteOrder
import java.util.concurrent.atomic.AtomicInteger
/**
* 流式 Keyword Spotter 唤醒引擎(v3 架构)
*
* 工作流:
* 1. AudioCapture 100ms PCM
* 2. KeywordSpotter (sherpa-onnx Zipformer-2 流式 KWS) 持续接收音频
* 3. 模型实时匹配 keywords.txt 里的拼音规则
* 4. 命中即触发 onWakeup 回调
*
* 优势:
* - 流式实时识别,延迟 100~200ms(对比 ASR 切段 1~2 )
* - 模型 5MB(SenseVoice 228MB),启动快内存低
* - 专为短唤醒词训练,识别率高
* - 不再依赖 VAD / 能量门 / DC 去除
* - keywords.txt 多变体覆盖("你好丽啦/咪啦/丽拉..."),修改不用改代码
*
* keywords.txt 格式:
* <拼音 token1> <token2> ... @<显示名>
* : n ǐ h ǎo l ì l ā @你好Lila
*
* @param onWakeup 命中回调( ASR 推理线程调用)
*/
class AsrEngine(
private val context: Context,
var onWakeup: ((keyword: String) -> Unit)? = null,
) {
companion object {
private const val TAG = "KwsService.Engine"
}
private var spotter: KeywordSpotter? = null
private var stream: OnlineStream? = null
private val dumpCounter = AtomicInteger(0)
fun init() {
Log.i(TAG, "[Engine] init start")
val config = KeywordSpotterConfig(
featConfig = FeatureConfig(sampleRate = Config.SAMPLE_RATE, featureDim = 80),
modelConfig = OnlineModelConfig(
transducer = OnlineTransducerModelConfig(
encoder = Config.KWS_ENCODER,
decoder = Config.KWS_DECODER,
joiner = Config.KWS_JOINER,
),
tokens = Config.KWS_TOKENS,
modelType = "zipformer2",
numThreads = Config.KWS_NUM_THREADS,
provider = "cpu",
),
keywordsFile = Config.KWS_KEYWORDS_FILE,
keywordsScore = Config.KWS_SCORE,
keywordsThreshold = Config.KWS_THRESHOLD,
numTrailingBlanks = 2,
)
spotter = KeywordSpotter(assetManager = context.assets, config = config)
stream = spotter!!.createStream()
Log.i(TAG, "[Engine] init done (KWS Zipformer + keywords.txt loaded)")
}
/**
* 喂一帧 PCM 数据流式识别,命中立即回调
*/
fun feedAudio(pcm: ShortArray) {
val s = stream ?: return
val sp = spotter ?: return
// PCM short → float [-1, 1]
val samples = FloatArray(pcm.size) { pcm[it] / 32768f }
s.acceptWaveform(samples, Config.SAMPLE_RATE)
// 持续 decode 直到没有新数据
while (sp.isReady(s)) {
sp.decode(s)
}
// 检查命中
val result = sp.getResult(s)
if (result.keyword.isNotEmpty()) {
Log.i(TAG, "[KWS] HIT keyword=\"${result.keyword}\" tokens=${result.tokens.joinToString(",")}")
onWakeup?.invoke(result.keyword)
// 命中后 reset stream,等待下一次唤醒
sp.reset(s)
// 命中时 dump 最近 PCM 到 WAV(诊断用)
if (Config.DUMP_HIT_WAV) {
dumpHitWav(pcm)
}
}
}
/**
* 重置识别状态 PAUSE/RESUME 切换时调用
*/
fun reset() {
val s = stream ?: return
val sp = spotter ?: return
sp.reset(s)
Log.i(TAG, "[Engine] reset (stream cleared)")
}
/**
* 释放 native 资源
*/
fun release() {
try {
stream?.release()
spotter?.release()
} catch (e: Exception) {
Log.e(TAG, "release failed: ${e.message}")
}
stream = null
spotter = null
Log.i(TAG, "[Engine] released")
}
// ============================================================
// WAV dump(诊断用)
// ============================================================
/**
* 命中时 dump 最近一帧 PCM WAV
*
* 路径: /sdcard/Android/data/com.lila.wakeup/files/lila_kws/hit_NNN.wav
* 拉回: adb pull /sdcard/Android/data/com.lila.wakeup/files/lila_kws/ ./
*/
private fun dumpHitWav(pcm: ShortArray) {
try {
val baseDir = context.getExternalFilesDir(null) ?: context.filesDir
val dir = File(baseDir, "lila_kws")
if (!dir.exists()) dir.mkdirs()
val idx = dumpCounter.incrementAndGet()
val file = File(dir, "hit_${idx.toString().padStart(3, '0')}.wav")
val dataBytes = pcm.size * 2
val totalBytes = 36 + dataBytes
val byteRate = Config.SAMPLE_RATE * 2
val header = ByteBuffer.allocate(44).order(ByteOrder.LITTLE_ENDIAN).apply {
put("RIFF".toByteArray())
putInt(totalBytes)
put("WAVE".toByteArray())
put("fmt ".toByteArray())
putInt(16)
putShort(1)
putShort(1)
putInt(Config.SAMPLE_RATE)
putInt(byteRate)
putShort(2)
putShort(16)
put("data".toByteArray())
putInt(dataBytes)
}.array()
FileOutputStream(file).use { fos ->
fos.write(header)
val data = ByteBuffer.allocate(dataBytes).order(ByteOrder.LITTLE_ENDIAN)
for (s in pcm) data.putShort(s)
fos.write(data.array())
}
Log.i(TAG, "[Dump] hit WAV saved: ${file.absolutePath}")
} catch (e: Exception) {
Log.e(TAG, "[Dump] failed: ${e.message}", e)
}
}
}

View File

@ -10,13 +10,16 @@ import com.lila.wakeup.Config
/**
* 麦克风音频采集封装
*
* 设计要点
* - AudioSource VOICE_RECOGNITION自动启用 NS/AGC省去自己降噪
* - 16kHz / mono / 16-bit PCM sherpa-onnx 模型严格对齐
* - 10ms 160 sample便于和 VAD 对齐
* - 暂停时完全 release AudioRecord释放麦克风给 RTC SDK
* 设计要点:
* - AudioSource VOICE_RECOGNITION:启用 Android 系统 AGC + NS(自动增益 + 降噪)
* 板载 mic 信号弱(说话峰值 16000 vs 底噪 5000,SNR ~10dB), MIC raw 信号
* SenseVoice 识别后输出全空VOICE_RECOGNITION 走系统 ASR 标准路径,硬件 AGC
* 把弱信号拉到合理量程,实测 ASR 命中率显著提升
* - 16kHz / mono / 16-bit PCM( sherpa-onnx 模型严格对齐)
* - 100ms (1600 sample),过小帧会导致 sherpa-onnx 内部 chunk 边界异常
* - 暂停时完全 release AudioRecord,释放麦克风给 RTC SDK
*
* 这是常驻读取线程不是 ForegroundService 主线程
* :这是常驻读取线程,不是 ForegroundService 主线程
*/
class AudioCapture(
/** 每读到一帧 PCM 数据时回调,参数为 16-bit PCM short[] */
@ -40,11 +43,14 @@ class AudioCapture(
val minBuf = AudioRecord.getMinBufferSize(
Config.SAMPLE_RATE, Config.CHANNEL_CONFIG, Config.ENCODING
)
// 至少 100ms ring buffer多于 minBuf 防止音频丢帧
val bufSize = maxOf(minBuf, Config.SAMPLE_RATE * 2 * 100 / 1000)
// ring buffer 必须 >= 一帧字节数,否则 read(1600) 在 1280 字节 buffer 上读不全,
// 数据断帧导致 VAD/ASR 收到的是部分 + 历史脏数据。取 max(minBuf*2, frameBytes*4)
// 给 100ms 帧充足缓冲,同时不丢帧。
val frameBytes = Config.FRAME_SAMPLES * 2 // 16-bit PCM = 2 字节/样本
val bufSize = maxOf(minBuf * 2, frameBytes * 4)
record = AudioRecord.Builder()
.setAudioSource(MediaRecorder.AudioSource.VOICE_RECOGNITION)
.setAudioSource(MediaRecorder.AudioSource.VOICE_RECOGNITION) // 启用系统 AGC/NS
.setAudioFormat(
AudioFormat.Builder()
.setSampleRate(Config.SAMPLE_RATE)
@ -67,9 +73,31 @@ class AudioCapture(
thread = Thread {
val frame = ShortArray(Config.FRAME_SAMPLES)
val gain = Config.AUDIO_CAPTURE_GAIN
var frameCount = 0L
while (running) {
val n = record?.read(frame, 0, frame.size) ?: 0
if (n > 0) {
// 软件预增益: 板载 mic 信号偏弱,放大让 KWS 看到更显著特征
if (gain != 1.0f) {
for (i in 0 until n) {
val v = (frame[i] * gain).toInt()
frame[i] = when {
v > 32767 -> 32767
v < -32768 -> -32768
else -> v.toShort()
}
}
}
// 每 10 帧 (1 秒) 打印一次音量峰值
if (++frameCount % 10 == 0L) {
var peak = 0
for (i in 0 until n) {
val v = if (frame[i] >= 0) frame[i].toInt() else -frame[i].toInt()
if (v > peak) peak = v
}
Log.i(TAG, "[Audio] frame#$frameCount n=$n peak=$peak (gain=${gain}x)")
}
onFrame(frame)
} else if (n < 0) {
Log.e(TAG, "AudioRecord.read error code=$n, abort")

View File

@ -1,119 +0,0 @@
package com.lila.wakeup.kws
import android.content.Context
import android.util.Log
import com.k2fsa.sherpa.onnx.KeywordSpotter
import com.k2fsa.sherpa.onnx.KeywordSpotterConfig
import com.k2fsa.sherpa.onnx.OnlineModelConfig
import com.k2fsa.sherpa.onnx.OnlineStream
import com.k2fsa.sherpa.onnx.OnlineTransducerModelConfig
import com.lila.wakeup.Config
/**
* sherpa-onnx KWS 引擎封装
*
* 该类是对 sherpa-onnx 官方 [KeywordSpotter] Kotlin Wrapper 的薄封装
* 隐藏底层 API 细节对外暴露简洁的 [process] 接口
*
* 模型路径 [Config.MODEL_DIR] APK assets 加载
*
* 版本兼容本骨架基于 sherpa-onnx v1.13.0+ Kotlin API
* 若克隆的 sherpa-onnx 仓库 API 有差异如类名变更仅需调整本文件
*
* 业务结果数据类
*
* @param keyword 命中的唤醒词文本 "你好Lila"未命中为空字符串
* @param confidence 置信度[0,1]
*/
data class KwsResult(
val keyword: String,
val confidence: Float
) {
val isHit: Boolean get() = keyword.isNotEmpty()
}
class KwsEngine(private val context: Context) {
companion object {
private const val TAG = "KwsService.Engine"
}
private var spotter: KeywordSpotter? = null
private var stream: OnlineStream? = null
/**
* 加载模型 + keywords.txt初始化推理引擎
* 必须在 [process] 之前调用一次
*/
fun init() {
val cfg = KeywordSpotterConfig(
featConfig = com.k2fsa.sherpa.onnx.FeatureConfig(
sampleRate = Config.SAMPLE_RATE,
featureDim = 80
),
modelConfig = OnlineModelConfig(
transducer = OnlineTransducerModelConfig(
encoder = Config.MODEL_ENCODER,
decoder = Config.MODEL_DECODER,
joiner = Config.MODEL_JOINER
),
tokens = Config.MODEL_TOKENS,
modelType = "zipformer"
),
keywordsFile = Config.MODEL_KEYWORDS,
keywordsThreshold = Config.KWS_THRESHOLD_PRIMARY,
keywordsScore = 1.5f,
numTrailingBlanks = 2
)
spotter = KeywordSpotter(assetManager = context.assets, config = cfg)
stream = spotter?.createStream()
Log.i(TAG, "[Engine] init done, keywordsFile=${Config.MODEL_KEYWORDS}")
}
/**
* 喂一帧 PCM 数据进引擎返回是否命中
*
* @param pcm 16-bit PCM short[] 数据长度任意一般 10ms = 160 samples
* @return 命中则 [KwsResult.isHit] true
*/
fun process(pcm: ShortArray): KwsResult {
val sp = spotter ?: return KwsResult("", 0f)
val st = stream ?: return KwsResult("", 0f)
// sherpa-onnx 输入要求 [-1, 1] 范围 float
val floatBuf = FloatArray(pcm.size) { pcm[it] / 32768f }
st.acceptWaveform(floatBuf, Config.SAMPLE_RATE)
while (sp.isReady(st)) {
sp.decode(st)
}
val result = sp.getResult(st)
val keyword = result.keyword
return if (keyword.isNotEmpty()) {
// 命中后必须 reset否则下次不会再触发
sp.reset(st)
Log.i(TAG, "[Engine] hit keyword=$keyword")
KwsResult(keyword, 1f) // sherpa-onnx 不直接给 confidence通过 keyword_score 间接控制
} else {
KwsResult("", 0f)
}
}
/**
* 释放 native 资源 Service.onDestroy 中调用
*/
fun release() {
try {
stream?.release()
spotter?.release()
} catch (e: Exception) {
Log.e(TAG, "release failed: ${e.message}")
}
stream = null
spotter = null
Log.i(TAG, "[Engine] released")
}
}

View File

@ -10,23 +10,39 @@ import java.util.concurrent.atomic.AtomicReference
/**
* KWS 状态机 协议 v2.1 规则的中央控制器
*
* 三个状态
* - [State.Idle]: 服务启动中极短
* - [State.Listening]: 正在监听唤醒词命中即发 WAKEUP 广播
* - [State.Paused]: APP 主动暂停 silence_ms 二次保险窗口
* 三个状态:
* - [State.Idle]: 服务启动中(极短)
* - [State.Listening]: 正在监听唤醒词,命中即发 WAKEUP 广播
* - [State.Paused]: APP 主动暂停( silence_ms 二次保险窗口)
*
* silence_ms 实现
* - 收到 PAUSE 立即进入 Paused记录 silenceUntilMs
* - silenceUntilMs 期间收到 RESUME 等到 silenceUntilMs 过后才真正 resume AI 自我唤醒
* silence_ms 实现:
* - 收到 PAUSE 立即进入 Paused,记录 silenceUntilMs
* - silenceUntilMs 期间收到 RESUME 等到 silenceUntilMs 过后才真正 resume( AI 自我唤醒)
* - 命中事件在 Paused 期间忽略
*
* 兜底超时APP 漏发 RESUME 2 分钟自动 resume详见 Config.PAUSE_TIMEOUT_MS
* 兜底超时:APP 漏发 RESUME ,2 分钟自动 resume(详见 Config.PAUSE_TIMEOUT_MS)
*
* 后验平滑设计移除:sherpa-onnx 内部已有 numTrailingBlanks 平滑,
* 外部再叠加反而导致漏命中(命中后 reset stream,下一帧不会再 hit)
*/
class KwsStateMachine(
private val engine: KwsEngine,
private val engine: AsrEngine,
private val sender: BroadcastSender,
private val onTimeoutResume: () -> Unit = {}
) {
init {
// ASR 引擎在内部线程触发命中,通过回调把 keyword 转发给 sender
// 仅在 Listening 状态下转发,Paused 期间忽略命中(VAD 段已处理但状态变更)
engine.onWakeup = { keyword ->
if (state.get() is State.Listening) {
Log.i(TAG, "[KWS] HIT keyword=$keyword")
sender.sendWakeup(keyword, confidence = 1.0f)
} else {
Log.d(TAG, "[KWS] hit ignored (state != LISTENING) keyword=$keyword")
}
}
}
companion object {
private const val TAG = "KwsService.State"
}
@ -37,38 +53,31 @@ class KwsStateMachine(
data class Paused(val silenceUntilMs: Long, val reason: String) : State()
}
/** 用 Atomic 做无锁状态切换避免读写竞态 */
/** 用 Atomic 做无锁状态切换,避免读写竞态 */
private val state = AtomicReference<State>(State.Idle)
private val mainHandler = Handler(Looper.getMainLooper())
private var timeoutRunnable: Runnable? = null
private var deferredResumeRunnable: Runnable? = null
/** 平滑窗口:连续 N 帧都命中才真正发 WAKEUPv2.1 后验平滑) */
private var consecutiveHits = 0
fun transitionToListening() {
state.set(State.Listening)
consecutiveHits = 0
Log.i(TAG, "[State] -> LISTENING")
}
/**
* 收到 KWS_PAUSE 广播时调用
*
* @param silenceMs 静默期毫秒默认 [Config.DEFAULT_SILENCE_MS]
* @param reason 调用原因仅日志
* @param silenceMs 静默期(毫秒),默认 [Config.DEFAULT_SILENCE_MS]
* @param reason 调用原因(仅日志)
*/
fun onPauseReceived(silenceMs: Long, reason: String) {
val until = System.currentTimeMillis() + silenceMs
state.set(State.Paused(until, reason))
consecutiveHits = 0
Log.i(TAG, "[State] -> PAUSED reason=$reason silence_ms=$silenceMs until=$until")
// 取消上一次的 deferred resume如果有
// 重置 VAD,清除暂停期间累积的音频(避免恢复后重复识别旧音)
engine.reset()
cancelDeferredResume()
// 启动 2 分钟兜底超时
scheduleTimeoutResume()
}
@ -76,12 +85,10 @@ class KwsStateMachine(
* 收到 KWS_RESUME 广播时调用
*/
fun onResumeReceived(reason: String) {
val current = state.get()
when (current) {
when (val current = state.get()) {
is State.Paused -> {
val now = System.currentTimeMillis()
if (now < current.silenceUntilMs) {
// silence 期间,延迟到 silence 满才真正 resume
val delay = current.silenceUntilMs - now
Log.i(TAG, "[State] RESUME deferred ${delay}ms (still in silence)")
deferredResumeRunnable = Runnable {
@ -96,7 +103,6 @@ class KwsStateMachine(
}
}
is State.Listening -> {
// 幂等:本就在监听态,无副作用
Log.d(TAG, "[State] RESUME ignored, already LISTENING")
}
is State.Idle -> {
@ -108,26 +114,12 @@ class KwsStateMachine(
}
/**
* AudioCapture 每帧调用一次 Listening 态喂引擎并检查命中
* AudioCapture 每帧调用一次 Listening 态喂 VAD,
* VAD 累积语音段后异步触发 ASR 识别,命中通过 engine.onWakeup 回调
*/
fun onAudioFrame(pcm: ShortArray) {
if (state.get() !is State.Listening) return
val result = engine.process(pcm)
if (result.isHit) {
consecutiveHits++
if (consecutiveHits >= Config.SMOOTH_FRAMES) {
// 命中且过平滑 → 发 WAKEUP 广播
Log.i(TAG, "[KWS] HIT confirmed keyword=${result.keyword} smooth=$consecutiveHits")
sender.sendWakeup(result.keyword, result.confidence)
consecutiveHits = 0
} else {
Log.d(TAG, "[KWS] hit pending smooth=$consecutiveHits/${Config.SMOOTH_FRAMES}")
}
} else {
// 未命中重置计数
if (consecutiveHits > 0) consecutiveHits = 0
}
engine.feedAudio(pcm)
}
fun shutdown() {
@ -137,13 +129,13 @@ class KwsStateMachine(
}
// ============================================================
// 内部兜底定时器
// 内部:兜底定时器
// ============================================================
private fun scheduleTimeoutResume() {
cancelTimeoutResume()
timeoutRunnable = Runnable {
Log.w(TAG, "[Tmout] PAUSE 超时${Config.PAUSE_TIMEOUT_MS / 1000}s强制 RESUME")
Log.w(TAG, "[Tmout] PAUSE 超时(${Config.PAUSE_TIMEOUT_MS / 1000}s),强制 RESUME")
transitionToListening()
onTimeoutResume()
}.also {

View File

@ -23,14 +23,24 @@ class BroadcastSender(private val context: Context) {
fun sendWakeup(keyword: String, confidence: Float) {
val timestamp = System.currentTimeMillis()
val intent = Intent(Config.ACTION_WAKEUP).apply {
// 1. 外部协议广播:发给数字人 APPcom.qy.lila
val externalIntent = Intent(Config.ACTION_WAKEUP).apply {
setPackage(Config.APP_PACKAGE) // v2.1 双向 setPackage 强制要求
putExtra(Config.EXTRA_KEYWORD, keyword)
putExtra(Config.EXTRA_TIMESTAMP, timestamp)
putExtra(Config.EXTRA_CONFIDENCE, confidence)
}
context.sendBroadcast(externalIntent)
// 2. 内部广播:本 APP 的 MainActivity 接收,用于 UI 实时显示命中
val internalIntent = Intent(Config.ACTION_INTERNAL_WAKEUP).apply {
setPackage(Config.SELF_PACKAGE) // 仅本 APP 自己接收,避免泄漏到外部
putExtra(Config.EXTRA_KEYWORD, keyword)
putExtra(Config.EXTRA_TIMESTAMP, timestamp)
putExtra(Config.EXTRA_CONFIDENCE, confidence)
}
context.sendBroadcast(internalIntent)
context.sendBroadcast(intent)
Log.i(TAG, "WAKEUP -> ${Config.APP_PACKAGE} keyword=$keyword conf=$confidence ts=$timestamp")
}
}