diff --git a/API相关/语音合成大模型-单向流式websocket-V3-支持复刻混音mix.md b/API相关/语音合成大模型-单向流式websocket-V3-支持复刻混音mix.md
new file mode 100644
index 0000000..fd679ef
--- /dev/null
+++ b/API相关/语音合成大模型-单向流式websocket-V3-支持复刻混音mix.md
@@ -0,0 +1,1205 @@
+
+# 1 接口功能
+单向流式API为用户提供文本转语音的能力,支持多语种、多方言,同时支持websocket协议流式输出。
+
+## 1.1 最佳实践
+推荐使用链接复用,可降低耗时约70ms左右。
+对比v1单向流式接口,不同的音色优化程度不同,以具体测试结果为准,理论上相对会有几十ms的提升。
+
+# 2 接口说明
+
+## 2.1 请求Request
+
+### 请求路径
+`wss://openspeech.bytedance.com/api/v3/tts/unidirectional/stream`
+
+### 建连&鉴权
+
+#### Request Headers
+
+| | | | | \
+|Key |说明 |是否必须 |Value示例 |
+|---|---|---|---|
+| | | | | \
+|X-Api-App-Id |\
+| |使用火山引擎控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F) |是 |\
+| | | |your-app-id |\
+| | | | |
+| | | | | \
+|X-Api-Access-Key |\
+| |使用火山引擎控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F) |是 |\
+| | | |your-access-key |\
+| | | | |
+| | | | | \
+|X-Api-Resource-Id |\
+| |表示调用服务的资源信息 ID |\
+| | |\
+| |* 豆包语音合成模型1.0: |\
+| | * seed-tts-1.0 或者 volc.service_type.10029(字符版) |\
+| | * seed-tts-1.0-concurr 或者 volc.service_type.10048(并发版) |\
+| |* 豆包语音合成模型2.0: |\
+| | * seed-tts-2.0 (字符版) |\
+| |* 声音复刻: |\
+| | * seed-icl-1.0(声音复刻1.0字符版) |\
+| | * seed-icl-1.0-concurr(声音复刻1.0并发版) |\
+| | * seed-icl-2.0 (声音复刻2.0字符版) |\
+| | |\
+| |**注意:** |\
+| | |\
+| |* "豆包语音合成模型1.0"的资源信息ID仅适用于["豆包语音合成模型1.0"的音色](https://www.volcengine.com/docs/6561/1257544) |\
+| |* "豆包语音合成模型2.0"的资源信息ID仅适用于["豆包语音合成模型2.0"的音色](https://www.volcengine.com/docs/6561/1257544) |是 |\
+| | | |* 豆包语音合成模型1.0: |\
+| | | | * seed-tts-1.0 |\
+| | | | * seed-tts-1.0-concurr |\
+| | | |* 豆包语音合成模型2.0: |\
+| | | | * seed-tts-2.0 |\
+| | | |* 声音复刻: |\
+| | | | * seed-icl-1.0(声音复刻1.0字符版) |\
+| | | | * seed-icl-1.0-concurr(声音复刻1.0并发版) |\
+| | | | * seed-icl-2.0 (声音复刻2.0字符版) |
+| | | | | \
+|X-Api-Request-Id |标识客户端请求ID,uuid随机字符串 |否 |67ee89ba-7050-4c04-a3d7-ac61a63499b3 |
+| | | | | \
+|X-Control-Require-Usage-Tokens-Return |请求消耗的用量返回控制标记。当携带此字段,在SessionFinish事件(152)中会携带用量数据 |否 |* 设置为*,表示返回已支持的用量数据。 |\
+| | | |* 也设置为具体的用量数据标记,如text_words;多个用逗号分隔 |\
+| | | |* 当前已支持的用量数据 |\
+| | | | * text_words,表示计费字符数 |
+
+
+#### Response Headers
+
+| | | | \
+|Key |说明 |Value示例 |
+|---|---|---|
+| | | | \
+|X-Tt-Logid |服务端返回的 logid,建议用户获取和打印方便定位问题 |2025041513355271DF5CF1A0AE0508E78C |
+
+
+### WebSocket 二进制协议
+WebSocket 使用二进制协议传输数据。
+协议的组成由至少 4 个字节的可变 header、payload size 和 payload 三部分组成,其中
+
+* header 描述消息类型、序列化方式以及压缩格式等信息;
+* payload size 是 payload 的长度;
+* payload 是具体负载内容,依据消息类型不同 payload 内容不同;
+
+需注意:协议中整数类型的字段都使用**大端**表示。
+
+##### 二进制帧
+
+| | | | | \
+|Byte |Left 4-bit |Right 4-bit |说明 |
+|---|---|---|---|
+| | | | | \
+|0 - Left half |Protocol version | |目前只有v1,始终填0b0001 |
+| | | | | \
+|0 - Right half | |Header size (4x) |目前只有4字节,始终填0b0001 |
+| | | | | \
+|1 - Left half |Message type | |固定为0b001 |
+| | | | | \
+|1 - Right half | |Message type specific flags |在sendText时,为0 |\
+| | | |在finishConnection时,为0b100 |
+| | | | | \
+|2 - Left half |Serialization method | |0b0000:Raw(无特殊序列化方式,主要针对二进制音频数据)0b0001:JSON(主要针对文本类型消息) |
+| | | | | \
+|2 - Right half | |Compression method |0b0000:无压缩0b0001:gzip |
+| | || | \
+|3 |Reserved | |留空(0b0000 0000) |
+| | || | \
+|[4 ~ 7] |[Optional field,like event number,...] | |取决于Message type specific flags,可能有、也可能没有 |
+| | || | \
+|... |Payload | |可能是音频数据、文本数据、音频文本混合数据 |
+
+
+###### payload请求参数
+
+| | | | | | \
+|字段 |描述 |是否必须 |类型 |默认值 |
+|---|---|---|---|---|
+| | | | | | \
+|user |用户信息 | | | |
+| | | | | | \
+|user.uid |用户uid | | | |
+| | | | | | \
+|event |请求的事件 | | | |
+| | | | | | \
+|namespace |请求方法 | |string |BidirectionalTTS |
+| | | | | | \
+|req_params.text |输入文本 | |string | |
+| | | | | | \
+|req_params.model |\
+| |模型版本,传`seed-tts-1.1`较默认版本音质有提升,并且延时更优,不传为默认效果。 |\
+| |注:若使用1.1模型效果,在复刻场景中会放大训练音频prompt特质,因此对prompt的要求更高,使用高质量的训练音频,可以获得更优的音质效果。 |\
+| | |\
+| |以下参数仅针对声音复刻2.0的音色生效,即音色ID的前缀为`saturn_`的音色。音色的取值为以下两种: |\
+| | |\
+| |* `seed-tts-2.0-expressive`:表现力较强,支持QA和Cot能力,不过可能存在抽卡的情况。 |\
+| |* `seed-tts-2.0-standard`:表现力上更加稳定,但是不支持QA和Cot能力。如果此时使用QA或Cot能力,则拒绝请求。 |\
+| |* 如果不传model参数,默认使用`seed-tts-2.0-expressive`模型。 | |string |\
+| | | | | |
+| | | | | | \
+|req_params.ssml |* 当文本格式是ssml时,需要将文本赋值为ssml,此时文本处理的优先级高于text。ssml和text字段,至少有一个不为空 |\
+| |* ["豆包语音合成模型2.0"的音色](https://www.volcengine.com/docs/6561/1257544) 暂不支持 |\
+| |* 豆包声音复刻模型2.0(icl 2.0)的音色暂不支持 | |string | |
+| | | | | | \
+|req_params.speaker |发音人,具体见[发音人列表](https://www.volcengine.com/docs/6561/1257544) |√ |string | |
+| | | | | | \
+|req_params.audio_params |音频参数,便于服务节省音频解码耗时 |√ |object | |
+| | | | | | \
+|req_params.audio_params.format |音频编码格式,mp3/ogg_opus/pcm。接口传入wav并不会报错,在流式场景下传入wav会多次返回wav header,这种场景建议使用pcm。 | |string |mp3 |
+| | | | | | \
+|req_params.audio_params.sample_rate |音频采样率,可选值 [8000,16000,22050,24000,32000,44100,48000] | |number |24000 |
+| | | | | | \
+|req_params.audio_params.bit_rate |音频比特率,可传16000、32000等。 |\
+| |bit_rate默认设置范围为64k~160k,传了disable_default_bit_rate为true后可以设置到64k以下 |\
+| |GoLang示例:additions = fmt.Sprintf("{"disable_default_bit_rate":true}") |\
+| |注:bit_rate只针对MP3格式,wav计算比特率跟pcm一样是 比特率 (bps) = 采样率 × 位深度 × 声道数 |\
+| |目前大模型TTS只能改采样率,所以对于wav格式来说只能通过改采样率来变更音频的比特率 | |number | |
+| | | | | | \
+|req_params.audio_params.emotion |设置音色的情感。示例:"emotion": "angry" |\
+| |注:当前仅部分音色支持设置情感,且不同音色支持的情感范围存在不同。 |\
+| |详见:[大模型语音合成API-音色列表-多情感音色](https://www.volcengine.com/docs/6561/1257544) | |string | |
+| | | | | | \
+|req_params.audio_params.emotion_scale |调用emotion设置情感参数后可使用emotion_scale进一步设置情绪值,范围1~5,不设置时默认值为4。 |\
+| |注:理论上情绪值越大,情感越明显。但情绪值1~5实际为非线性增长,可能存在超过某个值后,情绪增加不明显,例如设置3和5时情绪值可能接近。 | |number |4 |
+| | | | | | \
+|req_params.audio_params.speech_rate |语速,取值范围[-50,100],100代表2.0倍速,-50代表0.5倍数 | |number |0 |
+| | | | | | \
+|req_params.audio_params.loudness_rate |音量,取值范围[-50,100],100代表2.0倍音量,-50代表0.5倍音量(mix音色暂不支持) | |number |0 |
+| | | | | | \
+|req_params.audio_params.enable_timestamp |\
+|([仅TTS1.0支持](https://www.volcengine.com/docs/6561/1257544)) |设置 "enable_timestamp": true 返回句级别字的时间戳(默认为 false,参数传入 true 即表示启用) |\
+| |开启后,在原有返回的事件`event=TTSSentenceEnd`中,新增该子句的时间戳信息。 |\
+| | |\
+| |* 一个子句的时间戳返回之后才会开始返回下一句音频。 |\
+| |* 合成有多个子句会多次返回`TTSSentenceStart`和`TTSSentenceEnd`。开启字幕后字幕跟随`TTSSentenceEnd`返回。 |\
+| |* 字/词粒度的时间戳,其中字/词是tn。具体可以看下面的例子。 |\
+| |* 支持中、英,其他语种、方言暂时不支持。 |\
+| | |\
+| |注:该字段仅适用于["豆包语音合成模型1.0"的音色](https://www.volcengine.com/docs/6561/1257544) | |bool |false |
+| | | | | | \
+|req_params.audio_params.enable_subtitle |设置 "enable_subtitle": true 返回句级别字的时间戳(默认为 false,参数传入 true 即表示启用) |\
+| |开启后,新增返回事件`event=TTSSubtitle`,包含字幕信息。 |\
+| | |\
+| |* 在一句音频合成之后,不会立即返回该句的字幕。合成进度不会被字幕识别阻塞,当一句的字幕识别完成后立即返回。可能一个子句的字幕返回的时候,已经返回下一句的音频帧给调用方了。 |\
+| |* 合成有多个子句,仅返回一次`TTSSentenceStart`和`TTSSentenceEnd`。开启字幕后会多次返回`TTSSubtitle`。 |\
+| |* 字/词粒度的时间戳,其中字/词是原文。具体可以看下面的例子。 |\
+| |* 支持中、英,其他语种、方言暂时不支持; |\
+| |* latex公式不支持 |\
+| | * req_params.additions.enable_latex_tn为true时,不开启字幕识别功能,即不返回字幕; |\
+| |* ssml不支持 |\
+| | * req_params.ssml 不传时,不开启字幕识别功能,即不返回字幕; |\
+| | |\
+| |注:该参数只在TTS2.0、ICL2.0生效。 | |bool |false |
+| | | | | | \
+|req_params.additions |用户自定义参数 | |jsonstring | |
+| | | | | | \
+|req_params.additions.silence_duration |设置该参数可在句尾增加静音时长,范围0~30000ms。(注:增加的句尾静音主要针对传入文本最后的句尾,而非每句话的句尾) | |number |0 |
+| | | | | | \
+|req_params.additions.enable_language_detector |自动识别语种 | |bool |false |
+| | | | | | \
+|req_params.additions.disable_markdown_filter |是否开启markdown解析过滤, |\
+| |为true时,解析并过滤markdown语法,例如,`**你好**`,会读为“你好”, |\
+| |为false时,不解析不过滤,例如,`**你好**`,会读为“星星‘你好’星星” | |bool |false |
+| | | | | | \
+|req_params.additions.disable_emoji_filter |开启emoji表情在文本中不过滤显示,默认为false,建议搭配时间戳参数一起使用。 |\
+| |GoLang示例:`additions = fmt.Sprintf("{"disable_emoji_filter":true}")` | |bool |false |
+| | | | | | \
+|req_params.additions.mute_cut_remain_ms |该参数需配合mute_cut_threshold参数一起使用,其中: |\
+| |"mute_cut_threshold": "400", // 静音判断的阈值(音量小于该值时判定为静音) |\
+| |"mute_cut_remain_ms": "50", // 需要保留的静音长度 |\
+| |注:参数和value都为string格式 |\
+| |Golang示例:`additions = fmt.Sprintf("{"mute_cut_threshold":"400", "mute_cut_remain_ms": "1"}")` |\
+| |特别提醒: |\
+| | |\
+| |* 因MP3格式的特殊性,句首始终会存在100ms内的静音无法消除,WAV格式的音频句首静音可全部消除,建议依照自身业务需求综合判断选择 | |string | |
+| | | | | | \
+|req_params.additions.enable_latex_tn |是否可以播报latex公式,需将disable_markdown_filter设为true | |bool |false |
+| | | | | | \
+|req_params.additions.latex_parser |是否使用lid 能力播报latex公式,相较于latex_tn 效果更好; |\
+| |值为“v2”时支持lid能力解析公式,值为“”时不支持lid; |\
+| |需同时将disable_markdown_filter设为true; | |string | |
+| | | | | | \
+|req_params.additions.max_length_to_filter_parenthesis |是否过滤括号内的部分,0为不过滤,100为过滤 | |int |100 |
+| | | | | | \
+|req_params.additions.explicit_language(明确语种) |仅读指定语种的文本 |\
+| |**精品音色和 声音复刻 ICL1.0场景:** |\
+| | |\
+| |* 不给定参数,正常中英混 |\
+| |* `crosslingual` 启用多语种前端(包含`zh/en/ja/es-ms/id/pt-br`) |\
+| |* `zh-cn` 中文为主,支持中英混 |\
+| |* `en` 仅英文 |\
+| |* `ja` 仅日文 |\
+| |* `es-mx` 仅墨西 |\
+| |* `id` 仅印尼 |\
+| |* `pt-br` 仅巴葡 |\
+| | |\
+| |**DIT 声音复刻场景:** |\
+| |当音色是使用model_type=2训练的,即采用dit标准版效果时,建议指定明确语种,目前支持: |\
+| | |\
+| |* 不给定参数,启用多语种前端`zh,en,ja,es-mx,id,pt-br,de,fr` |\
+| |* `zh,en,ja,es-mx,id,pt-br,de,fr` 启用多语种前端 |\
+| |* `zh-cn` 中文为主,支持中英混 |\
+| |* `en` 仅英文 |\
+| |* `ja` 仅日文 |\
+| |* `es-mx` 仅墨西 |\
+| |* `id` 仅印尼 |\
+| |* `pt-br` 仅巴葡 |\
+| |* `de` 仅德语 |\
+| |* `fr` 仅法语 |\
+| | |\
+| |当音色是使用model_type=3训练的,即采用dit还原版效果时,必须指定明确语种,目前支持: |\
+| | |\
+| |* 不给定参数,正常中英混 |\
+| |* `zh-cn` 中文为主,支持中英混 |\
+| |* `en` 仅英文 |\
+| | |\
+| |**声音复刻 ICL2.0场景:** |\
+| |当音色是使用model_type=4训练的 |\
+| | |\
+| |* 不给定参数,正常中英混 |\
+| |* `zh-cn` 中文为主,支持中英混 |\
+| |* `en` 仅英文 |\
+| | |\
+| |GoLang示例:`additions = fmt.Sprintf("{"explicit_language": "zh"}")` | |string | |
+| | | | | | \
+|req_params.additions.context_language(参考语种) |给模型提供参考的语种 |\
+| | |\
+| |* 不给定 西欧语种采用英语 |\
+| |* id 西欧语种采用印尼 |\
+| |* es 西欧语种采用墨西 |\
+| |* pt 西欧语种采用巴葡 | |string | |
+| | | | | | \
+|req_params.additions.unsupported_char_ratio_thresh |默认: 0.3,最大值: 1.0 |\
+| |检测出不支持合成的文本超过设置的比例,则会返回错误。 | |float |0.3 |
+| | | | | | \
+|req_params.additions.aigc_watermark |默认:false |\
+| |是否在合成结尾增加音频节奏标识 | |bool |false |
+| | | | | | \
+|req_params.additions.aigc_metadata (meta 水印) |在合成音频 header加入元数据隐式表示,支持 mp3/wav/ogg_opus | |object | |
+| | | | | | \
+|req_params.additions.aigc_metadata.enable |是否启用隐式水印 | |bool |false |
+| | | | | | \
+|req_params.additions.aigc_metadata.content_producer |合成服务提供者的名称或编码 | |string |"" |
+| | | | | | \
+|req_params.additions.aigc_metadata.produce_id |内容制作编号 | |string |"" |
+| | | | | | \
+|req_params.additions.aigc_metadata.content_propagator |内容传播服务提供者的名称或编码 | |string |"" |
+| | | | | | \
+|req_params.additions.aigc_metadata.propagate_id |内容传播编号 | |string |"" |
+| | | | | | \
+|req_params.additions.cache_config(缓存相关参数) |开启缓存,开启后合成相同文本时,服务会直接读取缓存返回上一次合成该文本的音频,可明显加快相同文本的合成速率,缓存数据保留时间1小时。 |\
+| |(通过缓存返回的数据不会附带时间戳) |\
+| |Golang示例:`additions = fmt.Sprintf("{"disable_default_bit_rate":true, "cache_config": {"text_type": 1,"use_cache": true}}")` | |object | |
+| | | | | | \
+|req_params.additions.cache_config.text_type(缓存相关参数) |和use_cache参数一起使用,需要开启缓存时传1 | |int |1 |
+| | | | | | \
+|req_params.additions.cache_config.use_cache(缓存相关参数) |和text_type参数一起使用,需要开启缓存时传true | |bool |true |
+| | | | | | \
+|req_params.additions.post_process |后处理配置 |\
+| |Golang示例:`additions = fmt.Sprintf("{"post_process":{"pitch":12}}")` | |object | |
+| | | | | | \
+|req_params.additions.post_process.pitch |音调取值范围是[-12,12] | |int |\
+| | | | |0 |
+| | | | | | \
+|req_params.additions.context_texts |\
+|([仅TTS2.0支持](https://www.volcengine.com/docs/6561/1257544)) |语音合成的辅助信息,用于模型对话式合成,能更好的体现语音情感; |\
+| |可以探索,比如常见示例有以下几种: |\
+| | |\
+| |1. 语速调整 |\
+| | 1. 比如:context_texts: ["你可以说慢一点吗?"] |\
+| |2. 情绪/语气调整 |\
+| | 1. 比如:context_texts=["你可以用特别特别痛心的语气说话吗?"] |\
+| | 2. 比如:context_texts=["嗯,你的语气再欢乐一点"] |\
+| |3. 音量调整 |\
+| | 1. 比如:context_texts=["你嗓门再小点。"] |\
+| |4. 音感调整 |\
+| | 1. 比如:context_texts=["你能用骄傲的语气来说话吗?"] |\
+| | |\
+| |注意: |\
+| | |\
+| |1. 该字段仅适用于["豆包语音合成模型2.0"的音色](https://www.volcengine.com/docs/6561/1257544) |\
+| |2. 当前字符串列表只第一个值有效 |\
+| |3. 该字段文本不参与计费 | |string list |null |
+| | | | | | \
+|req_params.additions.section_id |\
+|([仅TTS2.0支持](https://www.volcengine.com/docs/6561/1257544)) |其他合成语音的会话id(session_id),用于辅助当前语音合成,提供更多的上下文信息; |\
+| |取值,参见接口交互中的session_id |\
+| |示例: |\
+| | |\
+| |1. section_id="bf5b5771-31cd-4f7a-b30c-f4ddcbf2f9da" |\
+| | |\
+| |注意: |\
+| | |\
+| |1. 该字段仅适用于["豆包语音合成模型2.0"的音色](https://www.volcengine.com/docs/6561/1257544) |\
+| |2. 历史上下文的session_id 有效期: |\
+| | 1. 最长30轮 |\
+| | 2. 最长10分钟 | |string |"" |
+| | | | | | \
+|req_params.additions.use_tag_parser |是否开启cot解析能力。cot能力可以辅助当前语音合成,对语速、情感等进行调整。 |\
+| |注意: |\
+| | |\
+| |1. 音色支持范围:仅限声音复刻2.0复刻的音色 |\
+| |2. 文本长度:单句的text字符长度最好小于64(cot标签也计算在内) |\
+| |3. cot能力生效的范围是单句 |\
+| | |\
+| |示例: |\
+| |支持单组和多组cot标签:`工作占据了生活的绝大部分,只有去做自己认为伟大的工作,才能获得满足感。不管生活再苦再累,都绝不放弃寻找。` | |bool |false |
+| | | | | | \
+|req_params.mix_speaker |混音参数结构 |\
+| |注意: |\
+| | |\
+| |1. 该字段仅适用于["豆包语音合成模型1.0"的音色](https://www.volcengine.com/docs/6561/1257544) | |object | |
+| | | | | | \
+|req_params.mix_speaker.speakers |混音音色名以及影响因子列表 |\
+| | |\
+| |1. 最多支持3个音色混音 |\
+| |2. 混音影响因子和必须=1 |\
+| |3. 使用复刻音色时,需要使用查询接口获取的icl_的speakerid,而非S_开头的speakerid |\
+| |4. 音色风格差异较大的两个音色(如男女混),以0.5-0.5同等比例混合时,可能出现偶发跳变,建议尽量避免 |\
+| | |\
+| |注意:使用Mix能力时,req_params.speaker = custom_mix_bigtts | |list |null |
+| | | | | | \
+|req_params.mix_speaker.speakers[i].source_speaker |混音源音色名(支持大小模型音色和复刻2.0音色) | |string |"" |
+| | | | | | \
+|req_params.mix_speaker.speakers[i].mix_factor |混音源音色名影响因子 | |float |0 |
+
+单音色请求参数示例:
+```JSON
+{
+ "user": {
+ "uid": "12345"
+ },
+ "req_params": {
+ "text": "明朝开国皇帝朱元璋也称这本书为,万物之根",
+ "speaker": "zh_female_shuangkuaisisi_moon_bigtts",
+ "audio_params": {
+ "format": "mp3",
+ "sample_rate": 24000
+ },
+ }
+ }
+}
+```
+
+mix请求参数示例:
+```JSON
+{
+ "user": {
+ "uid": "12345"
+ },
+ "req_params": {
+ "text": "明朝开国皇帝朱元璋也称这本书为万物之根",
+ "speaker": "custom_mix_bigtts",
+ "audio_params": {
+ "format": "mp3",
+ "sample_rate": 24000
+ },
+ "mix_speaker": {
+ "speakers": [{
+ "source_speaker": "zh_male_bvlazysheep",
+ "mix_factor": 0.3
+ }, {
+ "source_speaker": "BV120_streaming",
+ "mix_factor": 0.3
+ }, {
+ "source_speaker": "zh_male_ahu_conversation_wvae_bigtts",
+ "mix_factor": 0.4
+ }]
+ }
+ }
+}
+```
+
+
+## 2.2 响应Response
+
+### 建连响应
+主要关注建连阶段 HTTP Response 的状态码和 Body
+
+* 建连成功:状态码为 200
+* 建连失败:状态码不为 200,Body 中提供错误原因说明
+
+
+### WebSocket 传输响应
+
+#### 二进制帧 - 正常响应帧
+
+| | | | | \
+|Byte |Left 4-bit |Right 4-bit |说明 |
+|---|---|---|---|
+| | | | | \
+|0 - Left half |Protocol version | |目前只有v1,始终填0b0001 |
+| | | | | \
+|0 - Right half | |Header size (4x) |目前只有4字节,始终填0b0001 |
+| | | | | \
+|1 - Left half |Message type | |音频帧返回:0b1011 |\
+| | | |其他帧返回:0b1001 |
+| | | | | \
+|1 - Right half | |Message type specific flags |固定为0b0100 |
+| | | | | \
+|2 - Left half |Serialization method | |0b0000:Raw(无特殊序列化方式,主要针对二进制音频数据)0b0001:JSON(主要针对文本类型消息) |
+| | | | | \
+|2 - Right half | |Compression method |0b0000:无压缩0b0001:gzip |
+| | || | \
+|3 |Reserved | |留空(0b0000 0000) |
+| | || | \
+|[4 ~ 7] |[Optional field,like event number,...] |\
+| | | |取决于Message type specific flags,可能有、也可能没有 |
+| | || | \
+|... |Payload | |可能是音频数据、文本数据、音频文本混合数据 |
+
+
+##### payload响应参数
+
+| | | | \
+|字段 |描述 |类型 |
+|---|---|---|
+| | | | \
+|data |返回的二进制数据包 |[]byte |
+| | | | \
+|event |返回的事件类型 |number |
+| | | | \
+|res_params.text |经文本分句后的句子 |string |
+
+
+#### 二进制帧 - 错误响应帧
+
+| | | | | \
+|Byte |Left 4-bit |Right 4-bit |说明 |
+|---|---|---|---|
+| | | | | \
+|0 - Left half |Protocol version | |目前只有v1,始终填0b0001 |
+| | | | | \
+|0 - Right half | |Header size (4x) |目前只有4字节,始终填0b0001 |
+| | | | | \
+|1 |Message type |Message type specific flags |0b11110000 |
+| | | | | \
+|2 - Left half |Serialization method | |0b0000:Raw(无特殊序列化方式,主要针对二进制音频数据)0b0001:JSON(主要针对文本类型消息) |
+| | | | | \
+|2 - Right half | |Compression method |0b0000:无压缩0b0001:gzip |
+| | || | \
+|3 |Reserved | |留空(0b0000 0000) |
+| | || | \
+|[4 ~ 7] |Error code | |错误码 |
+| | || | \
+|... |Payload | |错误消息对象 |
+
+
+## 2.3 event定义
+在发送文本转TTS阶段,不需要客户端发送上行的event帧。event类型如下:
+
+| | | | | \
+|Event code |含义 |事件类型 |应用阶段:上行/下行 |
+|---|---|---|---|
+| | | | | \
+|152 |SessionFinished,会话已结束(上行&下行) |\
+| |标识语音一个完整的语音合成完成 |Session 类 |下行 |
+| | | | | \
+|350 |TTSSentenceStart,TTS 返回句内容开始 |数据类 |下行 |
+| | | | | \
+|351 |TTSSentenceEnd,TTS 返回句内容结束 |数据类 |下行 |
+| | | | | \
+|352 |TTSResponse,TTS 返回句的音频内容 |数据类 |下行 |
+
+在关闭连接阶段,需要客户端传递上行event帧去关闭连接。event类型如下:
+
+| | | | | \
+|Event code |含义 |事件类型 |应用阶段:上行/下行 |
+|---|---|---|---|
+| | | | | \
+|2 |FinishConnection,结束连接 |Connect 类 |上行 |
+| | | | | \
+|52 |ConnectionFinished 结束连接成功 |Connect 类 |下行 |
+
+交互示例:
+
+
+## 2.4 不同类型帧举例说明
+
+### SendText
+
+#### 请求Request
+
+| | | | || \
+|Byte |Left 4-bit |Right 4-bit |说明 | |
+|---|---|---|---|---|
+| | | | | | \
+|0 |0001 |0001 |v1 |4-byte header |
+| | | | | | \
+|1 |0001 |0000 |Full-client request |with no event number |
+| | | | | | \
+|2 |0001 |0000 |JSON |no compression |
+| | | | | | \
+|3 |0000 |0000 | | |
+| | || || \
+|4 ~ 7 |uint32(...) | |len(payload_json) | |
+| | || || \
+|8 ~ ... |\
+| |{...} |\
+| | | |文本 |\
+| | | | | |
+
+payload
+```JSON
+{
+ "user": {
+ "uid": "12345"
+ },
+ "req_params": {
+ "text": "明朝开国皇帝朱元璋也称这本书为,万物之根",
+ "speaker": "zh_female_shuangkuaisisi_moon_bigtts",
+ "audio_params": {
+ "format": "mp3",
+ "sample_rate": 24000
+ },
+ }
+ }
+}
+```
+
+
+#### 响应Response
+
+##### TTSSentenceStart
+
+| | | | || \
+|Byte |Left 4-bit |Right 4-bit |说明 | |
+|---|---|---|---|---|
+| | | | | | \
+|0 |0001 |0001 |v1 |4-byte header |
+| | | | | | \
+|1 |1001 |0100 |Full-client request |with event number |
+| | | | | | \
+|2 |0001 |0000 |JSON |no compression |
+| | | | | | \
+|3 |0000 |0000 | | |
+| | || || \
+|4 ~ 7 |TTSSentenceStart | |event type | |
+| | || || \
+|8 ~ 11 |uint32(12) | |len() | |
+| | || || \
+|12 ~ 23 |nxckjoejnkegf | |session_id | |
+| | || || \
+|24 ~ 27 |uint32( ...) | |len(text_binary) | |
+| | || || \
+|28 ~ ... |\
+| |{...} | |text_binary | |
+
+
+##### TTSResponse
+
+| | | | || \
+|Byte |Left 4-bit |Right 4-bit |说明 | |
+|---|---|---|---|---|
+| | | | | | \
+|0 |0001 |0001 |v1 |4-byte header |
+| | | | | | \
+|1 |1011 |0100 |Audio-only response |with event number |
+| | | | | | \
+|2 |0001 |0000 |JSON |no compression |
+| | | | | | \
+|3 |0000 |0000 | | |
+| | || | | \
+|4 ~ 7 |TTSResponse | |event type | |
+| | || | | \
+|8 ~ 11 |uint32(12) | |len() | |
+| | || | | \
+|12 ~ 23 |nxckjoejnkegf | |session_id | |
+| | || | | \
+|24 ~ 27 |uint32( ...) | |len(audio_binary) | |
+| | || | | \
+|28 ~ ... |{...} |\
+| | | |audio_binary |\
+| | | | | |
+
+
+##### TTSSentenceEnd
+
+| | | | || \
+|Byte |Left 4-bit |Right 4-bit |说明 | |
+|---|---|---|---|---|
+| | | | | | \
+|0 |0001 |0001 |v1 |4-byte header |
+| | | | | | \
+|1 |1001 |0100 |Full-client request |with event number |
+| | | | | | \
+|2 |0001 |0000 |JSON |no compression |
+| | | | | | \
+|3 |0000 |0000 | | |
+| | || || \
+|4 ~ 7 |TTSSentenceEnd | |event type | |
+| | || || \
+|8 ~ 11 |uint32(12) | |len() | |
+| | || || \
+|12 ~ 23 |nxckjoejnkegf | |session_id | |
+| | || || \
+|24 ~ 27 |uint32( ...) | |len(payload) | |
+| | || || \
+|28 ~ ... |{...} |\
+| | | |payload |\
+| | | | | |
+
+
+##### SessionFinished
+
+| | | | || \
+|Byte |Left 4-bit |Right 4-bit |说明 | |
+|---|---|---|---|---|
+| | | | | | \
+|0 |0001 |0001 |v1 |4-byte header |
+| | | | | | \
+|1 |1001 |0100 |Full-client request |with event number |
+| | | | | | \
+|2 |0001 |0000 |JSON |no compression |
+| | | | | | \
+|3 |0000 |0000 | | |
+| | || | | \
+|4 ~ 7 |SessionFinished | |event type | |
+| | || || \
+|8 ~ 11 |uint32(12) | |len() | |
+| | || || \
+|12 ~ 23 |nxckjoejnkegf | |session_id | |
+| | || || \
+|24 ~ 27 |uint32( ...) | |len(response_meta_json) | |
+| | || || \
+|28 ~ ... |{ |\
+| | "status_code": 20000000, |\
+| | "message": "ok", |\
+| |"usage": { |\
+| | "text_words":4 |\
+| | } |\
+| |} |\
+| | | |response_meta_json |\
+| | | | |\
+| | | |* 仅含status_code和message字段 |\
+| | | |* usage仅当header中携带X-Control-Require-Usage-Tokens-Return存在 | |
+
+
+#### FinishConnection
+
+##### 请求request
+
+| | | | || \
+|Byte |Left 4-bit |Right 4-bit |说明 | |
+|---|---|---|---|---|
+| | | | | | \
+|0 |0001 |0001 |v1 |4-byte header |
+| | | | | | \
+|1 |0001 |0100 |Full-client request |with event number |
+| | | | | | \
+|2 |0001 |0000 |JSON |no compression |
+| | | | | | \
+|3 |0000 |0000 | | |
+| | || || \
+|4-7 |uint32(...) | |len(payload_json) | |
+| | || || \
+|8 ~ ... |\
+| |{...} |\
+| | | |payload_json |\
+| | | |扩展保留,暂留空JSON | |
+
+
+##### 响应response
+
+| | | | || \
+|Byte |Left 4-bit |Right 4-bit |说明 | |
+|---|---|---|---|---|
+| | | | | | \
+|0 |0001 |0001 |v1 |4-byte header |
+| | | | | | \
+|1 |1001 |0100 |Full-client request |with event number |
+| | | | | | \
+|2 |0001 |0000 |JSON |no compression |
+| | | | | | \
+|3 |0000 |0000 | | |
+| | || || \
+|4 ~ 7 |ConnectionFinished | |event type | |
+| | || || \
+|8 ~ 11 |uint32(7) | |len() | |
+| | || || \
+|12 ~ 15 |uint32(58) | |len() | |
+| | || || \
+|28 ~ ... |{ |\
+| | "status_code": 20000000, |\
+| | "message": "ok" |\
+| |} | |response_meta_json |\
+| | | | |\
+| | | |* 仅含status_code和message字段 |\
+| | | | |\
+| | | | | |
+
+
+## 2.5 时间戳句子格式说明
+
+| | | | \
+| |\
+| |\
+|# |**TTS1.0** |\
+| |**ICL1.0** |**TTS2.0** |\
+| | |**ICL2.0** |
+|---|---|---|
+| | | | \
+|事件交互区别 |合成有多个子句会多次返回`TTSSentenceStart`和`TTSSentenceEnd`。开启字幕后字幕跟随`TTSSentenceEnd`返回。 |合成有多个子句,仅返回一次`TTSSentenceStart`和`TTSSentenceEnd`。 |\
+| | |开启字幕后会多次返回`TTSSubtitle`。 |
+| | | | \
+|返回时机 |一个子句的时间戳返回之后才会开始返回下一句音频。 |\
+| | |在一句音频合成之后,不会立即返回该句的字幕。 |\
+| | |合成进度不会被字幕识别阻塞,当一句的字幕识别完成后立即返回。 |\
+| | |可能一个子句的字幕返回的时候,已经返回下一句的音频帧给调用方了。 |
+| | | | \
+|句子返回格式 |\
+| |字幕信息是基于tn打轴 |\
+| |:::tip |\
+| |1. text字段对应于:原文 |\
+| |2. words内文本字段对应于:tn |\
+| |::: |\
+| |第一句: |\
+| |```JSON |\
+| |{ |\
+| | "phonemes": [ |\
+| | ], |\
+| | "text": "2019年1月8日,软件2.0版本于格萨拉彝族乡应时而生。发布会当日,一场瑞雪将天地映衬得纯净无瑕。", |\
+| | "words": [ |\
+| | { |\
+| | "confidence": 0.8766515, |\
+| | "endTime": 0.295, |\
+| | "startTime": 0.155, |\
+| | "word": "二" |\
+| | }, |\
+| | { |\
+| | "confidence": 0.95224416, |\
+| | "endTime": 0.425, |\
+| | "startTime": 0.295, |\
+| | "word": "零" |\
+| | }, |\
+| | { |\
+| | "confidence": 0.9108828, |\
+| | "endTime": 0.575, |\
+| | "startTime": 0.425, |\
+| | "word": "一" |\
+| | }, |\
+| | { |\
+| | "confidence": 0.9609025, |\
+| | "endTime": 0.755, |\
+| | "startTime": 0.575, |\
+| | "word": "九" |\
+| | }, |\
+| | { |\
+| | "confidence": 0.96244556, |\
+| | "endTime": 1.005, |\
+| | "startTime": 0.755, |\
+| | "word": "年" |\
+| | }, |\
+| | { |\
+| | "confidence": 0.85796577, |\
+| | "endTime": 1.155, |\
+| | "startTime": 1.005, |\
+| | "word": "一" |\
+| | }, |\
+| | { |\
+| | "confidence": 0.8460129, |\
+| | "endTime": 1.275, |\
+| | "startTime": 1.155, |\
+| | "word": "月" |\
+| | }, |\
+| | { |\
+| | "confidence": 0.90833753, |\
+| | "endTime": 1.505, |\
+| | "startTime": 1.275, |\
+| | "word": "八" |\
+| | }, |\
+| | { |\
+| | "confidence": 0.9403977, |\
+| | "endTime": 1.935, |\
+| | "startTime": 1.505, |\
+| | "word": "日," |\
+| | }, |\
+| | |\
+| | ... |\
+| | |\
+| | { |\
+| | "confidence": 0.9415791, |\
+| | "endTime": 10.505, |\
+| | "startTime": 10.355, |\
+| | "word": "无" |\
+| | }, |\
+| | { |\
+| | "confidence": 0.903162, |\
+| | "endTime": 10.895, // 第一句结束时间 |\
+| | "startTime": 10.505, |\
+| | "word": "瑕。" |\
+| | } |\
+| | ] |\
+| |} |\
+| |``` |\
+| | |\
+| |第二句: |\
+| |```JSON |\
+| |{ |\
+| | "phonemes": [ |\
+| | |\
+| | ], |\
+| | "text": "这仿佛一则自然寓言:我们致力于在不断的版本迭代中,为您带来如雪后初霁般清晰、焕然一新的体验。", |\
+| | "words": [ |\
+| | { |\
+| | "confidence": 0.8970245, |\
+| | "endTime": 11.6953745, |\
+| | "startTime": 11.535375, // 第二句开始时间,是相对整个session的位置 |\
+| | "word": "这" |\
+| | }, |\
+| | { |\
+| | "confidence": 0.86508185, |\
+| | "endTime": 11.875375, |\
+| | "startTime": 11.6953745, |\
+| | "word": "仿" |\
+| | }, |\
+| | { |\
+| | "confidence": 0.73354065, |\
+| | "endTime": 12.095375, |\
+| | "startTime": 11.875375, |\
+| | "word": "佛" |\
+| | }, |\
+| | { |\
+| | "confidence": 0.8525295, |\
+| | "endTime": 12.275374, |\
+| | "startTime": 12.095375, |\
+| | "word": "一" |\
+| | }... |\
+| | ] |\
+| |} |\
+| |``` |\
+| | |字幕信息是基于原文打轴 |\
+| | |:::tip |\
+| | |1. text字段对应于:原文 |\
+| | |2. words内文本字段对应于:原文 |\
+| | |::: |\
+| | |第一句: |\
+| | |```JSON |\
+| | |{ |\
+| | | "phonemes": [ |\
+| | | ], |\
+| | | "text": "2019年1月8日,软件2.0版本于格萨拉彝族乡应时而生。", |\
+| | | "words": [ |\
+| | | { |\
+| | | "confidence": 0.11120544, |\
+| | | "endTime": 0.615, |\
+| | | "startTime": 0.585, |\
+| | | "word": "2019" |\
+| | | }, |\
+| | | { |\
+| | | "confidence": 0.8413397, |\
+| | | "endTime": 0.845, |\
+| | | "startTime": 0.615, |\
+| | | "word": "年" |\
+| | | }, |\
+| | | { |\
+| | | "confidence": 0.2413961, |\
+| | | "endTime": 0.875, |\
+| | | "startTime": 0.845, |\
+| | | "word": "1" |\
+| | | }, |\
+| | | { |\
+| | | "confidence": 0.8487973, |\
+| | | "endTime": 1.055, |\
+| | | "startTime": 0.875, |\
+| | | "word": "月" |\
+| | | }, |\
+| | | { |\
+| | | "confidence": 0.509697, |\
+| | | "endTime": 1.225, |\
+| | | "startTime": 1.165, |\
+| | | "word": "8" |\
+| | | }, |\
+| | | { |\
+| | | "confidence": 0.9516253, |\
+| | | "endTime": 1.485, |\
+| | | "startTime": 1.225, |\
+| | | "word": "日," |\
+| | | }, |\
+| | | |\
+| | | ... |\
+| | | |\
+| | | { |\
+| | | "confidence": 0.6933777, |\
+| | | "endTime": 5.435, |\
+| | | "startTime": 5.325, |\
+| | | "word": "而" |\
+| | | }, |\
+| | | { |\
+| | | "confidence": 0.921702, |\
+| | | "endTime": 5.695, // 第一句结束时间 |\
+| | | "startTime": 5.435, |\
+| | | "word": "生。" |\
+| | | } |\
+| | | ] |\
+| | |} |\
+| | |``` |\
+| | | |\
+| | | |\
+| | |第二句: |\
+| | |```JSON |\
+| | |{ |\
+| | | "phonemes": [ |\
+| | | |\
+| | | ], |\
+| | | "text": "发布会当日,一场瑞雪将天地映衬得纯净无瑕。", |\
+| | | "words": [ |\
+| | | { |\
+| | | "confidence": 0.7016578, |\
+| | | "endTime": 6.3550415, |\
+| | | "startTime": 6.2150416, // 第二句开始时间,是相对整个session的位置 |\
+| | | "word": "发" |\
+| | | }, |\
+| | | { |\
+| | | "confidence": 0.6800497, |\
+| | | "endTime": 6.4450417, |\
+| | | "startTime": 6.3550415, |\
+| | | "word": "布" |\
+| | | }, |\
+| | | |\
+| | | ... |\
+| | | |\
+| | | { |\
+| | | "confidence": 0.8818264, |\
+| | | "endTime": 10.145041, |\
+| | | "startTime": 9.945042, |\
+| | | "word": "净" |\
+| | | }, |\
+| | | { |\
+| | | "confidence": 0.87248623, |\
+| | | "endTime": 10.285042, |\
+| | | "startTime": 10.145041, |\
+| | | "word": "无" |\
+| | | }, |\
+| | | { |\
+| | | "confidence": 0.8069703, |\
+| | | "endTime": 10.505041, |\
+| | | "startTime": 10.285042, |\
+| | | "word": "瑕。" |\
+| | | } |\
+| | | ] |\
+| | |} |\
+| | |``` |\
+| | | |\
+| | | |
+| | | | \
+|语种 |中、英,不支持小语种、方言 |中、英,不支持小语种、方言 |
+| | | | \
+|latex |enable_latex_tn=true,有字幕返回 |enable_latex_tn=true,无字幕返回,接口不报错 |
+| | | | \
+|ssml |req_params.ssml不为空,有字幕返回 |req_params.ssml不为空,无字幕返回,接口不报错 |
+
+
+# 3 错误码
+
+| | | | \
+|Code |Message |说明 |
+|---|---|---|
+| | | | \
+|20000000 |ok |音频合成结束的成功状态码 |
+| | | | \
+|45000000 |\
+| |speaker permission denied: get resource id: access denied |音色鉴权失败,一般是speaker指定音色未授权或者错误导致 |\
+| | | |
+|^^| | | \
+| |quota exceeded for types: concurrency |并发限流,一般是请求并发数超过限制 |
+| | | | \
+|55000000 |服务端一些error |服务端通用错误 |
+
+
+# 4 调用示例
+
+```mixin-react
+return (
+
+### 前提条件
+
+* 调用之前,您需要获取以下信息:
+ * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
+
+
+### Python环境
+
+* Python:3.9版本及以上。
+* Pip:25.1.1版本及以上。您可以使用下面命令安装。
+
+\`\`\`Bash
+python3 -m pip install --upgrade pip
+\`\`\`
+
+
+### 下载代码示例
+
+
+### 解压缩代码包,安装依赖
+\`\`\`Bash
+mkdir -p volcengine_unidirectional_stream_demo
+tar xvzf volcengine_unidirectional_stream_demo.tar.gz -C ./volcengine_unidirectional_stream_demo
+cd volcengine_unidirectional_stream_demo
+python3 -m venv .venv
+source .venv/bin/activate
+python3 -m pip install --upgrade pip
+pip3 install -e .
+\`\`\`
+
+
+### 发起调用
+> \`\`替换为您的APP ID。
+> \`\`替换为您的Access Token。
+> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。
+
+\`\`\`Bash
+python3 examples/volcengine/unidirectional_stream.py --appid --access_token --voice_type --text "你好,我是火山引擎的语音合成服务。这是一个美好的旅程。"
+\`\`\`
+
+`}>
+
+### 前提条件
+
+* 调用之前,您需要获取以下信息:
+ * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
+
+
+### Java环境
+
+* Java:21版本及以上。
+* Maven:3.9.10版本及以上。
+
+
+### 下载代码示例
+
+
+### 解压缩代码包,安装依赖
+\`\`\`Bash
+mkdir -p volcengine_unidirectional_stream_demo
+tar xvzf volcengine_unidirectional_stream_demo.tar.gz -C ./volcengine_unidirectional_stream_demo
+cd volcengine_unidirectional_stream_demo
+\`\`\`
+
+
+### 发起调用
+> \`\`替换为您的APP ID。
+> \`\`替换为您的Access Token。
+> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。
+
+\`\`\`Bash
+mvn compile exec:java -Dexec.mainClass=com.speech.volcengine.UnidirectionalStream -DappId= -DaccessToken= -Dvoice= -Dtext="**你好**,我是豆包语音助手,很高兴认识你。这是一个愉快的旅程。"
+\`\`\`
+
+`}>
+
+### 前提条件
+
+* 调用之前,您需要获取以下信息:
+ * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
+
+
+### Go环境
+
+* Go:1.21.0版本及以上。
+
+
+### 下载代码示例
+
+
+### 解压缩代码包,安装依赖
+\`\`\`Bash
+mkdir -p volcengine_unidirectional_stream_demo
+tar xvzf volcengine_unidirectional_stream_demo.tar.gz -C ./volcengine_unidirectional_stream_demo
+cd volcengine_unidirectional_stream_demo
+\`\`\`
+
+
+### 发起调用
+> \`\`替换为您的APP ID。
+> \`\`替换为您的Access Token。
+> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。
+
+\`\`\`Bash
+go run volcengine/unidirectional_stream/main.go --appid --access_token --voice_type --text "**你好**,我是火山引擎的语音合成服务。"
+\`\`\`
+
+`}>
+
+### 前提条件
+
+* 调用之前,您需要获取以下信息:
+ * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
+
+
+### C#环境
+
+* .Net 9.0版本。
+
+
+### 下载代码示例
+
+
+### 解压缩代码包,安装依赖
+\`\`\`Bash
+mkdir -p volcengine_unidirectional_stream_demo
+tar xvzf volcengine_unidirectional_stream_demo.tar.gz -C ./volcengine_unidirectional_stream_demo
+cd volcengine_unidirectional_stream_demo
+\`\`\`
+
+
+### 发起调用
+> \`\`替换为您的APP ID。
+> \`\`替换为您的Access Token。
+> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。
+
+\`\`\`Bash
+dotnet run --project Volcengine/UnidirectionalStream/Volcengine.Speech.UnidirectionalStream.csproj -- --appid --access_token --voice_type --text "**你好**,这是一个测试文本。我们正在测试文本转语音功能。"
+\`\`\`
+
+`}>
+
+### 前提条件
+
+* 调用之前,您需要获取以下信息:
+ * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
+
+
+### node环境
+
+* node:v24.0版本及以上。
+
+
+### 下载代码示例
+
+
+### 解压缩代码包,安装依赖
+\`\`\`Bash
+mkdir -p volcengine_unidirectional_stream_demo
+tar xvzf volcengine_unidirectional_stream_demo.tar.gz -C ./volcengine_unidirectional_stream_demo
+cd volcengine_unidirectional_stream_demo
+npm install
+npm install -g typescript
+npm install -g ts-node
+\`\`\`
+
+
+### 发起调用
+> \`\`替换为您的APP ID。
+> \`\`替换为您的Access Token。
+> \`\`替换为您预期使用的音色ID,例如\`\`。
+
+\`\`\`Bash
+npx ts-node src/volcengine/unidirectional_stream.ts --appid --access_token --voice_type --text "**你好**,我是火山引擎的语音合成服务。"
+\`\`\`
+
+`}>);
+ ```
+
+
diff --git a/API相关/语音合成大模型音色列表.md b/API相关/语音合成大模型音色列表.md
new file mode 100644
index 0000000..e69de29
diff --git a/API相关/豆包大模型语音合成API.md b/API相关/豆包大模型语音合成API.md
new file mode 100644
index 0000000..3797797
--- /dev/null
+++ b/API相关/豆包大模型语音合成API.md
@@ -0,0 +1,627 @@
+
+
+# Websocket
+> 使用账号申请部分申请到的 appid&access_token 进行调用
+> 文本一次性送入,后端边合成边返回音频数据
+
+
+## 1. 接口说明
+> V1:
+> **wss://openspeech.bytedance.com/api/v1/tts/ws_binary (V1 单向流式)**
+> **https://openspeech.bytedance.com/api/v1/tts (V1 http非流式)**
+> V3:
+> **wss://openspeech.bytedance.com/api/v3/tts/unidirectional/stream (V3 wss单向流式)**
+> [V3 websocket单向流式文档](https://www.volcengine.com/docs/6561/1719100)
+> **wss://openspeech.bytedance.com/api/v3/tts/bidirection (V3 wss双向流式)**
+> [V3 websocket双向流式文档](https://www.volcengine.com/docs/6561/1329505)
+> **https://openspeech.bytedance.com/api/v3/tts/unidirectional (V3 http单向流式)**
+> [V3 http单向流式文档](https://www.volcengine.com/docs/6561/1598757)
+
+:::warning
+大模型音色都推荐接入V3接口,时延上的表现会更好
+:::
+
+## 2. 身份认证
+认证方式使用 Bearer Token,在请求的 header 中加上`"Authorization": "Bearer; {token}"`,并在请求的 json 中填入对应的 appid。
+:::warning
+Bearer 和 token 使用分号 ; 分隔,替换时请勿保留{}
+:::
+AppID/Token/Cluster 等信息可参考 [控制台使用FAQ-Q1](/docs/6561/196768#q1:哪里可以获取到以下参数appid,cluster,token,authorization-type,secret-key-?)
+
+## 3. 请求方式
+
+### 3.1 二进制协议
+
+#### 报文格式(Message format)
+
+所有字段以 [Big Endian(大端序)](https://zh.wikipedia.org/wiki/%E5%AD%97%E8%8A%82%E5%BA%8F#%E5%A4%A7%E7%AB%AF%E5%BA%8F) 的方式存储。
+**字段描述**
+
+| | | | \
+|字段 Field (大小, 单位 bit) |描述 Description |值 Values |
+|---|---|---|
+| | | | \
+|协议版本(Protocol version) (4) |可能会在将来使用不同的协议版本,所以这个字段是为了让客户端和服务器在版本上保持一致。 |`0b0001` - 版本 1 (目前只有版本 1) |
+| | | | \
+|报头大小(Header size) (4) |header 实际大小是 `header size value x 4` bytes. |\
+| |这里有个特殊值 `0b1111` 表示 header 大小大于或等于 60(15 x 4 bytes),也就是会存在 header extension 字段。 |`0b0001` - 报头大小 = 4 (1 x 4) |\
+| | |`0b0010` - 报头大小 = 8 (2 x 4) |\
+| | |`0b1010` - 报头大小 = 40 (10 x 4) |\
+| | |`0b1110` - 报头大小 = 56 (14 x 4) |\
+| | |`0b1111` - 报头大小为 60 或更大; 实际大小在 header extension 中定义 |
+| | | | \
+|消息类型(Message type) (4) |定义消息类型。 |`0b0001` - full client request. |\
+| | |`~~0b1001~~` ~~- full server response(弃用).~~ |\
+| | |`0b1011` - Audio-only server response (ACK). |\
+| | |`0b1111` - Error message from server (例如错误的消息类型,不支持的序列化方法等等) |
+| | | | \
+|Message type specific flags (4) |flags 含义取决于消息类型。 |\
+| |具体内容请看消息类型小节. | |
+| | | | \
+|序列化方法(Message serialization method) (4) |定义序列化 payload 的方法。 |\
+| |注意:它只对某些特定的消息类型有意义 (例如 Audio-only server response `0b1011` 就不需要序列化). |`0b0000` - 无序列化 (raw bytes) |\
+| | |`0b0001` - JSON |\
+| | |`0b1111` - 自定义类型, 在 header extension 中定义 |
+| | | | \
+|压缩方法(Message Compression) (4) |定义 payload 的压缩方法。 |\
+| |Payload size 字段不压缩(如果有的话,取决于消息类型),而且 Payload size 指的是 payload 压缩后的大小。 |\
+| |Header 不压缩。 |`0b0000` - 无压缩 |\
+| | |`0b0001` - gzip |\
+| | |`0b1111` - 自定义压缩方法, 在 header extension 中定义 |
+| | | | \
+|保留字段(Reserved) (8) |保留字段,同时作为边界 (使整个报头大小为 4 个字节). |`0x00` - 目前只有 0 |
+
+
+#### 消息类型详细说明
+目前所有 TTS websocket 请求都使用 full client request 格式,无论"query"还是"submit"。
+
+#### Full client request
+
+* Header size为`b0001`(即 4B,没有 header extension)。
+* Message type为`b0001`.
+* Message type specific flags 固定为`b0000`.
+* Message serialization method为`b0001`JSON。字段参考上方表格。
+* 如果使用 gzip 压缩 payload,则 payload size 为压缩后的大小。
+
+
+#### Audio-only server response
+
+* Header size 应该为`b0001`.
+* Message type为`b1011`.
+* Message type specific flags 可能的值有:
+ * `b0000` - 没有 sequence number.
+ * `b0001` - sequence number > 0.
+ * `b0010`or`b0011` - sequence number < 0,表示来自服务器的最后一条消息,此时客户端应合并所有音频片段(如果有多条)。
+* Message serialization method为`b0000`(raw bytes).
+
+
+## 4.注意事项
+
+* 每次合成时reqid这个参数需要重新设置,且要保证唯一性(建议使用uuid.V4生成)
+* websocket demo中单条链接仅支持单次合成,若需要合成多次,需自行实现。每次创建websocket连接后,按顺序串行发送每一包。一次合成结束后,可以发送新的合成请求。
+* operation需要设置为submit才是流式返回
+* 在 websocket 握手成功后,会返回这些 Response header
+* 不支持["豆包语音合成模型2.0"的音色](https://www.volcengine.com/docs/6561/1257544),比如:"zh_female_vv_uranus_bigtts",如需使用推荐使用v3 接口
+
+
+| | | | \
+|Key |说明 |Value 示例 |
+|---|---|---|
+| | | | \
+|X-Tt-Logid |服务端返回的 logid,建议用户获取和打印方便定位问题 |202407261553070FACFE6D19421815D605 |
+
+
+## 5.调用示例
+
+```mixin-react
+return (
+
+### 前提条件
+
+* 调用之前,您需要获取以下信息:
+ * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
+
+
+### Python环境
+
+* Python:3.9版本及以上。
+* Pip:25.1.1版本及以上。您可以使用下面命令安装。
+
+\`\`\`Bash
+python3 -m pip install --upgrade pip
+\`\`\`
+
+
+### 下载代码示例
+
+
+### 解压缩代码包,安装依赖
+\`\`\`Bash
+mkdir -p volcengine_binary_demo
+tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo
+cd volcengine_binary_demo
+python3 -m venv .venv
+source .venv/bin/activate
+python3 -m pip install --upgrade pip
+pip3 install -e .
+\`\`\`
+
+
+### 发起调用
+> \`\`替换为您的APP ID。
+> \`\`替换为您的Access Token。
+> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。
+
+\`\`\`Bash
+python3 examples/volcengine/binary.py --appid --access_token --voice_type --text "你好,我是火山引擎的语音合成服务。这是一个美好的旅程。"
+\`\`\`
+
+`}>
+
+### 前提条件
+
+* 调用之前,您需要获取以下信息:
+ * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
+
+
+### Java环境
+
+* Java:21版本及以上。
+* Maven:3.9.10版本及以上。
+
+
+### 下载代码示例
+
+
+### 解压缩代码包,安装依赖
+\`\`\`Bash
+mkdir -p volcengine_binary_demo
+tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo
+cd volcengine_binary_demo
+\`\`\`
+
+
+### 发起调用
+> \`\`替换为您的APP ID。
+> \`\`替换为您的Access Token。
+> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。
+
+\`\`\`Bash
+mvn compile exec:java -Dexec.mainClass=com.speech.volcengine.Binary -DappId= -DaccessToken= -Dvoice= -Dtext="**你好**,我是豆包语音助手,很高兴认识你。这是一个愉快的旅程。"
+\`\`\`
+
+`}>
+
+### 前提条件
+
+* 调用之前,您需要获取以下信息:
+ * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
+
+
+### Go环境
+
+* Go:1.21.0版本及以上。
+
+
+### 下载代码示例
+
+
+### 解压缩代码包,安装依赖
+\`\`\`Bash
+mkdir -p volcengine_binary_demo
+tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo
+cd volcengine_binary_demo
+\`\`\`
+
+
+### 发起调用
+> \`\`替换为您的APP ID。
+> \`\`替换为您的Access Token。
+> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。
+
+\`\`\`Bash
+go run volcengine/binary/main.go --appid --access_token --voice_type --text "**你好**,我是火山引擎的语音合成服务。"
+\`\`\`
+
+`}>
+
+### 前提条件
+
+* 调用之前,您需要获取以下信息:
+ * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
+
+
+### C#环境
+
+* .Net 9.0版本。
+
+
+### 下载代码示例
+
+
+### 解压缩代码包,安装依赖
+\`\`\`Bash
+mkdir -p volcengine_binary_demo
+tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo
+cd volcengine_binary_demo
+\`\`\`
+
+
+### 发起调用
+> \`\`替换为您的APP ID。
+> \`\`替换为您的Access Token。
+> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。
+
+\`\`\`Bash
+dotnet run --project Volcengine/Binary/Volcengine.Speech.Binary.csproj -- --appid --access_token --voice_type --text "**你好**,这是一个测试文本。我们正在测试文本转语音功能。"
+\`\`\`
+
+`}>
+
+### 前提条件
+
+* 调用之前,您需要获取以下信息:
+ * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
+ * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
+
+
+### node环境
+
+* node:v24.0版本及以上。
+
+
+### 下载代码示例
+
+
+### 解压缩代码包,安装依赖
+\`\`\`Bash
+mkdir -p volcengine_binary_demo
+tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo
+cd volcengine_binary_demo
+npm install
+npm install -g typescript
+npm install -g ts-node
+\`\`\`
+
+
+### 发起调用
+> \`\`替换为您的APP ID。
+> \`\`替换为您的Access Token。
+> \`\`替换为您预期使用的音色ID,例如\`\`。
+
+\`\`\`Bash
+npx ts-node src/volcengine/binary.ts --appid --access_token --voice_type --text "**你好**,我是火山引擎的语音合成服务。"
+\`\`\`
+
+`}>);
+ ```
+
+
+# HTTP
+> 使用账号申请部分申请到的 appid&access_token 进行调用
+> 文本全部合成完毕之后,一次性返回全部的音频数据
+
+
+## 1. 接口说明
+接口地址为 **https://openspeech.bytedance.com/api/v1/tts**
+
+## 2. 身份认证
+认证方式采用 Bearer Token.
+1)需要在请求的 Header 中填入"Authorization":"Bearer;${token}"
+:::warning
+Bearer 和 token 使用分号 ; 分隔,替换时请勿保留${}
+:::
+AppID/Token/Cluster 等信息可参考 [控制台使用FAQ-Q1](/docs/6561/196768#q1:哪里可以获取到以下参数appid,cluster,token,authorization-type,secret-key-?)
+
+## 3. 注意事项
+
+* 使用 HTTP Post 方式进行请求,返回的结果为 JSON 格式,需要进行解析
+* 因 json 格式无法直接携带二进制音频,音频经 base64 编码。使用 base64 解码后,即为二进制音频
+* 每次合成时 reqid 这个参数需要重新设置,且要保证唯一性(建议使用 UUID/GUID 等生成)
+* 不支持["豆包语音合成模型2.0"的音色](https://www.volcengine.com/docs/6561/1257544),比如:"zh_female_vv_uranus_bigtts",如需使用推荐使用v3 接口
+
+
+# 参数列表
+> Websocket 与 Http 调用参数相同
+
+
+## 请求参数
+
+| | | | | | | \
+|字段 |含义 |层级 |格式 |必需 |备注 |
+|---|---|---|---|---|---|
+| | | | | | | \
+|app |应用相关配置 |1 |dict |✓ | |
+| | | | | | | \
+|appid |应用标识 |2 |string |✓ |需要申请 |
+| | | | | | | \
+|token |应用令牌 |2 |string |✓ |无实际鉴权作用的Fake token,可传任意非空字符串 |
+| | | | | | | \
+|cluster |业务集群 |2 |string |✓ |volcano_tts |
+| | | | | | | \
+|user |用户相关配置 |1 |dict |✓ | |
+| | | | | | | \
+|uid |用户标识 |2 |string |✓ |可传任意非空字符串,传入值可以通过服务端日志追溯 |
+| | | | | | | \
+|audio |音频相关配置 |1 |dict |✓ | |
+| | | | | | | \
+|voice_type |音色类型 |2 |string |✓ | |
+| | | | | | | \
+|emotion |音色情感 |2 |string | |设置音色的情感。示例:"emotion": "angry" |\
+| | | | | |注:当前仅部分音色支持设置情感,且不同音色支持的情感范围存在不同。 |\
+| | | | | |详见:[大模型语音合成API-音色列表-多情感音色](https://www.volcengine.com/docs/6561/1257544) |
+| | | | | | | \
+|enable_emotion |开启音色情感 |2 |bool | |是否可以设置音色情感,需将enable_emotion设为true |\
+| | | | | |示例:"enable_emotion": True |
+| | | | | | | \
+|emotion_scale |情绪值设置 |2 |float | |调用emotion设置情感参数后可使用emotion_scale进一步设置情绪值,范围1~5,不设置时默认值为4。 |\
+| | | | | |注:理论上情绪值越大,情感越明显。但情绪值1~5实际为非线性增长,可能存在超过某个值后,情绪增加不明显,例如设置3和5时情绪值可能接近。 |
+| | | | | | | \
+|encoding |音频编码格式 |2 |string | |wav / pcm / ogg_opus / mp3,默认为 pcm |\
+| | | | | |注意:wav 不支持流式 |
+| | | | | | | \
+|speed_ratio |语速 |2 |float | |[0.1,2],默认为 1,通常保留一位小数即可 |
+| | | | | | | \
+|rate |音频采样率 |2 |int | |默认为 24000,可选8000,16000 |
+| | | | | | | \
+|bitrate |比特率 |2 |int | |单位 kb/s,默认160 kb/s |\
+| | | | | |**注:** |\
+| | | | | |bitrate只针对MP3格式,wav计算比特率跟pcm一样是 比特率 (bps) = 采样率 × 位深度 × 声道数 |\
+| | | | | |目前大模型TTS只能改采样率,所以对于wav格式来说只能通过改采样率来变更音频的比特率 |
+| | | | | | | \
+|explicit_language |明确语种 |2 |string | |仅读指定语种的文本 |\
+| | | | | |精品音色和 ICL 声音复刻场景: |\
+| | | | | | |\
+| | | | | |* 不给定参数,正常中英混 |\
+| | | | | |* `crosslingual` 启用多语种前端(包含`zh/en/ja/es-ms/id/pt-br`) |\
+| | | | | |* `zh-cn` 中文为主,支持中英混 |\
+| | | | | |* `en` 仅英文 |\
+| | | | | |* `ja` 仅日文 |\
+| | | | | |* `es-mx` 仅墨西 |\
+| | | | | |* `id` 仅印尼 |\
+| | | | | |* `pt-br` 仅巴葡 |\
+| | | | | | |\
+| | | | | |DIT 声音复刻场景: |\
+| | | | | |当音色是使用model_type=2训练的,即采用dit标准版效果时,建议指定明确语种,目前支持: |\
+| | | | | | |\
+| | | | | |* 不给定参数,启用多语种前端`zh,en,ja,es-mx,id,pt-br,de,fr` |\
+| | | | | |* `zh,en,ja,es-mx,id,pt-br,de,fr` 启用多语种前端 |\
+| | | | | |* `zh-cn` 中文为主,支持中英混 |\
+| | | | | |* `en` 仅英文 |\
+| | | | | |* `ja` 仅日文 |\
+| | | | | |* `es-mx` 仅墨西 |\
+| | | | | |* `id` 仅印尼 |\
+| | | | | |* `pt-br` 仅巴葡 |\
+| | | | | |* `de` 仅德语 |\
+| | | | | |* `fr` 仅法语 |\
+| | | | | | |\
+| | | | | |当音色是使用model_type=3训练的,即采用dit还原版效果时,必须指定明确语种,目前支持: |\
+| | | | | | |\
+| | | | | |* 不给定参数,正常中英混 |\
+| | | | | |* `zh-cn` 中文为主,支持中英混 |\
+| | | | | |* `en` 仅英文 |
+| | | | | | | \
+|context_language |参考语种 |2 |string | |给模型提供参考的语种 |\
+| | | | | | |\
+| | | | | |* 不给定 西欧语种采用英语 |\
+| | | | | |* id 西欧语种采用印尼 |\
+| | | | | |* es 西欧语种采用墨西 |\
+| | | | | |* pt 西欧语种采用巴葡 |
+| | | | | | | \
+|loudness_ratio |音量调节 |2 |float | |[0.5,2],默认为1,通常保留一位小数即可。0.5代表原音量0.5倍,2代表原音量2倍 |
+| | | | | | | \
+|request |请求相关配置 |1 |dict |✓ | |
+| | | | | | | \
+|reqid |请求标识 |2 |string |✓ |需要保证每次调用传入值唯一,建议使用 UUID |
+| | | | | | | \
+|text |文本 |2 |string |✓ |合成语音的文本,长度限制 1024 字节(UTF-8 编码)建议小于300字符,超出容易增加badcase出现概率或报错 |
+| | | | | | | \
+|model |模型版本 |\
+| | |2 |\
+| | | |string |否 |模型版本,传`seed-tts-1.1`较默认版本音质有提升,并且延时更优,不传为默认效果。 |\
+| | | | | |注:若使用1.1模型效果,在复刻场景中会放大训练音频prompt特质,因此对prompt的要求更高,使用高质量的训练音频,可以获得更优的音质效果。 |
+| | | | | | | \
+|text_type |文本类型 |2 |string | |使用 ssml 时需要指定,值为"ssml" |
+| | | | | | | \
+|silence_duration |句尾静音 |2 |float | |设置该参数可在句尾增加静音时长,范围0~30000ms。(注:增加的句尾静音主要针对传入文本最后的句尾,而非每句话的句尾)若启用该参数,必须在request下首先设置enable_trailing_silence_audio = true |
+| | | | | | | \
+|with_timestamp |时间戳相关 |2 |int |\
+| | | |string | |传入1时表示启用,将返回TN后文本的时间戳,例如:2025。根据语义,TN后文本为“两千零二十五”或“二零二五”。 |\
+| | | | | |注:原文本中的多个标点连用或者空格仍会被处理,但不影响时间戳的连贯性(仅限大模型场景使用)。 |\
+| | | | | |附加说明(小模型和大模型时间戳原理差异): |\
+| | | | | |1. 小模型依据前端模型生成时间戳,然后合成音频。在处理时间戳时,TN前后文本进行了映射,所以小模型可返回TN前原文本的时间戳,即保留原文中的阿拉伯数字或者特殊符号等。 |\
+| | | | | |2. 大模型在对传入文本语义理解后合成音频,再针对合成音频进行TN后打轴以输出时间戳。若不采用TN后文本,输出的时间戳将与合成音频无法对齐,所以大模型返回的时间戳对应TN后的文本。 |
+| | | | | | | \
+|operation |操作 |2 |string |✓ |query(非流式,http 只能 query) / submit(流式) |
+| | | | | | | \
+|extra_param |附加参数 |2 |jsonstring | | |
+| | | | | | | \
+|disable_markdown_filter | |3 |bool | |是否开启markdown解析过滤, |\
+| | | | | |为true时,解析并过滤markdown语法,例如,**你好**,会读为“你好”, |\
+| | | | | |为false时,不解析不过滤,例如,**你好**,会读为“星星‘你好’星星” |\
+| | | | | |示例:"disable_markdown_filter": True |
+| | | | | | | \
+|enable_latex_tn | |3 |bool | |是否可以播报latex公式,需将disable_markdown_filter设为true |\
+| | | | | |示例:"enable_latex_tn": True |
+| | | | | | | \
+|mute_cut_remain_ms |句首静音参数 |3 |string | |该参数需配合mute_cut_threshold参数一起使用,其中: |\
+| | | | | |"mute_cut_threshold": "400", // 静音判断的阈值(音量小于该值时判定为静音) |\
+| | | | | |"mute_cut_remain_ms": "50", // 需要保留的静音长度 |\
+| | | | | |注:参数和value都为string格式 |\
+| | | | | |以python为示例: |\
+| | | | | |```Python |\
+| | | | | |"extra_param":("{\"mute_cut_threshold\":\"400\", \"mute_cut_remain_ms\": \"0\"}") |\
+| | | | | |``` |\
+| | | | | | |\
+| | | | | |特别提醒: |\
+| | | | | | |\
+| | | | | |* 因MP3格式的特殊性,句首始终会存在100ms内的静音无法消除,WAV格式的音频句首静音可全部消除,建议依照自身业务需求综合判断选择 |
+| | | | | | | \
+|disable_emoji_filter |emoji不过滤显示 |3 |bool | |开启emoji表情在文本中不过滤显示,默认为False,建议搭配时间戳参数一起使用。 |\
+| | | | | |Python示例:`"extra_param": json.dumps({"disable_emoji_filter": True})` |
+| | | | | | | \
+|unsupported_char_ratio_thresh |不支持语种占比阈值 |3 |float | |默认: 0.3,最大值: 1.0 |\
+| | | | | |检测出不支持合成的文本超过设置的比例,则会返回错误。 |\
+| | | | | |Python示例:`"extra_param": json.dumps({"`unsupported_char_ratio_thresh`": 0.3})` |
+| | | | | | | \
+|aigc_watermark |是否在合成结尾增加音频节奏标识 |3 |bool | |默认: false |\
+| | | | | |Python示例:`"extra_param": json.dumps({"aigc_watermark": True})` |
+| | | | | | | \
+|cache_config |缓存相关参数 |3 |dict | |开启缓存,开启后合成相同文本时,服务会直接读取缓存返回上一次合成该文本的音频,可明显加快相同文本的合成速率,缓存数据保留时间1小时。 |\
+| | | | | |(通过缓存返回的数据不会附带时间戳) |\
+| | | | | |Python示例:`"extra_param": json.dumps({"cache_config": {"text_type": 1,"use_cache": True}})` |
+| | | | | | | \
+|text_type |缓存相关参数 |4 |int | |和use_cache参数一起使用,需要开启缓存时传1 |
+| | | | | | | \
+|use_cache |缓存相关参数 |4 |bool | |和text_type参数一起使用,需要开启缓存时传true |
+
+
+
+
+备注:
+
+1. 已支持字级别时间戳能力(ssml文本类型不支持)
+2. ssml 能力已支持,详见 [SSML 标记语言--豆包语音-火山引擎 (volcengine.com)](https://www.volcengine.com/docs/6561/1330194)
+3. 暂时不支持音高调节
+4. 大模型音色语种支持中英混
+5. 大模型非双向流式已支持latex公式
+6. 在 websocket/http 握手成功后,会返回这些 Response header
+
+
+| | | | \
+|Key |说明 |Value 示例 |
+|---|---|---|
+| | | | \
+|X-Tt-Logid |服务端返回的 logid,建议用户获取和打印方便定位问题,使用默认格式即可,不要自定义格式 |202407261553070FACFE6D19421815D605 |
+
+请求示例:
+```go
+{
+ "app": {
+ "appid": "appid123",
+ "token": "access_token",
+ "cluster": "volcano_tts",
+ },
+ "user": {
+ "uid": "uid123"
+ },
+ "audio": {
+ "voice_type": "zh_male_M392_conversation_wvae_bigtts",
+ "encoding": "mp3",
+ "speed_ratio": 1.0,
+ },
+ "request": {
+ "reqid": "uuid",
+ "text": "字节跳动语音合成",
+ "operation": "query",
+ }
+}
+```
+
+
+## 返回参数
+
+| | | | | | \
+|字段 |含义 |层级 |格式 |备注 |
+|---|---|---|---|---|
+| | | | | | \
+|reqid |请求 ID |1 |string |请求 ID,与传入的参数中 reqid 一致 |
+| | | | | | \
+|code |请求状态码 |1 |int |错误码,参考下方说明 |
+| | | | | | \
+|message |请求状态信息 |1 |string |错误信息 |
+| | | | | | \
+|sequence |音频段序号 |1 |int |负数表示合成完毕 |
+| | | | | | \
+|data |合成音频 |1 |string |返回的音频数据,base64 编码 |
+| | | | | | \
+|addition |额外信息 |1 |string |额外信息父节点 |
+| | | | | | \
+|duration |音频时长 |2 |string |返回音频的长度,单位 ms |
+
+响应示例
+```go
+{
+ "reqid": "reqid",
+ "code": 3000,
+ "operation": "query",
+ "message": "Success",
+ "sequence": -1,
+ "data": "base64 encoded binary data",
+ "addition": {
+ "duration": "1960",
+ }
+}
+```
+
+
+## 注意事项
+
+* websocket 单条链接仅支持单次合成,若需要合成多次,则需要多次建立链接
+* 每次合成时 reqid 这个参数需要重新设置,且要保证唯一性(建议使用 uuid.V4 生成)
+* operation 需要设置为 submit
+
+
+# 返回码说明
+
+| | | | | \
+|错误码 |错误描述 |举例 |建议行为 |
+|---|---|---|---|
+| | | | | \
+|3000 |请求正确 |正常合成 |正常处理 |
+| | | | | \
+|3001 |无效的请求 |一些参数的值非法,比如 operation 配置错误 |检查参数 |
+| | | | | \
+|3003 |并发超限 |超过在线设置的并发阈值 |重试;使用 sdk 的情况下切换离线 |
+| | | | | \
+|3005 |后端服务忙 |后端服务器负载高 |重试;使用 sdk 的情况下切换离线 |
+| | | | | \
+|3006 |服务中断 |请求已完成/失败之后,相同 reqid 再次请求 |检查参数 |
+| | | | | \
+|3010 |文本长度超限 |单次请求超过设置的文本长度阈值 |检查参数 |
+| | | | | \
+|3011 |无效文本 |参数有误或者文本为空、文本与语种不匹配、文本只含标点 |检查参数 |
+| | | | | \
+|3030 |处理超时 |单次请求超过服务最长时间限制 |重试或检查文本 |
+| | | | | \
+|3031 |处理错误 |后端出现异常 |重试;使用 sdk 的情况下切换离线 |
+| | | | | \
+|3032 |等待获取音频超时 |后端网络异常 |重试;使用 sdk 的情况下切换离线 |
+| | | | | \
+|3040 |后端链路连接错误 |后端网络异常 |重试 |
+| | | | | \
+|3050 |音色不存在 |检查使用的 voice_type 代号 |检查参数 |
+
+
+# 常见错误返回说明
+
+1. 错误返回:
+ "message": "quota exceeded for types: xxxxxxxxx_lifetime"
+ **错误原因:试用版用量用完了,需要开通正式版才能继续使用**
+2. 错误返回:
+ "message": "quota exceeded for types: concurrency"
+ **错误原因:并发超过了限定值,需要减少并发调用情况或者增购并发**
+3. 错误返回:
+ "message": "Fail to feed text, reason Init Engine Instance failed"
+ **错误原因:voice_type / cluster 传递错误**
+4. 错误返回:
+ "message": "illegal input text!"
+ **错误原因:传入的 text 无效,没有可合成的有效文本。比如全部是标点符号或者 emoji 表情,或者使用中文音色时,传递日语,以此类推。多语种音色,也需要使用 language 指定对应的语种**
+5. 错误返回:
+ "message": "authenticate request: load grant: requested grant not found"
+ **错误原因:鉴权失败,需要检查 appid&token 的值是否设置正确,同时,鉴权的正确格式为**
+ **headers["Authorization"] = "Bearer;${token}"**
+6. 错误返回:
+ "message': 'extract request resource id: get resource id: access denied"
+ **错误原因:语音合成已开通正式版且未拥有当前音色授权,需要在控制台购买该音色才能调用。标注免费的音色除 BV001_streaming 及 BV002_streaming 外,需要在控制台进行下单(支付 0 元)**
+
+
diff --git a/Capybara audio/勇敢的小裁缝_1770727373.mp3 b/Capybara audio/勇敢的小裁缝_1770727373.mp3
new file mode 100644
index 0000000..cd8ffb9
Binary files /dev/null and b/Capybara audio/勇敢的小裁缝_1770727373.mp3 differ
diff --git a/Capybara audio/卡皮巴拉的奇幻漂流_1770727390.mp3 b/Capybara audio/卡皮巴拉的奇幻漂流_1770727390.mp3
new file mode 100644
index 0000000..c33e659
Binary files /dev/null and b/Capybara audio/卡皮巴拉的奇幻漂流_1770727390.mp3 differ
diff --git a/Capybara audio/小红帽与大灰狼_1770723087.mp3 b/Capybara audio/小红帽与大灰狼_1770723087.mp3
new file mode 100644
index 0000000..6debfbd
Binary files /dev/null and b/Capybara audio/小红帽与大灰狼_1770723087.mp3 differ
diff --git a/Capybara audio/杰克与魔豆_1770727355.mp3 b/Capybara audio/杰克与魔豆_1770727355.mp3
new file mode 100644
index 0000000..0274f5b
Binary files /dev/null and b/Capybara audio/杰克与魔豆_1770727355.mp3 differ
diff --git a/Capybara audio/海盗找朋友_1770718270.mp3 b/Capybara audio/海盗找朋友_1770718270.mp3
new file mode 100644
index 0000000..1d56aef
Binary files /dev/null and b/Capybara audio/海盗找朋友_1770718270.mp3 differ
diff --git a/Capybara audio/糖果屋历险记_1770721395.mp3 b/Capybara audio/糖果屋历险记_1770721395.mp3
new file mode 100644
index 0000000..1152f11
Binary files /dev/null and b/Capybara audio/糖果屋历险记_1770721395.mp3 differ
diff --git a/Capybara music/lyrics/书房咔咔茶_1770634690.txt b/Capybara music/lyrics/书房咔咔茶_1770634690.txt
new file mode 100644
index 0000000..3baa1ef
--- /dev/null
+++ b/Capybara music/lyrics/书房咔咔茶_1770634690.txt
@@ -0,0 +1,17 @@
+在书房角落 沏上一杯茶
+窗外微风轻拂 摇曳着树梢
+咔咔坐在椅上 沉浸在思考
+书页轻轻翻动 世界变得渺小
+咔咔咔咔 书房里的我
+静享时光 悠然自得
+茶香飘散 心灵得到慰藉
+咔咔咔咔 享受这刻
+阳光透过窗帘 柔和又温暖
+每个字每个句 都是心灵的食粮
+咔咔轻轻点头 感受着文字的力量
+在这安静的角落 找到了自我方向
+咔咔咔咔 书房里的我
+静享时光 悠然自得
+茶香飘散 心灵得到慰藉
+咔咔咔咔 享受这刻
+(茶杯轻放的声音...)
\ No newline at end of file
diff --git a/Capybara music/lyrics/书房咔咔茶_1770637242.txt b/Capybara music/lyrics/书房咔咔茶_1770637242.txt
new file mode 100644
index 0000000..c3b5f20
--- /dev/null
+++ b/Capybara music/lyrics/书房咔咔茶_1770637242.txt
@@ -0,0 +1,17 @@
+在书房角落里,我找到了安静
+一杯茶香飘来,思绪开始飞腾
+书页轻轻翻动,知识在心间
+咔咔我在这里,享受这宁静
+咔咔咔咔,独自享受
+书中的世界,如此美妙
+咔咔咔咔,心无旁骛
+沉浸在知识的海洋,自在飞翔
+窗外微风轻拂,阳光洒满书桌
+咔咔我在这里,与文字共舞
+每个字每个句,都像是音符
+奏出心灵的乐章,如此动听
+咔咔咔咔,独自享受
+书中的世界,如此美妙
+咔咔咔咔,心无旁骛
+沉浸在知识的海洋,自在飞翔
+(翻书声...风铃声...咔咔的呼吸声...)
\ No newline at end of file
diff --git a/Capybara music/lyrics/夜深了窗外下着小雨盖着被子准备入睡_1770627405.txt b/Capybara music/lyrics/夜深了窗外下着小雨盖着被子准备入睡_1770627405.txt
new file mode 100644
index 0000000..7c9342a
--- /dev/null
+++ b/Capybara music/lyrics/夜深了窗外下着小雨盖着被子准备入睡_1770627405.txt
@@ -0,0 +1,8 @@
+[verse]
+窗外细雨轻敲窗,
+被窝里温暖如常。
+[chorus]
+咔咔咔咔,梦乡近了,
+小雨伴我入眠床。
+[outro]
+(雨声和咔咔的呼吸声...)
\ No newline at end of file
diff --git a/Capybara music/lyrics/慵懒的午后泡在温泉里听水声发呆什么都不想_1770627905.txt b/Capybara music/lyrics/慵懒的午后泡在温泉里听水声发呆什么都不想_1770627905.txt
new file mode 100644
index 0000000..d090562
--- /dev/null
+++ b/Capybara music/lyrics/慵懒的午后泡在温泉里听水声发呆什么都不想_1770627905.txt
@@ -0,0 +1 @@
+[Inst]
\ No newline at end of file
diff --git a/Capybara music/lyrics/洗脑咔咔舞_1770631313.txt b/Capybara music/lyrics/洗脑咔咔舞_1770631313.txt
new file mode 100644
index 0000000..8b0c054
--- /dev/null
+++ b/Capybara music/lyrics/洗脑咔咔舞_1770631313.txt
@@ -0,0 +1,20 @@
+咔咔咔咔来跳舞,魔性旋律不停步
+跟着节奏摇摆身,洗脑神曲不放手
+重复的旋律像魔法,让人听了就上瘾
+咔咔咔咔的魔力,谁也挡不住
+洗脑咔咔舞,洗脑咔咔舞
+魔性的旋律,让人停不下来
+洗脑咔咔舞,洗脑咔咔舞
+跟着咔咔一起跳,快乐无边
+每个节拍都精准,咔咔的舞步最迷人
+不管走到哪里去,都能听到这魔音
+咔咔的舞蹈最独特,让人看了就想学
+洗脑神曲的魅力,就是让人忘不掉
+洗脑咔咔舞,洗脑咔咔舞
+魔性的旋律,让人停不下来
+洗脑咔咔舞,洗脑咔咔舞
+跟着咔咔一起跳,快乐无边
+咔咔咔咔,魔性洗脑舞
+重复的节奏,快乐的旋律
+洗脑咔咔舞,洗脑咔咔舞
+让快乐无限循环,直到永远
\ No newline at end of file
diff --git a/Capybara music/lyrics/温泉发呆曲_1770628235.txt b/Capybara music/lyrics/温泉发呆曲_1770628235.txt
new file mode 100644
index 0000000..deb4c0e
--- /dev/null
+++ b/Capybara music/lyrics/温泉发呆曲_1770628235.txt
@@ -0,0 +1,26 @@
+[verse 1]\n
+ 懒懒的午后阳光暖,\n
+ 温泉里我泡得欢。\n
+ 水声潺潺耳边响,\n
+ 什么都不想干。\n
+ \n
+ [chorus]\n
+ 咔咔咔咔,悠然自得,\n
+ 水波轻摇,心情舒畅。\n
+ 咔咔咔咔,享受此刻,\n
+ 懒懒午后,最是惬意。\n
+ \n
+ [verse 2]\n
+ 看着云朵慢慢飘,\n
+ 心思像水一样柔。\n
+ 闭上眼,世界都静了,\n
+ 只有我和这温泉。\n
+ \n
+ [chorus]\n
+ 咔咔咔咔,悠然自得,\n
+ 水波轻摇,心情舒畅。\n
+ 咔咔咔咔,享受此刻,\n
+ 懒懒午后,最是惬意。\n
+ \n
+ [outro]\n
+ (水声渐渐远去...)
\ No newline at end of file
diff --git a/Capybara music/lyrics/温泉发呆曲_1770630396.txt b/Capybara music/lyrics/温泉发呆曲_1770630396.txt
new file mode 100644
index 0000000..931cb55
--- /dev/null
+++ b/Capybara music/lyrics/温泉发呆曲_1770630396.txt
@@ -0,0 +1,21 @@
+慵懒午后阳光暖,温泉里我发呆
+
+水声潺潺耳边响,思绪飘向云外
+
+咔咔咔咔,泡在温泉
+
+心无杂念,享受此刻安宁
+
+什么都不想去做,只想静静享受
+
+水波轻抚我的背,世界变得温柔
+
+咔咔咔咔,泡在温泉
+
+心无杂念,享受此刻安宁
+
+(水花声...)
+
+咔咔的午后,慵懒又自在
+
+温泉里的世界,只有我和水声
\ No newline at end of file
diff --git a/Capybara music/lyrics/温泉发呆曲_1770630635.txt b/Capybara music/lyrics/温泉发呆曲_1770630635.txt
new file mode 100644
index 0000000..2e913e1
--- /dev/null
+++ b/Capybara music/lyrics/温泉发呆曲_1770630635.txt
@@ -0,0 +1,33 @@
+懒懒的午后阳光暖,
+
+温泉里我泡得欢。
+
+水声潺潺耳边响,
+
+什么都不想干。
+
+咔咔咔咔,发呆好时光,
+
+懒懒的我,享受这阳光。
+
+咔咔咔咔,让思绪飘扬,
+
+在温泉里,找到我的天堂。
+
+想法像泡泡一样浮上来,
+
+又慢慢沉下去,消失在水里。
+
+时间仿佛静止,我自在如鱼,
+
+在这温暖的怀抱里。
+
+咔咔咔咔,发呆好时光,
+
+懒懒的我,享受这阳光。
+
+咔咔咔咔,让思绪飘扬,
+
+在温泉里,找到我的天堂。
+
+(水声渐渐远去...)
\ No newline at end of file
diff --git a/Capybara music/lyrics/温泉发呆曲_1770639509.txt b/Capybara music/lyrics/温泉发呆曲_1770639509.txt
new file mode 100644
index 0000000..f1457e9
--- /dev/null
+++ b/Capybara music/lyrics/温泉发呆曲_1770639509.txt
@@ -0,0 +1,33 @@
+懒懒的午后阳光暖,
+
+温泉里我泡得欢。
+
+水声潺潺耳边响,
+
+什么都不想干。
+
+咔咔咔咔,发呆真好,
+
+懒懒的我,享受这秒。
+
+水波轻摇,心也飘,
+
+咔咔世界,别来无恙。
+
+想着云卷云又舒,
+
+温泉里的我多舒服。
+
+时间慢慢流,不急不徐,
+
+咔咔的梦,轻轻浮。
+
+咔咔咔咔,发呆真好,
+
+懒懒的我,享受这秒。
+
+水波轻摇,心也飘,
+
+咔咔世界,别来无恙。
+
+(水声渐渐远去...)
\ No newline at end of file
diff --git a/Capybara music/lyrics/温泉里的咔咔_1770730481.txt b/Capybara music/lyrics/温泉里的咔咔_1770730481.txt
new file mode 100644
index 0000000..7919ee9
--- /dev/null
+++ b/Capybara music/lyrics/温泉里的咔咔_1770730481.txt
@@ -0,0 +1,37 @@
+懒懒的午后阳光暖,
+
+温泉里我泡得欢。
+
+水声潺潺耳边响,
+
+什么都不想干。
+
+咔咔咔咔,悠然自得,
+
+水波荡漾心情悦。
+
+咔咔咔咔,闭上眼,
+
+享受这刻的宁静。
+
+想象自己是条鱼,
+
+在水里自由游来游去。
+
+没有烦恼没有压力,
+
+只有我和这温泉池。
+
+咔咔咔咔,悠然自得,
+
+水波荡漾心情悦。
+
+咔咔咔咔,闭上眼,
+
+享受这刻的宁静。
+
+(水花声...)
+
+咔咔,慵懒午后,
+
+水中世界最逍遥。
\ No newline at end of file
diff --git a/Capybara music/lyrics/草地上的咔咔_1770628910.txt b/Capybara music/lyrics/草地上的咔咔_1770628910.txt
new file mode 100644
index 0000000..b365ebd
--- /dev/null
+++ b/Capybara music/lyrics/草地上的咔咔_1770628910.txt
@@ -0,0 +1,26 @@
+[verse 1]\n"
+ "阳光洒满草地绿\n"
+ "咔咔奔跑心情舒畅\n"
+ "风儿轻拂过脸庞\n"
+ "快乐就像泡泡糖\n"
+ "\n"
+ "[chorus]\n"
+ "咔咔咔咔 快乐无边\n"
+ "草地上的我自由自在\n"
+ "阳光下的影子拉得好长\n"
+ "咔咔咔咔 快乐无边\n"
+ "\n"
+ "[verse 2]\n"
+ "蝴蝶飞舞花儿笑\n"
+ "咔咔摇摆尾巴摇\n"
+ "每一步都跳着舞\n"
+ "生活就像一首歌\n"
+ "\n"
+ "[chorus]\n"
+ "咔咔咔咔 快乐无边\n"
+ "草地上的我自由自在\n"
+ "阳光下的影子拉得好长\n"
+ "咔咔咔咔 快乐无边\n"
+ "\n"
+ "[outro]\n"
+ "(草地上咔咔的笑声...)
\ No newline at end of file
diff --git a/Capybara music/lyrics/草地上的咔咔_1770629673.txt b/Capybara music/lyrics/草地上的咔咔_1770629673.txt
new file mode 100644
index 0000000..76226a8
--- /dev/null
+++ b/Capybara music/lyrics/草地上的咔咔_1770629673.txt
@@ -0,0 +1,17 @@
+阳光洒满地 草香扑鼻来
+咔咔在草地上 跑得飞快
+风儿轻轻吹 摇曳着花海
+心情像彩虹 七彩斑斓开
+咔咔咔咔 快乐无边
+草地上的我 自由自在
+阳光下的梦 美好无限
+咔咔咔咔 快乐无边
+蝴蝶在飞舞 蜜蜂在歌唱
+咔咔跟着它们 一起欢唱
+天空蓝得像画 没有一丝阴霾
+咔咔的心里 只有满满的爱
+咔咔咔咔 快乐无边
+草地上的我 自由自在
+阳光下的梦 美好无限
+咔咔咔咔 快乐无边
+(草地上咔咔的笑声...)
\ No newline at end of file
diff --git a/Capybara music/lyrics/草地上的咔咔_1770640911.txt b/Capybara music/lyrics/草地上的咔咔_1770640911.txt
new file mode 100644
index 0000000..b069c3f
--- /dev/null
+++ b/Capybara music/lyrics/草地上的咔咔_1770640911.txt
@@ -0,0 +1,19 @@
+阳光洒满地 绿草如茵间
+咔咔跑起来 心情像飞燕
+风儿轻拂过 花香满径边
+快乐如此简单 每一步都新鲜
+咔咔咔咔 快乐咔咔
+草地上的我 自由自在
+阳光下的舞 轻松又欢快
+咔咔咔咔 快乐咔咔
+无忧无虑的我 最爱这蓝天
+蝴蝶翩翩起 蜜蜂忙采蜜
+咔咔我最棒 每个瞬间都美丽
+朋友在旁边 笑声传千里
+这世界多美好 有你有我有草地
+咔咔咔咔 快乐咔咔
+草地上的我 自由自在
+阳光下的舞 轻松又欢快
+咔咔咔咔 快乐咔咔
+无忧无虑的我 最爱这蓝天
+(草地上咔咔的笑声...)
\ No newline at end of file
diff --git a/Capybara music/lyrics/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770626906.txt b/Capybara music/lyrics/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770626906.txt
new file mode 100644
index 0000000..0b50271
--- /dev/null
+++ b/Capybara music/lyrics/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770626906.txt
@@ -0,0 +1,8 @@
+[verse]
+阳光洒满草地,我跑得飞快
+心情像彩虹,七彩斑斓真美
+[chorus]
+咔咔咔咔,快乐无边
+在阳光下,自由自在
+[outro]
+(风吹草低见水豚)
\ No newline at end of file
diff --git a/Capybara music/lyrics/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770639287.txt b/Capybara music/lyrics/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770639287.txt
new file mode 100644
index 0000000..d090562
--- /dev/null
+++ b/Capybara music/lyrics/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770639287.txt
@@ -0,0 +1 @@
+[Inst]
\ No newline at end of file
diff --git a/Capybara music/书房咔咔茶_1770634690.mp3 b/Capybara music/书房咔咔茶_1770634690.mp3
new file mode 100644
index 0000000..835700b
Binary files /dev/null and b/Capybara music/书房咔咔茶_1770634690.mp3 differ
diff --git a/Capybara music/书房咔咔茶_1770637242.mp3 b/Capybara music/书房咔咔茶_1770637242.mp3
new file mode 100644
index 0000000..d5912b7
Binary files /dev/null and b/Capybara music/书房咔咔茶_1770637242.mp3 differ
diff --git a/Capybara music/夜深了窗外下着小雨盖着被子准备入睡_1770627405.mp3 b/Capybara music/夜深了窗外下着小雨盖着被子准备入睡_1770627405.mp3
new file mode 100644
index 0000000..38c6ca1
Binary files /dev/null and b/Capybara music/夜深了窗外下着小雨盖着被子准备入睡_1770627405.mp3 differ
diff --git a/Capybara music/惊喜咔咔派_1770642290.mp3 b/Capybara music/惊喜咔咔派_1770642290.mp3
new file mode 100644
index 0000000..47b7f4a
Binary files /dev/null and b/Capybara music/惊喜咔咔派_1770642290.mp3 differ
diff --git a/Capybara music/慵懒的午后泡在温泉里听水声发呆什么都不想_1770627905.mp3 b/Capybara music/慵懒的午后泡在温泉里听水声发呆什么都不想_1770627905.mp3
new file mode 100644
index 0000000..f2747e2
Binary files /dev/null and b/Capybara music/慵懒的午后泡在温泉里听水声发呆什么都不想_1770627905.mp3 differ
diff --git a/Capybara music/洗脑咔咔舞_1770631313.mp3 b/Capybara music/洗脑咔咔舞_1770631313.mp3
new file mode 100644
index 0000000..8c51742
Binary files /dev/null and b/Capybara music/洗脑咔咔舞_1770631313.mp3 differ
diff --git a/Capybara music/温泉发呆曲_1770628235.mp3 b/Capybara music/温泉发呆曲_1770628235.mp3
new file mode 100644
index 0000000..d88634b
Binary files /dev/null and b/Capybara music/温泉发呆曲_1770628235.mp3 differ
diff --git a/Capybara music/温泉发呆曲_1770630396.mp3 b/Capybara music/温泉发呆曲_1770630396.mp3
new file mode 100644
index 0000000..5023065
Binary files /dev/null and b/Capybara music/温泉发呆曲_1770630396.mp3 differ
diff --git a/Capybara music/温泉发呆曲_1770630635.mp3 b/Capybara music/温泉发呆曲_1770630635.mp3
new file mode 100644
index 0000000..a31ecc8
Binary files /dev/null and b/Capybara music/温泉发呆曲_1770630635.mp3 differ
diff --git a/Capybara music/温泉发呆曲_1770639509.mp3 b/Capybara music/温泉发呆曲_1770639509.mp3
new file mode 100644
index 0000000..2f77a83
Binary files /dev/null and b/Capybara music/温泉发呆曲_1770639509.mp3 differ
diff --git a/Capybara music/温泉里的咔咔_1770730481.mp3 b/Capybara music/温泉里的咔咔_1770730481.mp3
new file mode 100644
index 0000000..d07f22a
Binary files /dev/null and b/Capybara music/温泉里的咔咔_1770730481.mp3 differ
diff --git a/Capybara music/草地上的咔咔_1770628910.mp3 b/Capybara music/草地上的咔咔_1770628910.mp3
new file mode 100644
index 0000000..ebbde2c
Binary files /dev/null and b/Capybara music/草地上的咔咔_1770628910.mp3 differ
diff --git a/Capybara music/草地上的咔咔_1770629673.mp3 b/Capybara music/草地上的咔咔_1770629673.mp3
new file mode 100644
index 0000000..44f4ec4
Binary files /dev/null and b/Capybara music/草地上的咔咔_1770629673.mp3 differ
diff --git a/Capybara music/草地上的咔咔_1770640911.mp3 b/Capybara music/草地上的咔咔_1770640911.mp3
new file mode 100644
index 0000000..2cab36e
Binary files /dev/null and b/Capybara music/草地上的咔咔_1770640911.mp3 differ
diff --git a/Capybara music/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770626906.mp3 b/Capybara music/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770626906.mp3
new file mode 100644
index 0000000..8f48317
Binary files /dev/null and b/Capybara music/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770626906.mp3 differ
diff --git a/Capybara music/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770639287.mp3 b/Capybara music/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770639287.mp3
new file mode 100644
index 0000000..b16cbba
Binary files /dev/null and b/Capybara music/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770639287.mp3 differ
diff --git a/Capybara stories/海盗找朋友_1770647563.txt b/Capybara stories/海盗找朋友_1770647563.txt
new file mode 100644
index 0000000..123215a
--- /dev/null
+++ b/Capybara stories/海盗找朋友_1770647563.txt
@@ -0,0 +1,11 @@
+# 海盗找朋友
+
+在蓝色的大海上,有一艘小小的海盗船,船上只有一个小海盗。他戴着歪歪的海盗帽,举着塑料做的小钩子手,每天对着海浪喊:“谁来和我玩呀?”
+
+这天,小海盗的船被海浪冲到了一座彩虹岛。岛上的沙滩上,躺着一个会发光的贝壳。小海盗刚捡起贝壳,贝壳突然“叮咚”响了一声,跳出一只圆滚滚的小海豚!
+
+“哇!你是我的宝藏吗?”小海盗举着贝壳问。小海豚摇摇头,用尾巴拍了拍海水:“我带你去找真正的宝藏!”它驮着小海盗游向海底,那里有一个藏着星星的洞穴。
+
+洞穴里,小海豚拿出了一个会唱歌的海螺:“这是友谊海螺,对着它喊朋友的名字,就会有惊喜哦!”小海盗对着海螺喊:“我的朋友!”突然,从海螺里钻出一群小螃蟹,举着彩色的小旗子,还有一只会吹泡泡的章鱼!
+
+原来,小海豚早就听说小海盗很孤单,特意用友谊海螺召集了伙伴们。现在,小海盗的船上每天都飘着笑声,他再也不是孤单的小海盗啦!
\ No newline at end of file
diff --git a/airhub_app/lib/pages/music_creation_page.dart b/airhub_app/lib/pages/music_creation_page.dart
index 5145f3b..ac90f91 100644
--- a/airhub_app/lib/pages/music_creation_page.dart
+++ b/airhub_app/lib/pages/music_creation_page.dart
@@ -417,9 +417,18 @@ class _MusicCreationPageState extends State
// Actually play or pause audio
try {
if (_isPlaying) {
+ // Show now-playing bubble immediately (before await)
+ _playStickyText = '正在播放: ${_playlist[_currentTrackIndex].title}';
+ setState(() {
+ _speechText = _playStickyText;
+ _speechVisible = true;
+ });
await _audioPlayer.play();
} else {
await _audioPlayer.pause();
+ // Hide bubble on pause
+ _playStickyText = null;
+ setState(() => _speechVisible = false);
}
} catch (e) {
debugPrint('Playback error: $e');
@@ -428,6 +437,7 @@ class _MusicCreationPageState extends State
// Revert UI state on error
setState(() {
_isPlaying = false;
+ _playStickyText = null;
_vinylSpinController.stop();
_tonearmController.reverse();
});
@@ -474,7 +484,8 @@ class _MusicCreationPageState extends State
}
}
- _showSpeech('正在播放: ${_playlist[index].title}');
+ _playStickyText = '正在播放: ${_playlist[index].title}';
+ _showSpeech(_playStickyText!, duration: 0);
}
// ── Mood Selection ──
@@ -646,6 +657,7 @@ class _MusicCreationPageState extends State
// ── Speech Bubble ──
String? _genStickyText; // Persistent text during generation
+ String? _playStickyText; // Persistent text during playback
void _showSpeech(String text, {int duration = 3000}) {
// If this is a generation-related message (duration == 0), save it as sticky
@@ -667,6 +679,12 @@ class _MusicCreationPageState extends State
_speechText = _genStickyText;
_speechVisible = true;
});
+ } else if (_isPlaying && _playStickyText != null) {
+ // If playing, restore the now-playing message
+ setState(() {
+ _speechText = _playStickyText;
+ _speechVisible = true;
+ });
} else {
setState(() => _speechVisible = false);
}
@@ -800,7 +818,9 @@ class _MusicCreationPageState extends State
child: _buildVinylWrapper(),
),
// Speech bubble — positioned top-right
- if (_speechVisible && _speechText != null)
+ // Always show during playback; otherwise use _speechVisible
+ if ((_speechVisible && _speechText != null) ||
+ (_isPlaying && _playStickyText != null))
Positioned(
top: 0,
right: -24, // HTML: right: -24px
@@ -1067,12 +1087,18 @@ class _MusicCreationPageState extends State
Widget _buildSpeechBubble() {
// HTML: .capy-speech-bubble with clip-path iMessage-style tail at bottom-left
const tailH = 8.0;
+ // During playback, always show the playing text even if _speechVisible is false
+ final bool showBubble = _speechVisible || (_isPlaying && _playStickyText != null);
+ final String bubbleText = (_isPlaying && _playStickyText != null && !_speechVisible)
+ ? _playStickyText!
+ : (_speechText ?? '');
+
return AnimatedOpacity(
duration: const Duration(milliseconds: 200),
- opacity: _speechVisible ? 1.0 : 0.0,
+ opacity: showBubble ? 1.0 : 0.0,
child: AnimatedScale(
duration: const Duration(milliseconds: 350),
- scale: _speechVisible ? 1.0 : 0.7,
+ scale: showBubble ? 1.0 : 0.7,
curve: const Cubic(0.34, 1.56, 0.64, 1.0),
alignment: Alignment.bottomLeft,
child: Column(
@@ -1098,7 +1124,7 @@ class _MusicCreationPageState extends State
],
),
child: Text(
- _speechText ?? '',
+ bubbleText,
style: GoogleFonts.dmSans(
fontSize: 12.5,
fontWeight: FontWeight.w500,
@@ -1485,6 +1511,7 @@ class _MusicCreationPageState extends State
builder: (ctx) => _PlaylistModalContent(
tracks: _playlist,
currentIndex: _currentTrackIndex,
+ isPlaying: _isPlaying,
onSelect: (index) {
Navigator.pop(ctx);
_playTrack(index);
@@ -1921,17 +1948,53 @@ class _InputModalContent extends StatelessWidget {
}
/// Playlist Modal — HTML: .playlist-container
-class _PlaylistModalContent extends StatelessWidget {
+class _PlaylistModalContent extends StatefulWidget {
final List<_Track> tracks;
final int currentIndex;
+ final bool isPlaying;
final ValueChanged onSelect;
const _PlaylistModalContent({
required this.tracks,
required this.currentIndex,
+ required this.isPlaying,
required this.onSelect,
});
+ @override
+ State<_PlaylistModalContent> createState() => _PlaylistModalContentState();
+}
+
+class _PlaylistModalContentState extends State<_PlaylistModalContent>
+ with SingleTickerProviderStateMixin {
+ late AnimationController _waveController;
+
+ @override
+ void initState() {
+ super.initState();
+ _waveController = AnimationController(
+ vsync: this,
+ duration: const Duration(milliseconds: 800),
+ );
+ if (widget.isPlaying) _waveController.repeat(reverse: true);
+ }
+
+ @override
+ void didUpdateWidget(covariant _PlaylistModalContent oldWidget) {
+ super.didUpdateWidget(oldWidget);
+ if (widget.isPlaying && !_waveController.isAnimating) {
+ _waveController.repeat(reverse: true);
+ } else if (!widget.isPlaying && _waveController.isAnimating) {
+ _waveController.stop();
+ }
+ }
+
+ @override
+ void dispose() {
+ _waveController.dispose();
+ super.dispose();
+ }
+
@override
Widget build(BuildContext context) {
final screenWidth = MediaQuery.of(context).size.width;
@@ -2015,23 +2078,39 @@ class _PlaylistModalContent extends StatelessWidget {
mainAxisSpacing: 8,
childAspectRatio: 0.75,
),
- itemCount: tracks.length,
+ itemCount: widget.tracks.length,
itemBuilder: (context, index) {
- final track = tracks[index];
- final isPlaying = index == currentIndex;
+ final track = widget.tracks[index];
+ final isCurrent = index == widget.currentIndex;
+ final isPlaying = isCurrent && widget.isPlaying;
// HTML: .record-slot { background: rgba(0,0,0,0.03); border-radius: 12px;
// padding: 10px 4px; border: 1px solid rgba(0,0,0,0.02); }
return GestureDetector(
- onTap: () => onSelect(index),
+ onTap: () => widget.onSelect(index),
child: Container(
padding:
const EdgeInsets.symmetric(horizontal: 4, vertical: 10),
decoration: BoxDecoration(
- color: Colors.black.withOpacity(0.03),
+ // Current track: warm golden background; others: subtle grey
+ color: isCurrent
+ ? const Color(0xFFFDF3E3)
+ : Colors.black.withOpacity(0.03),
borderRadius: BorderRadius.circular(12),
border: Border.all(
- color: Colors.black.withOpacity(0.02)),
+ color: isCurrent
+ ? const Color(0xFFECCFA8).withOpacity(0.6)
+ : Colors.black.withOpacity(0.02),
+ width: isCurrent ? 1.5 : 1.0),
+ boxShadow: isCurrent
+ ? [
+ BoxShadow(
+ color: const Color(0xFFECCFA8).withOpacity(0.25),
+ blurRadius: 8,
+ offset: const Offset(0, 2),
+ ),
+ ]
+ : null,
),
child: Column(
children: [
@@ -2043,10 +2122,8 @@ class _PlaylistModalContent extends StatelessWidget {
decoration: BoxDecoration(
shape: BoxShape.circle,
color: const Color(0xFF18181B),
- // HTML: .record-item.playing .record-cover-wrapper
- // { box-shadow: 0 0 0 2px #ECCFA8, ... }
boxShadow: [
- if (isPlaying)
+ if (isCurrent)
const BoxShadow(
color: Color(0xFFECCFA8),
spreadRadius: 2,
@@ -2096,23 +2173,57 @@ class _PlaylistModalContent extends StatelessWidget {
),
),
),
+ // Sound wave overlay for playing track
+ if (isPlaying)
+ Center(
+ child: AnimatedBuilder(
+ animation: _waveController,
+ builder: (context, child) {
+ return CustomPaint(
+ painter: _MiniWavePainter(
+ progress: _waveController.value,
+ ),
+ size: const Size(28, 20),
+ );
+ },
+ ),
+ ),
],
),
),
),
),
const SizedBox(height: 8),
- // HTML: .record-title { font-size: 12px; font-weight: 500; }
- Text(
- track.title,
- style: GoogleFonts.dmSans(
- fontSize: 12,
- fontWeight: FontWeight.w500,
- color: const Color(0xFF374151),
- ),
- textAlign: TextAlign.center,
- maxLines: 1,
- overflow: TextOverflow.ellipsis,
+ // Title with playing indicator
+ Row(
+ mainAxisAlignment: MainAxisAlignment.center,
+ mainAxisSize: MainAxisSize.min,
+ children: [
+ if (isCurrent)
+ Padding(
+ padding: const EdgeInsets.only(right: 3),
+ child: Icon(
+ isPlaying ? Icons.volume_up_rounded : Icons.volume_off_rounded,
+ size: 12,
+ color: const Color(0xFFECCFA8),
+ ),
+ ),
+ Flexible(
+ child: Text(
+ track.title,
+ style: GoogleFonts.dmSans(
+ fontSize: 12,
+ fontWeight: isCurrent ? FontWeight.w600 : FontWeight.w500,
+ color: isCurrent
+ ? const Color(0xFFB8860B)
+ : const Color(0xFF374151),
+ ),
+ textAlign: TextAlign.center,
+ maxLines: 1,
+ overflow: TextOverflow.ellipsis,
+ ),
+ ),
+ ],
),
],
),
@@ -2127,3 +2238,39 @@ class _PlaylistModalContent extends StatelessWidget {
}
}
+/// Mini sound wave painter for playlist playing indicator
+class _MiniWavePainter extends CustomPainter {
+ final double progress;
+
+ _MiniWavePainter({required this.progress});
+
+ @override
+ void paint(Canvas canvas, Size size) {
+ final paint = Paint()
+ ..color = const Color(0xFFECCFA8)
+ ..strokeWidth = 2.5
+ ..strokeCap = StrokeCap.round;
+
+ const barCount = 4;
+ final barWidth = size.width / (barCount * 2 - 1);
+ final centerY = size.height / 2;
+
+ for (int i = 0; i < barCount; i++) {
+ // Each bar has a different phase offset for wave effect
+ final phase = (progress + i * 0.25) % 1.0;
+ final height = size.height * (0.3 + 0.7 * (0.5 + 0.5 * sin(phase * 3.14159 * 2)));
+ final x = i * barWidth * 2 + barWidth / 2;
+
+ canvas.drawLine(
+ Offset(x, centerY - height / 2),
+ Offset(x, centerY + height / 2),
+ paint,
+ );
+ }
+ }
+
+ @override
+ bool shouldRepaint(covariant _MiniWavePainter oldDelegate) =>
+ oldDelegate.progress != progress;
+}
+
diff --git a/airhub_app/lib/pages/story_detail_page.dart b/airhub_app/lib/pages/story_detail_page.dart
index dfe34d4..fa96b33 100644
--- a/airhub_app/lib/pages/story_detail_page.dart
+++ b/airhub_app/lib/pages/story_detail_page.dart
@@ -1,9 +1,12 @@
+import 'dart:async';
import 'dart:ui' as ui;
import 'package:flutter/material.dart';
-import 'package:flutter_svg/flutter_svg.dart';
+import 'package:just_audio/just_audio.dart';
import '../theme/design_tokens.dart';
import '../widgets/gradient_button.dart';
+import '../widgets/pill_progress_button.dart';
+import '../services/tts_service.dart';
import 'story_loading_page.dart';
enum StoryMode { generated, read }
@@ -30,6 +33,14 @@ class _StoryDetailPageState extends State
bool _hasGeneratedVideo = false;
bool _isLoadingVideo = false;
+ // TTS — uses global TTSService singleton
+ final TTSService _ttsService = TTSService.instance;
+ final AudioPlayer _audioPlayer = AudioPlayer();
+ StreamSubscription? _positionSub;
+ StreamSubscription? _playerStateSub;
+ Duration _audioDuration = Duration.zero;
+ Duration _audioPosition = Duration.zero;
+
// Genie Suck Animation
bool _isSaving = false;
AnimationController? _genieController;
@@ -41,9 +52,9 @@ class _StoryDetailPageState extends State
'content': """
在遥远的银河系边缘,有一个被星云包裹的神秘茶馆。今天,这里迎来了两位特殊的客人:刚执行完火星探测任务的宇航员波波,和正在追捕暗影怪兽的忍者小次郎。
-“这儿的重力好像有点不对劲?”波波飘在半空中,试图抓住飞来飞去的茶杯。小次郎则冷静地倒挂在天花板上,手里紧握着一枚手里剑——其实那是用来切月饼的。
+"这儿的重力好像有点不对劲?"波波飘在半空中,试图抓住飞来飞去的茶杯。小次郎则冷静地倒挂在天花板上,手里紧握着一枚手里剑——其实那是用来切月饼的。
-突然,桌上的魔法茶壶“噗”地一声喷出了七彩烟雾,一只会说话的卡皮巴拉钻了出来:“别打架,别打架,喝了这杯‘银河气泡茶’,我们都是好朋友!”
+突然,桌上的魔法茶壶"噗"地一声喷出了七彩烟雾,一只会说话的卡皮巴拉钻了出来:"别打架,别打架,喝了这杯'银河气泡茶',我们都是好朋友!"
于是,宇宙中最奇怪的组合诞生了。他们决定,下一站,去黑洞边缘钓星星。
""",
@@ -54,7 +65,6 @@ class _StoryDetailPageState extends State
Map _initStory() {
final source = widget.story ?? _defaultStory;
final result = Map.from(source);
- // 兜底:如果没有 content 就用默认故事内容
result['content'] ??= _defaultStory['content'];
result['title'] ??= _defaultStory['title'];
return result;
@@ -64,18 +74,171 @@ class _StoryDetailPageState extends State
void initState() {
super.initState();
_currentStory = _initStory();
+
+ // Subscribe to TTSService changes
+ _ttsService.addListener(_onTTSChanged);
+
+ // Listen to audio player state
+ _playerStateSub = _audioPlayer.playerStateStream.listen((state) {
+ if (!mounted) return;
+ if (state.processingState == ProcessingState.completed) {
+ setState(() {
+ _isPlaying = false;
+ _audioPosition = Duration.zero;
+ });
+ }
+ });
+
+ // Listen to playback position for ring progress
+ _positionSub = _audioPlayer.positionStream.listen((pos) {
+ if (!mounted) return;
+ setState(() => _audioPosition = pos);
+ });
+
+ // Listen to duration changes
+ _audioPlayer.durationStream.listen((dur) {
+ if (!mounted || dur == null) return;
+ setState(() => _audioDuration = dur);
+ });
+
+ // Check if audio already exists (via TTSService)
+ final title = _currentStory['title'] as String? ?? '';
+ _ttsService.checkExistingAudio(title);
+ }
+
+ void _onTTSChanged() {
+ if (!mounted) return;
+
+ // Auto-play when generation completes
+ if (_ttsService.justCompleted &&
+ _ttsService.hasAudioFor(_currentStory['title'] ?? '')) {
+ // Delay slightly to let the completion flash play
+ Future.delayed(const Duration(milliseconds: 1500), () {
+ if (mounted) {
+ _ttsService.clearJustCompleted();
+ final route = ModalRoute.of(context);
+ if (route != null && route.isCurrent) {
+ _playAudio();
+ }
+ }
+ });
+ }
+
+ setState(() {});
}
@override
void dispose() {
+ _ttsService.removeListener(_onTTSChanged);
+ _positionSub?.cancel();
+ _playerStateSub?.cancel();
+ _audioPlayer.dispose();
_genieController?.dispose();
super.dispose();
}
- /// Trigger Genie Suck animation matching HTML:
- /// CSS: animation: genieSuck 0.8s cubic-bezier(0.6, -0.28, 0.735, 0.045) forwards
- /// Phase 1 (0→15%): card scales up to 1.05 (tension)
- /// Phase 2 (15%→100%): card shrinks to 0.05, moves toward bottom, blurs & fades
+ // ── TTS button logic ──
+
+ bool _audioLoaded = false; // Track if audio URL is loaded in player
+ String? _loadedUrl; // Which URL is currently loaded
+
+ TTSButtonState get _ttsState {
+ final title = _currentStory['title'] as String? ?? '';
+
+ if (_ttsService.error != null &&
+ !_ttsService.isGenerating &&
+ _ttsService.audioUrl == null) {
+ return TTSButtonState.error;
+ }
+ if (_ttsService.isGeneratingFor(title)) {
+ return TTSButtonState.generating;
+ }
+ if (_ttsService.justCompleted && _ttsService.hasAudioFor(title)) {
+ return TTSButtonState.completed;
+ }
+ if (_isPlaying) {
+ return TTSButtonState.playing;
+ }
+ if (_ttsService.hasAudioFor(title) && !_audioLoaded) {
+ return TTSButtonState.ready; // audio ready, not yet played -> show "鎾斁"
+ }
+ if (_audioLoaded) {
+ return TTSButtonState.paused; // was playing, now paused -> show "缁х画"
+ }
+ return TTSButtonState.idle;
+ }
+
+ double get _ttsProgress {
+ final state = _ttsState;
+ switch (state) {
+ case TTSButtonState.generating:
+ return _ttsService.progress;
+ case TTSButtonState.ready:
+ return 0.0;
+ case TTSButtonState.completed:
+ return 1.0;
+ case TTSButtonState.playing:
+ case TTSButtonState.paused:
+ if (_audioDuration.inMilliseconds > 0) {
+ return (_audioPosition.inMilliseconds / _audioDuration.inMilliseconds)
+ .clamp(0.0, 1.0);
+ }
+ return 0.0;
+ default:
+ return 0.0;
+ }
+ }
+
+ void _handleTTSTap() {
+ final state = _ttsState;
+ switch (state) {
+ case TTSButtonState.idle:
+ case TTSButtonState.error:
+ final title = _currentStory['title'] as String? ?? '';
+ final content = _currentStory['content'] as String? ?? '';
+ _ttsService.generate(title: title, content: content);
+ break;
+ case TTSButtonState.generating:
+ break;
+ case TTSButtonState.ready:
+ case TTSButtonState.completed:
+ case TTSButtonState.paused:
+ _playAudio();
+ break;
+ case TTSButtonState.playing:
+ _audioPlayer.pause();
+ setState(() => _isPlaying = false);
+ break;
+ }
+ }
+
+ Future _playAudio() async {
+ final title = _currentStory['title'] as String? ?? '';
+ final url = _ttsService.hasAudioFor(title) ? _ttsService.audioUrl : null;
+ if (url == null) return;
+
+ try {
+ // If already loaded the same URL, seek to saved position and resume
+ if (_audioLoaded && _loadedUrl == url) {
+ await _audioPlayer.seek(_audioPosition);
+ _audioPlayer.play();
+ } else {
+ // Load new URL and play from start
+ await _audioPlayer.setUrl(url);
+ _audioLoaded = true;
+ _loadedUrl = url;
+ _audioPlayer.play();
+ }
+ if (mounted) {
+ setState(() => _isPlaying = true);
+ }
+ } catch (e) {
+ debugPrint('Audio play error: $e');
+ }
+ }
+
+ // ── Genie Suck Animation ──
+
void _triggerGenieSuck() {
if (_isSaving) return;
@@ -84,7 +247,6 @@ class _StoryDetailPageState extends State
duration: const Duration(milliseconds: 800),
);
- // Calculate how far the card should travel downward (toward the save button)
final screenHeight = MediaQuery.of(context).size.height;
_targetDY = screenHeight * 0.35;
@@ -94,23 +256,20 @@ class _StoryDetailPageState extends State
}
});
- setState(() {
- _isSaving = true;
- });
+ setState(() => _isSaving = true);
_genieController!.forward();
}
+ // ── Build ──
+
@override
Widget build(BuildContext context) {
return Scaffold(
- backgroundColor: AppColors.storyBackground, // #FDF9F3
+ backgroundColor: AppColors.storyBackground,
body: SafeArea(
child: Column(
children: [
- // Header + Content Card — animated together during genie suck
Expanded(child: _buildAnimatedBody()),
-
- // Footer
_buildFooter(),
],
),
@@ -118,7 +277,6 @@ class _StoryDetailPageState extends State
);
}
- /// Wraps header + content card in genie suck animation
Widget _buildAnimatedBody() {
Widget body = Column(
children: [
@@ -132,7 +290,7 @@ class _StoryDetailPageState extends State
return AnimatedBuilder(
animation: _genieController!,
builder: (context, child) {
- final t = _genieController!.value; // linear 0→1
+ final t = _genieController!.value;
double scale;
double translateY;
@@ -140,14 +298,12 @@ class _StoryDetailPageState extends State
double blur;
if (t <= 0.15) {
- // Phase 1: tension — whole area scales up slightly
final p = t / 0.15;
scale = 1.0 + 0.05 * Curves.easeOut.transform(p);
translateY = 0;
opacity = 1.0;
blur = 0;
} else {
- // Phase 2: suck — shrinks, moves down, fades and blurs
final p = ((t - 0.15) / 0.85).clamp(0.0, 1.0);
final curved =
const Cubic(0.6, -0.28, 0.735, 0.045).transform(p);
@@ -209,7 +365,7 @@ class _StoryDetailPageState extends State
),
),
Text(
- _currentStory['title'],
+ _currentStory['title'] ?? '',
style: const TextStyle(
fontSize: 17,
fontWeight: FontWeight.w600,
@@ -227,9 +383,9 @@ class _StoryDetailPageState extends State
child: Row(
mainAxisAlignment: MainAxisAlignment.center,
children: [
- _buildTabBtn('📄 故事', 'text'),
+ _buildTabBtn('故事', 'text'),
const SizedBox(width: 8),
- _buildTabBtn('🎬 绘本', 'video'),
+ _buildTabBtn('绘本', 'video'),
],
),
);
@@ -238,11 +394,7 @@ class _StoryDetailPageState extends State
Widget _buildTabBtn(String label, String key) {
bool isActive = _activeTab == key;
return GestureDetector(
- onTap: () {
- setState(() {
- _activeTab = key;
- });
- },
+ onTap: () => setState(() => _activeTab = key),
child: Container(
padding: const EdgeInsets.symmetric(horizontal: 16, vertical: 8),
decoration: BoxDecoration(
@@ -271,7 +423,6 @@ class _StoryDetailPageState extends State
}
Widget _buildContentCard() {
- // HTML: .story-paper
bool isVideoMode = _activeTab == 'video';
return Container(
@@ -292,11 +443,11 @@ class _StoryDetailPageState extends State
_currentStory['content']
.toString()
.replaceAll(RegExp(r'\n+'), '\n\n')
- .trim(), // Simple paragraph spacing
+ .trim(),
style: const TextStyle(
- fontSize: 16, // HTML: 16px
- height: 2.0, // HTML: line-height 2.0
- color: AppColors.storyText, // #374151
+ fontSize: 16,
+ height: 2.0,
+ color: AppColors.storyText,
),
textAlign: TextAlign.justify,
),
@@ -313,7 +464,7 @@ class _StoryDetailPageState extends State
width: 40,
height: 40,
child: CircularProgressIndicator(
- color: Color(0xFFF43F5E), // HTML: #F43F5E
+ color: Color(0xFFF43F5E),
strokeWidth: 3,
),
),
@@ -339,15 +490,14 @@ class _StoryDetailPageState extends State
alignment: Alignment.center,
children: [
AspectRatio(
- aspectRatio: 16 / 9, // Assume landscape video
+ aspectRatio: 16 / 9,
child: Container(
color: Colors.black,
child: const Center(
child: Icon(Icons.videocam, color: Colors.white54, size: 48),
- ), // Placeholder for Video Player
+ ),
),
),
- // Play Button Overlay
Container(
width: 48,
height: 48,
@@ -372,7 +522,6 @@ class _StoryDetailPageState extends State
child: _activeTab == 'text' ? _buildTextFooter() : _buildVideoFooter(),
);
- // Fade out footer during genie suck animation
if (_isSaving) {
return IgnorePointer(
child: AnimatedOpacity(
@@ -387,12 +536,9 @@ class _StoryDetailPageState extends State
}
void _handleRewrite() async {
- // 跳到 loading 页重新生成
final result = await Navigator.of(context).push(
MaterialPageRoute(builder: (context) => const StoryLoadingPage()),
);
-
- // loading 完成后返回结果
if (mounted && result == 'saved') {
Navigator.of(context).pop('saved');
}
@@ -403,7 +549,6 @@ class _StoryDetailPageState extends State
// Generator Mode: Rewrite + Save
return Row(
children: [
- // Rewrite (Secondary)
Expanded(
child: GestureDetector(
onTap: _handleRewrite,
@@ -415,19 +560,25 @@ class _StoryDetailPageState extends State
color: Colors.white.withOpacity(0.8),
),
alignment: Alignment.center,
- child: const Text(
- '↻ 重写',
- style: TextStyle(
- fontSize: 16,
- fontWeight: FontWeight.w600,
- color: Color(0xFF4B5563),
- ),
+ child: const Row(
+ mainAxisAlignment: MainAxisAlignment.center,
+ children: [
+ Icon(Icons.refresh_rounded, size: 18, color: Color(0xFF4B5563)),
+ SizedBox(width: 4),
+ Text(
+ '重写',
+ style: TextStyle(
+ fontSize: 16,
+ fontWeight: FontWeight.w600,
+ color: Color(0xFF4B5563),
+ ),
+ ),
+ ],
),
),
),
),
const SizedBox(width: 16),
- // Save (Primary) - Returns 'saved' to trigger add book animation
Expanded(
child: GradientButton(
text: '保存故事',
@@ -441,41 +592,14 @@ class _StoryDetailPageState extends State
],
);
} else {
- // Read Mode: TTS + Make Picture Book
+ // Read Mode: TTS pill button + Make Picture Book
return Row(
children: [
- // TTS
Expanded(
- child: GestureDetector(
- onTap: () => setState(() => _isPlaying = !_isPlaying),
- child: Container(
- height: 48,
- decoration: BoxDecoration(
- border: Border.all(color: const Color(0xFFE5E7EB)),
- borderRadius: BorderRadius.circular(24),
- color: Colors.white.withOpacity(0.8),
- ),
- alignment: Alignment.center,
- child: Row(
- mainAxisAlignment: MainAxisAlignment.center,
- children: [
- Icon(
- _isPlaying ? Icons.pause : Icons.headphones,
- size: 20,
- color: const Color(0xFF4B5563),
- ),
- const SizedBox(width: 6),
- Text(
- _isPlaying ? '暂停' : '朗读',
- style: const TextStyle(
- fontSize: 16,
- fontWeight: FontWeight.w600,
- color: Color(0xFF4B5563),
- ),
- ),
- ],
- ),
- ),
+ child: PillProgressButton(
+ state: _ttsState,
+ progress: _ttsProgress,
+ onTap: _handleTTSTap,
),
),
const SizedBox(width: 16),
@@ -500,7 +624,7 @@ class _StoryDetailPageState extends State
children: [
Expanded(
child: GradientButton(
- text: '↻ 重新生成',
+ text: '重新生成',
onPressed: _startVideoGeneration,
gradient: const LinearGradient(
colors: AppColors.btnCapybaraGradient,
@@ -517,7 +641,6 @@ class _StoryDetailPageState extends State
_isLoadingVideo = true;
_activeTab = 'video';
});
- // Mock delay
Future.delayed(const Duration(seconds: 2), () {
if (mounted) {
setState(() {
diff --git a/airhub_app/lib/services/tts_service.dart b/airhub_app/lib/services/tts_service.dart
new file mode 100644
index 0000000..5c6458a
--- /dev/null
+++ b/airhub_app/lib/services/tts_service.dart
@@ -0,0 +1,190 @@
+import 'dart:convert';
+import 'package:flutter/foundation.dart';
+import 'package:http/http.dart' as http;
+
+/// Singleton service that manages TTS generation in the background.
+/// Survives page navigation — when user leaves and comes back,
+/// generation continues and result is available.
+class TTSService extends ChangeNotifier {
+ TTSService._();
+ static final TTSService instance = TTSService._();
+
+ static const String _kServerBase = 'http://localhost:3000';
+
+ // ── Current task state ──
+ bool _isGenerating = false;
+ double _progress = 0.0; // 0.0 ~ 1.0
+ String _statusMessage = '';
+ String? _currentStoryTitle; // Which story is being generated
+
+ // ── Result ──
+ String? _audioUrl;
+ String? _completedStoryTitle; // Which story the audio belongs to
+ bool _justCompleted = false; // Flash animation trigger
+
+ // ── Error ──
+ String? _error;
+
+ // ── Getters ──
+ bool get isGenerating => _isGenerating;
+ double get progress => _progress;
+ String get statusMessage => _statusMessage;
+ String? get currentStoryTitle => _currentStoryTitle;
+ String? get audioUrl => _audioUrl;
+ String? get completedStoryTitle => _completedStoryTitle;
+ bool get justCompleted => _justCompleted;
+ String? get error => _error;
+
+ /// Check if audio is ready for a specific story.
+ bool hasAudioFor(String title) {
+ return _completedStoryTitle == title && _audioUrl != null;
+ }
+
+ /// Check if currently generating for a specific story.
+ bool isGeneratingFor(String title) {
+ return _isGenerating && _currentStoryTitle == title;
+ }
+
+ /// Clear the "just completed" flag (after flash animation plays).
+ void clearJustCompleted() {
+ _justCompleted = false;
+ notifyListeners();
+ }
+
+ /// Set audio URL directly (e.g. from pre-check).
+ void setExistingAudio(String title, String url) {
+ _completedStoryTitle = title;
+ _audioUrl = url;
+ _justCompleted = false;
+ notifyListeners();
+ }
+
+ /// Check server for existing audio file.
+ Future checkExistingAudio(String title) async {
+ if (title.isEmpty) return;
+ try {
+ final resp = await http.get(
+ Uri.parse(
+ '$_kServerBase/api/tts_check?title=${Uri.encodeComponent(title)}',
+ ),
+ );
+ if (resp.statusCode == 200) {
+ final data = jsonDecode(resp.body);
+ if (data['exists'] == true && data['audio_url'] != null) {
+ _completedStoryTitle = title;
+ _audioUrl = '$_kServerBase/${data['audio_url']}';
+ notifyListeners();
+ }
+ }
+ } catch (_) {}
+ }
+
+ /// Start TTS generation. Safe to call even if page navigates away.
+ Future generate({
+ required String title,
+ required String content,
+ }) async {
+ if (_isGenerating) return;
+
+ _isGenerating = true;
+ _progress = 0.0;
+ _statusMessage = '正在连接...';
+ _currentStoryTitle = title;
+ _audioUrl = null;
+ _completedStoryTitle = null;
+ _justCompleted = false;
+ _error = null;
+ notifyListeners();
+
+ try {
+ final client = http.Client();
+ final request = http.Request(
+ 'POST',
+ Uri.parse('$_kServerBase/api/create_tts'),
+ );
+ request.headers['Content-Type'] = 'application/json';
+ request.body = jsonEncode({'title': title, 'content': content});
+
+ final streamed = await client.send(request);
+
+ await for (final chunk in streamed.stream.transform(utf8.decoder)) {
+ for (final line in chunk.split('\n')) {
+ if (!line.startsWith('data: ')) continue;
+ try {
+ final data = jsonDecode(line.substring(6));
+ final stage = data['stage'] as String? ?? '';
+ final message = data['message'] as String? ?? '';
+
+ switch (stage) {
+ case 'connecting':
+ _updateProgress(0.10, '正在连接...');
+ break;
+ case 'generating':
+ _updateProgress(0.30, '语音生成中...');
+ break;
+ case 'saving':
+ _updateProgress(0.88, '正在保存...');
+ break;
+ case 'done':
+ if (data['audio_url'] != null) {
+ _audioUrl = '$_kServerBase/${data['audio_url']}';
+ _completedStoryTitle = title;
+ _justCompleted = true;
+ _updateProgress(1.0, '生成完成');
+ }
+ break;
+ case 'error':
+ throw Exception(message);
+ default:
+ // Progress slowly increases during generation
+ if (_progress < 0.85) {
+ _updateProgress(_progress + 0.02, message);
+ }
+ }
+ } catch (e) {
+ if (e is Exception &&
+ e.toString().contains('语音合成失败')) {
+ rethrow;
+ }
+ }
+ }
+ }
+
+ client.close();
+
+ _isGenerating = false;
+ if (_audioUrl == null) {
+ _error = '未获取到音频';
+ _statusMessage = '生成失败';
+ }
+ notifyListeners();
+ } catch (e) {
+ debugPrint('TTS generation error: $e');
+ _isGenerating = false;
+ _progress = 0.0;
+ _error = e.toString();
+ _statusMessage = '生成失败';
+ _justCompleted = false;
+ notifyListeners();
+ }
+ }
+
+ void _updateProgress(double progress, String message) {
+ _progress = progress.clamp(0.0, 1.0);
+ _statusMessage = message;
+ notifyListeners();
+ }
+
+ /// Reset all state (e.g. when switching stories).
+ void reset() {
+ if (_isGenerating) return; // Don't reset during generation
+ _progress = 0.0;
+ _statusMessage = '';
+ _currentStoryTitle = null;
+ _audioUrl = null;
+ _completedStoryTitle = null;
+ _justCompleted = false;
+ _error = null;
+ notifyListeners();
+ }
+}
diff --git a/airhub_app/lib/widgets/pill_progress_button.dart b/airhub_app/lib/widgets/pill_progress_button.dart
new file mode 100644
index 0000000..f76b51c
--- /dev/null
+++ b/airhub_app/lib/widgets/pill_progress_button.dart
@@ -0,0 +1,335 @@
+import 'dart:math' as math;
+import 'package:flutter/material.dart';
+
+enum TTSButtonState {
+ idle,
+ ready,
+ generating,
+ completed,
+ playing,
+ paused,
+ error,
+}
+
+class PillProgressButton extends StatefulWidget {
+ final TTSButtonState state;
+ final double progress;
+ final VoidCallback? onTap;
+ final double height;
+
+ const PillProgressButton({
+ super.key,
+ required this.state,
+ this.progress = 0.0,
+ this.onTap,
+ this.height = 48,
+ });
+
+ @override
+ State createState() => _PillProgressButtonState();
+}
+
+class _PillProgressButtonState extends State
+ with TickerProviderStateMixin {
+ late AnimationController _progressCtrl;
+ double _displayProgress = 0.0;
+
+ late AnimationController _glowCtrl;
+ late Animation _glowAnim;
+
+ late AnimationController _waveCtrl;
+
+ bool _wasCompleted = false;
+
+ @override
+ void initState() {
+ super.initState();
+
+ _progressCtrl = AnimationController(
+ vsync: this,
+ duration: const Duration(milliseconds: 500),
+ );
+ _progressCtrl.addListener(() => setState(() {}));
+
+ _glowCtrl = AnimationController(
+ vsync: this,
+ duration: const Duration(milliseconds: 1000),
+ );
+ _glowAnim = TweenSequence([
+ TweenSequenceItem(tween: Tween(begin: 0.0, end: 1.0), weight: 35),
+ TweenSequenceItem(tween: Tween(begin: 1.0, end: 0.0), weight: 65),
+ ]).animate(CurvedAnimation(parent: _glowCtrl, curve: Curves.easeOut));
+ _glowCtrl.addListener(() => setState(() {}));
+
+ _waveCtrl = AnimationController(
+ vsync: this,
+ duration: const Duration(milliseconds: 800),
+ );
+
+ _syncAnimations();
+ }
+
+ @override
+ void didUpdateWidget(PillProgressButton oldWidget) {
+ super.didUpdateWidget(oldWidget);
+
+ if (widget.progress != oldWidget.progress) {
+ if (oldWidget.state == TTSButtonState.completed &&
+ (widget.state == TTSButtonState.playing || widget.state == TTSButtonState.ready)) {
+ _displayProgress = 0.0;
+ } else {
+ _animateProgressTo(widget.progress);
+ }
+ }
+
+ if (widget.state == TTSButtonState.completed && !_wasCompleted) {
+ _wasCompleted = true;
+ _glowCtrl.forward(from: 0);
+ } else if (widget.state != TTSButtonState.completed) {
+ _wasCompleted = false;
+ }
+
+ _syncAnimations();
+ }
+
+ void _animateProgressTo(double target) {
+ final from = _displayProgress;
+ _progressCtrl.reset();
+ _progressCtrl.addListener(() {
+ final t = Curves.easeInOut.transform(_progressCtrl.value);
+ _displayProgress = from + (target - from) * t;
+ });
+ _progressCtrl.forward();
+ }
+
+ void _syncAnimations() {
+ if (widget.state == TTSButtonState.generating) {
+ if (!_waveCtrl.isAnimating) _waveCtrl.repeat();
+ } else {
+ if (_waveCtrl.isAnimating) {
+ _waveCtrl.stop();
+ _waveCtrl.value = 0;
+ }
+ }
+ }
+
+ @override
+ void dispose() {
+ _progressCtrl.dispose();
+ _glowCtrl.dispose();
+ _waveCtrl.dispose();
+ super.dispose();
+ }
+
+ bool get _showBorder =>
+ widget.state == TTSButtonState.generating ||
+ widget.state == TTSButtonState.completed ||
+ widget.state == TTSButtonState.playing ||
+ widget.state == TTSButtonState.paused;
+
+ @override
+ Widget build(BuildContext context) {
+ const borderColor = Color(0xFFE5E7EB);
+ const progressColor = Color(0xFFECCFA8);
+ const bgColor = Color(0xCCFFFFFF);
+
+ return GestureDetector(
+ onTap: widget.state == TTSButtonState.generating ? null : widget.onTap,
+ child: Container(
+ height: widget.height,
+ decoration: BoxDecoration(
+ borderRadius: BorderRadius.circular(widget.height / 2),
+ boxShadow: _glowAnim.value > 0
+ ? [
+ BoxShadow(
+ color: progressColor.withOpacity(0.5 * _glowAnim.value),
+ blurRadius: 16 * _glowAnim.value,
+ spreadRadius: 2 * _glowAnim.value,
+ ),
+ ]
+ : null,
+ ),
+ child: CustomPaint(
+ painter: PillBorderPainter(
+ progress: _showBorder ? _displayProgress.clamp(0.0, 1.0) : 0.0,
+ borderColor: borderColor,
+ progressColor: progressColor,
+ radius: widget.height / 2,
+ stroke: _showBorder ? 2.5 : 1.0,
+ bg: bgColor,
+ ),
+ child: Center(child: _buildContent()),
+ ),
+ ),
+ );
+ }
+
+ Widget _buildContent() {
+ switch (widget.state) {
+ case TTSButtonState.idle:
+ return _label(Icons.headphones_rounded, '\u6717\u8bfb');
+ case TTSButtonState.generating:
+ return Row(
+ mainAxisAlignment: MainAxisAlignment.center,
+ children: [
+ AnimatedBuilder(
+ animation: _waveCtrl,
+ builder: (context, _) => CustomPaint(
+ size: const Size(20, 18),
+ painter: WavePainter(t: _waveCtrl.value, color: const Color(0xFFC99672)),
+ ),
+ ),
+ const SizedBox(width: 6),
+ const Text('\u751f\u6210\u4e2d',
+ style: TextStyle(fontSize: 15, fontWeight: FontWeight.w600, color: Color(0xFF4B5563))),
+ ],
+ );
+ case TTSButtonState.ready:
+ return _label(Icons.play_arrow_rounded, '\u64ad\u653e');
+ case TTSButtonState.completed:
+ return _label(Icons.play_arrow_rounded, '\u64ad\u653e');
+ case TTSButtonState.playing:
+ return _label(Icons.pause_rounded, '\u6682\u505c');
+ case TTSButtonState.paused:
+ return _label(Icons.play_arrow_rounded, '\u7ee7\u7eed');
+ case TTSButtonState.error:
+ return _label(Icons.refresh_rounded, '\u91cd\u8bd5', isError: true);
+ }
+ }
+
+ Widget _label(IconData icon, String text, {bool isError = false}) {
+ final c = isError ? const Color(0xFFEF4444) : const Color(0xFF4B5563);
+ return Row(
+ mainAxisAlignment: MainAxisAlignment.center,
+ mainAxisSize: MainAxisSize.min,
+ children: [
+ Icon(icon, size: 20, color: c),
+ const SizedBox(width: 4),
+ Text(text, style: TextStyle(fontSize: 16, fontWeight: FontWeight.w600, color: c)),
+ ],
+ );
+ }
+}
+
+class PillBorderPainter extends CustomPainter {
+ final double progress;
+ final Color borderColor;
+ final Color progressColor;
+ final double radius;
+ final double stroke;
+ final Color bg;
+
+ PillBorderPainter({
+ required this.progress,
+ required this.borderColor,
+ required this.progressColor,
+ required this.radius,
+ required this.stroke,
+ required this.bg,
+ });
+
+ @override
+ void paint(Canvas canvas, Size size) {
+ final r = radius.clamp(0.0, size.height / 2);
+ final rrect = RRect.fromRectAndRadius(
+ Rect.fromLTWH(0, 0, size.width, size.height),
+ Radius.circular(r),
+ );
+
+ canvas.drawRRect(rrect, Paint()
+ ..color = bg
+ ..style = PaintingStyle.fill);
+ canvas.drawRRect(rrect, Paint()
+ ..color = borderColor
+ ..style = PaintingStyle.stroke
+ ..strokeWidth = stroke);
+
+ if (progress <= 0.001) return;
+
+ final straightH = size.width - 2 * r;
+ final halfTop = straightH / 2;
+ final arcLen = math.pi * r;
+ final totalLen = halfTop + arcLen + straightH + arcLen + halfTop;
+ final target = totalLen * progress;
+
+ final path = Path();
+ double done = 0;
+ final cx = size.width / 2;
+
+ path.moveTo(cx, 0);
+ var seg = math.min(halfTop, target - done);
+ path.lineTo(cx + seg, 0);
+ done += seg;
+ if (done >= target) { _drawPath(canvas, path); return; }
+
+ seg = math.min(arcLen, target - done);
+ _traceArc(path, size.width - r, r, r, -math.pi / 2, seg / r);
+ done += seg;
+ if (done >= target) { _drawPath(canvas, path); return; }
+
+ seg = math.min(straightH, target - done);
+ path.lineTo(size.width - r - seg, size.height);
+ done += seg;
+ if (done >= target) { _drawPath(canvas, path); return; }
+
+ seg = math.min(arcLen, target - done);
+ _traceArc(path, r, r, r, math.pi / 2, seg / r);
+ done += seg;
+ if (done >= target) { _drawPath(canvas, path); return; }
+
+ seg = math.min(halfTop, target - done);
+ path.lineTo(r + seg, 0);
+ _drawPath(canvas, path);
+ }
+
+ void _drawPath(Canvas canvas, Path path) {
+ canvas.drawPath(path, Paint()
+ ..color = progressColor
+ ..style = PaintingStyle.stroke
+ ..strokeWidth = stroke
+ ..strokeCap = StrokeCap.round);
+ }
+
+ void _traceArc(Path p, double cx, double cy, double r, double start, double sweep) {
+ const n = 24;
+ final step = sweep / n;
+ for (int i = 0; i <= n; i++) {
+ final a = start + step * i;
+ p.lineTo(cx + r * math.cos(a), cy + r * math.sin(a));
+ }
+ }
+
+ @override
+ bool shouldRepaint(PillBorderPainter old) => old.progress != progress || old.stroke != stroke;
+}
+
+class WavePainter extends CustomPainter {
+ final double t;
+ final Color color;
+ WavePainter({required this.t, required this.color});
+
+ @override
+ void paint(Canvas canvas, Size size) {
+ final paint = Paint()
+ ..color = color
+ ..style = PaintingStyle.fill;
+ final bw = size.width * 0.2;
+ final gap = size.width * 0.1;
+ final tw = 3 * bw + 2 * gap;
+ final sx = (size.width - tw) / 2;
+ for (int i = 0; i < 3; i++) {
+ final phase = t * 2 * math.pi + i * math.pi * 0.7;
+ final hr = 0.3 + 0.7 * ((math.sin(phase) + 1) / 2);
+ final bh = size.height * hr;
+ final x = sx + i * (bw + gap);
+ final y = (size.height - bh) / 2;
+ canvas.drawRRect(
+ RRect.fromRectAndRadius(Rect.fromLTWH(x, y, bw, bh), Radius.circular(bw / 2)),
+ paint,
+ );
+ }
+ }
+
+ @override
+ bool shouldRepaint(WavePainter old) => old.t != t;
+}
\ No newline at end of file
diff --git a/prompts/music_director.md b/prompts/music_director.md
index d06828f..6c63a22 100644
--- a/prompts/music_director.md
+++ b/prompts/music_director.md
@@ -24,8 +24,7 @@
1. **song_title** (歌曲名称)
- 使用**中文**,简短有趣,3-8个字。
- - 体现咔咔的可爱风格。
- - 示例:"温泉咔咔乐"、"草地蹦蹦跳"、"雨夜安眠曲"
+ - 根据用户描述的场景自由发挥,不要套用固定模板。
2. **style** (风格描述)
- 使用**英文**描述音乐风格、乐器、节奏、情绪。
diff --git a/server.py b/server.py
index 0280680..45b7489 100644
--- a/server.py
+++ b/server.py
@@ -2,10 +2,14 @@ import os
import re
import sys
import time
+import uuid
+import struct
+import asyncio
import uvicorn
import requests
import json
-from fastapi import FastAPI, HTTPException
+import websockets
+from fastapi import FastAPI, HTTPException, Query
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
@@ -20,11 +24,15 @@ if sys.platform == "win32":
load_dotenv()
MINIMAX_API_KEY = os.getenv("MINIMAX_API_KEY")
VOLCENGINE_API_KEY = os.getenv("VOLCENGINE_API_KEY")
+TTS_APP_ID = os.getenv("TTS_APP_ID")
+TTS_ACCESS_TOKEN = os.getenv("TTS_ACCESS_TOKEN")
if not MINIMAX_API_KEY:
print("Warning: MINIMAX_API_KEY not found in .env")
if not VOLCENGINE_API_KEY:
print("Warning: VOLCENGINE_API_KEY not found in .env")
+if not TTS_APP_ID or not TTS_ACCESS_TOKEN:
+ print("Warning: TTS_APP_ID or TTS_ACCESS_TOKEN not found in .env")
# Initialize FastAPI
app = FastAPI()
@@ -606,14 +614,244 @@ def get_playlist():
return {"playlist": playlist}
-# ── Static file serving for generated music ──
+# ═══════════════════════════════════════════════════════════════════
+# ── TTS: 豆包语音合成 2.0 WebSocket V3 二进制协议 ──
+# ═══════════════════════════════════════════════════════════════════
+
+TTS_WS_URL = "wss://openspeech.bytedance.com/api/v1/tts/ws_binary"
+TTS_CLUSTER = "volcano_tts"
+TTS_SPEAKER = "ICL_zh_female_keainvsheng_tob"
+
+_audio_dir = os.path.join(os.path.dirname(__file__) or ".", "Capybara audio")
+os.makedirs(_audio_dir, exist_ok=True)
+
+
+def _build_tts_v1_request(payload_json: dict) -> bytes:
+ """Build a V1 full-client-request binary frame.
+ Header: 0x11 0x10 0x10 0x00 (v1, 4-byte header, full-client-request, JSON, no compression)
+ Then 4-byte big-endian payload length, then JSON payload bytes.
+ """
+ payload_bytes = json.dumps(payload_json, ensure_ascii=False).encode("utf-8")
+ header = bytes([0x11, 0x10, 0x10, 0x00])
+ length = struct.pack(">I", len(payload_bytes))
+ return header + length + payload_bytes
+
+
+def _parse_tts_v1_response(data: bytes):
+ """Parse a V1 TTS response binary frame.
+ Returns (audio_bytes_or_none, is_last, is_error, error_msg).
+ """
+ if len(data) < 4:
+ return None, False, True, "Frame too short"
+
+ byte1 = data[1]
+ msg_type = (byte1 >> 4) & 0x0F
+ msg_flags = byte1 & 0x0F
+
+ # Error frame: msg_type = 0xF
+ if msg_type == 0x0F:
+ offset = 4
+ error_code = 0
+ if len(data) >= offset + 4:
+ error_code = struct.unpack(">I", data[offset:offset + 4])[0]
+ offset += 4
+ if len(data) >= offset + 4:
+ msg_len = struct.unpack(">I", data[offset:offset + 4])[0]
+ offset += 4
+ error_msg = data[offset:offset + msg_len].decode("utf-8", errors="replace")
+ else:
+ error_msg = f"error code {error_code}"
+ print(f"[TTS Error] code={error_code}, msg={error_msg}", flush=True)
+ return None, False, True, error_msg
+
+ # Audio-only response: msg_type = 0xB
+ if msg_type == 0x0B:
+ # flags: 0b0000=no seq, 0b0001=seq>0, 0b0010/0b0011=last (seq<0)
+ is_last = (msg_flags & 0x02) != 0 # bit 1 set = last message
+ offset = 4
+
+ # If flags != 0, there's a 4-byte sequence number
+ if msg_flags != 0:
+ offset += 4 # skip sequence number
+
+ if len(data) < offset + 4:
+ return None, is_last, False, ""
+
+ payload_size = struct.unpack(">I", data[offset:offset + 4])[0]
+ offset += 4
+ audio_data = data[offset:offset + payload_size]
+ return audio_data, is_last, False, ""
+
+ # Server response with JSON (msg_type = 0x9): usually contains metadata
+ if msg_type == 0x09:
+ offset = 4
+ if len(data) >= offset + 4:
+ payload_size = struct.unpack(">I", data[offset:offset + 4])[0]
+ offset += 4
+ json_str = data[offset:offset + payload_size].decode("utf-8", errors="replace")
+ print(f"[TTS] Server JSON: {json_str[:200]}", flush=True)
+ return None, False, False, ""
+
+ return None, False, False, ""
+
+
+async def tts_synthesize(text: str) -> bytes:
+ """Connect to Doubao TTS V1 WebSocket and synthesize text to MP3 bytes."""
+ headers = {
+ "Authorization": f"Bearer;{TTS_ACCESS_TOKEN}",
+ }
+
+ payload = {
+ "app": {
+ "appid": TTS_APP_ID,
+ "token": "placeholder",
+ "cluster": TTS_CLUSTER,
+ },
+ "user": {
+ "uid": "airhub_user",
+ },
+ "audio": {
+ "voice_type": TTS_SPEAKER,
+ "encoding": "mp3",
+ "speed_ratio": 1.0,
+ "rate": 24000,
+ },
+ "request": {
+ "reqid": str(uuid.uuid4()),
+ "text": text,
+ "operation": "submit", # streaming mode
+ },
+ }
+
+ audio_buffer = bytearray()
+ request_frame = _build_tts_v1_request(payload)
+
+ print(f"[TTS] Connecting to V1 WebSocket... text length={len(text)}", flush=True)
+
+ async with websockets.connect(
+ TTS_WS_URL,
+ extra_headers=headers,
+ max_size=10 * 1024 * 1024, # 10MB max frame
+ ping_interval=None,
+ ) as ws:
+ # Send request
+ await ws.send(request_frame)
+ print("[TTS] Request sent, waiting for audio...", flush=True)
+
+ # Receive audio chunks
+ chunk_count = 0
+ async for message in ws:
+ if isinstance(message, bytes):
+ audio_data, is_last, is_error, error_msg = _parse_tts_v1_response(message)
+
+ if is_error:
+ raise RuntimeError(f"TTS error: {error_msg}")
+
+ if audio_data and len(audio_data) > 0:
+ audio_buffer.extend(audio_data)
+ chunk_count += 1
+
+ if is_last:
+ print(f"[TTS] Last frame received. chunks={chunk_count}, "
+ f"audio size={len(audio_buffer)} bytes", flush=True)
+ break
+
+ return bytes(audio_buffer)
+
+
+class TTSRequest(BaseModel):
+ title: str
+ content: str
+
+
+@app.get("/api/tts_check")
+def tts_check(title: str = Query(...)):
+ """Check if audio already exists for a story title."""
+ for f in os.listdir(_audio_dir):
+ if f.lower().endswith(".mp3"):
+ # Match by title prefix (before timestamp)
+ name = f[:-4] # strip .mp3
+ name_without_ts = re.sub(r'_\d{10,}$', '', name)
+ if name_without_ts == title or name == title:
+ return {
+ "exists": True,
+ "audio_url": f"Capybara audio/{f}",
+ }
+ return {"exists": False, "audio_url": None}
+
+
+@app.post("/api/create_tts")
+def create_tts(req: TTSRequest):
+ """Generate TTS audio for a story. Returns SSE stream with progress."""
+
+ def event_stream():
+ import asyncio
+
+ yield sse_event({"stage": "connecting", "progress": 10,
+ "message": "正在连接语音合成服务..."})
+
+ # Check if audio already exists
+ for f in os.listdir(_audio_dir):
+ if f.lower().endswith(".mp3"):
+ name = f[:-4]
+ name_without_ts = re.sub(r'_\d{10,}$', '', name)
+ if name_without_ts == req.title:
+ yield sse_event({"stage": "done", "progress": 100,
+ "message": "语音已存在",
+ "audio_url": f"Capybara audio/{f}"})
+ return
+
+ yield sse_event({"stage": "generating", "progress": 30,
+ "message": "AI 正在朗读故事..."})
+
+ try:
+ # Run async TTS in a new event loop
+ loop = asyncio.new_event_loop()
+ audio_bytes = loop.run_until_complete(tts_synthesize(req.content))
+ loop.close()
+
+ if not audio_bytes or len(audio_bytes) < 100:
+ yield sse_event({"stage": "error", "progress": 0,
+ "message": "语音合成返回了空音频"})
+ return
+
+ yield sse_event({"stage": "saving", "progress": 80,
+ "message": "正在保存音频..."})
+
+ # Save audio file
+ timestamp = int(time.time())
+ safe_title = re.sub(r'[<>:"/\\|?*]', '', req.title)[:50]
+ filename = f"{safe_title}_{timestamp}.mp3"
+ filepath = os.path.join(_audio_dir, filename)
+
+ with open(filepath, "wb") as f:
+ f.write(audio_bytes)
+
+ print(f"[TTS Saved] {filepath} ({len(audio_bytes)} bytes)", flush=True)
+
+ yield sse_event({"stage": "done", "progress": 100,
+ "message": "语音生成完成!",
+ "audio_url": f"Capybara audio/{filename}"})
+
+ except Exception as e:
+ print(f"[TTS Error] {e}", flush=True)
+ yield sse_event({"stage": "error", "progress": 0,
+ "message": f"语音合成失败: {str(e)}"})
+
+ return StreamingResponse(event_stream(), media_type="text/event-stream")
+
+
+# ── Static file serving ──
from fastapi.staticfiles import StaticFiles
-# Create music directory if it doesn't exist
+# Music directory
_music_dir = os.path.join(os.path.dirname(__file__) or ".", "Capybara music")
os.makedirs(_music_dir, exist_ok=True)
app.mount("/Capybara music", StaticFiles(directory=_music_dir), name="music_files")
+# Audio directory (TTS generated)
+app.mount("/Capybara audio", StaticFiles(directory=_audio_dir), name="audio_files")
+
if __name__ == "__main__":
print("[Server] Music Server running on http://localhost:3000")
diff --git a/阶段总结/session_progress.md b/阶段总结/session_progress.md
index 852908d..ae4fc88 100644
--- a/阶段总结/session_progress.md
+++ b/阶段总结/session_progress.md
@@ -3,7 +3,7 @@
> **用途**:每次对话结束前 / 做完一个阶段后更新此文件。
> 新对话开始时,AI 先读此文件恢复上下文。
>
-> **最后更新**:2026-02-09 (第八次对话)
+> **最后更新**:2026-02-10 (第九次对话)
---
@@ -155,9 +155,47 @@
- **封面区分**:预设故事显示封面图,AI 生成的故事显示淡紫渐变"暂无封面"占位
- **乱码过滤**:API 层自动跳过无中文标题的异常文件
-### 正在做的
-- TTS 语音合成待后续接入(用户去开通火山语音服务后再做)
+### 第九次对话完成的工作(2026-02-10)
+
+#### TTS 语音合成全链路接入(上次对话完成,此处补记)
+- **后端**:`server.py` 新增 `/api/tts` 接口,WebSocket 流式调用豆包 TTS V1 API
+- **音色**:可爱女生(`ICL_zh_female_keainvsheng_tob`)
+- **前端组件**:`PillProgressButton`(药丸形进度按钮)替代旧 RingProgressButton
+ - 5 种状态:idle / ready / generating / completed / playing / paused / error
+ - 进度环动画 + 音波动效 + 发光效果
+- **TTSService 单例**:后台持续运行,切页面不中断生成
+- **音频保存**:生成的 TTS 音频保存到 `Capybara audio/` 目录
+- **暂停/续播修复**:显式 seek 到暂停位置再 play,解决 Web 端从头播放的 bug
+- **按钮状态修复**:新增 `ready` 状态,未播放过的音频显示"播放"而非"继续"
+- **自动播放控制**:仅在用户停留在故事页时自动播放,切出页面不自动播
+
+#### 音乐总监 Prompt 优化
+- **歌名去重复**:移除固定示例("温泉咔咔乐"等),改为"根据场景自由发挥,不要套用固定模板"
+- **效果**:AI 每次为相似场景生成不同歌名,唱片架不再出现一堆同名歌曲
+
+#### 唱片架播放状态可视化
+- **卡片高亮**:当前播放的歌曲整张卡片变暖金色底 + 金色边框 + 阴影
+- **标题标识**:播放中的歌曲标题前加小喇叭图标 + 金色加粗文字
+- **音波动效**:播放中的唱片中心叠加跳动音波 CustomPaint 动画
+
+#### 气泡持续显示当前歌名
+- 播放期间气泡始终显示"正在播放: xxx",不再 3 秒后消失
+- 直接点播放按钮(非从唱片架选歌)也会显示歌名
+- 暂停时气泡自动隐藏,切歌时自动更新
+- 使用 `_playStickyText` 机制,即使其他临时消息弹出后也会恢复播放信息
+
+#### 调研 AI 音乐生成平台
+- 对比了 MiniMax Music 2.5(现用)、Mureka(昆仑万维)、天谱乐、ACE-Step
+- 发现 Mureka 有中国站 API(platform.mureka.cn),质量评测超越 Suno V4
+- 用户的朋友用的 Muse AI App 底层就是 Mureka 模型
+- MiniMax 文本模型(abab6.5s-chat)价格偏高,可考虑切豆包
+- 歌词生成费用极低(每次约 0.005 元),主要成本在音乐生成(1 元/首)
+
+### 正在做的 / 待办
- 故事封面方案待定(付费生成 or 免费生成)
+- 考虑将音乐生成从 MiniMax 切换到 Mureka(用户在评估中)
+- 考虑将歌词生成的 LLM 从 MiniMax abab6.5s-chat 切到豆包(更便宜)
+- 长歌名 fallback 问题:LLM 返回空 song_title 时用了用户输入原文当歌名,后续可优化
---