diff --git a/API相关/语音合成大模型-单向流式websocket-V3-支持复刻混音mix.md b/API相关/语音合成大模型-单向流式websocket-V3-支持复刻混音mix.md new file mode 100644 index 0000000..fd679ef --- /dev/null +++ b/API相关/语音合成大模型-单向流式websocket-V3-支持复刻混音mix.md @@ -0,0 +1,1205 @@ + +# 1 接口功能 +单向流式API为用户提供文本转语音的能力,支持多语种、多方言,同时支持websocket协议流式输出。 + +## 1.1 最佳实践 +推荐使用链接复用,可降低耗时约70ms左右。 +对比v1单向流式接口,不同的音色优化程度不同,以具体测试结果为准,理论上相对会有几十ms的提升。 + +# 2 接口说明 + +## 2.1 请求Request + +### 请求路径 +`wss://openspeech.bytedance.com/api/v3/tts/unidirectional/stream` + +### 建连&鉴权 + +#### Request Headers + +| | | | | \ +|Key |说明 |是否必须 |Value示例 | +|---|---|---|---| +| | | | | \ +|X-Api-App-Id |\ +| |使用火山引擎控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F) |是 |\ +| | | |your-app-id |\ +| | | | | +| | | | | \ +|X-Api-Access-Key |\ +| |使用火山引擎控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F) |是 |\ +| | | |your-access-key |\ +| | | | | +| | | | | \ +|X-Api-Resource-Id |\ +| |表示调用服务的资源信息 ID |\ +| | |\ +| |* 豆包语音合成模型1.0: |\ +| | * seed-tts-1.0 或者 volc.service_type.10029(字符版) |\ +| | * seed-tts-1.0-concurr 或者 volc.service_type.10048(并发版) |\ +| |* 豆包语音合成模型2.0: |\ +| | * seed-tts-2.0 (字符版) |\ +| |* 声音复刻: |\ +| | * seed-icl-1.0(声音复刻1.0字符版) |\ +| | * seed-icl-1.0-concurr(声音复刻1.0并发版) |\ +| | * seed-icl-2.0 (声音复刻2.0字符版) |\ +| | |\ +| |**注意:** |\ +| | |\ +| |* "豆包语音合成模型1.0"的资源信息ID仅适用于["豆包语音合成模型1.0"的音色](https://www.volcengine.com/docs/6561/1257544) |\ +| |* "豆包语音合成模型2.0"的资源信息ID仅适用于["豆包语音合成模型2.0"的音色](https://www.volcengine.com/docs/6561/1257544) |是 |\ +| | | |* 豆包语音合成模型1.0: |\ +| | | | * seed-tts-1.0 |\ +| | | | * seed-tts-1.0-concurr |\ +| | | |* 豆包语音合成模型2.0: |\ +| | | | * seed-tts-2.0 |\ +| | | |* 声音复刻: |\ +| | | | * seed-icl-1.0(声音复刻1.0字符版) |\ +| | | | * seed-icl-1.0-concurr(声音复刻1.0并发版) |\ +| | | | * seed-icl-2.0 (声音复刻2.0字符版) | +| | | | | \ +|X-Api-Request-Id |标识客户端请求ID,uuid随机字符串 |否 |67ee89ba-7050-4c04-a3d7-ac61a63499b3 | +| | | | | \ +|X-Control-Require-Usage-Tokens-Return |请求消耗的用量返回控制标记。当携带此字段,在SessionFinish事件(152)中会携带用量数据 |否 |* 设置为*,表示返回已支持的用量数据。 |\ +| | | |* 也设置为具体的用量数据标记,如text_words;多个用逗号分隔 |\ +| | | |* 当前已支持的用量数据 |\ +| | | | * text_words,表示计费字符数 | + + +#### Response Headers + +| | | | \ +|Key |说明 |Value示例 | +|---|---|---| +| | | | \ +|X-Tt-Logid |服务端返回的 logid,建议用户获取和打印方便定位问题 |2025041513355271DF5CF1A0AE0508E78C | + + +### WebSocket 二进制协议 +WebSocket 使用二进制协议传输数据。 +协议的组成由至少 4 个字节的可变 header、payload size 和 payload 三部分组成,其中 + +* header 描述消息类型、序列化方式以及压缩格式等信息; +* payload size 是 payload 的长度; +* payload 是具体负载内容,依据消息类型不同 payload 内容不同; + +需注意:协议中整数类型的字段都使用**大端**表示。 + +##### 二进制帧 + +| | | | | \ +|Byte |Left 4-bit |Right 4-bit |说明 | +|---|---|---|---| +| | | | | \ +|0 - Left half |Protocol version | |目前只有v1,始终填0b0001 | +| | | | | \ +|0 - Right half | |Header size (4x) |目前只有4字节,始终填0b0001 | +| | | | | \ +|1 - Left half |Message type | |固定为0b001 | +| | | | | \ +|1 - Right half | |Message type specific flags |在sendText时,为0 |\ +| | | |在finishConnection时,为0b100 | +| | | | | \ +|2 - Left half |Serialization method | |0b0000:Raw(无特殊序列化方式,主要针对二进制音频数据)0b0001:JSON(主要针对文本类型消息) | +| | | | | \ +|2 - Right half | |Compression method |0b0000:无压缩0b0001:gzip | +| | || | \ +|3 |Reserved | |留空(0b0000 0000) | +| | || | \ +|[4 ~ 7] |[Optional field,like event number,...] | |取决于Message type specific flags,可能有、也可能没有 | +| | || | \ +|... |Payload | |可能是音频数据、文本数据、音频文本混合数据 | + + +###### payload请求参数 + +| | | | | | \ +|字段 |描述 |是否必须 |类型 |默认值 | +|---|---|---|---|---| +| | | | | | \ +|user |用户信息 | | | | +| | | | | | \ +|user.uid |用户uid | | | | +| | | | | | \ +|event |请求的事件 | | | | +| | | | | | \ +|namespace |请求方法 | |string |BidirectionalTTS | +| | | | | | \ +|req_params.text |输入文本 | |string | | +| | | | | | \ +|req_params.model |\ +| |模型版本,传`seed-tts-1.1`较默认版本音质有提升,并且延时更优,不传为默认效果。 |\ +| |注:若使用1.1模型效果,在复刻场景中会放大训练音频prompt特质,因此对prompt的要求更高,使用高质量的训练音频,可以获得更优的音质效果。 |\ +| | |\ +| |以下参数仅针对声音复刻2.0的音色生效,即音色ID的前缀为`saturn_`的音色。音色的取值为以下两种: |\ +| | |\ +| |* `seed-tts-2.0-expressive`:表现力较强,支持QA和Cot能力,不过可能存在抽卡的情况。 |\ +| |* `seed-tts-2.0-standard`:表现力上更加稳定,但是不支持QA和Cot能力。如果此时使用QA或Cot能力,则拒绝请求。 |\ +| |* 如果不传model参数,默认使用`seed-tts-2.0-expressive`模型。 | |string |\ +| | | | | | +| | | | | | \ +|req_params.ssml |* 当文本格式是ssml时,需要将文本赋值为ssml,此时文本处理的优先级高于text。ssml和text字段,至少有一个不为空 |\ +| |* ["豆包语音合成模型2.0"的音色](https://www.volcengine.com/docs/6561/1257544) 暂不支持 |\ +| |* 豆包声音复刻模型2.0(icl 2.0)的音色暂不支持 | |string | | +| | | | | | \ +|req_params.speaker |发音人,具体见[发音人列表](https://www.volcengine.com/docs/6561/1257544) |√ |string | | +| | | | | | \ +|req_params.audio_params |音频参数,便于服务节省音频解码耗时 |√ |object | | +| | | | | | \ +|req_params.audio_params.format |音频编码格式,mp3/ogg_opus/pcm。接口传入wav并不会报错,在流式场景下传入wav会多次返回wav header,这种场景建议使用pcm。 | |string |mp3 | +| | | | | | \ +|req_params.audio_params.sample_rate |音频采样率,可选值 [8000,16000,22050,24000,32000,44100,48000] | |number |24000 | +| | | | | | \ +|req_params.audio_params.bit_rate |音频比特率,可传16000、32000等。 |\ +| |bit_rate默认设置范围为64k~160k,传了disable_default_bit_rate为true后可以设置到64k以下 |\ +| |GoLang示例:additions = fmt.Sprintf("{"disable_default_bit_rate":true}") |\ +| |注:bit_rate只针对MP3格式,wav计算比特率跟pcm一样是 比特率 (bps) = 采样率 × 位深度 × 声道数 |\ +| |目前大模型TTS只能改采样率,所以对于wav格式来说只能通过改采样率来变更音频的比特率 | |number | | +| | | | | | \ +|req_params.audio_params.emotion |设置音色的情感。示例:"emotion": "angry" |\ +| |注:当前仅部分音色支持设置情感,且不同音色支持的情感范围存在不同。 |\ +| |详见:[大模型语音合成API-音色列表-多情感音色](https://www.volcengine.com/docs/6561/1257544) | |string | | +| | | | | | \ +|req_params.audio_params.emotion_scale |调用emotion设置情感参数后可使用emotion_scale进一步设置情绪值,范围1~5,不设置时默认值为4。 |\ +| |注:理论上情绪值越大,情感越明显。但情绪值1~5实际为非线性增长,可能存在超过某个值后,情绪增加不明显,例如设置3和5时情绪值可能接近。 | |number |4 | +| | | | | | \ +|req_params.audio_params.speech_rate |语速,取值范围[-50,100],100代表2.0倍速,-50代表0.5倍数 | |number |0 | +| | | | | | \ +|req_params.audio_params.loudness_rate |音量,取值范围[-50,100],100代表2.0倍音量,-50代表0.5倍音量(mix音色暂不支持) | |number |0 | +| | | | | | \ +|req_params.audio_params.enable_timestamp |\ +|([仅TTS1.0支持](https://www.volcengine.com/docs/6561/1257544)) |设置 "enable_timestamp": true 返回句级别字的时间戳(默认为 false,参数传入 true 即表示启用) |\ +| |开启后,在原有返回的事件`event=TTSSentenceEnd`中,新增该子句的时间戳信息。 |\ +| | |\ +| |* 一个子句的时间戳返回之后才会开始返回下一句音频。 |\ +| |* 合成有多个子句会多次返回`TTSSentenceStart`和`TTSSentenceEnd`。开启字幕后字幕跟随`TTSSentenceEnd`返回。 |\ +| |* 字/词粒度的时间戳,其中字/词是tn。具体可以看下面的例子。 |\ +| |* 支持中、英,其他语种、方言暂时不支持。 |\ +| | |\ +| |注:该字段仅适用于["豆包语音合成模型1.0"的音色](https://www.volcengine.com/docs/6561/1257544) | |bool |false | +| | | | | | \ +|req_params.audio_params.enable_subtitle |设置 "enable_subtitle": true 返回句级别字的时间戳(默认为 false,参数传入 true 即表示启用) |\ +| |开启后,新增返回事件`event=TTSSubtitle`,包含字幕信息。 |\ +| | |\ +| |* 在一句音频合成之后,不会立即返回该句的字幕。合成进度不会被字幕识别阻塞,当一句的字幕识别完成后立即返回。可能一个子句的字幕返回的时候,已经返回下一句的音频帧给调用方了。 |\ +| |* 合成有多个子句,仅返回一次`TTSSentenceStart`和`TTSSentenceEnd`。开启字幕后会多次返回`TTSSubtitle`。 |\ +| |* 字/词粒度的时间戳,其中字/词是原文。具体可以看下面的例子。 |\ +| |* 支持中、英,其他语种、方言暂时不支持; |\ +| |* latex公式不支持 |\ +| | * req_params.additions.enable_latex_tn为true时,不开启字幕识别功能,即不返回字幕; |\ +| |* ssml不支持 |\ +| | * req_params.ssml 不传时,不开启字幕识别功能,即不返回字幕; |\ +| | |\ +| |注:该参数只在TTS2.0、ICL2.0生效。 | |bool |false | +| | | | | | \ +|req_params.additions |用户自定义参数 | |jsonstring | | +| | | | | | \ +|req_params.additions.silence_duration |设置该参数可在句尾增加静音时长,范围0~30000ms。(注:增加的句尾静音主要针对传入文本最后的句尾,而非每句话的句尾) | |number |0 | +| | | | | | \ +|req_params.additions.enable_language_detector |自动识别语种 | |bool |false | +| | | | | | \ +|req_params.additions.disable_markdown_filter |是否开启markdown解析过滤, |\ +| |为true时,解析并过滤markdown语法,例如,`**你好**`,会读为“你好”, |\ +| |为false时,不解析不过滤,例如,`**你好**`,会读为“星星‘你好’星星” | |bool |false | +| | | | | | \ +|req_params.additions.disable_emoji_filter |开启emoji表情在文本中不过滤显示,默认为false,建议搭配时间戳参数一起使用。 |\ +| |GoLang示例:`additions = fmt.Sprintf("{"disable_emoji_filter":true}")` | |bool |false | +| | | | | | \ +|req_params.additions.mute_cut_remain_ms |该参数需配合mute_cut_threshold参数一起使用,其中: |\ +| |"mute_cut_threshold": "400", // 静音判断的阈值(音量小于该值时判定为静音) |\ +| |"mute_cut_remain_ms": "50", // 需要保留的静音长度 |\ +| |注:参数和value都为string格式 |\ +| |Golang示例:`additions = fmt.Sprintf("{"mute_cut_threshold":"400", "mute_cut_remain_ms": "1"}")` |\ +| |特别提醒: |\ +| | |\ +| |* 因MP3格式的特殊性,句首始终会存在100ms内的静音无法消除,WAV格式的音频句首静音可全部消除,建议依照自身业务需求综合判断选择 | |string | | +| | | | | | \ +|req_params.additions.enable_latex_tn |是否可以播报latex公式,需将disable_markdown_filter设为true | |bool |false | +| | | | | | \ +|req_params.additions.latex_parser |是否使用lid 能力播报latex公式,相较于latex_tn 效果更好; |\ +| |值为“v2”时支持lid能力解析公式,值为“”时不支持lid; |\ +| |需同时将disable_markdown_filter设为true; | |string | | +| | | | | | \ +|req_params.additions.max_length_to_filter_parenthesis |是否过滤括号内的部分,0为不过滤,100为过滤 | |int |100 | +| | | | | | \ +|req_params.additions.explicit_language(明确语种) |仅读指定语种的文本 |\ +| |**精品音色和 声音复刻 ICL1.0场景:** |\ +| | |\ +| |* 不给定参数,正常中英混 |\ +| |* `crosslingual` 启用多语种前端(包含`zh/en/ja/es-ms/id/pt-br`) |\ +| |* `zh-cn` 中文为主,支持中英混 |\ +| |* `en` 仅英文 |\ +| |* `ja` 仅日文 |\ +| |* `es-mx` 仅墨西 |\ +| |* `id` 仅印尼 |\ +| |* `pt-br` 仅巴葡 |\ +| | |\ +| |**DIT 声音复刻场景:** |\ +| |当音色是使用model_type=2训练的,即采用dit标准版效果时,建议指定明确语种,目前支持: |\ +| | |\ +| |* 不给定参数,启用多语种前端`zh,en,ja,es-mx,id,pt-br,de,fr` |\ +| |* `zh,en,ja,es-mx,id,pt-br,de,fr` 启用多语种前端 |\ +| |* `zh-cn` 中文为主,支持中英混 |\ +| |* `en` 仅英文 |\ +| |* `ja` 仅日文 |\ +| |* `es-mx` 仅墨西 |\ +| |* `id` 仅印尼 |\ +| |* `pt-br` 仅巴葡 |\ +| |* `de` 仅德语 |\ +| |* `fr` 仅法语 |\ +| | |\ +| |当音色是使用model_type=3训练的,即采用dit还原版效果时,必须指定明确语种,目前支持: |\ +| | |\ +| |* 不给定参数,正常中英混 |\ +| |* `zh-cn` 中文为主,支持中英混 |\ +| |* `en` 仅英文 |\ +| | |\ +| |**声音复刻 ICL2.0场景:** |\ +| |当音色是使用model_type=4训练的 |\ +| | |\ +| |* 不给定参数,正常中英混 |\ +| |* `zh-cn` 中文为主,支持中英混 |\ +| |* `en` 仅英文 |\ +| | |\ +| |GoLang示例:`additions = fmt.Sprintf("{"explicit_language": "zh"}")` | |string | | +| | | | | | \ +|req_params.additions.context_language(参考语种) |给模型提供参考的语种 |\ +| | |\ +| |* 不给定 西欧语种采用英语 |\ +| |* id 西欧语种采用印尼 |\ +| |* es 西欧语种采用墨西 |\ +| |* pt 西欧语种采用巴葡 | |string | | +| | | | | | \ +|req_params.additions.unsupported_char_ratio_thresh |默认: 0.3,最大值: 1.0 |\ +| |检测出不支持合成的文本超过设置的比例,则会返回错误。 | |float |0.3 | +| | | | | | \ +|req_params.additions.aigc_watermark |默认:false |\ +| |是否在合成结尾增加音频节奏标识 | |bool |false | +| | | | | | \ +|req_params.additions.aigc_metadata (meta 水印) |在合成音频 header加入元数据隐式表示,支持 mp3/wav/ogg_opus | |object | | +| | | | | | \ +|req_params.additions.aigc_metadata.enable |是否启用隐式水印 | |bool |false | +| | | | | | \ +|req_params.additions.aigc_metadata.content_producer |合成服务提供者的名称或编码 | |string |"" | +| | | | | | \ +|req_params.additions.aigc_metadata.produce_id |内容制作编号 | |string |"" | +| | | | | | \ +|req_params.additions.aigc_metadata.content_propagator |内容传播服务提供者的名称或编码 | |string |"" | +| | | | | | \ +|req_params.additions.aigc_metadata.propagate_id |内容传播编号 | |string |"" | +| | | | | | \ +|req_params.additions.cache_config(缓存相关参数) |开启缓存,开启后合成相同文本时,服务会直接读取缓存返回上一次合成该文本的音频,可明显加快相同文本的合成速率,缓存数据保留时间1小时。 |\ +| |(通过缓存返回的数据不会附带时间戳) |\ +| |Golang示例:`additions = fmt.Sprintf("{"disable_default_bit_rate":true, "cache_config": {"text_type": 1,"use_cache": true}}")` | |object | | +| | | | | | \ +|req_params.additions.cache_config.text_type(缓存相关参数) |和use_cache参数一起使用,需要开启缓存时传1 | |int |1 | +| | | | | | \ +|req_params.additions.cache_config.use_cache(缓存相关参数) |和text_type参数一起使用,需要开启缓存时传true | |bool |true | +| | | | | | \ +|req_params.additions.post_process |后处理配置 |\ +| |Golang示例:`additions = fmt.Sprintf("{"post_process":{"pitch":12}}")` | |object | | +| | | | | | \ +|req_params.additions.post_process.pitch |音调取值范围是[-12,12] | |int |\ +| | | | |0 | +| | | | | | \ +|req_params.additions.context_texts |\ +|([仅TTS2.0支持](https://www.volcengine.com/docs/6561/1257544)) |语音合成的辅助信息,用于模型对话式合成,能更好的体现语音情感; |\ +| |可以探索,比如常见示例有以下几种: |\ +| | |\ +| |1. 语速调整 |\ +| | 1. 比如:context_texts: ["你可以说慢一点吗?"] |\ +| |2. 情绪/语气调整 |\ +| | 1. 比如:context_texts=["你可以用特别特别痛心的语气说话吗?"] |\ +| | 2. 比如:context_texts=["嗯,你的语气再欢乐一点"] |\ +| |3. 音量调整 |\ +| | 1. 比如:context_texts=["你嗓门再小点。"] |\ +| |4. 音感调整 |\ +| | 1. 比如:context_texts=["你能用骄傲的语气来说话吗?"] |\ +| | |\ +| |注意: |\ +| | |\ +| |1. 该字段仅适用于["豆包语音合成模型2.0"的音色](https://www.volcengine.com/docs/6561/1257544) |\ +| |2. 当前字符串列表只第一个值有效 |\ +| |3. 该字段文本不参与计费 | |string list |null | +| | | | | | \ +|req_params.additions.section_id |\ +|([仅TTS2.0支持](https://www.volcengine.com/docs/6561/1257544)) |其他合成语音的会话id(session_id),用于辅助当前语音合成,提供更多的上下文信息; |\ +| |取值,参见接口交互中的session_id |\ +| |示例: |\ +| | |\ +| |1. section_id="bf5b5771-31cd-4f7a-b30c-f4ddcbf2f9da" |\ +| | |\ +| |注意: |\ +| | |\ +| |1. 该字段仅适用于["豆包语音合成模型2.0"的音色](https://www.volcengine.com/docs/6561/1257544) |\ +| |2. 历史上下文的session_id 有效期: |\ +| | 1. 最长30轮 |\ +| | 2. 最长10分钟 | |string |"" | +| | | | | | \ +|req_params.additions.use_tag_parser |是否开启cot解析能力。cot能力可以辅助当前语音合成,对语速、情感等进行调整。 |\ +| |注意: |\ +| | |\ +| |1. 音色支持范围:仅限声音复刻2.0复刻的音色 |\ +| |2. 文本长度:单句的text字符长度最好小于64(cot标签也计算在内) |\ +| |3. cot能力生效的范围是单句 |\ +| | |\ +| |示例: |\ +| |支持单组和多组cot标签:`工作占据了生活的绝大部分,只有去做自己认为伟大的工作,才能获得满足感。不管生活再苦再累,都绝不放弃寻找。` | |bool |false | +| | | | | | \ +|req_params.mix_speaker |混音参数结构 |\ +| |注意: |\ +| | |\ +| |1. 该字段仅适用于["豆包语音合成模型1.0"的音色](https://www.volcengine.com/docs/6561/1257544) | |object | | +| | | | | | \ +|req_params.mix_speaker.speakers |混音音色名以及影响因子列表 |\ +| | |\ +| |1. 最多支持3个音色混音 |\ +| |2. 混音影响因子和必须=1 |\ +| |3. 使用复刻音色时,需要使用查询接口获取的icl_的speakerid,而非S_开头的speakerid |\ +| |4. 音色风格差异较大的两个音色(如男女混),以0.5-0.5同等比例混合时,可能出现偶发跳变,建议尽量避免 |\ +| | |\ +| |注意:使用Mix能力时,req_params.speaker = custom_mix_bigtts | |list |null | +| | | | | | \ +|req_params.mix_speaker.speakers[i].source_speaker |混音源音色名(支持大小模型音色和复刻2.0音色) | |string |"" | +| | | | | | \ +|req_params.mix_speaker.speakers[i].mix_factor |混音源音色名影响因子 | |float |0 | + +单音色请求参数示例: +```JSON +{ + "user": { + "uid": "12345" + }, + "req_params": { + "text": "明朝开国皇帝朱元璋也称这本书为,万物之根", + "speaker": "zh_female_shuangkuaisisi_moon_bigtts", + "audio_params": { + "format": "mp3", + "sample_rate": 24000 + }, + } + } +} +``` + +mix请求参数示例: +```JSON +{ + "user": { + "uid": "12345" + }, + "req_params": { + "text": "明朝开国皇帝朱元璋也称这本书为万物之根", + "speaker": "custom_mix_bigtts", + "audio_params": { + "format": "mp3", + "sample_rate": 24000 + }, + "mix_speaker": { + "speakers": [{ + "source_speaker": "zh_male_bvlazysheep", + "mix_factor": 0.3 + }, { + "source_speaker": "BV120_streaming", + "mix_factor": 0.3 + }, { + "source_speaker": "zh_male_ahu_conversation_wvae_bigtts", + "mix_factor": 0.4 + }] + } + } +} +``` + + +## 2.2 响应Response + +### 建连响应 +主要关注建连阶段 HTTP Response 的状态码和 Body + +* 建连成功:状态码为 200 +* 建连失败:状态码不为 200,Body 中提供错误原因说明 + + +### WebSocket 传输响应 + +#### 二进制帧 - 正常响应帧 + +| | | | | \ +|Byte |Left 4-bit |Right 4-bit |说明 | +|---|---|---|---| +| | | | | \ +|0 - Left half |Protocol version | |目前只有v1,始终填0b0001 | +| | | | | \ +|0 - Right half | |Header size (4x) |目前只有4字节,始终填0b0001 | +| | | | | \ +|1 - Left half |Message type | |音频帧返回:0b1011 |\ +| | | |其他帧返回:0b1001 | +| | | | | \ +|1 - Right half | |Message type specific flags |固定为0b0100 | +| | | | | \ +|2 - Left half |Serialization method | |0b0000:Raw(无特殊序列化方式,主要针对二进制音频数据)0b0001:JSON(主要针对文本类型消息) | +| | | | | \ +|2 - Right half | |Compression method |0b0000:无压缩0b0001:gzip | +| | || | \ +|3 |Reserved | |留空(0b0000 0000) | +| | || | \ +|[4 ~ 7] |[Optional field,like event number,...] |\ +| | | |取决于Message type specific flags,可能有、也可能没有 | +| | || | \ +|... |Payload | |可能是音频数据、文本数据、音频文本混合数据 | + + +##### payload响应参数 + +| | | | \ +|字段 |描述 |类型 | +|---|---|---| +| | | | \ +|data |返回的二进制数据包 |[]byte | +| | | | \ +|event |返回的事件类型 |number | +| | | | \ +|res_params.text |经文本分句后的句子 |string | + + +#### 二进制帧 - 错误响应帧 + +| | | | | \ +|Byte |Left 4-bit |Right 4-bit |说明 | +|---|---|---|---| +| | | | | \ +|0 - Left half |Protocol version | |目前只有v1,始终填0b0001 | +| | | | | \ +|0 - Right half | |Header size (4x) |目前只有4字节,始终填0b0001 | +| | | | | \ +|1 |Message type |Message type specific flags |0b11110000 | +| | | | | \ +|2 - Left half |Serialization method | |0b0000:Raw(无特殊序列化方式,主要针对二进制音频数据)0b0001:JSON(主要针对文本类型消息) | +| | | | | \ +|2 - Right half | |Compression method |0b0000:无压缩0b0001:gzip | +| | || | \ +|3 |Reserved | |留空(0b0000 0000) | +| | || | \ +|[4 ~ 7] |Error code | |错误码 | +| | || | \ +|... |Payload | |错误消息对象 | + + +## 2.3 event定义 +在发送文本转TTS阶段,不需要客户端发送上行的event帧。event类型如下: + +| | | | | \ +|Event code |含义 |事件类型 |应用阶段:上行/下行 | +|---|---|---|---| +| | | | | \ +|152 |SessionFinished,会话已结束(上行&下行) |\ +| |标识语音一个完整的语音合成完成 |Session 类 |下行 | +| | | | | \ +|350 |TTSSentenceStart,TTS 返回句内容开始 |数据类 |下行 | +| | | | | \ +|351 |TTSSentenceEnd,TTS 返回句内容结束 |数据类 |下行 | +| | | | | \ +|352 |TTSResponse,TTS 返回句的音频内容 |数据类 |下行 | + +在关闭连接阶段,需要客户端传递上行event帧去关闭连接。event类型如下: + +| | | | | \ +|Event code |含义 |事件类型 |应用阶段:上行/下行 | +|---|---|---|---| +| | | | | \ +|2 |FinishConnection,结束连接 |Connect 类 |上行 | +| | | | | \ +|52 |ConnectionFinished 结束连接成功 |Connect 类 |下行 | + +交互示例: +![Image](https://p9-arcosite.byteimg.com/tos-cn-i-goo7wpa0wc/a9005d7ddd564ad79ad6dda9699a4a65~tplv-goo7wpa0wc-image.image =419x) + +## 2.4 不同类型帧举例说明 + +### SendText + +#### 请求Request + +| | | | || \ +|Byte |Left 4-bit |Right 4-bit |说明 | | +|---|---|---|---|---| +| | | | | | \ +|0 |0001 |0001 |v1 |4-byte header | +| | | | | | \ +|1 |0001 |0000 |Full-client request |with no event number | +| | | | | | \ +|2 |0001 |0000 |JSON |no compression | +| | | | | | \ +|3 |0000 |0000 | | | +| | || || \ +|4 ~ 7 |uint32(...) | |len(payload_json) | | +| | || || \ +|8 ~ ... |\ +| |{...} |\ +| | | |文本 |\ +| | | | | | + +payload +```JSON +{ + "user": { + "uid": "12345" + }, + "req_params": { + "text": "明朝开国皇帝朱元璋也称这本书为,万物之根", + "speaker": "zh_female_shuangkuaisisi_moon_bigtts", + "audio_params": { + "format": "mp3", + "sample_rate": 24000 + }, + } + } +} +``` + + +#### 响应Response + +##### TTSSentenceStart + +| | | | || \ +|Byte |Left 4-bit |Right 4-bit |说明 | | +|---|---|---|---|---| +| | | | | | \ +|0 |0001 |0001 |v1 |4-byte header | +| | | | | | \ +|1 |1001 |0100 |Full-client request |with event number | +| | | | | | \ +|2 |0001 |0000 |JSON |no compression | +| | | | | | \ +|3 |0000 |0000 | | | +| | || || \ +|4 ~ 7 |TTSSentenceStart | |event type | | +| | || || \ +|8 ~ 11 |uint32(12) | |len() | | +| | || || \ +|12 ~ 23 |nxckjoejnkegf | |session_id | | +| | || || \ +|24 ~ 27 |uint32( ...) | |len(text_binary) | | +| | || || \ +|28 ~ ... |\ +| |{...} | |text_binary | | + + +##### TTSResponse + +| | | | || \ +|Byte |Left 4-bit |Right 4-bit |说明 | | +|---|---|---|---|---| +| | | | | | \ +|0 |0001 |0001 |v1 |4-byte header | +| | | | | | \ +|1 |1011 |0100 |Audio-only response |with event number | +| | | | | | \ +|2 |0001 |0000 |JSON |no compression | +| | | | | | \ +|3 |0000 |0000 | | | +| | || | | \ +|4 ~ 7 |TTSResponse | |event type | | +| | || | | \ +|8 ~ 11 |uint32(12) | |len() | | +| | || | | \ +|12 ~ 23 |nxckjoejnkegf | |session_id | | +| | || | | \ +|24 ~ 27 |uint32( ...) | |len(audio_binary) | | +| | || | | \ +|28 ~ ... |{...} |\ +| | | |audio_binary |\ +| | | | | | + + +##### TTSSentenceEnd + +| | | | || \ +|Byte |Left 4-bit |Right 4-bit |说明 | | +|---|---|---|---|---| +| | | | | | \ +|0 |0001 |0001 |v1 |4-byte header | +| | | | | | \ +|1 |1001 |0100 |Full-client request |with event number | +| | | | | | \ +|2 |0001 |0000 |JSON |no compression | +| | | | | | \ +|3 |0000 |0000 | | | +| | || || \ +|4 ~ 7 |TTSSentenceEnd | |event type | | +| | || || \ +|8 ~ 11 |uint32(12) | |len() | | +| | || || \ +|12 ~ 23 |nxckjoejnkegf | |session_id | | +| | || || \ +|24 ~ 27 |uint32( ...) | |len(payload) | | +| | || || \ +|28 ~ ... |{...} |\ +| | | |payload |\ +| | | | | | + + +##### SessionFinished + +| | | | || \ +|Byte |Left 4-bit |Right 4-bit |说明 | | +|---|---|---|---|---| +| | | | | | \ +|0 |0001 |0001 |v1 |4-byte header | +| | | | | | \ +|1 |1001 |0100 |Full-client request |with event number | +| | | | | | \ +|2 |0001 |0000 |JSON |no compression | +| | | | | | \ +|3 |0000 |0000 | | | +| | || | | \ +|4 ~ 7 |SessionFinished | |event type | | +| | || || \ +|8 ~ 11 |uint32(12) | |len() | | +| | || || \ +|12 ~ 23 |nxckjoejnkegf | |session_id | | +| | || || \ +|24 ~ 27 |uint32( ...) | |len(response_meta_json) | | +| | || || \ +|28 ~ ... |{ |\ +| | "status_code": 20000000, |\ +| | "message": "ok", |\ +| |"usage": { |\ +| | "text_words":4 |\ +| | } |\ +| |} |\ +| | | |response_meta_json |\ +| | | | |\ +| | | |* 仅含status_code和message字段 |\ +| | | |* usage仅当header中携带X-Control-Require-Usage-Tokens-Return存在 | | + + +#### FinishConnection + +##### 请求request + +| | | | || \ +|Byte |Left 4-bit |Right 4-bit |说明 | | +|---|---|---|---|---| +| | | | | | \ +|0 |0001 |0001 |v1 |4-byte header | +| | | | | | \ +|1 |0001 |0100 |Full-client request |with event number | +| | | | | | \ +|2 |0001 |0000 |JSON |no compression | +| | | | | | \ +|3 |0000 |0000 | | | +| | || || \ +|4-7 |uint32(...) | |len(payload_json) | | +| | || || \ +|8 ~ ... |\ +| |{...} |\ +| | | |payload_json |\ +| | | |扩展保留,暂留空JSON | | + + +##### 响应response + +| | | | || \ +|Byte |Left 4-bit |Right 4-bit |说明 | | +|---|---|---|---|---| +| | | | | | \ +|0 |0001 |0001 |v1 |4-byte header | +| | | | | | \ +|1 |1001 |0100 |Full-client request |with event number | +| | | | | | \ +|2 |0001 |0000 |JSON |no compression | +| | | | | | \ +|3 |0000 |0000 | | | +| | || || \ +|4 ~ 7 |ConnectionFinished | |event type | | +| | || || \ +|8 ~ 11 |uint32(7) | |len() | | +| | || || \ +|12 ~ 15 |uint32(58) | |len() | | +| | || || \ +|28 ~ ... |{ |\ +| | "status_code": 20000000, |\ +| | "message": "ok" |\ +| |} | |response_meta_json |\ +| | | | |\ +| | | |* 仅含status_code和message字段 |\ +| | | | |\ +| | | | | | + + +## 2.5 时间戳句子格式说明 + +| | | | \ +| |\ +| |\ +|# |**TTS1.0** |\ +| |**ICL1.0** |**TTS2.0** |\ +| | |**ICL2.0** | +|---|---|---| +| | | | \ +|事件交互区别 |合成有多个子句会多次返回`TTSSentenceStart`和`TTSSentenceEnd`。开启字幕后字幕跟随`TTSSentenceEnd`返回。 |合成有多个子句,仅返回一次`TTSSentenceStart`和`TTSSentenceEnd`。 |\ +| | |开启字幕后会多次返回`TTSSubtitle`。 | +| | | | \ +|返回时机 |一个子句的时间戳返回之后才会开始返回下一句音频。 |\ +| | |在一句音频合成之后,不会立即返回该句的字幕。 |\ +| | |合成进度不会被字幕识别阻塞,当一句的字幕识别完成后立即返回。 |\ +| | |可能一个子句的字幕返回的时候,已经返回下一句的音频帧给调用方了。 | +| | | | \ +|句子返回格式 |\ +| |字幕信息是基于tn打轴 |\ +| |:::tip |\ +| |1. text字段对应于:原文 |\ +| |2. words内文本字段对应于:tn |\ +| |::: |\ +| |第一句: |\ +| |```JSON |\ +| |{ |\ +| | "phonemes": [ |\ +| | ], |\ +| | "text": "2019年1月8日,软件2.0版本于格萨拉彝族乡应时而生。发布会当日,一场瑞雪将天地映衬得纯净无瑕。", |\ +| | "words": [ |\ +| | { |\ +| | "confidence": 0.8766515, |\ +| | "endTime": 0.295, |\ +| | "startTime": 0.155, |\ +| | "word": "二" |\ +| | }, |\ +| | { |\ +| | "confidence": 0.95224416, |\ +| | "endTime": 0.425, |\ +| | "startTime": 0.295, |\ +| | "word": "零" |\ +| | }, |\ +| | { |\ +| | "confidence": 0.9108828, |\ +| | "endTime": 0.575, |\ +| | "startTime": 0.425, |\ +| | "word": "一" |\ +| | }, |\ +| | { |\ +| | "confidence": 0.9609025, |\ +| | "endTime": 0.755, |\ +| | "startTime": 0.575, |\ +| | "word": "九" |\ +| | }, |\ +| | { |\ +| | "confidence": 0.96244556, |\ +| | "endTime": 1.005, |\ +| | "startTime": 0.755, |\ +| | "word": "年" |\ +| | }, |\ +| | { |\ +| | "confidence": 0.85796577, |\ +| | "endTime": 1.155, |\ +| | "startTime": 1.005, |\ +| | "word": "一" |\ +| | }, |\ +| | { |\ +| | "confidence": 0.8460129, |\ +| | "endTime": 1.275, |\ +| | "startTime": 1.155, |\ +| | "word": "月" |\ +| | }, |\ +| | { |\ +| | "confidence": 0.90833753, |\ +| | "endTime": 1.505, |\ +| | "startTime": 1.275, |\ +| | "word": "八" |\ +| | }, |\ +| | { |\ +| | "confidence": 0.9403977, |\ +| | "endTime": 1.935, |\ +| | "startTime": 1.505, |\ +| | "word": "日," |\ +| | }, |\ +| | |\ +| | ... |\ +| | |\ +| | { |\ +| | "confidence": 0.9415791, |\ +| | "endTime": 10.505, |\ +| | "startTime": 10.355, |\ +| | "word": "无" |\ +| | }, |\ +| | { |\ +| | "confidence": 0.903162, |\ +| | "endTime": 10.895, // 第一句结束时间 |\ +| | "startTime": 10.505, |\ +| | "word": "瑕。" |\ +| | } |\ +| | ] |\ +| |} |\ +| |``` |\ +| | |\ +| |第二句: |\ +| |```JSON |\ +| |{ |\ +| | "phonemes": [ |\ +| | |\ +| | ], |\ +| | "text": "这仿佛一则自然寓言:我们致力于在不断的版本迭代中,为您带来如雪后初霁般清晰、焕然一新的体验。", |\ +| | "words": [ |\ +| | { |\ +| | "confidence": 0.8970245, |\ +| | "endTime": 11.6953745, |\ +| | "startTime": 11.535375, // 第二句开始时间,是相对整个session的位置 |\ +| | "word": "这" |\ +| | }, |\ +| | { |\ +| | "confidence": 0.86508185, |\ +| | "endTime": 11.875375, |\ +| | "startTime": 11.6953745, |\ +| | "word": "仿" |\ +| | }, |\ +| | { |\ +| | "confidence": 0.73354065, |\ +| | "endTime": 12.095375, |\ +| | "startTime": 11.875375, |\ +| | "word": "佛" |\ +| | }, |\ +| | { |\ +| | "confidence": 0.8525295, |\ +| | "endTime": 12.275374, |\ +| | "startTime": 12.095375, |\ +| | "word": "一" |\ +| | }... |\ +| | ] |\ +| |} |\ +| |``` |\ +| | |字幕信息是基于原文打轴 |\ +| | |:::tip |\ +| | |1. text字段对应于:原文 |\ +| | |2. words内文本字段对应于:原文 |\ +| | |::: |\ +| | |第一句: |\ +| | |```JSON |\ +| | |{ |\ +| | | "phonemes": [ |\ +| | | ], |\ +| | | "text": "2019年1月8日,软件2.0版本于格萨拉彝族乡应时而生。", |\ +| | | "words": [ |\ +| | | { |\ +| | | "confidence": 0.11120544, |\ +| | | "endTime": 0.615, |\ +| | | "startTime": 0.585, |\ +| | | "word": "2019" |\ +| | | }, |\ +| | | { |\ +| | | "confidence": 0.8413397, |\ +| | | "endTime": 0.845, |\ +| | | "startTime": 0.615, |\ +| | | "word": "年" |\ +| | | }, |\ +| | | { |\ +| | | "confidence": 0.2413961, |\ +| | | "endTime": 0.875, |\ +| | | "startTime": 0.845, |\ +| | | "word": "1" |\ +| | | }, |\ +| | | { |\ +| | | "confidence": 0.8487973, |\ +| | | "endTime": 1.055, |\ +| | | "startTime": 0.875, |\ +| | | "word": "月" |\ +| | | }, |\ +| | | { |\ +| | | "confidence": 0.509697, |\ +| | | "endTime": 1.225, |\ +| | | "startTime": 1.165, |\ +| | | "word": "8" |\ +| | | }, |\ +| | | { |\ +| | | "confidence": 0.9516253, |\ +| | | "endTime": 1.485, |\ +| | | "startTime": 1.225, |\ +| | | "word": "日," |\ +| | | }, |\ +| | | |\ +| | | ... |\ +| | | |\ +| | | { |\ +| | | "confidence": 0.6933777, |\ +| | | "endTime": 5.435, |\ +| | | "startTime": 5.325, |\ +| | | "word": "而" |\ +| | | }, |\ +| | | { |\ +| | | "confidence": 0.921702, |\ +| | | "endTime": 5.695, // 第一句结束时间 |\ +| | | "startTime": 5.435, |\ +| | | "word": "生。" |\ +| | | } |\ +| | | ] |\ +| | |} |\ +| | |``` |\ +| | | |\ +| | | |\ +| | |第二句: |\ +| | |```JSON |\ +| | |{ |\ +| | | "phonemes": [ |\ +| | | |\ +| | | ], |\ +| | | "text": "发布会当日,一场瑞雪将天地映衬得纯净无瑕。", |\ +| | | "words": [ |\ +| | | { |\ +| | | "confidence": 0.7016578, |\ +| | | "endTime": 6.3550415, |\ +| | | "startTime": 6.2150416, // 第二句开始时间,是相对整个session的位置 |\ +| | | "word": "发" |\ +| | | }, |\ +| | | { |\ +| | | "confidence": 0.6800497, |\ +| | | "endTime": 6.4450417, |\ +| | | "startTime": 6.3550415, |\ +| | | "word": "布" |\ +| | | }, |\ +| | | |\ +| | | ... |\ +| | | |\ +| | | { |\ +| | | "confidence": 0.8818264, |\ +| | | "endTime": 10.145041, |\ +| | | "startTime": 9.945042, |\ +| | | "word": "净" |\ +| | | }, |\ +| | | { |\ +| | | "confidence": 0.87248623, |\ +| | | "endTime": 10.285042, |\ +| | | "startTime": 10.145041, |\ +| | | "word": "无" |\ +| | | }, |\ +| | | { |\ +| | | "confidence": 0.8069703, |\ +| | | "endTime": 10.505041, |\ +| | | "startTime": 10.285042, |\ +| | | "word": "瑕。" |\ +| | | } |\ +| | | ] |\ +| | |} |\ +| | |``` |\ +| | | |\ +| | | | +| | | | \ +|语种 |中、英,不支持小语种、方言 |中、英,不支持小语种、方言 | +| | | | \ +|latex |enable_latex_tn=true,有字幕返回 |enable_latex_tn=true,无字幕返回,接口不报错 | +| | | | \ +|ssml |req_params.ssml不为空,有字幕返回 |req_params.ssml不为空,无字幕返回,接口不报错 | + + +# 3 错误码 + +| | | | \ +|Code |Message |说明 | +|---|---|---| +| | | | \ +|20000000 |ok |音频合成结束的成功状态码 | +| | | | \ +|45000000 |\ +| |speaker permission denied: get resource id: access denied |音色鉴权失败,一般是speaker指定音色未授权或者错误导致 |\ +| | | | +|^^| | | \ +| |quota exceeded for types: concurrency |并发限流,一般是请求并发数超过限制 | +| | | | \ +|55000000 |服务端一些error |服务端通用错误 | + + +# 4 调用示例 + +```mixin-react +return ( + +### 前提条件 + +* 调用之前,您需要获取以下信息: + * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。 + + +### Python环境 + +* Python:3.9版本及以上。 +* Pip:25.1.1版本及以上。您可以使用下面命令安装。 + +\`\`\`Bash +python3 -m pip install --upgrade pip +\`\`\` + + +### 下载代码示例 + + +### 解压缩代码包,安装依赖 +\`\`\`Bash +mkdir -p volcengine_unidirectional_stream_demo +tar xvzf volcengine_unidirectional_stream_demo.tar.gz -C ./volcengine_unidirectional_stream_demo +cd volcengine_unidirectional_stream_demo +python3 -m venv .venv +source .venv/bin/activate +python3 -m pip install --upgrade pip +pip3 install -e . +\`\`\` + + +### 发起调用 +> \`\`替换为您的APP ID。 +> \`\`替换为您的Access Token。 +> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。 + +\`\`\`Bash +python3 examples/volcengine/unidirectional_stream.py --appid --access_token --voice_type --text "你好,我是火山引擎的语音合成服务。这是一个美好的旅程。" +\`\`\` + +`}> + +### 前提条件 + +* 调用之前,您需要获取以下信息: + * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。 + + +### Java环境 + +* Java:21版本及以上。 +* Maven:3.9.10版本及以上。 + + +### 下载代码示例 + + +### 解压缩代码包,安装依赖 +\`\`\`Bash +mkdir -p volcengine_unidirectional_stream_demo +tar xvzf volcengine_unidirectional_stream_demo.tar.gz -C ./volcengine_unidirectional_stream_demo +cd volcengine_unidirectional_stream_demo +\`\`\` + + +### 发起调用 +> \`\`替换为您的APP ID。 +> \`\`替换为您的Access Token。 +> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。 + +\`\`\`Bash +mvn compile exec:java -Dexec.mainClass=com.speech.volcengine.UnidirectionalStream -DappId= -DaccessToken= -Dvoice= -Dtext="**你好**,我是豆包语音助手,很高兴认识你。这是一个愉快的旅程。" +\`\`\` + +`}> + +### 前提条件 + +* 调用之前,您需要获取以下信息: + * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。 + + +### Go环境 + +* Go:1.21.0版本及以上。 + + +### 下载代码示例 + + +### 解压缩代码包,安装依赖 +\`\`\`Bash +mkdir -p volcengine_unidirectional_stream_demo +tar xvzf volcengine_unidirectional_stream_demo.tar.gz -C ./volcengine_unidirectional_stream_demo +cd volcengine_unidirectional_stream_demo +\`\`\` + + +### 发起调用 +> \`\`替换为您的APP ID。 +> \`\`替换为您的Access Token。 +> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。 + +\`\`\`Bash +go run volcengine/unidirectional_stream/main.go --appid --access_token --voice_type --text "**你好**,我是火山引擎的语音合成服务。" +\`\`\` + +`}> + +### 前提条件 + +* 调用之前,您需要获取以下信息: + * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。 + + +### C#环境 + +* .Net 9.0版本。 + + +### 下载代码示例 + + +### 解压缩代码包,安装依赖 +\`\`\`Bash +mkdir -p volcengine_unidirectional_stream_demo +tar xvzf volcengine_unidirectional_stream_demo.tar.gz -C ./volcengine_unidirectional_stream_demo +cd volcengine_unidirectional_stream_demo +\`\`\` + + +### 发起调用 +> \`\`替换为您的APP ID。 +> \`\`替换为您的Access Token。 +> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。 + +\`\`\`Bash +dotnet run --project Volcengine/UnidirectionalStream/Volcengine.Speech.UnidirectionalStream.csproj -- --appid --access_token --voice_type --text "**你好**,这是一个测试文本。我们正在测试文本转语音功能。" +\`\`\` + +`}> + +### 前提条件 + +* 调用之前,您需要获取以下信息: + * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。 + + +### node环境 + +* node:v24.0版本及以上。 + + +### 下载代码示例 + + +### 解压缩代码包,安装依赖 +\`\`\`Bash +mkdir -p volcengine_unidirectional_stream_demo +tar xvzf volcengine_unidirectional_stream_demo.tar.gz -C ./volcengine_unidirectional_stream_demo +cd volcengine_unidirectional_stream_demo +npm install +npm install -g typescript +npm install -g ts-node +\`\`\` + + +### 发起调用 +> \`\`替换为您的APP ID。 +> \`\`替换为您的Access Token。 +> \`\`替换为您预期使用的音色ID,例如\`\`。 + +\`\`\`Bash +npx ts-node src/volcengine/unidirectional_stream.ts --appid --access_token --voice_type --text "**你好**,我是火山引擎的语音合成服务。" +\`\`\` + +`}>); + ``` + + diff --git a/API相关/语音合成大模型音色列表.md b/API相关/语音合成大模型音色列表.md new file mode 100644 index 0000000..e69de29 diff --git a/API相关/豆包大模型语音合成API.md b/API相关/豆包大模型语音合成API.md new file mode 100644 index 0000000..3797797 --- /dev/null +++ b/API相关/豆包大模型语音合成API.md @@ -0,0 +1,627 @@ + + +# Websocket +> 使用账号申请部分申请到的 appid&access_token 进行调用 +> 文本一次性送入,后端边合成边返回音频数据 + + +## 1. 接口说明 +> V1: +> **wss://openspeech.bytedance.com/api/v1/tts/ws_binary (V1 单向流式)** +> **https://openspeech.bytedance.com/api/v1/tts (V1 http非流式)** +> V3: +> **wss://openspeech.bytedance.com/api/v3/tts/unidirectional/stream (V3 wss单向流式)** +> [V3 websocket单向流式文档](https://www.volcengine.com/docs/6561/1719100) +> **wss://openspeech.bytedance.com/api/v3/tts/bidirection (V3 wss双向流式)** +> [V3 websocket双向流式文档](https://www.volcengine.com/docs/6561/1329505) +> **https://openspeech.bytedance.com/api/v3/tts/unidirectional (V3 http单向流式)** +> [V3 http单向流式文档](https://www.volcengine.com/docs/6561/1598757) + +:::warning +大模型音色都推荐接入V3接口,时延上的表现会更好 +::: + +## 2. 身份认证 +认证方式使用 Bearer Token,在请求的 header 中加上`"Authorization": "Bearer; {token}"`,并在请求的 json 中填入对应的 appid。 +:::warning +Bearer 和 token 使用分号 ; 分隔,替换时请勿保留{} +::: +AppID/Token/Cluster 等信息可参考 [控制台使用FAQ-Q1](/docs/6561/196768#q1:哪里可以获取到以下参数appid,cluster,token,authorization-type,secret-key-?) + +## 3. 请求方式 + +### 3.1 二进制协议 + +#### 报文格式(Message format) +![Image](https://lf3-volc-editor.volccdn.com/obj/volcfe/sop-public/upload_cc1c1cdd61bf29f5bde066dc693dcb2b.png =1816x) +所有字段以 [Big Endian(大端序)](https://zh.wikipedia.org/wiki/%E5%AD%97%E8%8A%82%E5%BA%8F#%E5%A4%A7%E7%AB%AF%E5%BA%8F) 的方式存储。 +**字段描述** + +| | | | \ +|字段 Field (大小, 单位 bit) |描述 Description |值 Values | +|---|---|---| +| | | | \ +|协议版本(Protocol version) (4) |可能会在将来使用不同的协议版本,所以这个字段是为了让客户端和服务器在版本上保持一致。 |`0b0001` - 版本 1 (目前只有版本 1) | +| | | | \ +|报头大小(Header size) (4) |header 实际大小是 `header size value x 4` bytes. |\ +| |这里有个特殊值 `0b1111` 表示 header 大小大于或等于 60(15 x 4 bytes),也就是会存在 header extension 字段。 |`0b0001` - 报头大小 = 4 (1 x 4) |\ +| | |`0b0010` - 报头大小 = 8 (2 x 4) |\ +| | |`0b1010` - 报头大小 = 40 (10 x 4) |\ +| | |`0b1110` - 报头大小 = 56 (14 x 4) |\ +| | |`0b1111` - 报头大小为 60 或更大; 实际大小在 header extension 中定义 | +| | | | \ +|消息类型(Message type) (4) |定义消息类型。 |`0b0001` - full client request. |\ +| | |`~~0b1001~~` ~~- full server response(弃用).~~ |\ +| | |`0b1011` - Audio-only server response (ACK). |\ +| | |`0b1111` - Error message from server (例如错误的消息类型,不支持的序列化方法等等) | +| | | | \ +|Message type specific flags (4) |flags 含义取决于消息类型。 |\ +| |具体内容请看消息类型小节. | | +| | | | \ +|序列化方法(Message serialization method) (4) |定义序列化 payload 的方法。 |\ +| |注意:它只对某些特定的消息类型有意义 (例如 Audio-only server response `0b1011` 就不需要序列化). |`0b0000` - 无序列化 (raw bytes) |\ +| | |`0b0001` - JSON |\ +| | |`0b1111` - 自定义类型, 在 header extension 中定义 | +| | | | \ +|压缩方法(Message Compression) (4) |定义 payload 的压缩方法。 |\ +| |Payload size 字段不压缩(如果有的话,取决于消息类型),而且 Payload size 指的是 payload 压缩后的大小。 |\ +| |Header 不压缩。 |`0b0000` - 无压缩 |\ +| | |`0b0001` - gzip |\ +| | |`0b1111` - 自定义压缩方法, 在 header extension 中定义 | +| | | | \ +|保留字段(Reserved) (8) |保留字段,同时作为边界 (使整个报头大小为 4 个字节). |`0x00` - 目前只有 0 | + + +#### 消息类型详细说明 +目前所有 TTS websocket 请求都使用 full client request 格式,无论"query"还是"submit"。 + +#### Full client request + +* Header size为`b0001`(即 4B,没有 header extension)。 +* Message type为`b0001`. +* Message type specific flags 固定为`b0000`. +* Message serialization method为`b0001`JSON。字段参考上方表格。 +* 如果使用 gzip 压缩 payload,则 payload size 为压缩后的大小。 + + +#### Audio-only server response + +* Header size 应该为`b0001`. +* Message type为`b1011`. +* Message type specific flags 可能的值有: + * `b0000` - 没有 sequence number. + * `b0001` - sequence number > 0. + * `b0010`or`b0011` - sequence number < 0,表示来自服务器的最后一条消息,此时客户端应合并所有音频片段(如果有多条)。 +* Message serialization method为`b0000`(raw bytes). + + +## 4.注意事项 + +* 每次合成时reqid这个参数需要重新设置,且要保证唯一性(建议使用uuid.V4生成) +* websocket demo中单条链接仅支持单次合成,若需要合成多次,需自行实现。每次创建websocket连接后,按顺序串行发送每一包。一次合成结束后,可以发送新的合成请求。 +* operation需要设置为submit才是流式返回 +* 在 websocket 握手成功后,会返回这些 Response header +* 不支持["豆包语音合成模型2.0"的音色](https://www.volcengine.com/docs/6561/1257544),比如:"zh_female_vv_uranus_bigtts",如需使用推荐使用v3 接口 + + +| | | | \ +|Key |说明 |Value 示例 | +|---|---|---| +| | | | \ +|X-Tt-Logid |服务端返回的 logid,建议用户获取和打印方便定位问题 |202407261553070FACFE6D19421815D605 | + + +## 5.调用示例 + +```mixin-react +return ( + +### 前提条件 + +* 调用之前,您需要获取以下信息: + * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。 + + +### Python环境 + +* Python:3.9版本及以上。 +* Pip:25.1.1版本及以上。您可以使用下面命令安装。 + +\`\`\`Bash +python3 -m pip install --upgrade pip +\`\`\` + + +### 下载代码示例 + + +### 解压缩代码包,安装依赖 +\`\`\`Bash +mkdir -p volcengine_binary_demo +tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo +cd volcengine_binary_demo +python3 -m venv .venv +source .venv/bin/activate +python3 -m pip install --upgrade pip +pip3 install -e . +\`\`\` + + +### 发起调用 +> \`\`替换为您的APP ID。 +> \`\`替换为您的Access Token。 +> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。 + +\`\`\`Bash +python3 examples/volcengine/binary.py --appid --access_token --voice_type --text "你好,我是火山引擎的语音合成服务。这是一个美好的旅程。" +\`\`\` + +`}> + +### 前提条件 + +* 调用之前,您需要获取以下信息: + * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。 + + +### Java环境 + +* Java:21版本及以上。 +* Maven:3.9.10版本及以上。 + + +### 下载代码示例 + + +### 解压缩代码包,安装依赖 +\`\`\`Bash +mkdir -p volcengine_binary_demo +tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo +cd volcengine_binary_demo +\`\`\` + + +### 发起调用 +> \`\`替换为您的APP ID。 +> \`\`替换为您的Access Token。 +> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。 + +\`\`\`Bash +mvn compile exec:java -Dexec.mainClass=com.speech.volcengine.Binary -DappId= -DaccessToken= -Dvoice= -Dtext="**你好**,我是豆包语音助手,很高兴认识你。这是一个愉快的旅程。" +\`\`\` + +`}> + +### 前提条件 + +* 调用之前,您需要获取以下信息: + * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。 + + +### Go环境 + +* Go:1.21.0版本及以上。 + + +### 下载代码示例 + + +### 解压缩代码包,安装依赖 +\`\`\`Bash +mkdir -p volcengine_binary_demo +tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo +cd volcengine_binary_demo +\`\`\` + + +### 发起调用 +> \`\`替换为您的APP ID。 +> \`\`替换为您的Access Token。 +> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。 + +\`\`\`Bash +go run volcengine/binary/main.go --appid --access_token --voice_type --text "**你好**,我是火山引擎的语音合成服务。" +\`\`\` + +`}> + +### 前提条件 + +* 调用之前,您需要获取以下信息: + * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。 + + +### C#环境 + +* .Net 9.0版本。 + + +### 下载代码示例 + + +### 解压缩代码包,安装依赖 +\`\`\`Bash +mkdir -p volcengine_binary_demo +tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo +cd volcengine_binary_demo +\`\`\` + + +### 发起调用 +> \`\`替换为您的APP ID。 +> \`\`替换为您的Access Token。 +> \`\`替换为您预期使用的音色ID,例如\`zh_female_cancan_mars_bigtts\`。 + +\`\`\`Bash +dotnet run --project Volcengine/Binary/Volcengine.Speech.Binary.csproj -- --appid --access_token --voice_type --text "**你好**,这是一个测试文本。我们正在测试文本转语音功能。" +\`\`\` + +`}> + +### 前提条件 + +* 调用之前,您需要获取以下信息: + * \`\`:使用控制台获取的APP ID,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:使用控制台获取的Access Token,可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。 + * \`\`:您预期使用的音色ID,可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。 + + +### node环境 + +* node:v24.0版本及以上。 + + +### 下载代码示例 + + +### 解压缩代码包,安装依赖 +\`\`\`Bash +mkdir -p volcengine_binary_demo +tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo +cd volcengine_binary_demo +npm install +npm install -g typescript +npm install -g ts-node +\`\`\` + + +### 发起调用 +> \`\`替换为您的APP ID。 +> \`\`替换为您的Access Token。 +> \`\`替换为您预期使用的音色ID,例如\`\`。 + +\`\`\`Bash +npx ts-node src/volcengine/binary.ts --appid --access_token --voice_type --text "**你好**,我是火山引擎的语音合成服务。" +\`\`\` + +`}>); + ``` + + +# HTTP +> 使用账号申请部分申请到的 appid&access_token 进行调用 +> 文本全部合成完毕之后,一次性返回全部的音频数据 + + +## 1. 接口说明 +接口地址为 **https://openspeech.bytedance.com/api/v1/tts** + +## 2. 身份认证 +认证方式采用 Bearer Token. +1)需要在请求的 Header 中填入"Authorization":"Bearer;${token}" +:::warning +Bearer 和 token 使用分号 ; 分隔,替换时请勿保留${} +::: +AppID/Token/Cluster 等信息可参考 [控制台使用FAQ-Q1](/docs/6561/196768#q1:哪里可以获取到以下参数appid,cluster,token,authorization-type,secret-key-?) + +## 3. 注意事项 + +* 使用 HTTP Post 方式进行请求,返回的结果为 JSON 格式,需要进行解析 +* 因 json 格式无法直接携带二进制音频,音频经 base64 编码。使用 base64 解码后,即为二进制音频 +* 每次合成时 reqid 这个参数需要重新设置,且要保证唯一性(建议使用 UUID/GUID 等生成) +* 不支持["豆包语音合成模型2.0"的音色](https://www.volcengine.com/docs/6561/1257544),比如:"zh_female_vv_uranus_bigtts",如需使用推荐使用v3 接口 + + +# 参数列表 +> Websocket 与 Http 调用参数相同 + + +## 请求参数 + +| | | | | | | \ +|字段 |含义 |层级 |格式 |必需 |备注 | +|---|---|---|---|---|---| +| | | | | | | \ +|app |应用相关配置 |1 |dict |✓ | | +| | | | | | | \ +|appid |应用标识 |2 |string |✓ |需要申请 | +| | | | | | | \ +|token |应用令牌 |2 |string |✓ |无实际鉴权作用的Fake token,可传任意非空字符串 | +| | | | | | | \ +|cluster |业务集群 |2 |string |✓ |volcano_tts | +| | | | | | | \ +|user |用户相关配置 |1 |dict |✓ | | +| | | | | | | \ +|uid |用户标识 |2 |string |✓ |可传任意非空字符串,传入值可以通过服务端日志追溯 | +| | | | | | | \ +|audio |音频相关配置 |1 |dict |✓ | | +| | | | | | | \ +|voice_type |音色类型 |2 |string |✓ | | +| | | | | | | \ +|emotion |音色情感 |2 |string | |设置音色的情感。示例:"emotion": "angry" |\ +| | | | | |注:当前仅部分音色支持设置情感,且不同音色支持的情感范围存在不同。 |\ +| | | | | |详见:[大模型语音合成API-音色列表-多情感音色](https://www.volcengine.com/docs/6561/1257544) | +| | | | | | | \ +|enable_emotion |开启音色情感 |2 |bool | |是否可以设置音色情感,需将enable_emotion设为true |\ +| | | | | |示例:"enable_emotion": True | +| | | | | | | \ +|emotion_scale |情绪值设置 |2 |float | |调用emotion设置情感参数后可使用emotion_scale进一步设置情绪值,范围1~5,不设置时默认值为4。 |\ +| | | | | |注:理论上情绪值越大,情感越明显。但情绪值1~5实际为非线性增长,可能存在超过某个值后,情绪增加不明显,例如设置3和5时情绪值可能接近。 | +| | | | | | | \ +|encoding |音频编码格式 |2 |string | |wav / pcm / ogg_opus / mp3,默认为 pcm |\ +| | | | | |注意:wav 不支持流式 | +| | | | | | | \ +|speed_ratio |语速 |2 |float | |[0.1,2],默认为 1,通常保留一位小数即可 | +| | | | | | | \ +|rate |音频采样率 |2 |int | |默认为 24000,可选8000,16000 | +| | | | | | | \ +|bitrate |比特率 |2 |int | |单位 kb/s,默认160 kb/s |\ +| | | | | |**注:** |\ +| | | | | |bitrate只针对MP3格式,wav计算比特率跟pcm一样是 比特率 (bps) = 采样率 × 位深度 × 声道数 |\ +| | | | | |目前大模型TTS只能改采样率,所以对于wav格式来说只能通过改采样率来变更音频的比特率 | +| | | | | | | \ +|explicit_language |明确语种 |2 |string | |仅读指定语种的文本 |\ +| | | | | |精品音色和 ICL 声音复刻场景: |\ +| | | | | | |\ +| | | | | |* 不给定参数,正常中英混 |\ +| | | | | |* `crosslingual` 启用多语种前端(包含`zh/en/ja/es-ms/id/pt-br`) |\ +| | | | | |* `zh-cn` 中文为主,支持中英混 |\ +| | | | | |* `en` 仅英文 |\ +| | | | | |* `ja` 仅日文 |\ +| | | | | |* `es-mx` 仅墨西 |\ +| | | | | |* `id` 仅印尼 |\ +| | | | | |* `pt-br` 仅巴葡 |\ +| | | | | | |\ +| | | | | |DIT 声音复刻场景: |\ +| | | | | |当音色是使用model_type=2训练的,即采用dit标准版效果时,建议指定明确语种,目前支持: |\ +| | | | | | |\ +| | | | | |* 不给定参数,启用多语种前端`zh,en,ja,es-mx,id,pt-br,de,fr` |\ +| | | | | |* `zh,en,ja,es-mx,id,pt-br,de,fr` 启用多语种前端 |\ +| | | | | |* `zh-cn` 中文为主,支持中英混 |\ +| | | | | |* `en` 仅英文 |\ +| | | | | |* `ja` 仅日文 |\ +| | | | | |* `es-mx` 仅墨西 |\ +| | | | | |* `id` 仅印尼 |\ +| | | | | |* `pt-br` 仅巴葡 |\ +| | | | | |* `de` 仅德语 |\ +| | | | | |* `fr` 仅法语 |\ +| | | | | | |\ +| | | | | |当音色是使用model_type=3训练的,即采用dit还原版效果时,必须指定明确语种,目前支持: |\ +| | | | | | |\ +| | | | | |* 不给定参数,正常中英混 |\ +| | | | | |* `zh-cn` 中文为主,支持中英混 |\ +| | | | | |* `en` 仅英文 | +| | | | | | | \ +|context_language |参考语种 |2 |string | |给模型提供参考的语种 |\ +| | | | | | |\ +| | | | | |* 不给定 西欧语种采用英语 |\ +| | | | | |* id 西欧语种采用印尼 |\ +| | | | | |* es 西欧语种采用墨西 |\ +| | | | | |* pt 西欧语种采用巴葡 | +| | | | | | | \ +|loudness_ratio |音量调节 |2 |float | |[0.5,2],默认为1,通常保留一位小数即可。0.5代表原音量0.5倍,2代表原音量2倍 | +| | | | | | | \ +|request |请求相关配置 |1 |dict |✓ | | +| | | | | | | \ +|reqid |请求标识 |2 |string |✓ |需要保证每次调用传入值唯一,建议使用 UUID | +| | | | | | | \ +|text |文本 |2 |string |✓ |合成语音的文本,长度限制 1024 字节(UTF-8 编码)建议小于300字符,超出容易增加badcase出现概率或报错 | +| | | | | | | \ +|model |模型版本 |\ +| | |2 |\ +| | | |string |否 |模型版本,传`seed-tts-1.1`较默认版本音质有提升,并且延时更优,不传为默认效果。 |\ +| | | | | |注:若使用1.1模型效果,在复刻场景中会放大训练音频prompt特质,因此对prompt的要求更高,使用高质量的训练音频,可以获得更优的音质效果。 | +| | | | | | | \ +|text_type |文本类型 |2 |string | |使用 ssml 时需要指定,值为"ssml" | +| | | | | | | \ +|silence_duration |句尾静音 |2 |float | |设置该参数可在句尾增加静音时长,范围0~30000ms。(注:增加的句尾静音主要针对传入文本最后的句尾,而非每句话的句尾)若启用该参数,必须在request下首先设置enable_trailing_silence_audio = true | +| | | | | | | \ +|with_timestamp |时间戳相关 |2 |int |\ +| | | |string | |传入1时表示启用,将返回TN后文本的时间戳,例如:2025。根据语义,TN后文本为“两千零二十五”或“二零二五”。 |\ +| | | | | |注:原文本中的多个标点连用或者空格仍会被处理,但不影响时间戳的连贯性(仅限大模型场景使用)。 |\ +| | | | | |附加说明(小模型和大模型时间戳原理差异): |\ +| | | | | |1. 小模型依据前端模型生成时间戳,然后合成音频。在处理时间戳时,TN前后文本进行了映射,所以小模型可返回TN前原文本的时间戳,即保留原文中的阿拉伯数字或者特殊符号等。 |\ +| | | | | |2. 大模型在对传入文本语义理解后合成音频,再针对合成音频进行TN后打轴以输出时间戳。若不采用TN后文本,输出的时间戳将与合成音频无法对齐,所以大模型返回的时间戳对应TN后的文本。 | +| | | | | | | \ +|operation |操作 |2 |string |✓ |query(非流式,http 只能 query) / submit(流式) | +| | | | | | | \ +|extra_param |附加参数 |2 |jsonstring | | | +| | | | | | | \ +|disable_markdown_filter | |3 |bool | |是否开启markdown解析过滤, |\ +| | | | | |为true时,解析并过滤markdown语法,例如,**你好**,会读为“你好”, |\ +| | | | | |为false时,不解析不过滤,例如,**你好**,会读为“星星‘你好’星星” |\ +| | | | | |示例:"disable_markdown_filter": True | +| | | | | | | \ +|enable_latex_tn | |3 |bool | |是否可以播报latex公式,需将disable_markdown_filter设为true |\ +| | | | | |示例:"enable_latex_tn": True | +| | | | | | | \ +|mute_cut_remain_ms |句首静音参数 |3 |string | |该参数需配合mute_cut_threshold参数一起使用,其中: |\ +| | | | | |"mute_cut_threshold": "400", // 静音判断的阈值(音量小于该值时判定为静音) |\ +| | | | | |"mute_cut_remain_ms": "50", // 需要保留的静音长度 |\ +| | | | | |注:参数和value都为string格式 |\ +| | | | | |以python为示例: |\ +| | | | | |```Python |\ +| | | | | |"extra_param":("{\"mute_cut_threshold\":\"400\", \"mute_cut_remain_ms\": \"0\"}") |\ +| | | | | |``` |\ +| | | | | | |\ +| | | | | |特别提醒: |\ +| | | | | | |\ +| | | | | |* 因MP3格式的特殊性,句首始终会存在100ms内的静音无法消除,WAV格式的音频句首静音可全部消除,建议依照自身业务需求综合判断选择 | +| | | | | | | \ +|disable_emoji_filter |emoji不过滤显示 |3 |bool | |开启emoji表情在文本中不过滤显示,默认为False,建议搭配时间戳参数一起使用。 |\ +| | | | | |Python示例:`"extra_param": json.dumps({"disable_emoji_filter": True})` | +| | | | | | | \ +|unsupported_char_ratio_thresh |不支持语种占比阈值 |3 |float | |默认: 0.3,最大值: 1.0 |\ +| | | | | |检测出不支持合成的文本超过设置的比例,则会返回错误。 |\ +| | | | | |Python示例:`"extra_param": json.dumps({"`unsupported_char_ratio_thresh`": 0.3})` | +| | | | | | | \ +|aigc_watermark |是否在合成结尾增加音频节奏标识 |3 |bool | |默认: false |\ +| | | | | |Python示例:`"extra_param": json.dumps({"aigc_watermark": True})` | +| | | | | | | \ +|cache_config |缓存相关参数 |3 |dict | |开启缓存,开启后合成相同文本时,服务会直接读取缓存返回上一次合成该文本的音频,可明显加快相同文本的合成速率,缓存数据保留时间1小时。 |\ +| | | | | |(通过缓存返回的数据不会附带时间戳) |\ +| | | | | |Python示例:`"extra_param": json.dumps({"cache_config": {"text_type": 1,"use_cache": True}})` | +| | | | | | | \ +|text_type |缓存相关参数 |4 |int | |和use_cache参数一起使用,需要开启缓存时传1 | +| | | | | | | \ +|use_cache |缓存相关参数 |4 |bool | |和text_type参数一起使用,需要开启缓存时传true | + + + + +备注: + +1. 已支持字级别时间戳能力(ssml文本类型不支持) +2. ssml 能力已支持,详见 [SSML 标记语言--豆包语音-火山引擎 (volcengine.com)](https://www.volcengine.com/docs/6561/1330194) +3. 暂时不支持音高调节 +4. 大模型音色语种支持中英混 +5. 大模型非双向流式已支持latex公式 +6. 在 websocket/http 握手成功后,会返回这些 Response header + + +| | | | \ +|Key |说明 |Value 示例 | +|---|---|---| +| | | | \ +|X-Tt-Logid |服务端返回的 logid,建议用户获取和打印方便定位问题,使用默认格式即可,不要自定义格式 |202407261553070FACFE6D19421815D605 | + +请求示例: +```go +{ + "app": { + "appid": "appid123", + "token": "access_token", + "cluster": "volcano_tts", + }, + "user": { + "uid": "uid123" + }, + "audio": { + "voice_type": "zh_male_M392_conversation_wvae_bigtts", + "encoding": "mp3", + "speed_ratio": 1.0, + }, + "request": { + "reqid": "uuid", + "text": "字节跳动语音合成", + "operation": "query", + } +} +``` + + +## 返回参数 + +| | | | | | \ +|字段 |含义 |层级 |格式 |备注 | +|---|---|---|---|---| +| | | | | | \ +|reqid |请求 ID |1 |string |请求 ID,与传入的参数中 reqid 一致 | +| | | | | | \ +|code |请求状态码 |1 |int |错误码,参考下方说明 | +| | | | | | \ +|message |请求状态信息 |1 |string |错误信息 | +| | | | | | \ +|sequence |音频段序号 |1 |int |负数表示合成完毕 | +| | | | | | \ +|data |合成音频 |1 |string |返回的音频数据,base64 编码 | +| | | | | | \ +|addition |额外信息 |1 |string |额外信息父节点 | +| | | | | | \ +|duration |音频时长 |2 |string |返回音频的长度,单位 ms | + +响应示例 +```go +{ + "reqid": "reqid", + "code": 3000, + "operation": "query", + "message": "Success", + "sequence": -1, + "data": "base64 encoded binary data", + "addition": { + "duration": "1960", + } +} +``` + + +## 注意事项 + +* websocket 单条链接仅支持单次合成,若需要合成多次,则需要多次建立链接 +* 每次合成时 reqid 这个参数需要重新设置,且要保证唯一性(建议使用 uuid.V4 生成) +* operation 需要设置为 submit + + +# 返回码说明 + +| | | | | \ +|错误码 |错误描述 |举例 |建议行为 | +|---|---|---|---| +| | | | | \ +|3000 |请求正确 |正常合成 |正常处理 | +| | | | | \ +|3001 |无效的请求 |一些参数的值非法,比如 operation 配置错误 |检查参数 | +| | | | | \ +|3003 |并发超限 |超过在线设置的并发阈值 |重试;使用 sdk 的情况下切换离线 | +| | | | | \ +|3005 |后端服务忙 |后端服务器负载高 |重试;使用 sdk 的情况下切换离线 | +| | | | | \ +|3006 |服务中断 |请求已完成/失败之后,相同 reqid 再次请求 |检查参数 | +| | | | | \ +|3010 |文本长度超限 |单次请求超过设置的文本长度阈值 |检查参数 | +| | | | | \ +|3011 |无效文本 |参数有误或者文本为空、文本与语种不匹配、文本只含标点 |检查参数 | +| | | | | \ +|3030 |处理超时 |单次请求超过服务最长时间限制 |重试或检查文本 | +| | | | | \ +|3031 |处理错误 |后端出现异常 |重试;使用 sdk 的情况下切换离线 | +| | | | | \ +|3032 |等待获取音频超时 |后端网络异常 |重试;使用 sdk 的情况下切换离线 | +| | | | | \ +|3040 |后端链路连接错误 |后端网络异常 |重试 | +| | | | | \ +|3050 |音色不存在 |检查使用的 voice_type 代号 |检查参数 | + + +# 常见错误返回说明 + +1. 错误返回: + "message": "quota exceeded for types: xxxxxxxxx_lifetime" + **错误原因:试用版用量用完了,需要开通正式版才能继续使用** +2. 错误返回: + "message": "quota exceeded for types: concurrency" + **错误原因:并发超过了限定值,需要减少并发调用情况或者增购并发** +3. 错误返回: + "message": "Fail to feed text, reason Init Engine Instance failed" + **错误原因:voice_type / cluster 传递错误** +4. 错误返回: + "message": "illegal input text!" + **错误原因:传入的 text 无效,没有可合成的有效文本。比如全部是标点符号或者 emoji 表情,或者使用中文音色时,传递日语,以此类推。多语种音色,也需要使用 language 指定对应的语种** +5. 错误返回: + "message": "authenticate request: load grant: requested grant not found" + **错误原因:鉴权失败,需要检查 appid&token 的值是否设置正确,同时,鉴权的正确格式为** + **headers["Authorization"] = "Bearer;${token}"** +6. 错误返回: + "message': 'extract request resource id: get resource id: access denied" + **错误原因:语音合成已开通正式版且未拥有当前音色授权,需要在控制台购买该音色才能调用。标注免费的音色除 BV001_streaming 及 BV002_streaming 外,需要在控制台进行下单(支付 0 元)** + + diff --git a/Capybara audio/勇敢的小裁缝_1770727373.mp3 b/Capybara audio/勇敢的小裁缝_1770727373.mp3 new file mode 100644 index 0000000..cd8ffb9 Binary files /dev/null and b/Capybara audio/勇敢的小裁缝_1770727373.mp3 differ diff --git a/Capybara audio/卡皮巴拉的奇幻漂流_1770727390.mp3 b/Capybara audio/卡皮巴拉的奇幻漂流_1770727390.mp3 new file mode 100644 index 0000000..c33e659 Binary files /dev/null and b/Capybara audio/卡皮巴拉的奇幻漂流_1770727390.mp3 differ diff --git a/Capybara audio/小红帽与大灰狼_1770723087.mp3 b/Capybara audio/小红帽与大灰狼_1770723087.mp3 new file mode 100644 index 0000000..6debfbd Binary files /dev/null and b/Capybara audio/小红帽与大灰狼_1770723087.mp3 differ diff --git a/Capybara audio/杰克与魔豆_1770727355.mp3 b/Capybara audio/杰克与魔豆_1770727355.mp3 new file mode 100644 index 0000000..0274f5b Binary files /dev/null and b/Capybara audio/杰克与魔豆_1770727355.mp3 differ diff --git a/Capybara audio/海盗找朋友_1770718270.mp3 b/Capybara audio/海盗找朋友_1770718270.mp3 new file mode 100644 index 0000000..1d56aef Binary files /dev/null and b/Capybara audio/海盗找朋友_1770718270.mp3 differ diff --git a/Capybara audio/糖果屋历险记_1770721395.mp3 b/Capybara audio/糖果屋历险记_1770721395.mp3 new file mode 100644 index 0000000..1152f11 Binary files /dev/null and b/Capybara audio/糖果屋历险记_1770721395.mp3 differ diff --git a/Capybara music/lyrics/书房咔咔茶_1770634690.txt b/Capybara music/lyrics/书房咔咔茶_1770634690.txt new file mode 100644 index 0000000..3baa1ef --- /dev/null +++ b/Capybara music/lyrics/书房咔咔茶_1770634690.txt @@ -0,0 +1,17 @@ +在书房角落 沏上一杯茶 +窗外微风轻拂 摇曳着树梢 +咔咔坐在椅上 沉浸在思考 +书页轻轻翻动 世界变得渺小 +咔咔咔咔 书房里的我 +静享时光 悠然自得 +茶香飘散 心灵得到慰藉 +咔咔咔咔 享受这刻 +阳光透过窗帘 柔和又温暖 +每个字每个句 都是心灵的食粮 +咔咔轻轻点头 感受着文字的力量 +在这安静的角落 找到了自我方向 +咔咔咔咔 书房里的我 +静享时光 悠然自得 +茶香飘散 心灵得到慰藉 +咔咔咔咔 享受这刻 +(茶杯轻放的声音...) \ No newline at end of file diff --git a/Capybara music/lyrics/书房咔咔茶_1770637242.txt b/Capybara music/lyrics/书房咔咔茶_1770637242.txt new file mode 100644 index 0000000..c3b5f20 --- /dev/null +++ b/Capybara music/lyrics/书房咔咔茶_1770637242.txt @@ -0,0 +1,17 @@ +在书房角落里,我找到了安静 +一杯茶香飘来,思绪开始飞腾 +书页轻轻翻动,知识在心间 +咔咔我在这里,享受这宁静 +咔咔咔咔,独自享受 +书中的世界,如此美妙 +咔咔咔咔,心无旁骛 +沉浸在知识的海洋,自在飞翔 +窗外微风轻拂,阳光洒满书桌 +咔咔我在这里,与文字共舞 +每个字每个句,都像是音符 +奏出心灵的乐章,如此动听 +咔咔咔咔,独自享受 +书中的世界,如此美妙 +咔咔咔咔,心无旁骛 +沉浸在知识的海洋,自在飞翔 +(翻书声...风铃声...咔咔的呼吸声...) \ No newline at end of file diff --git a/Capybara music/lyrics/夜深了窗外下着小雨盖着被子准备入睡_1770627405.txt b/Capybara music/lyrics/夜深了窗外下着小雨盖着被子准备入睡_1770627405.txt new file mode 100644 index 0000000..7c9342a --- /dev/null +++ b/Capybara music/lyrics/夜深了窗外下着小雨盖着被子准备入睡_1770627405.txt @@ -0,0 +1,8 @@ +[verse] +窗外细雨轻敲窗, +被窝里温暖如常。 +[chorus] +咔咔咔咔,梦乡近了, +小雨伴我入眠床。 +[outro] +(雨声和咔咔的呼吸声...) \ No newline at end of file diff --git a/Capybara music/lyrics/慵懒的午后泡在温泉里听水声发呆什么都不想_1770627905.txt b/Capybara music/lyrics/慵懒的午后泡在温泉里听水声发呆什么都不想_1770627905.txt new file mode 100644 index 0000000..d090562 --- /dev/null +++ b/Capybara music/lyrics/慵懒的午后泡在温泉里听水声发呆什么都不想_1770627905.txt @@ -0,0 +1 @@ +[Inst] \ No newline at end of file diff --git a/Capybara music/lyrics/洗脑咔咔舞_1770631313.txt b/Capybara music/lyrics/洗脑咔咔舞_1770631313.txt new file mode 100644 index 0000000..8b0c054 --- /dev/null +++ b/Capybara music/lyrics/洗脑咔咔舞_1770631313.txt @@ -0,0 +1,20 @@ +咔咔咔咔来跳舞,魔性旋律不停步 +跟着节奏摇摆身,洗脑神曲不放手 +重复的旋律像魔法,让人听了就上瘾 +咔咔咔咔的魔力,谁也挡不住 +洗脑咔咔舞,洗脑咔咔舞 +魔性的旋律,让人停不下来 +洗脑咔咔舞,洗脑咔咔舞 +跟着咔咔一起跳,快乐无边 +每个节拍都精准,咔咔的舞步最迷人 +不管走到哪里去,都能听到这魔音 +咔咔的舞蹈最独特,让人看了就想学 +洗脑神曲的魅力,就是让人忘不掉 +洗脑咔咔舞,洗脑咔咔舞 +魔性的旋律,让人停不下来 +洗脑咔咔舞,洗脑咔咔舞 +跟着咔咔一起跳,快乐无边 +咔咔咔咔,魔性洗脑舞 +重复的节奏,快乐的旋律 +洗脑咔咔舞,洗脑咔咔舞 +让快乐无限循环,直到永远 \ No newline at end of file diff --git a/Capybara music/lyrics/温泉发呆曲_1770628235.txt b/Capybara music/lyrics/温泉发呆曲_1770628235.txt new file mode 100644 index 0000000..deb4c0e --- /dev/null +++ b/Capybara music/lyrics/温泉发呆曲_1770628235.txt @@ -0,0 +1,26 @@ +[verse 1]\n + 懒懒的午后阳光暖,\n + 温泉里我泡得欢。\n + 水声潺潺耳边响,\n + 什么都不想干。\n + \n + [chorus]\n + 咔咔咔咔,悠然自得,\n + 水波轻摇,心情舒畅。\n + 咔咔咔咔,享受此刻,\n + 懒懒午后,最是惬意。\n + \n + [verse 2]\n + 看着云朵慢慢飘,\n + 心思像水一样柔。\n + 闭上眼,世界都静了,\n + 只有我和这温泉。\n + \n + [chorus]\n + 咔咔咔咔,悠然自得,\n + 水波轻摇,心情舒畅。\n + 咔咔咔咔,享受此刻,\n + 懒懒午后,最是惬意。\n + \n + [outro]\n + (水声渐渐远去...) \ No newline at end of file diff --git a/Capybara music/lyrics/温泉发呆曲_1770630396.txt b/Capybara music/lyrics/温泉发呆曲_1770630396.txt new file mode 100644 index 0000000..931cb55 --- /dev/null +++ b/Capybara music/lyrics/温泉发呆曲_1770630396.txt @@ -0,0 +1,21 @@ +慵懒午后阳光暖,温泉里我发呆 + +水声潺潺耳边响,思绪飘向云外 + +咔咔咔咔,泡在温泉 + +心无杂念,享受此刻安宁 + +什么都不想去做,只想静静享受 + +水波轻抚我的背,世界变得温柔 + +咔咔咔咔,泡在温泉 + +心无杂念,享受此刻安宁 + +(水花声...) + +咔咔的午后,慵懒又自在 + +温泉里的世界,只有我和水声 \ No newline at end of file diff --git a/Capybara music/lyrics/温泉发呆曲_1770630635.txt b/Capybara music/lyrics/温泉发呆曲_1770630635.txt new file mode 100644 index 0000000..2e913e1 --- /dev/null +++ b/Capybara music/lyrics/温泉发呆曲_1770630635.txt @@ -0,0 +1,33 @@ +懒懒的午后阳光暖, + +温泉里我泡得欢。 + +水声潺潺耳边响, + +什么都不想干。 + +咔咔咔咔,发呆好时光, + +懒懒的我,享受这阳光。 + +咔咔咔咔,让思绪飘扬, + +在温泉里,找到我的天堂。 + +想法像泡泡一样浮上来, + +又慢慢沉下去,消失在水里。 + +时间仿佛静止,我自在如鱼, + +在这温暖的怀抱里。 + +咔咔咔咔,发呆好时光, + +懒懒的我,享受这阳光。 + +咔咔咔咔,让思绪飘扬, + +在温泉里,找到我的天堂。 + +(水声渐渐远去...) \ No newline at end of file diff --git a/Capybara music/lyrics/温泉发呆曲_1770639509.txt b/Capybara music/lyrics/温泉发呆曲_1770639509.txt new file mode 100644 index 0000000..f1457e9 --- /dev/null +++ b/Capybara music/lyrics/温泉发呆曲_1770639509.txt @@ -0,0 +1,33 @@ +懒懒的午后阳光暖, + +温泉里我泡得欢。 + +水声潺潺耳边响, + +什么都不想干。 + +咔咔咔咔,发呆真好, + +懒懒的我,享受这秒。 + +水波轻摇,心也飘, + +咔咔世界,别来无恙。 + +想着云卷云又舒, + +温泉里的我多舒服。 + +时间慢慢流,不急不徐, + +咔咔的梦,轻轻浮。 + +咔咔咔咔,发呆真好, + +懒懒的我,享受这秒。 + +水波轻摇,心也飘, + +咔咔世界,别来无恙。 + +(水声渐渐远去...) \ No newline at end of file diff --git a/Capybara music/lyrics/温泉里的咔咔_1770730481.txt b/Capybara music/lyrics/温泉里的咔咔_1770730481.txt new file mode 100644 index 0000000..7919ee9 --- /dev/null +++ b/Capybara music/lyrics/温泉里的咔咔_1770730481.txt @@ -0,0 +1,37 @@ +懒懒的午后阳光暖, + +温泉里我泡得欢。 + +水声潺潺耳边响, + +什么都不想干。 + +咔咔咔咔,悠然自得, + +水波荡漾心情悦。 + +咔咔咔咔,闭上眼, + +享受这刻的宁静。 + +想象自己是条鱼, + +在水里自由游来游去。 + +没有烦恼没有压力, + +只有我和这温泉池。 + +咔咔咔咔,悠然自得, + +水波荡漾心情悦。 + +咔咔咔咔,闭上眼, + +享受这刻的宁静。 + +(水花声...) + +咔咔,慵懒午后, + +水中世界最逍遥。 \ No newline at end of file diff --git a/Capybara music/lyrics/草地上的咔咔_1770628910.txt b/Capybara music/lyrics/草地上的咔咔_1770628910.txt new file mode 100644 index 0000000..b365ebd --- /dev/null +++ b/Capybara music/lyrics/草地上的咔咔_1770628910.txt @@ -0,0 +1,26 @@ +[verse 1]\n" + "阳光洒满草地绿\n" + "咔咔奔跑心情舒畅\n" + "风儿轻拂过脸庞\n" + "快乐就像泡泡糖\n" + "\n" + "[chorus]\n" + "咔咔咔咔 快乐无边\n" + "草地上的我自由自在\n" + "阳光下的影子拉得好长\n" + "咔咔咔咔 快乐无边\n" + "\n" + "[verse 2]\n" + "蝴蝶飞舞花儿笑\n" + "咔咔摇摆尾巴摇\n" + "每一步都跳着舞\n" + "生活就像一首歌\n" + "\n" + "[chorus]\n" + "咔咔咔咔 快乐无边\n" + "草地上的我自由自在\n" + "阳光下的影子拉得好长\n" + "咔咔咔咔 快乐无边\n" + "\n" + "[outro]\n" + "(草地上咔咔的笑声...) \ No newline at end of file diff --git a/Capybara music/lyrics/草地上的咔咔_1770629673.txt b/Capybara music/lyrics/草地上的咔咔_1770629673.txt new file mode 100644 index 0000000..76226a8 --- /dev/null +++ b/Capybara music/lyrics/草地上的咔咔_1770629673.txt @@ -0,0 +1,17 @@ +阳光洒满地 草香扑鼻来 +咔咔在草地上 跑得飞快 +风儿轻轻吹 摇曳着花海 +心情像彩虹 七彩斑斓开 +咔咔咔咔 快乐无边 +草地上的我 自由自在 +阳光下的梦 美好无限 +咔咔咔咔 快乐无边 +蝴蝶在飞舞 蜜蜂在歌唱 +咔咔跟着它们 一起欢唱 +天空蓝得像画 没有一丝阴霾 +咔咔的心里 只有满满的爱 +咔咔咔咔 快乐无边 +草地上的我 自由自在 +阳光下的梦 美好无限 +咔咔咔咔 快乐无边 +(草地上咔咔的笑声...) \ No newline at end of file diff --git a/Capybara music/lyrics/草地上的咔咔_1770640911.txt b/Capybara music/lyrics/草地上的咔咔_1770640911.txt new file mode 100644 index 0000000..b069c3f --- /dev/null +++ b/Capybara music/lyrics/草地上的咔咔_1770640911.txt @@ -0,0 +1,19 @@ +阳光洒满地 绿草如茵间 +咔咔跑起来 心情像飞燕 +风儿轻拂过 花香满径边 +快乐如此简单 每一步都新鲜 +咔咔咔咔 快乐咔咔 +草地上的我 自由自在 +阳光下的舞 轻松又欢快 +咔咔咔咔 快乐咔咔 +无忧无虑的我 最爱这蓝天 +蝴蝶翩翩起 蜜蜂忙采蜜 +咔咔我最棒 每个瞬间都美丽 +朋友在旁边 笑声传千里 +这世界多美好 有你有我有草地 +咔咔咔咔 快乐咔咔 +草地上的我 自由自在 +阳光下的舞 轻松又欢快 +咔咔咔咔 快乐咔咔 +无忧无虑的我 最爱这蓝天 +(草地上咔咔的笑声...) \ No newline at end of file diff --git a/Capybara music/lyrics/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770626906.txt b/Capybara music/lyrics/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770626906.txt new file mode 100644 index 0000000..0b50271 --- /dev/null +++ b/Capybara music/lyrics/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770626906.txt @@ -0,0 +1,8 @@ +[verse] +阳光洒满草地,我跑得飞快 +心情像彩虹,七彩斑斓真美 +[chorus] +咔咔咔咔,快乐无边 +在阳光下,自由自在 +[outro] +(风吹草低见水豚) \ No newline at end of file diff --git a/Capybara music/lyrics/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770639287.txt b/Capybara music/lyrics/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770639287.txt new file mode 100644 index 0000000..d090562 --- /dev/null +++ b/Capybara music/lyrics/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770639287.txt @@ -0,0 +1 @@ +[Inst] \ No newline at end of file diff --git a/Capybara music/书房咔咔茶_1770634690.mp3 b/Capybara music/书房咔咔茶_1770634690.mp3 new file mode 100644 index 0000000..835700b Binary files /dev/null and b/Capybara music/书房咔咔茶_1770634690.mp3 differ diff --git a/Capybara music/书房咔咔茶_1770637242.mp3 b/Capybara music/书房咔咔茶_1770637242.mp3 new file mode 100644 index 0000000..d5912b7 Binary files /dev/null and b/Capybara music/书房咔咔茶_1770637242.mp3 differ diff --git a/Capybara music/夜深了窗外下着小雨盖着被子准备入睡_1770627405.mp3 b/Capybara music/夜深了窗外下着小雨盖着被子准备入睡_1770627405.mp3 new file mode 100644 index 0000000..38c6ca1 Binary files /dev/null and b/Capybara music/夜深了窗外下着小雨盖着被子准备入睡_1770627405.mp3 differ diff --git a/Capybara music/惊喜咔咔派_1770642290.mp3 b/Capybara music/惊喜咔咔派_1770642290.mp3 new file mode 100644 index 0000000..47b7f4a Binary files /dev/null and b/Capybara music/惊喜咔咔派_1770642290.mp3 differ diff --git a/Capybara music/慵懒的午后泡在温泉里听水声发呆什么都不想_1770627905.mp3 b/Capybara music/慵懒的午后泡在温泉里听水声发呆什么都不想_1770627905.mp3 new file mode 100644 index 0000000..f2747e2 Binary files /dev/null and b/Capybara music/慵懒的午后泡在温泉里听水声发呆什么都不想_1770627905.mp3 differ diff --git a/Capybara music/洗脑咔咔舞_1770631313.mp3 b/Capybara music/洗脑咔咔舞_1770631313.mp3 new file mode 100644 index 0000000..8c51742 Binary files /dev/null and b/Capybara music/洗脑咔咔舞_1770631313.mp3 differ diff --git a/Capybara music/温泉发呆曲_1770628235.mp3 b/Capybara music/温泉发呆曲_1770628235.mp3 new file mode 100644 index 0000000..d88634b Binary files /dev/null and b/Capybara music/温泉发呆曲_1770628235.mp3 differ diff --git a/Capybara music/温泉发呆曲_1770630396.mp3 b/Capybara music/温泉发呆曲_1770630396.mp3 new file mode 100644 index 0000000..5023065 Binary files /dev/null and b/Capybara music/温泉发呆曲_1770630396.mp3 differ diff --git a/Capybara music/温泉发呆曲_1770630635.mp3 b/Capybara music/温泉发呆曲_1770630635.mp3 new file mode 100644 index 0000000..a31ecc8 Binary files /dev/null and b/Capybara music/温泉发呆曲_1770630635.mp3 differ diff --git a/Capybara music/温泉发呆曲_1770639509.mp3 b/Capybara music/温泉发呆曲_1770639509.mp3 new file mode 100644 index 0000000..2f77a83 Binary files /dev/null and b/Capybara music/温泉发呆曲_1770639509.mp3 differ diff --git a/Capybara music/温泉里的咔咔_1770730481.mp3 b/Capybara music/温泉里的咔咔_1770730481.mp3 new file mode 100644 index 0000000..d07f22a Binary files /dev/null and b/Capybara music/温泉里的咔咔_1770730481.mp3 differ diff --git a/Capybara music/草地上的咔咔_1770628910.mp3 b/Capybara music/草地上的咔咔_1770628910.mp3 new file mode 100644 index 0000000..ebbde2c Binary files /dev/null and b/Capybara music/草地上的咔咔_1770628910.mp3 differ diff --git a/Capybara music/草地上的咔咔_1770629673.mp3 b/Capybara music/草地上的咔咔_1770629673.mp3 new file mode 100644 index 0000000..44f4ec4 Binary files /dev/null and b/Capybara music/草地上的咔咔_1770629673.mp3 differ diff --git a/Capybara music/草地上的咔咔_1770640911.mp3 b/Capybara music/草地上的咔咔_1770640911.mp3 new file mode 100644 index 0000000..2cab36e Binary files /dev/null and b/Capybara music/草地上的咔咔_1770640911.mp3 differ diff --git a/Capybara music/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770626906.mp3 b/Capybara music/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770626906.mp3 new file mode 100644 index 0000000..8f48317 Binary files /dev/null and b/Capybara music/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770626906.mp3 differ diff --git a/Capybara music/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770639287.mp3 b/Capybara music/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770639287.mp3 new file mode 100644 index 0000000..b16cbba Binary files /dev/null and b/Capybara music/阳光灿烂的日子在草地上奔跑撒欢心情超级好_1770639287.mp3 differ diff --git a/Capybara stories/海盗找朋友_1770647563.txt b/Capybara stories/海盗找朋友_1770647563.txt new file mode 100644 index 0000000..123215a --- /dev/null +++ b/Capybara stories/海盗找朋友_1770647563.txt @@ -0,0 +1,11 @@ +# 海盗找朋友 + +在蓝色的大海上,有一艘小小的海盗船,船上只有一个小海盗。他戴着歪歪的海盗帽,举着塑料做的小钩子手,每天对着海浪喊:“谁来和我玩呀?” + +这天,小海盗的船被海浪冲到了一座彩虹岛。岛上的沙滩上,躺着一个会发光的贝壳。小海盗刚捡起贝壳,贝壳突然“叮咚”响了一声,跳出一只圆滚滚的小海豚! + +“哇!你是我的宝藏吗?”小海盗举着贝壳问。小海豚摇摇头,用尾巴拍了拍海水:“我带你去找真正的宝藏!”它驮着小海盗游向海底,那里有一个藏着星星的洞穴。 + +洞穴里,小海豚拿出了一个会唱歌的海螺:“这是友谊海螺,对着它喊朋友的名字,就会有惊喜哦!”小海盗对着海螺喊:“我的朋友!”突然,从海螺里钻出一群小螃蟹,举着彩色的小旗子,还有一只会吹泡泡的章鱼! + +原来,小海豚早就听说小海盗很孤单,特意用友谊海螺召集了伙伴们。现在,小海盗的船上每天都飘着笑声,他再也不是孤单的小海盗啦! \ No newline at end of file diff --git a/airhub_app/lib/pages/music_creation_page.dart b/airhub_app/lib/pages/music_creation_page.dart index 5145f3b..ac90f91 100644 --- a/airhub_app/lib/pages/music_creation_page.dart +++ b/airhub_app/lib/pages/music_creation_page.dart @@ -417,9 +417,18 @@ class _MusicCreationPageState extends State // Actually play or pause audio try { if (_isPlaying) { + // Show now-playing bubble immediately (before await) + _playStickyText = '正在播放: ${_playlist[_currentTrackIndex].title}'; + setState(() { + _speechText = _playStickyText; + _speechVisible = true; + }); await _audioPlayer.play(); } else { await _audioPlayer.pause(); + // Hide bubble on pause + _playStickyText = null; + setState(() => _speechVisible = false); } } catch (e) { debugPrint('Playback error: $e'); @@ -428,6 +437,7 @@ class _MusicCreationPageState extends State // Revert UI state on error setState(() { _isPlaying = false; + _playStickyText = null; _vinylSpinController.stop(); _tonearmController.reverse(); }); @@ -474,7 +484,8 @@ class _MusicCreationPageState extends State } } - _showSpeech('正在播放: ${_playlist[index].title}'); + _playStickyText = '正在播放: ${_playlist[index].title}'; + _showSpeech(_playStickyText!, duration: 0); } // ── Mood Selection ── @@ -646,6 +657,7 @@ class _MusicCreationPageState extends State // ── Speech Bubble ── String? _genStickyText; // Persistent text during generation + String? _playStickyText; // Persistent text during playback void _showSpeech(String text, {int duration = 3000}) { // If this is a generation-related message (duration == 0), save it as sticky @@ -667,6 +679,12 @@ class _MusicCreationPageState extends State _speechText = _genStickyText; _speechVisible = true; }); + } else if (_isPlaying && _playStickyText != null) { + // If playing, restore the now-playing message + setState(() { + _speechText = _playStickyText; + _speechVisible = true; + }); } else { setState(() => _speechVisible = false); } @@ -800,7 +818,9 @@ class _MusicCreationPageState extends State child: _buildVinylWrapper(), ), // Speech bubble — positioned top-right - if (_speechVisible && _speechText != null) + // Always show during playback; otherwise use _speechVisible + if ((_speechVisible && _speechText != null) || + (_isPlaying && _playStickyText != null)) Positioned( top: 0, right: -24, // HTML: right: -24px @@ -1067,12 +1087,18 @@ class _MusicCreationPageState extends State Widget _buildSpeechBubble() { // HTML: .capy-speech-bubble with clip-path iMessage-style tail at bottom-left const tailH = 8.0; + // During playback, always show the playing text even if _speechVisible is false + final bool showBubble = _speechVisible || (_isPlaying && _playStickyText != null); + final String bubbleText = (_isPlaying && _playStickyText != null && !_speechVisible) + ? _playStickyText! + : (_speechText ?? ''); + return AnimatedOpacity( duration: const Duration(milliseconds: 200), - opacity: _speechVisible ? 1.0 : 0.0, + opacity: showBubble ? 1.0 : 0.0, child: AnimatedScale( duration: const Duration(milliseconds: 350), - scale: _speechVisible ? 1.0 : 0.7, + scale: showBubble ? 1.0 : 0.7, curve: const Cubic(0.34, 1.56, 0.64, 1.0), alignment: Alignment.bottomLeft, child: Column( @@ -1098,7 +1124,7 @@ class _MusicCreationPageState extends State ], ), child: Text( - _speechText ?? '', + bubbleText, style: GoogleFonts.dmSans( fontSize: 12.5, fontWeight: FontWeight.w500, @@ -1485,6 +1511,7 @@ class _MusicCreationPageState extends State builder: (ctx) => _PlaylistModalContent( tracks: _playlist, currentIndex: _currentTrackIndex, + isPlaying: _isPlaying, onSelect: (index) { Navigator.pop(ctx); _playTrack(index); @@ -1921,17 +1948,53 @@ class _InputModalContent extends StatelessWidget { } /// Playlist Modal — HTML: .playlist-container -class _PlaylistModalContent extends StatelessWidget { +class _PlaylistModalContent extends StatefulWidget { final List<_Track> tracks; final int currentIndex; + final bool isPlaying; final ValueChanged onSelect; const _PlaylistModalContent({ required this.tracks, required this.currentIndex, + required this.isPlaying, required this.onSelect, }); + @override + State<_PlaylistModalContent> createState() => _PlaylistModalContentState(); +} + +class _PlaylistModalContentState extends State<_PlaylistModalContent> + with SingleTickerProviderStateMixin { + late AnimationController _waveController; + + @override + void initState() { + super.initState(); + _waveController = AnimationController( + vsync: this, + duration: const Duration(milliseconds: 800), + ); + if (widget.isPlaying) _waveController.repeat(reverse: true); + } + + @override + void didUpdateWidget(covariant _PlaylistModalContent oldWidget) { + super.didUpdateWidget(oldWidget); + if (widget.isPlaying && !_waveController.isAnimating) { + _waveController.repeat(reverse: true); + } else if (!widget.isPlaying && _waveController.isAnimating) { + _waveController.stop(); + } + } + + @override + void dispose() { + _waveController.dispose(); + super.dispose(); + } + @override Widget build(BuildContext context) { final screenWidth = MediaQuery.of(context).size.width; @@ -2015,23 +2078,39 @@ class _PlaylistModalContent extends StatelessWidget { mainAxisSpacing: 8, childAspectRatio: 0.75, ), - itemCount: tracks.length, + itemCount: widget.tracks.length, itemBuilder: (context, index) { - final track = tracks[index]; - final isPlaying = index == currentIndex; + final track = widget.tracks[index]; + final isCurrent = index == widget.currentIndex; + final isPlaying = isCurrent && widget.isPlaying; // HTML: .record-slot { background: rgba(0,0,0,0.03); border-radius: 12px; // padding: 10px 4px; border: 1px solid rgba(0,0,0,0.02); } return GestureDetector( - onTap: () => onSelect(index), + onTap: () => widget.onSelect(index), child: Container( padding: const EdgeInsets.symmetric(horizontal: 4, vertical: 10), decoration: BoxDecoration( - color: Colors.black.withOpacity(0.03), + // Current track: warm golden background; others: subtle grey + color: isCurrent + ? const Color(0xFFFDF3E3) + : Colors.black.withOpacity(0.03), borderRadius: BorderRadius.circular(12), border: Border.all( - color: Colors.black.withOpacity(0.02)), + color: isCurrent + ? const Color(0xFFECCFA8).withOpacity(0.6) + : Colors.black.withOpacity(0.02), + width: isCurrent ? 1.5 : 1.0), + boxShadow: isCurrent + ? [ + BoxShadow( + color: const Color(0xFFECCFA8).withOpacity(0.25), + blurRadius: 8, + offset: const Offset(0, 2), + ), + ] + : null, ), child: Column( children: [ @@ -2043,10 +2122,8 @@ class _PlaylistModalContent extends StatelessWidget { decoration: BoxDecoration( shape: BoxShape.circle, color: const Color(0xFF18181B), - // HTML: .record-item.playing .record-cover-wrapper - // { box-shadow: 0 0 0 2px #ECCFA8, ... } boxShadow: [ - if (isPlaying) + if (isCurrent) const BoxShadow( color: Color(0xFFECCFA8), spreadRadius: 2, @@ -2096,23 +2173,57 @@ class _PlaylistModalContent extends StatelessWidget { ), ), ), + // Sound wave overlay for playing track + if (isPlaying) + Center( + child: AnimatedBuilder( + animation: _waveController, + builder: (context, child) { + return CustomPaint( + painter: _MiniWavePainter( + progress: _waveController.value, + ), + size: const Size(28, 20), + ); + }, + ), + ), ], ), ), ), ), const SizedBox(height: 8), - // HTML: .record-title { font-size: 12px; font-weight: 500; } - Text( - track.title, - style: GoogleFonts.dmSans( - fontSize: 12, - fontWeight: FontWeight.w500, - color: const Color(0xFF374151), - ), - textAlign: TextAlign.center, - maxLines: 1, - overflow: TextOverflow.ellipsis, + // Title with playing indicator + Row( + mainAxisAlignment: MainAxisAlignment.center, + mainAxisSize: MainAxisSize.min, + children: [ + if (isCurrent) + Padding( + padding: const EdgeInsets.only(right: 3), + child: Icon( + isPlaying ? Icons.volume_up_rounded : Icons.volume_off_rounded, + size: 12, + color: const Color(0xFFECCFA8), + ), + ), + Flexible( + child: Text( + track.title, + style: GoogleFonts.dmSans( + fontSize: 12, + fontWeight: isCurrent ? FontWeight.w600 : FontWeight.w500, + color: isCurrent + ? const Color(0xFFB8860B) + : const Color(0xFF374151), + ), + textAlign: TextAlign.center, + maxLines: 1, + overflow: TextOverflow.ellipsis, + ), + ), + ], ), ], ), @@ -2127,3 +2238,39 @@ class _PlaylistModalContent extends StatelessWidget { } } +/// Mini sound wave painter for playlist playing indicator +class _MiniWavePainter extends CustomPainter { + final double progress; + + _MiniWavePainter({required this.progress}); + + @override + void paint(Canvas canvas, Size size) { + final paint = Paint() + ..color = const Color(0xFFECCFA8) + ..strokeWidth = 2.5 + ..strokeCap = StrokeCap.round; + + const barCount = 4; + final barWidth = size.width / (barCount * 2 - 1); + final centerY = size.height / 2; + + for (int i = 0; i < barCount; i++) { + // Each bar has a different phase offset for wave effect + final phase = (progress + i * 0.25) % 1.0; + final height = size.height * (0.3 + 0.7 * (0.5 + 0.5 * sin(phase * 3.14159 * 2))); + final x = i * barWidth * 2 + barWidth / 2; + + canvas.drawLine( + Offset(x, centerY - height / 2), + Offset(x, centerY + height / 2), + paint, + ); + } + } + + @override + bool shouldRepaint(covariant _MiniWavePainter oldDelegate) => + oldDelegate.progress != progress; +} + diff --git a/airhub_app/lib/pages/story_detail_page.dart b/airhub_app/lib/pages/story_detail_page.dart index dfe34d4..fa96b33 100644 --- a/airhub_app/lib/pages/story_detail_page.dart +++ b/airhub_app/lib/pages/story_detail_page.dart @@ -1,9 +1,12 @@ +import 'dart:async'; import 'dart:ui' as ui; import 'package:flutter/material.dart'; -import 'package:flutter_svg/flutter_svg.dart'; +import 'package:just_audio/just_audio.dart'; import '../theme/design_tokens.dart'; import '../widgets/gradient_button.dart'; +import '../widgets/pill_progress_button.dart'; +import '../services/tts_service.dart'; import 'story_loading_page.dart'; enum StoryMode { generated, read } @@ -30,6 +33,14 @@ class _StoryDetailPageState extends State bool _hasGeneratedVideo = false; bool _isLoadingVideo = false; + // TTS — uses global TTSService singleton + final TTSService _ttsService = TTSService.instance; + final AudioPlayer _audioPlayer = AudioPlayer(); + StreamSubscription? _positionSub; + StreamSubscription? _playerStateSub; + Duration _audioDuration = Duration.zero; + Duration _audioPosition = Duration.zero; + // Genie Suck Animation bool _isSaving = false; AnimationController? _genieController; @@ -41,9 +52,9 @@ class _StoryDetailPageState extends State 'content': """ 在遥远的银河系边缘,有一个被星云包裹的神秘茶馆。今天,这里迎来了两位特殊的客人:刚执行完火星探测任务的宇航员波波,和正在追捕暗影怪兽的忍者小次郎。 -“这儿的重力好像有点不对劲?”波波飘在半空中,试图抓住飞来飞去的茶杯。小次郎则冷静地倒挂在天花板上,手里紧握着一枚手里剑——其实那是用来切月饼的。 +"这儿的重力好像有点不对劲?"波波飘在半空中,试图抓住飞来飞去的茶杯。小次郎则冷静地倒挂在天花板上,手里紧握着一枚手里剑——其实那是用来切月饼的。 -突然,桌上的魔法茶壶“噗”地一声喷出了七彩烟雾,一只会说话的卡皮巴拉钻了出来:“别打架,别打架,喝了这杯‘银河气泡茶’,我们都是好朋友!” +突然,桌上的魔法茶壶"噗"地一声喷出了七彩烟雾,一只会说话的卡皮巴拉钻了出来:"别打架,别打架,喝了这杯'银河气泡茶',我们都是好朋友!" 于是,宇宙中最奇怪的组合诞生了。他们决定,下一站,去黑洞边缘钓星星。 """, @@ -54,7 +65,6 @@ class _StoryDetailPageState extends State Map _initStory() { final source = widget.story ?? _defaultStory; final result = Map.from(source); - // 兜底:如果没有 content 就用默认故事内容 result['content'] ??= _defaultStory['content']; result['title'] ??= _defaultStory['title']; return result; @@ -64,18 +74,171 @@ class _StoryDetailPageState extends State void initState() { super.initState(); _currentStory = _initStory(); + + // Subscribe to TTSService changes + _ttsService.addListener(_onTTSChanged); + + // Listen to audio player state + _playerStateSub = _audioPlayer.playerStateStream.listen((state) { + if (!mounted) return; + if (state.processingState == ProcessingState.completed) { + setState(() { + _isPlaying = false; + _audioPosition = Duration.zero; + }); + } + }); + + // Listen to playback position for ring progress + _positionSub = _audioPlayer.positionStream.listen((pos) { + if (!mounted) return; + setState(() => _audioPosition = pos); + }); + + // Listen to duration changes + _audioPlayer.durationStream.listen((dur) { + if (!mounted || dur == null) return; + setState(() => _audioDuration = dur); + }); + + // Check if audio already exists (via TTSService) + final title = _currentStory['title'] as String? ?? ''; + _ttsService.checkExistingAudio(title); + } + + void _onTTSChanged() { + if (!mounted) return; + + // Auto-play when generation completes + if (_ttsService.justCompleted && + _ttsService.hasAudioFor(_currentStory['title'] ?? '')) { + // Delay slightly to let the completion flash play + Future.delayed(const Duration(milliseconds: 1500), () { + if (mounted) { + _ttsService.clearJustCompleted(); + final route = ModalRoute.of(context); + if (route != null && route.isCurrent) { + _playAudio(); + } + } + }); + } + + setState(() {}); } @override void dispose() { + _ttsService.removeListener(_onTTSChanged); + _positionSub?.cancel(); + _playerStateSub?.cancel(); + _audioPlayer.dispose(); _genieController?.dispose(); super.dispose(); } - /// Trigger Genie Suck animation matching HTML: - /// CSS: animation: genieSuck 0.8s cubic-bezier(0.6, -0.28, 0.735, 0.045) forwards - /// Phase 1 (0→15%): card scales up to 1.05 (tension) - /// Phase 2 (15%→100%): card shrinks to 0.05, moves toward bottom, blurs & fades + // ── TTS button logic ── + + bool _audioLoaded = false; // Track if audio URL is loaded in player + String? _loadedUrl; // Which URL is currently loaded + + TTSButtonState get _ttsState { + final title = _currentStory['title'] as String? ?? ''; + + if (_ttsService.error != null && + !_ttsService.isGenerating && + _ttsService.audioUrl == null) { + return TTSButtonState.error; + } + if (_ttsService.isGeneratingFor(title)) { + return TTSButtonState.generating; + } + if (_ttsService.justCompleted && _ttsService.hasAudioFor(title)) { + return TTSButtonState.completed; + } + if (_isPlaying) { + return TTSButtonState.playing; + } + if (_ttsService.hasAudioFor(title) && !_audioLoaded) { + return TTSButtonState.ready; // audio ready, not yet played -> show "鎾斁" + } + if (_audioLoaded) { + return TTSButtonState.paused; // was playing, now paused -> show "缁х画" + } + return TTSButtonState.idle; + } + + double get _ttsProgress { + final state = _ttsState; + switch (state) { + case TTSButtonState.generating: + return _ttsService.progress; + case TTSButtonState.ready: + return 0.0; + case TTSButtonState.completed: + return 1.0; + case TTSButtonState.playing: + case TTSButtonState.paused: + if (_audioDuration.inMilliseconds > 0) { + return (_audioPosition.inMilliseconds / _audioDuration.inMilliseconds) + .clamp(0.0, 1.0); + } + return 0.0; + default: + return 0.0; + } + } + + void _handleTTSTap() { + final state = _ttsState; + switch (state) { + case TTSButtonState.idle: + case TTSButtonState.error: + final title = _currentStory['title'] as String? ?? ''; + final content = _currentStory['content'] as String? ?? ''; + _ttsService.generate(title: title, content: content); + break; + case TTSButtonState.generating: + break; + case TTSButtonState.ready: + case TTSButtonState.completed: + case TTSButtonState.paused: + _playAudio(); + break; + case TTSButtonState.playing: + _audioPlayer.pause(); + setState(() => _isPlaying = false); + break; + } + } + + Future _playAudio() async { + final title = _currentStory['title'] as String? ?? ''; + final url = _ttsService.hasAudioFor(title) ? _ttsService.audioUrl : null; + if (url == null) return; + + try { + // If already loaded the same URL, seek to saved position and resume + if (_audioLoaded && _loadedUrl == url) { + await _audioPlayer.seek(_audioPosition); + _audioPlayer.play(); + } else { + // Load new URL and play from start + await _audioPlayer.setUrl(url); + _audioLoaded = true; + _loadedUrl = url; + _audioPlayer.play(); + } + if (mounted) { + setState(() => _isPlaying = true); + } + } catch (e) { + debugPrint('Audio play error: $e'); + } + } + + // ── Genie Suck Animation ── + void _triggerGenieSuck() { if (_isSaving) return; @@ -84,7 +247,6 @@ class _StoryDetailPageState extends State duration: const Duration(milliseconds: 800), ); - // Calculate how far the card should travel downward (toward the save button) final screenHeight = MediaQuery.of(context).size.height; _targetDY = screenHeight * 0.35; @@ -94,23 +256,20 @@ class _StoryDetailPageState extends State } }); - setState(() { - _isSaving = true; - }); + setState(() => _isSaving = true); _genieController!.forward(); } + // ── Build ── + @override Widget build(BuildContext context) { return Scaffold( - backgroundColor: AppColors.storyBackground, // #FDF9F3 + backgroundColor: AppColors.storyBackground, body: SafeArea( child: Column( children: [ - // Header + Content Card — animated together during genie suck Expanded(child: _buildAnimatedBody()), - - // Footer _buildFooter(), ], ), @@ -118,7 +277,6 @@ class _StoryDetailPageState extends State ); } - /// Wraps header + content card in genie suck animation Widget _buildAnimatedBody() { Widget body = Column( children: [ @@ -132,7 +290,7 @@ class _StoryDetailPageState extends State return AnimatedBuilder( animation: _genieController!, builder: (context, child) { - final t = _genieController!.value; // linear 0→1 + final t = _genieController!.value; double scale; double translateY; @@ -140,14 +298,12 @@ class _StoryDetailPageState extends State double blur; if (t <= 0.15) { - // Phase 1: tension — whole area scales up slightly final p = t / 0.15; scale = 1.0 + 0.05 * Curves.easeOut.transform(p); translateY = 0; opacity = 1.0; blur = 0; } else { - // Phase 2: suck — shrinks, moves down, fades and blurs final p = ((t - 0.15) / 0.85).clamp(0.0, 1.0); final curved = const Cubic(0.6, -0.28, 0.735, 0.045).transform(p); @@ -209,7 +365,7 @@ class _StoryDetailPageState extends State ), ), Text( - _currentStory['title'], + _currentStory['title'] ?? '', style: const TextStyle( fontSize: 17, fontWeight: FontWeight.w600, @@ -227,9 +383,9 @@ class _StoryDetailPageState extends State child: Row( mainAxisAlignment: MainAxisAlignment.center, children: [ - _buildTabBtn('📄 故事', 'text'), + _buildTabBtn('故事', 'text'), const SizedBox(width: 8), - _buildTabBtn('🎬 绘本', 'video'), + _buildTabBtn('绘本', 'video'), ], ), ); @@ -238,11 +394,7 @@ class _StoryDetailPageState extends State Widget _buildTabBtn(String label, String key) { bool isActive = _activeTab == key; return GestureDetector( - onTap: () { - setState(() { - _activeTab = key; - }); - }, + onTap: () => setState(() => _activeTab = key), child: Container( padding: const EdgeInsets.symmetric(horizontal: 16, vertical: 8), decoration: BoxDecoration( @@ -271,7 +423,6 @@ class _StoryDetailPageState extends State } Widget _buildContentCard() { - // HTML: .story-paper bool isVideoMode = _activeTab == 'video'; return Container( @@ -292,11 +443,11 @@ class _StoryDetailPageState extends State _currentStory['content'] .toString() .replaceAll(RegExp(r'\n+'), '\n\n') - .trim(), // Simple paragraph spacing + .trim(), style: const TextStyle( - fontSize: 16, // HTML: 16px - height: 2.0, // HTML: line-height 2.0 - color: AppColors.storyText, // #374151 + fontSize: 16, + height: 2.0, + color: AppColors.storyText, ), textAlign: TextAlign.justify, ), @@ -313,7 +464,7 @@ class _StoryDetailPageState extends State width: 40, height: 40, child: CircularProgressIndicator( - color: Color(0xFFF43F5E), // HTML: #F43F5E + color: Color(0xFFF43F5E), strokeWidth: 3, ), ), @@ -339,15 +490,14 @@ class _StoryDetailPageState extends State alignment: Alignment.center, children: [ AspectRatio( - aspectRatio: 16 / 9, // Assume landscape video + aspectRatio: 16 / 9, child: Container( color: Colors.black, child: const Center( child: Icon(Icons.videocam, color: Colors.white54, size: 48), - ), // Placeholder for Video Player + ), ), ), - // Play Button Overlay Container( width: 48, height: 48, @@ -372,7 +522,6 @@ class _StoryDetailPageState extends State child: _activeTab == 'text' ? _buildTextFooter() : _buildVideoFooter(), ); - // Fade out footer during genie suck animation if (_isSaving) { return IgnorePointer( child: AnimatedOpacity( @@ -387,12 +536,9 @@ class _StoryDetailPageState extends State } void _handleRewrite() async { - // 跳到 loading 页重新生成 final result = await Navigator.of(context).push( MaterialPageRoute(builder: (context) => const StoryLoadingPage()), ); - - // loading 完成后返回结果 if (mounted && result == 'saved') { Navigator.of(context).pop('saved'); } @@ -403,7 +549,6 @@ class _StoryDetailPageState extends State // Generator Mode: Rewrite + Save return Row( children: [ - // Rewrite (Secondary) Expanded( child: GestureDetector( onTap: _handleRewrite, @@ -415,19 +560,25 @@ class _StoryDetailPageState extends State color: Colors.white.withOpacity(0.8), ), alignment: Alignment.center, - child: const Text( - '↻ 重写', - style: TextStyle( - fontSize: 16, - fontWeight: FontWeight.w600, - color: Color(0xFF4B5563), - ), + child: const Row( + mainAxisAlignment: MainAxisAlignment.center, + children: [ + Icon(Icons.refresh_rounded, size: 18, color: Color(0xFF4B5563)), + SizedBox(width: 4), + Text( + '重写', + style: TextStyle( + fontSize: 16, + fontWeight: FontWeight.w600, + color: Color(0xFF4B5563), + ), + ), + ], ), ), ), ), const SizedBox(width: 16), - // Save (Primary) - Returns 'saved' to trigger add book animation Expanded( child: GradientButton( text: '保存故事', @@ -441,41 +592,14 @@ class _StoryDetailPageState extends State ], ); } else { - // Read Mode: TTS + Make Picture Book + // Read Mode: TTS pill button + Make Picture Book return Row( children: [ - // TTS Expanded( - child: GestureDetector( - onTap: () => setState(() => _isPlaying = !_isPlaying), - child: Container( - height: 48, - decoration: BoxDecoration( - border: Border.all(color: const Color(0xFFE5E7EB)), - borderRadius: BorderRadius.circular(24), - color: Colors.white.withOpacity(0.8), - ), - alignment: Alignment.center, - child: Row( - mainAxisAlignment: MainAxisAlignment.center, - children: [ - Icon( - _isPlaying ? Icons.pause : Icons.headphones, - size: 20, - color: const Color(0xFF4B5563), - ), - const SizedBox(width: 6), - Text( - _isPlaying ? '暂停' : '朗读', - style: const TextStyle( - fontSize: 16, - fontWeight: FontWeight.w600, - color: Color(0xFF4B5563), - ), - ), - ], - ), - ), + child: PillProgressButton( + state: _ttsState, + progress: _ttsProgress, + onTap: _handleTTSTap, ), ), const SizedBox(width: 16), @@ -500,7 +624,7 @@ class _StoryDetailPageState extends State children: [ Expanded( child: GradientButton( - text: '↻ 重新生成', + text: '重新生成', onPressed: _startVideoGeneration, gradient: const LinearGradient( colors: AppColors.btnCapybaraGradient, @@ -517,7 +641,6 @@ class _StoryDetailPageState extends State _isLoadingVideo = true; _activeTab = 'video'; }); - // Mock delay Future.delayed(const Duration(seconds: 2), () { if (mounted) { setState(() { diff --git a/airhub_app/lib/services/tts_service.dart b/airhub_app/lib/services/tts_service.dart new file mode 100644 index 0000000..5c6458a --- /dev/null +++ b/airhub_app/lib/services/tts_service.dart @@ -0,0 +1,190 @@ +import 'dart:convert'; +import 'package:flutter/foundation.dart'; +import 'package:http/http.dart' as http; + +/// Singleton service that manages TTS generation in the background. +/// Survives page navigation — when user leaves and comes back, +/// generation continues and result is available. +class TTSService extends ChangeNotifier { + TTSService._(); + static final TTSService instance = TTSService._(); + + static const String _kServerBase = 'http://localhost:3000'; + + // ── Current task state ── + bool _isGenerating = false; + double _progress = 0.0; // 0.0 ~ 1.0 + String _statusMessage = ''; + String? _currentStoryTitle; // Which story is being generated + + // ── Result ── + String? _audioUrl; + String? _completedStoryTitle; // Which story the audio belongs to + bool _justCompleted = false; // Flash animation trigger + + // ── Error ── + String? _error; + + // ── Getters ── + bool get isGenerating => _isGenerating; + double get progress => _progress; + String get statusMessage => _statusMessage; + String? get currentStoryTitle => _currentStoryTitle; + String? get audioUrl => _audioUrl; + String? get completedStoryTitle => _completedStoryTitle; + bool get justCompleted => _justCompleted; + String? get error => _error; + + /// Check if audio is ready for a specific story. + bool hasAudioFor(String title) { + return _completedStoryTitle == title && _audioUrl != null; + } + + /// Check if currently generating for a specific story. + bool isGeneratingFor(String title) { + return _isGenerating && _currentStoryTitle == title; + } + + /// Clear the "just completed" flag (after flash animation plays). + void clearJustCompleted() { + _justCompleted = false; + notifyListeners(); + } + + /// Set audio URL directly (e.g. from pre-check). + void setExistingAudio(String title, String url) { + _completedStoryTitle = title; + _audioUrl = url; + _justCompleted = false; + notifyListeners(); + } + + /// Check server for existing audio file. + Future checkExistingAudio(String title) async { + if (title.isEmpty) return; + try { + final resp = await http.get( + Uri.parse( + '$_kServerBase/api/tts_check?title=${Uri.encodeComponent(title)}', + ), + ); + if (resp.statusCode == 200) { + final data = jsonDecode(resp.body); + if (data['exists'] == true && data['audio_url'] != null) { + _completedStoryTitle = title; + _audioUrl = '$_kServerBase/${data['audio_url']}'; + notifyListeners(); + } + } + } catch (_) {} + } + + /// Start TTS generation. Safe to call even if page navigates away. + Future generate({ + required String title, + required String content, + }) async { + if (_isGenerating) return; + + _isGenerating = true; + _progress = 0.0; + _statusMessage = '正在连接...'; + _currentStoryTitle = title; + _audioUrl = null; + _completedStoryTitle = null; + _justCompleted = false; + _error = null; + notifyListeners(); + + try { + final client = http.Client(); + final request = http.Request( + 'POST', + Uri.parse('$_kServerBase/api/create_tts'), + ); + request.headers['Content-Type'] = 'application/json'; + request.body = jsonEncode({'title': title, 'content': content}); + + final streamed = await client.send(request); + + await for (final chunk in streamed.stream.transform(utf8.decoder)) { + for (final line in chunk.split('\n')) { + if (!line.startsWith('data: ')) continue; + try { + final data = jsonDecode(line.substring(6)); + final stage = data['stage'] as String? ?? ''; + final message = data['message'] as String? ?? ''; + + switch (stage) { + case 'connecting': + _updateProgress(0.10, '正在连接...'); + break; + case 'generating': + _updateProgress(0.30, '语音生成中...'); + break; + case 'saving': + _updateProgress(0.88, '正在保存...'); + break; + case 'done': + if (data['audio_url'] != null) { + _audioUrl = '$_kServerBase/${data['audio_url']}'; + _completedStoryTitle = title; + _justCompleted = true; + _updateProgress(1.0, '生成完成'); + } + break; + case 'error': + throw Exception(message); + default: + // Progress slowly increases during generation + if (_progress < 0.85) { + _updateProgress(_progress + 0.02, message); + } + } + } catch (e) { + if (e is Exception && + e.toString().contains('语音合成失败')) { + rethrow; + } + } + } + } + + client.close(); + + _isGenerating = false; + if (_audioUrl == null) { + _error = '未获取到音频'; + _statusMessage = '生成失败'; + } + notifyListeners(); + } catch (e) { + debugPrint('TTS generation error: $e'); + _isGenerating = false; + _progress = 0.0; + _error = e.toString(); + _statusMessage = '生成失败'; + _justCompleted = false; + notifyListeners(); + } + } + + void _updateProgress(double progress, String message) { + _progress = progress.clamp(0.0, 1.0); + _statusMessage = message; + notifyListeners(); + } + + /// Reset all state (e.g. when switching stories). + void reset() { + if (_isGenerating) return; // Don't reset during generation + _progress = 0.0; + _statusMessage = ''; + _currentStoryTitle = null; + _audioUrl = null; + _completedStoryTitle = null; + _justCompleted = false; + _error = null; + notifyListeners(); + } +} diff --git a/airhub_app/lib/widgets/pill_progress_button.dart b/airhub_app/lib/widgets/pill_progress_button.dart new file mode 100644 index 0000000..f76b51c --- /dev/null +++ b/airhub_app/lib/widgets/pill_progress_button.dart @@ -0,0 +1,335 @@ +import 'dart:math' as math; +import 'package:flutter/material.dart'; + +enum TTSButtonState { + idle, + ready, + generating, + completed, + playing, + paused, + error, +} + +class PillProgressButton extends StatefulWidget { + final TTSButtonState state; + final double progress; + final VoidCallback? onTap; + final double height; + + const PillProgressButton({ + super.key, + required this.state, + this.progress = 0.0, + this.onTap, + this.height = 48, + }); + + @override + State createState() => _PillProgressButtonState(); +} + +class _PillProgressButtonState extends State + with TickerProviderStateMixin { + late AnimationController _progressCtrl; + double _displayProgress = 0.0; + + late AnimationController _glowCtrl; + late Animation _glowAnim; + + late AnimationController _waveCtrl; + + bool _wasCompleted = false; + + @override + void initState() { + super.initState(); + + _progressCtrl = AnimationController( + vsync: this, + duration: const Duration(milliseconds: 500), + ); + _progressCtrl.addListener(() => setState(() {})); + + _glowCtrl = AnimationController( + vsync: this, + duration: const Duration(milliseconds: 1000), + ); + _glowAnim = TweenSequence([ + TweenSequenceItem(tween: Tween(begin: 0.0, end: 1.0), weight: 35), + TweenSequenceItem(tween: Tween(begin: 1.0, end: 0.0), weight: 65), + ]).animate(CurvedAnimation(parent: _glowCtrl, curve: Curves.easeOut)); + _glowCtrl.addListener(() => setState(() {})); + + _waveCtrl = AnimationController( + vsync: this, + duration: const Duration(milliseconds: 800), + ); + + _syncAnimations(); + } + + @override + void didUpdateWidget(PillProgressButton oldWidget) { + super.didUpdateWidget(oldWidget); + + if (widget.progress != oldWidget.progress) { + if (oldWidget.state == TTSButtonState.completed && + (widget.state == TTSButtonState.playing || widget.state == TTSButtonState.ready)) { + _displayProgress = 0.0; + } else { + _animateProgressTo(widget.progress); + } + } + + if (widget.state == TTSButtonState.completed && !_wasCompleted) { + _wasCompleted = true; + _glowCtrl.forward(from: 0); + } else if (widget.state != TTSButtonState.completed) { + _wasCompleted = false; + } + + _syncAnimations(); + } + + void _animateProgressTo(double target) { + final from = _displayProgress; + _progressCtrl.reset(); + _progressCtrl.addListener(() { + final t = Curves.easeInOut.transform(_progressCtrl.value); + _displayProgress = from + (target - from) * t; + }); + _progressCtrl.forward(); + } + + void _syncAnimations() { + if (widget.state == TTSButtonState.generating) { + if (!_waveCtrl.isAnimating) _waveCtrl.repeat(); + } else { + if (_waveCtrl.isAnimating) { + _waveCtrl.stop(); + _waveCtrl.value = 0; + } + } + } + + @override + void dispose() { + _progressCtrl.dispose(); + _glowCtrl.dispose(); + _waveCtrl.dispose(); + super.dispose(); + } + + bool get _showBorder => + widget.state == TTSButtonState.generating || + widget.state == TTSButtonState.completed || + widget.state == TTSButtonState.playing || + widget.state == TTSButtonState.paused; + + @override + Widget build(BuildContext context) { + const borderColor = Color(0xFFE5E7EB); + const progressColor = Color(0xFFECCFA8); + const bgColor = Color(0xCCFFFFFF); + + return GestureDetector( + onTap: widget.state == TTSButtonState.generating ? null : widget.onTap, + child: Container( + height: widget.height, + decoration: BoxDecoration( + borderRadius: BorderRadius.circular(widget.height / 2), + boxShadow: _glowAnim.value > 0 + ? [ + BoxShadow( + color: progressColor.withOpacity(0.5 * _glowAnim.value), + blurRadius: 16 * _glowAnim.value, + spreadRadius: 2 * _glowAnim.value, + ), + ] + : null, + ), + child: CustomPaint( + painter: PillBorderPainter( + progress: _showBorder ? _displayProgress.clamp(0.0, 1.0) : 0.0, + borderColor: borderColor, + progressColor: progressColor, + radius: widget.height / 2, + stroke: _showBorder ? 2.5 : 1.0, + bg: bgColor, + ), + child: Center(child: _buildContent()), + ), + ), + ); + } + + Widget _buildContent() { + switch (widget.state) { + case TTSButtonState.idle: + return _label(Icons.headphones_rounded, '\u6717\u8bfb'); + case TTSButtonState.generating: + return Row( + mainAxisAlignment: MainAxisAlignment.center, + children: [ + AnimatedBuilder( + animation: _waveCtrl, + builder: (context, _) => CustomPaint( + size: const Size(20, 18), + painter: WavePainter(t: _waveCtrl.value, color: const Color(0xFFC99672)), + ), + ), + const SizedBox(width: 6), + const Text('\u751f\u6210\u4e2d', + style: TextStyle(fontSize: 15, fontWeight: FontWeight.w600, color: Color(0xFF4B5563))), + ], + ); + case TTSButtonState.ready: + return _label(Icons.play_arrow_rounded, '\u64ad\u653e'); + case TTSButtonState.completed: + return _label(Icons.play_arrow_rounded, '\u64ad\u653e'); + case TTSButtonState.playing: + return _label(Icons.pause_rounded, '\u6682\u505c'); + case TTSButtonState.paused: + return _label(Icons.play_arrow_rounded, '\u7ee7\u7eed'); + case TTSButtonState.error: + return _label(Icons.refresh_rounded, '\u91cd\u8bd5', isError: true); + } + } + + Widget _label(IconData icon, String text, {bool isError = false}) { + final c = isError ? const Color(0xFFEF4444) : const Color(0xFF4B5563); + return Row( + mainAxisAlignment: MainAxisAlignment.center, + mainAxisSize: MainAxisSize.min, + children: [ + Icon(icon, size: 20, color: c), + const SizedBox(width: 4), + Text(text, style: TextStyle(fontSize: 16, fontWeight: FontWeight.w600, color: c)), + ], + ); + } +} + +class PillBorderPainter extends CustomPainter { + final double progress; + final Color borderColor; + final Color progressColor; + final double radius; + final double stroke; + final Color bg; + + PillBorderPainter({ + required this.progress, + required this.borderColor, + required this.progressColor, + required this.radius, + required this.stroke, + required this.bg, + }); + + @override + void paint(Canvas canvas, Size size) { + final r = radius.clamp(0.0, size.height / 2); + final rrect = RRect.fromRectAndRadius( + Rect.fromLTWH(0, 0, size.width, size.height), + Radius.circular(r), + ); + + canvas.drawRRect(rrect, Paint() + ..color = bg + ..style = PaintingStyle.fill); + canvas.drawRRect(rrect, Paint() + ..color = borderColor + ..style = PaintingStyle.stroke + ..strokeWidth = stroke); + + if (progress <= 0.001) return; + + final straightH = size.width - 2 * r; + final halfTop = straightH / 2; + final arcLen = math.pi * r; + final totalLen = halfTop + arcLen + straightH + arcLen + halfTop; + final target = totalLen * progress; + + final path = Path(); + double done = 0; + final cx = size.width / 2; + + path.moveTo(cx, 0); + var seg = math.min(halfTop, target - done); + path.lineTo(cx + seg, 0); + done += seg; + if (done >= target) { _drawPath(canvas, path); return; } + + seg = math.min(arcLen, target - done); + _traceArc(path, size.width - r, r, r, -math.pi / 2, seg / r); + done += seg; + if (done >= target) { _drawPath(canvas, path); return; } + + seg = math.min(straightH, target - done); + path.lineTo(size.width - r - seg, size.height); + done += seg; + if (done >= target) { _drawPath(canvas, path); return; } + + seg = math.min(arcLen, target - done); + _traceArc(path, r, r, r, math.pi / 2, seg / r); + done += seg; + if (done >= target) { _drawPath(canvas, path); return; } + + seg = math.min(halfTop, target - done); + path.lineTo(r + seg, 0); + _drawPath(canvas, path); + } + + void _drawPath(Canvas canvas, Path path) { + canvas.drawPath(path, Paint() + ..color = progressColor + ..style = PaintingStyle.stroke + ..strokeWidth = stroke + ..strokeCap = StrokeCap.round); + } + + void _traceArc(Path p, double cx, double cy, double r, double start, double sweep) { + const n = 24; + final step = sweep / n; + for (int i = 0; i <= n; i++) { + final a = start + step * i; + p.lineTo(cx + r * math.cos(a), cy + r * math.sin(a)); + } + } + + @override + bool shouldRepaint(PillBorderPainter old) => old.progress != progress || old.stroke != stroke; +} + +class WavePainter extends CustomPainter { + final double t; + final Color color; + WavePainter({required this.t, required this.color}); + + @override + void paint(Canvas canvas, Size size) { + final paint = Paint() + ..color = color + ..style = PaintingStyle.fill; + final bw = size.width * 0.2; + final gap = size.width * 0.1; + final tw = 3 * bw + 2 * gap; + final sx = (size.width - tw) / 2; + for (int i = 0; i < 3; i++) { + final phase = t * 2 * math.pi + i * math.pi * 0.7; + final hr = 0.3 + 0.7 * ((math.sin(phase) + 1) / 2); + final bh = size.height * hr; + final x = sx + i * (bw + gap); + final y = (size.height - bh) / 2; + canvas.drawRRect( + RRect.fromRectAndRadius(Rect.fromLTWH(x, y, bw, bh), Radius.circular(bw / 2)), + paint, + ); + } + } + + @override + bool shouldRepaint(WavePainter old) => old.t != t; +} \ No newline at end of file diff --git a/prompts/music_director.md b/prompts/music_director.md index d06828f..6c63a22 100644 --- a/prompts/music_director.md +++ b/prompts/music_director.md @@ -24,8 +24,7 @@ 1. **song_title** (歌曲名称) - 使用**中文**,简短有趣,3-8个字。 - - 体现咔咔的可爱风格。 - - 示例:"温泉咔咔乐"、"草地蹦蹦跳"、"雨夜安眠曲" + - 根据用户描述的场景自由发挥,不要套用固定模板。 2. **style** (风格描述) - 使用**英文**描述音乐风格、乐器、节奏、情绪。 diff --git a/server.py b/server.py index 0280680..45b7489 100644 --- a/server.py +++ b/server.py @@ -2,10 +2,14 @@ import os import re import sys import time +import uuid +import struct +import asyncio import uvicorn import requests import json -from fastapi import FastAPI, HTTPException +import websockets +from fastapi import FastAPI, HTTPException, Query from fastapi.responses import StreamingResponse from fastapi.middleware.cors import CORSMiddleware from pydantic import BaseModel @@ -20,11 +24,15 @@ if sys.platform == "win32": load_dotenv() MINIMAX_API_KEY = os.getenv("MINIMAX_API_KEY") VOLCENGINE_API_KEY = os.getenv("VOLCENGINE_API_KEY") +TTS_APP_ID = os.getenv("TTS_APP_ID") +TTS_ACCESS_TOKEN = os.getenv("TTS_ACCESS_TOKEN") if not MINIMAX_API_KEY: print("Warning: MINIMAX_API_KEY not found in .env") if not VOLCENGINE_API_KEY: print("Warning: VOLCENGINE_API_KEY not found in .env") +if not TTS_APP_ID or not TTS_ACCESS_TOKEN: + print("Warning: TTS_APP_ID or TTS_ACCESS_TOKEN not found in .env") # Initialize FastAPI app = FastAPI() @@ -606,14 +614,244 @@ def get_playlist(): return {"playlist": playlist} -# ── Static file serving for generated music ── +# ═══════════════════════════════════════════════════════════════════ +# ── TTS: 豆包语音合成 2.0 WebSocket V3 二进制协议 ── +# ═══════════════════════════════════════════════════════════════════ + +TTS_WS_URL = "wss://openspeech.bytedance.com/api/v1/tts/ws_binary" +TTS_CLUSTER = "volcano_tts" +TTS_SPEAKER = "ICL_zh_female_keainvsheng_tob" + +_audio_dir = os.path.join(os.path.dirname(__file__) or ".", "Capybara audio") +os.makedirs(_audio_dir, exist_ok=True) + + +def _build_tts_v1_request(payload_json: dict) -> bytes: + """Build a V1 full-client-request binary frame. + Header: 0x11 0x10 0x10 0x00 (v1, 4-byte header, full-client-request, JSON, no compression) + Then 4-byte big-endian payload length, then JSON payload bytes. + """ + payload_bytes = json.dumps(payload_json, ensure_ascii=False).encode("utf-8") + header = bytes([0x11, 0x10, 0x10, 0x00]) + length = struct.pack(">I", len(payload_bytes)) + return header + length + payload_bytes + + +def _parse_tts_v1_response(data: bytes): + """Parse a V1 TTS response binary frame. + Returns (audio_bytes_or_none, is_last, is_error, error_msg). + """ + if len(data) < 4: + return None, False, True, "Frame too short" + + byte1 = data[1] + msg_type = (byte1 >> 4) & 0x0F + msg_flags = byte1 & 0x0F + + # Error frame: msg_type = 0xF + if msg_type == 0x0F: + offset = 4 + error_code = 0 + if len(data) >= offset + 4: + error_code = struct.unpack(">I", data[offset:offset + 4])[0] + offset += 4 + if len(data) >= offset + 4: + msg_len = struct.unpack(">I", data[offset:offset + 4])[0] + offset += 4 + error_msg = data[offset:offset + msg_len].decode("utf-8", errors="replace") + else: + error_msg = f"error code {error_code}" + print(f"[TTS Error] code={error_code}, msg={error_msg}", flush=True) + return None, False, True, error_msg + + # Audio-only response: msg_type = 0xB + if msg_type == 0x0B: + # flags: 0b0000=no seq, 0b0001=seq>0, 0b0010/0b0011=last (seq<0) + is_last = (msg_flags & 0x02) != 0 # bit 1 set = last message + offset = 4 + + # If flags != 0, there's a 4-byte sequence number + if msg_flags != 0: + offset += 4 # skip sequence number + + if len(data) < offset + 4: + return None, is_last, False, "" + + payload_size = struct.unpack(">I", data[offset:offset + 4])[0] + offset += 4 + audio_data = data[offset:offset + payload_size] + return audio_data, is_last, False, "" + + # Server response with JSON (msg_type = 0x9): usually contains metadata + if msg_type == 0x09: + offset = 4 + if len(data) >= offset + 4: + payload_size = struct.unpack(">I", data[offset:offset + 4])[0] + offset += 4 + json_str = data[offset:offset + payload_size].decode("utf-8", errors="replace") + print(f"[TTS] Server JSON: {json_str[:200]}", flush=True) + return None, False, False, "" + + return None, False, False, "" + + +async def tts_synthesize(text: str) -> bytes: + """Connect to Doubao TTS V1 WebSocket and synthesize text to MP3 bytes.""" + headers = { + "Authorization": f"Bearer;{TTS_ACCESS_TOKEN}", + } + + payload = { + "app": { + "appid": TTS_APP_ID, + "token": "placeholder", + "cluster": TTS_CLUSTER, + }, + "user": { + "uid": "airhub_user", + }, + "audio": { + "voice_type": TTS_SPEAKER, + "encoding": "mp3", + "speed_ratio": 1.0, + "rate": 24000, + }, + "request": { + "reqid": str(uuid.uuid4()), + "text": text, + "operation": "submit", # streaming mode + }, + } + + audio_buffer = bytearray() + request_frame = _build_tts_v1_request(payload) + + print(f"[TTS] Connecting to V1 WebSocket... text length={len(text)}", flush=True) + + async with websockets.connect( + TTS_WS_URL, + extra_headers=headers, + max_size=10 * 1024 * 1024, # 10MB max frame + ping_interval=None, + ) as ws: + # Send request + await ws.send(request_frame) + print("[TTS] Request sent, waiting for audio...", flush=True) + + # Receive audio chunks + chunk_count = 0 + async for message in ws: + if isinstance(message, bytes): + audio_data, is_last, is_error, error_msg = _parse_tts_v1_response(message) + + if is_error: + raise RuntimeError(f"TTS error: {error_msg}") + + if audio_data and len(audio_data) > 0: + audio_buffer.extend(audio_data) + chunk_count += 1 + + if is_last: + print(f"[TTS] Last frame received. chunks={chunk_count}, " + f"audio size={len(audio_buffer)} bytes", flush=True) + break + + return bytes(audio_buffer) + + +class TTSRequest(BaseModel): + title: str + content: str + + +@app.get("/api/tts_check") +def tts_check(title: str = Query(...)): + """Check if audio already exists for a story title.""" + for f in os.listdir(_audio_dir): + if f.lower().endswith(".mp3"): + # Match by title prefix (before timestamp) + name = f[:-4] # strip .mp3 + name_without_ts = re.sub(r'_\d{10,}$', '', name) + if name_without_ts == title or name == title: + return { + "exists": True, + "audio_url": f"Capybara audio/{f}", + } + return {"exists": False, "audio_url": None} + + +@app.post("/api/create_tts") +def create_tts(req: TTSRequest): + """Generate TTS audio for a story. Returns SSE stream with progress.""" + + def event_stream(): + import asyncio + + yield sse_event({"stage": "connecting", "progress": 10, + "message": "正在连接语音合成服务..."}) + + # Check if audio already exists + for f in os.listdir(_audio_dir): + if f.lower().endswith(".mp3"): + name = f[:-4] + name_without_ts = re.sub(r'_\d{10,}$', '', name) + if name_without_ts == req.title: + yield sse_event({"stage": "done", "progress": 100, + "message": "语音已存在", + "audio_url": f"Capybara audio/{f}"}) + return + + yield sse_event({"stage": "generating", "progress": 30, + "message": "AI 正在朗读故事..."}) + + try: + # Run async TTS in a new event loop + loop = asyncio.new_event_loop() + audio_bytes = loop.run_until_complete(tts_synthesize(req.content)) + loop.close() + + if not audio_bytes or len(audio_bytes) < 100: + yield sse_event({"stage": "error", "progress": 0, + "message": "语音合成返回了空音频"}) + return + + yield sse_event({"stage": "saving", "progress": 80, + "message": "正在保存音频..."}) + + # Save audio file + timestamp = int(time.time()) + safe_title = re.sub(r'[<>:"/\\|?*]', '', req.title)[:50] + filename = f"{safe_title}_{timestamp}.mp3" + filepath = os.path.join(_audio_dir, filename) + + with open(filepath, "wb") as f: + f.write(audio_bytes) + + print(f"[TTS Saved] {filepath} ({len(audio_bytes)} bytes)", flush=True) + + yield sse_event({"stage": "done", "progress": 100, + "message": "语音生成完成!", + "audio_url": f"Capybara audio/{filename}"}) + + except Exception as e: + print(f"[TTS Error] {e}", flush=True) + yield sse_event({"stage": "error", "progress": 0, + "message": f"语音合成失败: {str(e)}"}) + + return StreamingResponse(event_stream(), media_type="text/event-stream") + + +# ── Static file serving ── from fastapi.staticfiles import StaticFiles -# Create music directory if it doesn't exist +# Music directory _music_dir = os.path.join(os.path.dirname(__file__) or ".", "Capybara music") os.makedirs(_music_dir, exist_ok=True) app.mount("/Capybara music", StaticFiles(directory=_music_dir), name="music_files") +# Audio directory (TTS generated) +app.mount("/Capybara audio", StaticFiles(directory=_audio_dir), name="audio_files") + if __name__ == "__main__": print("[Server] Music Server running on http://localhost:3000") diff --git a/阶段总结/session_progress.md b/阶段总结/session_progress.md index 852908d..ae4fc88 100644 --- a/阶段总结/session_progress.md +++ b/阶段总结/session_progress.md @@ -3,7 +3,7 @@ > **用途**:每次对话结束前 / 做完一个阶段后更新此文件。 > 新对话开始时,AI 先读此文件恢复上下文。 > -> **最后更新**:2026-02-09 (第八次对话) +> **最后更新**:2026-02-10 (第九次对话) --- @@ -155,9 +155,47 @@ - **封面区分**:预设故事显示封面图,AI 生成的故事显示淡紫渐变"暂无封面"占位 - **乱码过滤**:API 层自动跳过无中文标题的异常文件 -### 正在做的 -- TTS 语音合成待后续接入(用户去开通火山语音服务后再做) +### 第九次对话完成的工作(2026-02-10) + +#### TTS 语音合成全链路接入(上次对话完成,此处补记) +- **后端**:`server.py` 新增 `/api/tts` 接口,WebSocket 流式调用豆包 TTS V1 API +- **音色**:可爱女生(`ICL_zh_female_keainvsheng_tob`) +- **前端组件**:`PillProgressButton`(药丸形进度按钮)替代旧 RingProgressButton + - 5 种状态:idle / ready / generating / completed / playing / paused / error + - 进度环动画 + 音波动效 + 发光效果 +- **TTSService 单例**:后台持续运行,切页面不中断生成 +- **音频保存**:生成的 TTS 音频保存到 `Capybara audio/` 目录 +- **暂停/续播修复**:显式 seek 到暂停位置再 play,解决 Web 端从头播放的 bug +- **按钮状态修复**:新增 `ready` 状态,未播放过的音频显示"播放"而非"继续" +- **自动播放控制**:仅在用户停留在故事页时自动播放,切出页面不自动播 + +#### 音乐总监 Prompt 优化 +- **歌名去重复**:移除固定示例("温泉咔咔乐"等),改为"根据场景自由发挥,不要套用固定模板" +- **效果**:AI 每次为相似场景生成不同歌名,唱片架不再出现一堆同名歌曲 + +#### 唱片架播放状态可视化 +- **卡片高亮**:当前播放的歌曲整张卡片变暖金色底 + 金色边框 + 阴影 +- **标题标识**:播放中的歌曲标题前加小喇叭图标 + 金色加粗文字 +- **音波动效**:播放中的唱片中心叠加跳动音波 CustomPaint 动画 + +#### 气泡持续显示当前歌名 +- 播放期间气泡始终显示"正在播放: xxx",不再 3 秒后消失 +- 直接点播放按钮(非从唱片架选歌)也会显示歌名 +- 暂停时气泡自动隐藏,切歌时自动更新 +- 使用 `_playStickyText` 机制,即使其他临时消息弹出后也会恢复播放信息 + +#### 调研 AI 音乐生成平台 +- 对比了 MiniMax Music 2.5(现用)、Mureka(昆仑万维)、天谱乐、ACE-Step +- 发现 Mureka 有中国站 API(platform.mureka.cn),质量评测超越 Suno V4 +- 用户的朋友用的 Muse AI App 底层就是 Mureka 模型 +- MiniMax 文本模型(abab6.5s-chat)价格偏高,可考虑切豆包 +- 歌词生成费用极低(每次约 0.005 元),主要成本在音乐生成(1 元/首) + +### 正在做的 / 待办 - 故事封面方案待定(付费生成 or 免费生成) +- 考虑将音乐生成从 MiniMax 切换到 Mureka(用户在评估中) +- 考虑将歌词生成的 LLM 从 MiniMax abab6.5s-chat 切到豆包(更便宜) +- 长歌名 fallback 问题:LLM 返回空 song_title 时用了用户输入原文当歌名,后续可优化 ---