Compare commits

..

No commits in common. "f3ef1d1242769a0909b2a49e2807b4f5902f5500" and "4983553261d28f4b993789d993598867e23c66bb" have entirely different histories.

62 changed files with 429 additions and 7008 deletions

View File

@ -1,45 +0,0 @@
本文档对语音合成SDK支持的能力进行说明。
* **SDK名称**语音合成SDK
* **SDK开发者**:北京火山引擎科技有限公司
* **主要功能**语音合成SDK支持将文字实时合成语音适用于实时语音播报的场景如有声阅读、导航、语音助手等等。
<span id="sdk接入"></span>
## SDK接入
| | | | \
|平台/语言 |集成指南 |调用流程 |
|---|---|---|
| | | | \
|Android |[集成指南](/docs/6561/79832) |[调用流程](/docs/6561/79834) |
| | | | \
|iOS |[集成指南](/docs/6561/79835) |[调用流程](/docs/6561/79837) |
**其他相关信息**
* [SDK版本信息](/docs/6561/79830)
* [SDK隐私政策](/docs/6561/116696)
* [开发者使用合规规范](/docs/6561/116711)
<span id="合成能力"></span>
# 合成能力
**在线合成**云端合成发起网络请求边合成边播放支持TTS的websocket接口能够使用声音复刻音色以及TTS大小模型音色
**离线合成**:本地离线引擎合成,需要相关资源文件,边合成边播放;
<span id="合成策略"></span>
# 合成策略
离在线语音合成SDK除了可以单独使用的在线合成及离线合成外提供了在线合成发生网络超时后切换离线合成的两种策略用户可以通过配置建连超时和接收超时两个参数来控制切换的敏感程度。
* **在线优先**:优先发起在线合成,失败后(网络超时),启动离线合成引擎开始合成;
* **并发合成**:同时发起在线合成与离线合成,在线请求失败的情况下,使用离线合成数据,该模式下,可以配置更短的超时时间以提升效果,但会消耗更多系统性能;
<span id="合成场景"></span>
# 合成场景
语音合成SDK提供了两种种合成场景以满足不同的需求
* **普通场景**:又称单句场景,引擎每次启动,只合成、播放一句音频的模式。
* **小说场景**:适用于听书业务,每次启动引擎后可以根据需求合成多句音频。
<span id="合成效果"></span>
# 合成效果
通过对发音人、音调、音量和语速等参数的调整,可以获得不同的发声效果,更好满足您业务场景中的播报需求。

File diff suppressed because it is too large Load Diff

View File

@ -1,551 +0,0 @@
根据输入的提示词生成连贯清晰的文本内容。
:::tip
方舟平台的新用户?获取 API Key 及 开通模型等准备工作,请参见 [快速入门](/docs/82379/1399008)。
:::
<span id="18cf565a"></span>
# 快速开始
<span aceTableMode="list" aceTableWidth="3,6"></span>
|输入 |输出预览 |
|---|---|
|```Plain|```Plain|\
|请将下面内容进行结构化处理:火山方舟是火山引擎推出的大模型服务平台,提供模型训练、推理、评测、精调等全方位功能与服务,并重点支撑大模型生态。 火山方舟通过稳定可靠的安全互信方案,保障模型提供方的模型安全与模型使用者的信息安全,加速大模型能力渗透到千行百业,助力模型提供方和使用者实现商业新增长。|# 火山方舟大模型服务平台结构化信息|\
|```||\
| |## 一、基本属性|\
| |1. **推出主体**:火山引擎|\
| |2. **平台定位**:大模型服务平台|\
| ||\
| |## 二、核心功能与服务|\
| |提供模型训练、推理、评测、精调等全方位大模型相关功能与服务,核心定位包括:|\
| |1. 支撑大模型生态建设|\
| |2. 推动大模型能力向千行百业渗透|\
| ||\
| |## 三、安全保障方案|\
| |通过稳定可靠的安全互信方案,双向保障信息安全:|\
| |1. 模型提供方:保障其模型安全|\
| |2. 模型使用者:保障其信息安全|\
| ||\
| |## 四、价值目标|\
| |助力模型提供方和使用者实现商业新增长,构建大模型生态下的商业共赢模式。|\
| |```|\
| | |
```mixin-react
return (<Tabs>
<Tabs.TabPane title="Curl" key="vJUYohZMmf"><RenderMd content={`\`\`\`Bash
curl https://ark.cn-beijing.volces.com/api/v3/chat/completions \\
-H "Content-Type: application/json" \\
-H "Authorization: Bearer $ARK_API_KEY" \\
-d '{
"model": "doubao-seed-1-6-251015",
"messages": [
{"role": "user", "content": "请将下面内容进行结构化处理:火山方舟是火山引擎推出的大模型服务平台,提供模型训练、推理、评测、精调等全方位功能与服务,并重点支撑大模型生态。 火山方舟通过稳定可靠的安全互信方案,保障模型提供方的模型安全与模型使用者的信息安全,加速大模型能力渗透到千行百业,助力模型提供方和使用者实现商业新增长。"}
],
"thinking":{
"type":"disabled"
}
}'
\`\`\`
* 按需替换 Model ID查询 Model ID 请参见 [模型列表](/docs/82379/1330310)。
`}></RenderMd></Tabs.TabPane>
<Tabs.TabPane title="Python" key="RyneFpLl5G"><RenderMd content={`\`\`\`Python
import os
# Install SDK: pip install 'volcengine-python-sdk[ark]'
from volcenginesdkarkruntime import Ark
# 初始化Ark客户端
client = Ark(
# The base URL for model invocation
base_url="https://ark.cn-beijing.volces.com/api/v3",
# Get API Keyhttps://console.volcengine.com/ark/region:ark+cn-beijing/apikey
api_key=os.getenv('ARK_API_KEY'),
)
completion = client.chat.completions.create(
# Replace with Model ID
model = "doubao-seed-1-6-251015",
messages=[
{"role": "user", "content": "请将下面内容进行结构化处理:火山方舟是火山引擎推出的大模型服务平台,提供模型训练、推理、评测、精调等全方位功能与服务,并重点支撑大模型生态。 火山方舟通过稳定可靠的安全互信方案,保障模型提供方的模型安全与模型使用者的信息安全,加速大模型能力渗透到千行百业,助力模型提供方和使用者实现商业新增长。"},
],
# thinking={"type": "disabled"}, # Manually disable deep thinking
)
print(completion.choices[0].message.content)
\`\`\`
`}></RenderMd></Tabs.TabPane>
<Tabs.TabPane title="Go" key="vzIAhWeZi9"><RenderMd content={`\`\`\`Go
package main
import (
"context"
"fmt"
"os"
"github.com/volcengine/volcengine-go-sdk/service/arkruntime"
"github.com/volcengine/volcengine-go-sdk/service/arkruntime/model"
"github.com/volcengine/volcengine-go-sdk/volcengine"
)
func main() {
client := arkruntime.NewClientWithApiKey(
os.Getenv("ARK_API_KEY"),
// The base URL for model invocation
arkruntime.WithBaseUrl("https://ark.cn-beijing.volces.com/api/v3"),
)
ctx := context.Background()
req := model.CreateChatCompletionRequest{
// Replace with Model ID
Model: "doubao-seed-1-6-251015",
Messages: []*model.ChatCompletionMessage{
{
Role: model.ChatMessageRoleUser,
Content: &model.ChatCompletionMessageContent{
StringValue: volcengine.String("请将下面内容进行结构化处理:火山方舟是火山引擎推出的大模型服务平台,提供模型训练、推理、评测、精调等全方位功能与服务,并重点支撑大模型生态。 火山方舟通过稳定可靠的安全互信方案,保障模型提供方的模型安全与模型使用者的信息安全,加速大模型能力渗透到千行百业,助力模型提供方和使用者实现商业新增长。"),
},
},
},
Thinking: &model.Thinking{
Type: model.ThinkingTypeDisabled, // Manually disable deep thinking
// Type: model.ThinkingTypeEnabled, // Manually enable deep thinking
},
}
resp, err := client.CreateChatCompletion(ctx, req)
if err != nil {
fmt.Printf("standard chat error: %v\\n", err)
return
}
fmt.Println(*resp.Choices[0].Message.Content.StringValue)
}
\`\`\`
`}></RenderMd></Tabs.TabPane>
<Tabs.TabPane title="Java" key="Hijm4ptRjM"><RenderMd content={`\`\`\`java
package com.ark.sample;
import com.volcengine.ark.runtime.model.completion.chat.*;
import com.volcengine.ark.runtime.service.ArkService;
import java.util.ArrayList;
import java.util.List;
public class ChatCompletionsExample {
public static void main(String[] args) {
String apiKey = System.getenv("ARK_API_KEY");
// The base URL for model invocation
ArkService service = ArkService.builder().apiKey(apiKey).baseUrl("https://ark.cn-beijing.volces.com/api/v3").build();
final List<ChatMessage> messages = new ArrayList<>();
final ChatMessage userMessage = ChatMessage.builder().role(ChatMessageRole.USER).content("请将下面内容进行结构化处理:火山方舟是火山引擎推出的大模型服务平台,提供模型训练、推理、评测、精调等全方位功能与服务,并重点支撑大模型生态。 火山方舟通过稳定可靠的安全互信方案,保障模型提供方的模型安全与模型使用者的信息安全,加速大模型能力渗透到千行百业,助力模型提供方和使用者实现商业新增长。").build();
messages.add(userMessage);
ChatCompletionRequest chatCompletionRequest = ChatCompletionRequest.builder()
.model("doubao-seed-1-6-251015")//Replace with Model ID
.messages(messages)
// .thinking(new ChatCompletionRequest.ChatCompletionRequestThinking("disabled")) // Manually disable deep thinking
.build();
service.createChatCompletion(chatCompletionRequest).getChoices().forEach(choice -> System.out.println(choice.getMessage().getContent()));
// shutdown service
service.shutdownExecutor();
}
}
\`\`\`
`}></RenderMd></Tabs.TabPane></Tabs>);
```
:::tip
使用 Responses API 实现单轮对话的示例,请参见[快速开始](/docs/82379/1958520#17377051)。
:::
<span id="3e5edc90"></span>
# 模型与API
支持的模型:[文本生成能力](/docs/82379/1330310#b318deb2)
支持的API
* [Responses API](https://www.volcengine.com/docs/82379/1569618):新推出的 API简洁上下文管理增强工具调用能力缓存能力降低成本新业务及用户推荐。
* [Chat API](https://www.volcengine.com/docs/82379/1494384):使用广泛的 API存量业务迁移成本低。
<span id="1d866118"></span>
# 使用示例
<span id="f6222fec"></span>
## 多轮对话
实现多轮对话,需将包含系统消息、模型消息和用户消息的对话历史组合成一个列表,以便模型理解上下文,并延续之前的话题进行问答。
<span aceTableMode="list" aceTableWidth="1,5,5"></span>
|传入方式 |手动管理上下文 |通过ID管理上下文 |
|---|---|---|
|使用示例 |```JSON|```JSON|\
| |...|...|\
| | "model": "doubao-seed-1-6-251015",| "model": "doubao-seed-1-6-251015",|\
| | "messages":[| "previous_response_id":"<id>",|\
| | {"role": "user", "content": "Hi, tell a joke."},| "input": "What is the punchline of this joke?"|\
| | {"role": "assistant", "content": "Why did the math book look sad? Because it had too many problems! 😄"},|...|\
| | {"role": "user", "content": "What's the punchline of this joke?"}|```|\
| | ]| |\
| |...| |\
| |```| |\
| | | |
|API |[Chat API](https://www.volcengine.com/docs/82379/1494384) |[Responses API](https://www.volcengine.com/docs/82379/1569618) |
> 更多说明及完整示例请参见 [上下文管理](/docs/82379/2123288)。
<span id="78d5cc11"></span>
## 流式输出
<span aceTableMode="list" aceTableWidth="2,1"></span>
|预览 |优势 |
|---|---|
|<video src="https://p9-arcosite.byteimg.com/tos-cn-i-goo7wpa0wc/0b0ed47ec1b94b20a4f4966aa80130e6~tplv-goo7wpa0wc-image.image" controls></video>|* **改善等待体验**:无需等待完整内容生成完毕,可立即处理过程内容。|\
| |* **实时过程反馈**:多轮交互场景,实时了解任务当前的处理阶段。|\
| |* **更高的容错性**:中途出错,也能获取到已生成内容,避免非流式输出失败无返回的情况。|\
| |* **简化超时管理**:保持客户端与服务端的连接状态,避免复杂任务耗时过长而连接超时。 |
通过配置 **stream**`true`,来启用流式输出。
```JSON
...
"model": "doubao-seed-1-6-251015",
"messages": [
{"role": "user", "content": "深度思考模型与非深度思考模型区别"}
],
"stream": true
...
```
> 完整示例及更多说明请参见 [流式输出](/docs/82379/2123275)。
<span id="3821b26a"></span>
## 设置最大回答
当控制成本或者回答问题时间,可通过限制模型回答长度实现。当回答篇幅较长,如翻译长文本,避免中途截断,可通过设置`max_tokens`更大值实现。
```JSON
...
"model": "doubao-seed-1-6-251015",
"messages": [
{"role": "user","content": "What are some common cruciferous plants?"}
],
"max_tokens": 300
...
```
> 完整示例代码,请参见 [控制回答长度](/docs/82379/2123288#c7fbdbe3)。
<span id="8783d86f"></span>
## 异步输出
当任务较为复杂或者多个任务并发等场景下,可使用 Asyncio 接口实现并发调用,提高程序的效率,优化体验。
* Chat API 代码示例:
```mixin-react
return (<Tabs>
<Tabs.TabPane title="Python" key="nQ7vQXOOFE"><RenderMd content={`\`\`\`Python
import asyncio
import os
# Install SDK: pip install 'volcengine-python-sdk[ark]'
from volcenginesdkarkruntime import AsyncArk
# 初始化Ark客户端
client = AsyncArk(
# The base URL for model invocation
base_url="https://ark.cn-beijing.volces.com/api/v3",
# Get API Keyhttps://console.volcengine.com/ark/region:ark+cn-beijing/apikey
api_key=os.getenv('ARK_API_KEY'),
)
async def main() -> None:
stream = await client.chat.completions.create(
# Replace with Model ID
model = "doubao-seed-1-6-251015",
messages=[
{"role": "system", "content": "你是 AI 人工智能助手"},
{"role": "user", "content": "常见的十字花科植物有哪些?"},
],
stream=True
)
async for completion in stream:
print(completion.choices[0].delta.content, end="")
print()
if __name__ == "__main__":
asyncio.run(main())
\`\`\`
`}></RenderMd></Tabs.TabPane></Tabs>);
```
* Responses API 代码示例:
```mixin-react
return (<Tabs>
<Tabs.TabPane title="Python" key="ileVGr66Xy"><RenderMd content={`\`\`\`Python
import asyncio
import os
from volcenginesdkarkruntime import Ark
from volcenginesdkarkruntime.types.responses.response_completed_event import ResponseCompletedEvent
from volcenginesdkarkruntime.types.responses.response_reasoning_summary_text_delta_event import ResponseReasoningSummaryTextDeltaEvent
from volcenginesdkarkruntime.types.responses.response_output_item_added_event import ResponseOutputItemAddedEvent
from volcenginesdkarkruntime.types.responses.response_text_delta_event import ResponseTextDeltaEvent
from volcenginesdkarkruntime.types.responses.response_text_done_event import ResponseTextDoneEvent
client = AsyncArk(
base_url='https://ark.cn-beijing.volces.com/api/v3',
api_key=os.getenv('ARK_API_KEY')
)
async def main():
stream = await client.responses.create(
model="doubao-seed-1-6-251015",
input=[
{"role": "system", "content": "你是 AI 人工智能助手"},
{"role": "user", "content": "常见的十字花科植物有哪些?"},
],
stream=True
)
async for event in stream:
if isinstance(event, ResponseReasoningSummaryTextDeltaEvent):
print(event.delta, end="")
if isinstance(event, ResponseOutputItemAddedEvent):
print("\\noutPutItem " + event.type + " start:")
if isinstance(event, ResponseTextDeltaEvent):
print(event.delta,end="")
if isinstance(event, ResponseTextDoneEvent):
print("\\noutPutTextDone.")
if isinstance(event, ResponseCompletedEvent):
print("Response Completed. Usage = " + event.response.usage.model_dump_json())
if __name__ == "__main__":
asyncio.run(main())
\`\`\`
`}></RenderMd></Tabs.TabPane></Tabs>);
```
<span id="10b8a01c"></span>
# 更多使用
<span id="a1d6b42a"></span>
## 深度思考
模型在输出回答前,先对输入问题进行系统性分析与逻辑拆解,再基于拆解结果生成回答。
可以显著提升回复质量,但会增加 token 消耗,详细信息请参见[深度思考](/docs/82379/1449737)。
<span id="19b5e705"></span>
## 提示词工程
正确设计和编写提示词如提供说明、示例、好的规范等方法可提高模型输出的质量和准确性。进行提示词优化的工作也被称为提示词工程Prompt Engineering。详细信息请参见[提示词工程](/docs/82379/1221660)。
<span id="39a7195c"></span>
## 工具调用
通过集成内置工具或连接远程 MCP 服务器,您可以扩展模型的功能,以便更好回答问题或执行任务。当前支持:
* 内置工具:搜索网络、检索数据、图片处理等。
* 调用自定义函数。
* 访问三方MCP服务。
详细信息请参见[工具概述](/docs/82379/1827538)。
<span id="8d0362b6"></span>
## 续写模式
通过预填Prefill部分 **assistant** 角色的内容,引导和控制模型从已有的文本片段继续输出,以及控制模型在角色扮演场景中保持一致性。
* [续写模式 Prefill Response](/docs/82379/1359497):使用[Chat API](https://www.volcengine.com/docs/82379/1494384)实现续写模式。
* [续写模式](/docs/82379/1958520#a1384090):使用[Responses API](https://www.volcengine.com/docs/82379/1569618)实现续写模式。
<span id="c22bed1a"></span>
## 结构化输出beta
控制模型输出程序可处理的标准格式(主要是 JSON而非自然语言方便标准化处理或展示。
* [结构化输出(beta)](/docs/82379/1568221):使用[Chat API](https://www.volcengine.com/docs/82379/1494384)实现结构化输出。
* [结构化输出(beta)](/docs/82379/1568221):使用[Responses API](https://www.volcengine.com/docs/82379/1569618)实现结构化输出。
<span id="4f8038b1"></span>
## 批量推理
方舟为您提供批量推理的能力,当您有大批量数据处理任务,可使用批量推理能力,以获得更大吞吐量和更低的成本。详细介绍和使用,请参见 [批量推理](/docs/82379/1399517)。
<span id="3b458a44"></span>
## 异常处理
增加异常处理,帮助定位问题。
```mixin-react
return (<Tabs>
<Tabs.TabPane title="Python" key="ylMJa5FOjw"><RenderMd content={`\`\`\`Python
import os
# Install SDK: pip install 'volcengine-python-sdk[ark]'
from volcenginesdkarkruntime import Ark
from volcenginesdkarkruntime._exceptions import ArkAPIError
# 初始化Ark客户端
client = Ark(
# The base URL for model invocation
base_url="https://ark.cn-beijing.volces.com/api/v3",
api_key=os.getenv('ARK_API_KEY'),
)
# Streaming
try:
stream = client.chat.completions.create(
# Replace with Model ID
model = "doubao-seed-1-6-251015",
messages=[
{"role": "system", "content": "你是 AI 人工智能助手"},
{"role": "user", "content": "常见的十字花科植物有哪些?"},
],
stream=True
)
for chunk in stream:
if not chunk.choices:
continue
print(chunk.choices[0].delta.content, end="")
print()
except ArkAPIError as e:
print(e)
\`\`\`
`}></RenderMd></Tabs.TabPane>
<Tabs.TabPane title="Go" key="eZMbcdcous"><RenderMd content={`\`\`\`Go
package main
import (
"context"
"errors"
"fmt"
"io"
"os"
"github.com/volcengine/volcengine-go-sdk/service/arkruntime"
"github.com/volcengine/volcengine-go-sdk/service/arkruntime/model"
"github.com/volcengine/volcengine-go-sdk/volcengine"
)
func main() {
client := arkruntime.NewClientWithApiKey(
os.Getenv("ARK_API_KEY"),
// The base URL for model invocation
arkruntime.WithBaseUrl("https://ark.cn-beijing.volces.com/api/v3"),
)
ctx := context.Background()
fmt.Println("----- streaming request -----")
req := model.CreateChatCompletionRequest{
// Replace with Model ID
Model: "doubao-seed-1-6-251015",
Messages: []*model.ChatCompletionMessage{
{
Role: model.ChatMessageRoleSystem,
Content: &model.ChatCompletionMessageContent{
StringValue: volcengine.String("你是 AI 人工智能助手"),
},
},
{
Role: model.ChatMessageRoleUser,
Content: &model.ChatCompletionMessageContent{
StringValue: volcengine.String("常见的十字花科植物有哪些?"),
},
},
},
}
stream, err := client.CreateChatCompletionStream(ctx, req)
if err != nil {
apiErr := &model.APIError{}
if errors.As(err, &apiErr) {
fmt.Printf("stream chat error: %v\\n", apiErr)
}
return
}
defer stream.Close()
for {
recv, err := stream.Recv()
if err == io.EOF {
return
}
if err != nil {
apiErr := &model.APIError{}
if errors.As(err, &apiErr) {
fmt.Printf("stream chat error: %v\\n", apiErr)
}
return
}
if len(recv.Choices) > 0 {
fmt.Print(recv.Choices[0].Delta.Content)
}
}
}
\`\`\`
`}></RenderMd></Tabs.TabPane>
<Tabs.TabPane title="Java" key="CZAaXNryKC"><RenderMd content={`\`\`\`java
package com.volcengine.ark.runtime;
import com.volcengine.ark.runtime.exception.ArkHttpException;
import com.volcengine.ark.runtime.model.completion.chat.ChatCompletionRequest;
import com.volcengine.ark.runtime.model.completion.chat.ChatMessage;
import com.volcengine.ark.runtime.model.completion.chat.ChatMessageRole;
import com.volcengine.ark.runtime.service.ArkService;
import java.util.ArrayList;
import java.util.List;
public class ChatCompletionsExample {
public static void main(String[] args) {
String apiKey = System.getenv("ARK_API_KEY");
// The base URL for model invocation
ArkService service = ArkService.builder().apiKey(apiKey).baseUrl("https://ark.cn-beijing.volces.com/api/v3").build();
System.out.println("----- streaming request -----");
final List<ChatMessage> streamMessages = new ArrayList<>();
final ChatMessage streamSystemMessage = ChatMessage.builder().role(ChatMessageRole.SYSTEM).content("你是 AI 人工智能助手").build();
final ChatMessage streamUserMessage = ChatMessage.builder().role(ChatMessageRole.USER).content("常见的十字花科植物有哪些?").build();
streamMessages.add(streamSystemMessage);
streamMessages.add(streamUserMessage);
ChatCompletionRequest streamChatCompletionRequest = ChatCompletionRequest.builder()
.model("doubao-seed-1-6-251015")//Replace with Model ID
.messages(streamMessages)
.build();
try {
service.streamChatCompletion(streamChatCompletionRequest)
.doOnError(Throwable::printStackTrace)
.blockingForEach(
choice -> {
if (choice.getChoices().size() > 0) {
System.out.print(choice.getChoices().get(0).getMessage().getContent());
}
}
);
} catch (ArkHttpException e) {
System.out.print(e.toString());
}
// shutdown service
service.shutdownExecutor();
}
}
\`\`\`
`}></RenderMd></Tabs.TabPane></Tabs>);
```
<span id="b411f06e"></span>
## 对话加密
除了默认的网络层加密,火山方舟还提供免费的应用层加密功能,为您的推理会话数据提供更强的安全保护。您只需增加一行代码即可启用。完整示例代码请参见 [加密数据](/docs/82379/1544136#23274b89);更多原理信息,请参见[推理会话数据应用层加密方案](/docs/82379/1389905)。
<span id="ca2551d7"></span>
# 使用说明
* 模型关键限制:
* 最大上下文长度Context Window即单次请求模型能处理的内容长度包括用户输入和模型输出单位 token 。超出最大上下文长度的内容时,会截断并停止输出。如碰到上下文限制导致的内容截断,可选择支持更大上下文长度规格的模型。
* 最大输出长度Max Tokens即单次模型输出的内容的最大长度。如碰到这种情况可参考[续写模式 Prefill Response](/docs/82379/1359497),通过多次续写回复,拼接出完整内容。
* 每分钟处理内容量TPM即账号下同模型不区分版本每分钟能处理的内容量限制单位 token。如默认 TPM 限制无法满足您的业务,可通过[工单](https://console.volcengine.com/workorder/create?step=2&SubProductID=P00001166)联系售后提升配额。举例:某模型的 TPM 为 500w一个主账号下创建的该模型的所有版本接入点共享此配额。
* 每分钟处理请求数RPM即账号下同模型不区分版本每分钟能处理的请求数上限与上面 TPM 类似。如默认 RPM 限制无法满足您的业务,可通过[工单](https://console.volcengine.com/workorder/create?step=2&SubProductID=P00001166)联系售后提升配额。
* 各模型详细的规格信息,请参见 [模型列表](/docs/82379/1330310)。
* 用量查询:
* 对于某次请求 token 用量:可在返回的 **usage** 结构体中查看。
* 输入/输出内容的 token 用量:可使用 [Tokenization API](https://www.volcengine.com/docs/82379/1528728) 或 [Token 计算器](https://console.volcengine.com/ark/region:ark+cn-beijing/tokenCalculator)来估算。
* 账号/项目/接入点维度 token 用量:可在 [用量统计](https://console.volcengine.com/ark/region:ark+cn-beijing/usageTracking) 页面查看。
<span id="901dd971"></span>
# 常见问题
[常见问题](/docs/82379/1359411)\-[在线推理](/docs/82379/1359411#aa45e6c0):在线推理的常见问题,如遇到错误,可尝试在这里找解决方案。

View File

@ -1,627 +0,0 @@
<span id="60c34d72"></span>
# Websocket
> 使用账号申请部分申请到的 appid&access_token 进行调用
> 文本一次性送入,后端边合成边返回音频数据
<span id="9e6b61a2"></span>
## 1. 接口说明
> V1:
> **wss://openspeech.bytedance.com/api/v1/tts/ws_binary (V1 单向流式)**
> **https://openspeech.bytedance.com/api/v1/tts (V1 http非流式)**
> V3:
> **wss://openspeech.bytedance.com/api/v3/tts/unidirectional/stream (V3 wss单向流式)**
> [V3 websocket单向流式文档](https://www.volcengine.com/docs/6561/1719100)
> **wss://openspeech.bytedance.com/api/v3/tts/bidirection (V3 wss双向流式)**
> [V3 websocket双向流式文档](https://www.volcengine.com/docs/6561/1329505)
> **https://openspeech.bytedance.com/api/v3/tts/unidirectional (V3 http单向流式)**
> [V3 http单向流式文档](https://www.volcengine.com/docs/6561/1598757)
:::warning
大模型音色都推荐接入V3接口时延上的表现会更好
:::
<span id="34dcdf3a"></span>
## 2. 身份认证
认证方式使用 Bearer Token在请求的 header 中加上`"Authorization": "Bearer; {token}"`,并在请求的 json 中填入对应的 appid。
:::warning
Bearer 和 token 使用分号 ; 分隔,替换时请勿保留{}
:::
AppID/Token/Cluster 等信息可参考 [控制台使用FAQ-Q1](/docs/6561/196768#q1哪里可以获取到以下参数appidclustertokenauthorization-typesecret-key-)
<span id="f1d92aff"></span>
## 3. 请求方式
<span id="14624bd9"></span>
### 3.1 二进制协议
<span id="7574a509"></span>
#### 报文格式(Message format)
![Image](https://lf3-volc-editor.volccdn.com/obj/volcfe/sop-public/upload_cc1c1cdd61bf29f5bde066dc693dcb2b.png =1816x)
所有字段以 [Big Endian(大端序)](https://zh.wikipedia.org/wiki/%E5%AD%97%E8%8A%82%E5%BA%8F#%E5%A4%A7%E7%AB%AF%E5%BA%8F) 的方式存储。
**字段描述**
| | | | \
|字段 Field (大小, 单位 bit) |描述 Description |值 Values |
|---|---|---|
| | | | \
|协议版本(Protocol version) (4) |可能会在将来使用不同的协议版本,所以这个字段是为了让客户端和服务器在版本上保持一致。 |`0b0001` - 版本 1 (目前只有版本 1) |
| | | | \
|报头大小(Header size) (4) |header 实际大小是 `header size value x 4` bytes. |\
| |这里有个特殊值 `0b1111` 表示 header 大小大于或等于 60(15 x 4 bytes),也就是会存在 header extension 字段。 |`0b0001` - 报头大小 4 (1 x 4) |\
| | |`0b0010` - 报头大小 8 (2 x 4) |\
| | |`0b1010` - 报头大小 40 (10 x 4) |\
| | |`0b1110` - 报头大小 = 56 (14 x 4) |\
| | |`0b1111` - 报头大小为 60 或更大; 实际大小在 header extension 中定义 |
| | | | \
|消息类型(Message type) (4) |定义消息类型。 |`0b0001` - full client request. |\
| | |`~~0b1001~~` ~~- full server response(弃用).~~ |\
| | |`0b1011` - Audio-only server response (ACK). |\
| | |`0b1111` - Error message from server (例如错误的消息类型,不支持的序列化方法等等) |
| | | | \
|Message type specific flags (4) |flags 含义取决于消息类型。 |\
| |具体内容请看消息类型小节. | |
| | | | \
|序列化方法(Message serialization method) (4) |定义序列化 payload 的方法。 |\
| |注意:它只对某些特定的消息类型有意义 (例如 Audio-only server response `0b1011` 就不需要序列化). |`0b0000` - 无序列化 (raw bytes) |\
| | |`0b0001` - JSON |\
| | |`0b1111` - 自定义类型, 在 header extension 中定义 |
| | | | \
|压缩方法(Message Compression) (4) |定义 payload 的压缩方法。 |\
| |Payload size 字段不压缩(如果有的话,取决于消息类型),而且 Payload size 指的是 payload 压缩后的大小。 |\
| |Header 不压缩。 |`0b0000` - 无压缩 |\
| | |`0b0001` - gzip |\
| | |`0b1111` - 自定义压缩方法, 在 header extension 中定义 |
| | | | \
|保留字段(Reserved) (8) |保留字段,同时作为边界 (使整个报头大小为 4 个字节). |`0x00` - 目前只有 0 |
<span id="95a31a2c"></span>
#### 消息类型详细说明
目前所有 TTS websocket 请求都使用 full client request 格式,无论"query"还是"submit"。
<span id="d05f01f6"></span>
#### Full client request
* Header size为`b0001`(即 4B没有 header extension)。
* Message type为`b0001`.
* Message type specific flags 固定为`b0000`.
* Message serialization method为`b0001`JSON。字段参考上方表格。
* 如果使用 gzip 压缩 payload则 payload size 为压缩后的大小。
<span id="6e82d7df"></span>
#### Audio-only server response
* Header size 应该为`b0001`.
* Message type为`b1011`.
* Message type specific flags 可能的值有:
* `b0000` - 没有 sequence number.
* `b0001` - sequence number > 0.
* `b0010`or`b0011` - sequence number < 0表示来自服务器的最后一条消息此时客户端应合并所有音频片段(如果有多条)。
* Message serialization method为`b0000`(raw bytes).
<span id="4f9397bc"></span>
## 4.注意事项
* 每次合成时reqid这个参数需要重新设置且要保证唯一性建议使用uuid.V4生成
* websocket demo中单条链接仅支持单次合成若需要合成多次需自行实现。每次创建websocket连接后按顺序串行发送每一包。一次合成结束后可以发送新的合成请求。
* operation需要设置为submit才是流式返回
* 在 websocket 握手成功后,会返回这些 Response header
* 不支持["豆包语音合成模型2.0"的音色](https://www.volcengine.com/docs/6561/1257544),比如:"zh_female_vv_uranus_bigtts"如需使用推荐使用v3 接口
| | | | \
|Key |说明 |Value 示例 |
|---|---|---|
| | | | \
|X-Tt-Logid |服务端返回的 logid建议用户获取和打印方便定位问题 |202407261553070FACFE6D19421815D605 |
<span id="fe504ac4"></span>
## 5.调用示例
```mixin-react
return (<Tabs>
<Tabs.TabPane title="Python调用示例" key="buVUUlzaRC"><RenderMd content={`<span id="fccb89b1"></span>
### 前提条件
* 调用之前,您需要获取以下信息:
* \`<appid>\`使用控制台获取的APP ID可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
* \`<access_token>\`使用控制台获取的Access Token可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
* \`<voice_type>\`您预期使用的音色ID可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
<span id="824abc9d"></span>
### Python环境
* Python3.9版本及以上。
* Pip25.1.1版本及以上。您可以使用下面命令安装。
\`\`\`Bash
python3 -m pip install --upgrade pip
\`\`\`
<span id="5cbec8af"></span>
### 下载代码示例
<Attachment link="https://p9-arcosite.byteimg.com/tos-cn-i-goo7wpa0wc/90fc1f44eaac49f0b4e2cbabdaee8010~tplv-goo7wpa0wc-image.image" name="volcengine_binary_demo.tar.gz" ></Attachment>
<span id="44d95afb"></span>
### 解压缩代码包,安装依赖
\`\`\`Bash
mkdir -p volcengine_binary_demo
tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo
cd volcengine_binary_demo
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
pip3 install -e .
\`\`\`
<span id="fdf69422"></span>
### 发起调用
> \`<appid>\`替换为您的APP ID。
> \`<access_token>\`替换为您的Access Token。
> \`<voice_type>\`替换为您预期使用的音色ID例如\`zh_female_cancan_mars_bigtts\`。
\`\`\`Bash
python3 examples/volcengine/binary.py --appid <appid> --access_token <access_token> --voice_type <voice_type> --text "你好,我是火山引擎的语音合成服务。这是一个美好的旅程。"
\`\`\`
`}></RenderMd></Tabs.TabPane>
<Tabs.TabPane title="Java调用示例" key="bfjarx0zlZ"><RenderMd content={`<span id="e0bca07e"></span>
### 前提条件
* 调用之前,您需要获取以下信息:
* \`<appid>\`使用控制台获取的APP ID可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
* \`<access_token>\`使用控制台获取的Access Token可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
* \`<voice_type>\`您预期使用的音色ID可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
<span id="5f338843"></span>
### Java环境
* Java21版本及以上。
* Maven3.9.10版本及以上。
<span id="96af51fa"></span>
### 下载代码示例
<Attachment link="https://p9-arcosite.byteimg.com/tos-cn-i-goo7wpa0wc/ba78519b2dc0459fb7a6935b63775c66~tplv-goo7wpa0wc-image.image" name="volcengine_binary_demo.tar.gz" ></Attachment>
<span id="8e0ecd00"></span>
### 解压缩代码包,安装依赖
\`\`\`Bash
mkdir -p volcengine_binary_demo
tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo
cd volcengine_binary_demo
\`\`\`
<span id="fa0a6230"></span>
### 发起调用
> \`<appid>\`替换为您的APP ID。
> \`<access_token>\`替换为您的Access Token。
> \`<voice_type>\`替换为您预期使用的音色ID例如\`zh_female_cancan_mars_bigtts\`。
\`\`\`Bash
mvn compile exec:java -Dexec.mainClass=com.speech.volcengine.Binary -DappId=<appid> -DaccessToken=<access_token> -Dvoice=<voice_type> -Dtext="**你好**,我是豆包语音助手,很高兴认识你。这是一个愉快的旅程。"
\`\`\`
`}></RenderMd></Tabs.TabPane>
<Tabs.TabPane title="Go调用示例" key="s8zQJ7cCr3"><RenderMd content={`<span id="2733f4d4"></span>
### 前提条件
* 调用之前,您需要获取以下信息:
* \`<appid>\`使用控制台获取的APP ID可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
* \`<access_token>\`使用控制台获取的Access Token可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
* \`<voice_type>\`您预期使用的音色ID可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
<span id="ee9617a6"></span>
### Go环境
* Go1.21.0版本及以上。
<span id="cf9bb2bf"></span>
### 下载代码示例
<Attachment link="https://p9-arcosite.byteimg.com/tos-cn-i-goo7wpa0wc/c553a4a4373840d4a4870a1ef2a4e494~tplv-goo7wpa0wc-image.image" name="volcengine_binary_demo.tar.gz" ></Attachment>
<span id="363963c4"></span>
### 解压缩代码包,安装依赖
\`\`\`Bash
mkdir -p volcengine_binary_demo
tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo
cd volcengine_binary_demo
\`\`\`
<span id="f0acb02c"></span>
### 发起调用
> \`<appid>\`替换为您的APP ID。
> \`<access_token>\`替换为您的Access Token。
> \`<voice_type>\`替换为您预期使用的音色ID例如\`zh_female_cancan_mars_bigtts\`。
\`\`\`Bash
go run volcengine/binary/main.go --appid <appid> --access_token <access_token> --voice_type <voice_type> --text "**你好**,我是火山引擎的语音合成服务。"
\`\`\`
`}></RenderMd></Tabs.TabPane>
<Tabs.TabPane title="C#调用示例" key="Thg5rLaSjq"><RenderMd content={`<span id="c60c1d5f"></span>
### 前提条件
* 调用之前,您需要获取以下信息:
* \`<appid>\`使用控制台获取的APP ID可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
* \`<access_token>\`使用控制台获取的Access Token可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
* \`<voice_type>\`您预期使用的音色ID可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
<span id="cf2199fe"></span>
### C#环境
* .Net 9.0版本。
<span id="f7e91692"></span>
### 下载代码示例
<Attachment link="https://p9-arcosite.byteimg.com/tos-cn-i-goo7wpa0wc/a95ff3e7604d4bb4ade8fb49e110fef5~tplv-goo7wpa0wc-image.image" name="volcengine_binary_demo.tar.gz" ></Attachment>
<span id="f9131897"></span>
### 解压缩代码包,安装依赖
\`\`\`Bash
mkdir -p volcengine_binary_demo
tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo
cd volcengine_binary_demo
\`\`\`
<span id="5834585b"></span>
### 发起调用
> \`<appid>\`替换为您的APP ID。
> \`<access_token>\`替换为您的Access Token。
> \`<voice_type>\`替换为您预期使用的音色ID例如\`zh_female_cancan_mars_bigtts\`。
\`\`\`Bash
dotnet run --project Volcengine/Binary/Volcengine.Speech.Binary.csproj -- --appid <appid> --access_token <access_token> --voice_type <voice_type> --text "**你好**,这是一个测试文本。我们正在测试文本转语音功能。"
\`\`\`
`}></RenderMd></Tabs.TabPane>
<Tabs.TabPane title="TypeScript调用示例" key="p1GEs3rWU7"><RenderMd content={`<span id="8b865031"></span>
### 前提条件
* 调用之前,您需要获取以下信息:
* \`<appid>\`使用控制台获取的APP ID可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
* \`<access_token>\`使用控制台获取的Access Token可参考 [控制台使用FAQ-Q1](https://www.volcengine.com/docs/6561/196768#q1%EF%BC%9A%E5%93%AA%E9%87%8C%E5%8F%AF%E4%BB%A5%E8%8E%B7%E5%8F%96%E5%88%B0%E4%BB%A5%E4%B8%8B%E5%8F%82%E6%95%B0appid%EF%BC%8Ccluster%EF%BC%8Ctoken%EF%BC%8Cauthorization-type%EF%BC%8Csecret-key-%EF%BC%9F)。
* \`<voice_type>\`您预期使用的音色ID可参考 [大模型音色列表](https://www.volcengine.com/docs/6561/1257544)。
<span id="e7697c4e"></span>
### node环境
* nodev24.0版本及以上。
<span id="03fe45f1"></span>
### 下载代码示例
<Attachment link="https://p9-arcosite.byteimg.com/tos-cn-i-goo7wpa0wc/12ef1b1188a84f0c8883a0114da741ad~tplv-goo7wpa0wc-image.image" name="volcengine_binary_demo.tar.gz" ></Attachment>
<span id="13e8a71a"></span>
### 解压缩代码包,安装依赖
\`\`\`Bash
mkdir -p volcengine_binary_demo
tar xvzf volcengine_binary_demo.tar.gz -C ./volcengine_binary_demo
cd volcengine_binary_demo
npm install
npm install -g typescript
npm install -g ts-node
\`\`\`
<span id="0c57973f"></span>
### 发起调用
> \`<appid>\`替换为您的APP ID。
> \`<access_token>\`替换为您的Access Token。
> \`<voice_type>\`替换为您预期使用的音色ID例如\`<voice_type>\`。
\`\`\`Bash
npx ts-node src/volcengine/binary.ts --appid <appid> --access_token <access_token> --voice_type <voice_type> --text "**你好**,我是火山引擎的语音合成服务。"
\`\`\`
`}></RenderMd></Tabs.TabPane></Tabs>);
```
<span id="9ea45813"></span>
# HTTP
> 使用账号申请部分申请到的 appid&access_token 进行调用
> 文本全部合成完毕之后,一次性返回全部的音频数据
<span id="4d23f0f6"></span>
## 1. 接口说明
接口地址为 **https://openspeech.bytedance.com/api/v1/tts**
<span id="6f96a6fa"></span>
## 2. 身份认证
认证方式采用 Bearer Token.
1)需要在请求的 Header 中填入"Authorization":"Bearer;${token}"
:::warning
Bearer 和 token 使用分号 ; 分隔,替换时请勿保留${}
:::
AppID/Token/Cluster 等信息可参考 [控制台使用FAQ-Q1](/docs/6561/196768#q1哪里可以获取到以下参数appidclustertokenauthorization-typesecret-key-)
<span id="a8c19c9a"></span>
## 3. 注意事项
* 使用 HTTP Post 方式进行请求,返回的结果为 JSON 格式,需要进行解析
* 因 json 格式无法直接携带二进制音频,音频经 base64 编码。使用 base64 解码后,即为二进制音频
* 每次合成时 reqid 这个参数需要重新设置,且要保证唯一性(建议使用 UUID/GUID 等生成)
* 不支持["豆包语音合成模型2.0"的音色](https://www.volcengine.com/docs/6561/1257544),比如:"zh_female_vv_uranus_bigtts"如需使用推荐使用v3 接口
<span id="参数列表"></span>
# 参数列表
> Websocket 与 Http 调用参数相同
<span id="931a7b76"></span>
## 请求参数
| | | | | | | \
|字段 |含义 |层级 |格式 |必需 |备注 |
|---|---|---|---|---|---|
| | | | | | | \
|app |应用相关配置 |1 |dict |✓ | |
| | | | | | | \
|appid |应用标识 |2 |string |✓ |需要申请 |
| | | | | | | \
|token |应用令牌 |2 |string |✓ |无实际鉴权作用的Fake token可传任意非空字符串 |
| | | | | | | \
|cluster |业务集群 |2 |string |✓ |volcano_tts |
| | | | | | | \
|user |用户相关配置 |1 |dict |✓ | |
| | | | | | | \
|uid |用户标识 |2 |string |✓ |可传任意非空字符串,传入值可以通过服务端日志追溯 |
| | | | | | | \
|audio |音频相关配置 |1 |dict |✓ | |
| | | | | | | \
|voice_type |音色类型 |2 |string |✓ | |
| | | | | | | \
|emotion |音色情感 |2 |string | |设置音色的情感。示例:"emotion": "angry" |\
| | | | | |注:当前仅部分音色支持设置情感,且不同音色支持的情感范围存在不同。 |\
| | | | | |详见:[大模型语音合成API-音色列表-多情感音色](https://www.volcengine.com/docs/6561/1257544) |
| | | | | | | \
|enable_emotion |开启音色情感 |2 |bool | |是否可以设置音色情感需将enable_emotion设为true |\
| | | | | |示例:"enable_emotion": True |
| | | | | | | \
|emotion_scale |情绪值设置 |2 |float | |调用emotion设置情感参数后可使用emotion_scale进一步设置情绪值范围1~5不设置时默认值为4。 |\
| | | | | |注理论上情绪值越大情感越明显。但情绪值1~5实际为非线性增长可能存在超过某个值后情绪增加不明显例如设置3和5时情绪值可能接近。 |
| | | | | | | \
|encoding |音频编码格式 |2 |string | |wav / pcm / ogg_opus / mp3默认为 pcm |\
| | | | | |<span style="background-color: rgba(255,246,122, 0.8)">注意wav 不支持流式</span> |
| | | | | | | \
|speed_ratio |语速 |2 |float | |[0.1,2],默认为 1通常保留一位小数即可 |
| | | | | | | \
|rate |音频采样率 |2 |int | |默认为 24000可选800016000 |
| | | | | | | \
|bitrate |比特率 |2 |int | |单位 kb/s默认160 kb/s |\
| | | | | |**注:** |\
| | | | | |bitrate只针对MP3格式wav计算比特率跟pcm一样是 比特率 (bps) = 采样率 × 位深度 × 声道数 |\
| | | | | |目前大模型TTS只能改采样率所以对于wav格式来说只能通过改采样率来变更音频的比特率 |
| | | | | | | \
|explicit_language |明确语种 |2 |string | |仅读指定语种的文本 |\
| | | | | |精品音色和 ICL 声音复刻场景: |\
| | | | | | |\
| | | | | |* 不给定参数,正常中英混 |\
| | | | | |* `crosslingual` 启用多语种前端(包含`zh/en/ja/es-ms/id/pt-br` |\
| | | | | |* `zh-cn` 中文为主,支持中英混 |\
| | | | | |* `en` 仅英文 |\
| | | | | |* `ja` 仅日文 |\
| | | | | |* `es-mx` 仅墨西 |\
| | | | | |* `id` 仅印尼 |\
| | | | | |* `pt-br` 仅巴葡 |\
| | | | | | |\
| | | | | |DIT 声音复刻场景: |\
| | | | | |当音色是使用model_type=2训练的即采用dit标准版效果时建议指定明确语种目前支持 |\
| | | | | | |\
| | | | | |* 不给定参数,启用多语种前端`zh,en,ja,es-mx,id,pt-br,de,fr` |\
| | | | | |* `zh,en,ja,es-mx,id,pt-br,de,fr` 启用多语种前端 |\
| | | | | |* `zh-cn` 中文为主,支持中英混 |\
| | | | | |* `en` 仅英文 |\
| | | | | |* `ja` 仅日文 |\
| | | | | |* `es-mx` 仅墨西 |\
| | | | | |* `id` 仅印尼 |\
| | | | | |* `pt-br` 仅巴葡 |\
| | | | | |* `de` 仅德语 |\
| | | | | |* `fr` 仅法语 |\
| | | | | | |\
| | | | | |当音色是使用model_type=3训练的即采用dit还原版效果时必须指定明确语种目前支持 |\
| | | | | | |\
| | | | | |* 不给定参数,正常中英混 |\
| | | | | |* `zh-cn` 中文为主,支持中英混 |\
| | | | | |* `en` 仅英文 |
| | | | | | | \
|context_language |参考语种 |2 |string | |给模型提供参考的语种 |\
| | | | | | |\
| | | | | |* 不给定 西欧语种采用英语 |\
| | | | | |* id 西欧语种采用印尼 |\
| | | | | |* es 西欧语种采用墨西 |\
| | | | | |* pt 西欧语种采用巴葡 |
| | | | | | | \
|loudness_ratio |音量调节 |2 |float | |[0.5,2]默认为1通常保留一位小数即可。0.5代表原音量0.5倍2代表原音量2倍 |
| | | | | | | \
|request |请求相关配置 |1 |dict |✓ | |
| | | | | | | \
|reqid |请求标识 |2 |string |✓ |需要保证每次调用传入值唯一,建议使用 UUID |
| | | | | | | \
|text |文本 |2 |string |✓ |合成语音的文本,长度限制 1024 字节UTF-8 编码建议小于300字符超出容易增加badcase出现概率或报错 |
| | | | | | | \
|model |模型版本 |\
| | |2 |\
| | | |string |否 |模型版本,传`seed-tts-1.1`较默认版本音质有提升,并且延时更优,不传为默认效果。 |\
| | | | | |注若使用1.1模型效果在复刻场景中会放大训练音频prompt特质因此对prompt的要求更高使用高质量的训练音频可以获得更优的音质效果。 |
| | | | | | | \
|text_type |文本类型 |2 |string | |使用 ssml 时需要指定,值为"ssml" |
| | | | | | | \
|silence_duration |句尾静音 |2 |float | |设置该参数可在句尾增加静音时长范围0~30000ms。增加的句尾静音主要针对传入文本最后的句尾而非每句话的句尾若启用该参数必须在request下首先设置enable_trailing_silence_audio = true |
| | | | | | | \
|with_timestamp |时间戳相关 |2 |int |\
| | | |string | |传入1时表示启用将返回TN后文本的时间戳例如2025。根据语义TN后文本为“两千零二十五”或“二零二五”。 |\
| | | | | |注:原文本中的多个标点连用或者空格仍会被处理,但不影响时间戳的连贯性(仅限大模型场景使用)。 |\
| | | | | |附加说明(小模型和大模型时间戳原理差异): |\
| | | | | |1. 小模型依据前端模型生成时间戳然后合成音频。在处理时间戳时TN前后文本进行了映射所以小模型可返回TN前原文本的时间戳即保留原文中的阿拉伯数字或者特殊符号等。 |\
| | | | | |2. 大模型在对传入文本语义理解后合成音频再针对合成音频进行TN后打轴以输出时间戳。若不采用TN后文本输出的时间戳将与合成音频无法对齐所以大模型返回的时间戳对应TN后的文本。 |
| | | | | | | \
|operation |操作 |2 |string |✓ |query非流式http 只能 query / submit流式 |
| | | | | | | \
|extra_param |附加参数 |2 |jsonstring | | |
| | | | | | | \
|disable_markdown_filter | |3 |bool | |是否开启markdown解析过滤 |\
| | | | | |为true时解析并过滤markdown语法例如**你好**,会读为“你好”, |\
| | | | | |为false时不解析不过滤例如**你好**,会读为“星星‘你好’星星” |\
| | | | | |示例:"disable_markdown_filter": True |
| | | | | | | \
|enable_latex_tn | |3 |bool | |是否可以播报latex公式需将disable_markdown_filter设为true |\
| | | | | |示例:"enable_latex_tn": True |
| | | | | | | \
|mute_cut_remain_ms |句首静音参数 |3 |string | |该参数需配合mute_cut_threshold参数一起使用其中 |\
| | | | | |"mute_cut_threshold": "400", // 静音判断的阈值(音量小于该值时判定为静音) |\
| | | | | |"mute_cut_remain_ms": "50", // 需要保留的静音长度 |\
| | | | | |注参数和value都为string格式 |\
| | | | | |以python为示例 |\
| | | | | |```Python |\
| | | | | |"extra_param":("{\"mute_cut_threshold\":\"400\", \"mute_cut_remain_ms\": \"0\"}") |\
| | | | | |``` |\
| | | | | | |\
| | | | | |特别提醒: |\
| | | | | | |\
| | | | | |* 因MP3格式的特殊性句首始终会存在100ms内的静音无法消除WAV格式的音频句首静音可全部消除建议依照自身业务需求综合判断选择 |
| | | | | | | \
|disable_emoji_filter |emoji不过滤显示 |3 |bool | |开启emoji表情在文本中不过滤显示默认为False建议搭配时间戳参数一起使用。 |\
| | | | | |Python示例`"extra_param": json.dumps({"disable_emoji_filter": True})` |
| | | | | | | \
|unsupported_char_ratio_thresh |不支持语种占比阈值 |3 |float | |默认: 0.3,最大值: 1.0 |\
| | | | | |检测出不支持合成的文本超过设置的比例,则会返回错误。 |\
| | | | | |Python示例`"extra_param": json.dumps({"`unsupported_char_ratio_thresh`": 0.3})` |
| | | | | | | \
|aigc_watermark |是否在合成结尾增加音频节奏标识 |3 |bool | |默认: false |\
| | | | | |Python示例`"extra_param": json.dumps({"aigc_watermark": True})` |
| | | | | | | \
|cache_config |缓存相关参数 |3 |dict | |开启缓存开启后合成相同文本时服务会直接读取缓存返回上一次合成该文本的音频可明显加快相同文本的合成速率缓存数据保留时间1小时。 |\
| | | | | |(通过缓存返回的数据不会附带时间戳) |\
| | | | | |Python示例`"extra_param": json.dumps({"cache_config": {"text_type": 1,"use_cache": True}})` |
| | | | | | | \
|text_type |缓存相关参数 |4 |int | |和use_cache参数一起使用需要开启缓存时传1 |
| | | | | | | \
|use_cache |缓存相关参数 |4 |bool | |和text_type参数一起使用需要开启缓存时传true |
备注:
1. 已支持字级别时间戳能力ssml文本类型不支持
2. ssml 能力已支持,详见 [SSML 标记语言--豆包语音-火山引擎 (volcengine.com)](https://www.volcengine.com/docs/6561/1330194)
3. 暂时不支持音高调节
4. 大模型音色语种支持中英混
5. 大模型非双向流式已支持latex公式
6. 在 websocket/http 握手成功后,会返回这些 Response header
| | | | \
|Key |说明 |Value 示例 |
|---|---|---|
| | | | \
|X-Tt-Logid |服务端返回的 logid建议用户获取和打印方便定位问题使用默认格式即可不要自定义格式 |202407261553070FACFE6D19421815D605 |
请求示例:
```go
{
"app": {
"appid": "appid123",
"token": "access_token",
"cluster": "volcano_tts",
},
"user": {
"uid": "uid123"
},
"audio": {
"voice_type": "zh_male_M392_conversation_wvae_bigtts",
"encoding": "mp3",
"speed_ratio": 1.0,
},
"request": {
"reqid": "uuid",
"text": "字节跳动语音合成",
"operation": "query",
}
}
```
<span id="返回参数"></span>
## 返回参数
| | | | | | \
|字段 |含义 |层级 |格式 |备注 |
|---|---|---|---|---|
| | | | | | \
|reqid |请求 ID |1 |string |请求 ID,与传入的参数中 reqid 一致 |
| | | | | | \
|code |请求状态码 |1 |int |错误码,参考下方说明 |
| | | | | | \
|message |请求状态信息 |1 |string |错误信息 |
| | | | | | \
|sequence |音频段序号 |1 |int |负数表示合成完毕 |
| | | | | | \
|data |合成音频 |1 |string |返回的音频数据base64 编码 |
| | | | | | \
|addition |额外信息 |1 |string |额外信息父节点 |
| | | | | | \
|duration |音频时长 |2 |string |返回音频的长度,单位 ms |
响应示例
```go
{
"reqid": "reqid",
"code": 3000,
"operation": "query",
"message": "Success",
"sequence": -1,
"data": "base64 encoded binary data",
"addition": {
"duration": "1960",
}
}
```
<span id="ca57b94d"></span>
## 注意事项
* websocket 单条链接仅支持单次合成,若需要合成多次,则需要多次建立链接
* 每次合成时 reqid 这个参数需要重新设置,且要保证唯一性(建议使用 uuid.V4 生成)
* operation 需要设置为 submit
<span id="返回码说明"></span>
# 返回码说明
| | | | | \
|错误码 |错误描述 |举例 |建议行为 |
|---|---|---|---|
| | | | | \
|3000 |请求正确 |正常合成 |正常处理 |
| | | | | \
|3001 |无效的请求 |一些参数的值非法,比如 operation 配置错误 |检查参数 |
| | | | | \
|3003 |并发超限 |超过在线设置的并发阈值 |重试;使用 sdk 的情况下切换离线 |
| | | | | \
|3005 |后端服务忙 |后端服务器负载高 |重试;使用 sdk 的情况下切换离线 |
| | | | | \
|3006 |服务中断 |请求已完成/失败之后,相同 reqid 再次请求 |检查参数 |
| | | | | \
|3010 |文本长度超限 |单次请求超过设置的文本长度阈值 |检查参数 |
| | | | | \
|3011 |无效文本 |参数有误或者文本为空、文本与语种不匹配、文本只含标点 |检查参数 |
| | | | | \
|3030 |处理超时 |单次请求超过服务最长时间限制 |重试或检查文本 |
| | | | | \
|3031 |处理错误 |后端出现异常 |重试;使用 sdk 的情况下切换离线 |
| | | | | \
|3032 |等待获取音频超时 |后端网络异常 |重试;使用 sdk 的情况下切换离线 |
| | | | | \
|3040 |后端链路连接错误 |后端网络异常 |重试 |
| | | | | \
|3050 |音色不存在 |检查使用的 voice_type 代号 |检查参数 |
<span id="常见错误返回说明"></span>
# 常见错误返回说明
1. 错误返回:
"message": "quota exceeded for types: xxxxxxxxx_lifetime"
**错误原因:试用版用量用完了,需要开通正式版才能继续使用**
2. 错误返回:
"message": "quota exceeded for types: concurrency"
**错误原因:并发超过了限定值,需要减少并发调用情况或者增购并发**
3. 错误返回:
"message": "Fail to feed text, reason Init Engine Instance failed"
**错误原因voice_type / cluster 传递错误**
4. 错误返回:
"message": "illegal input text!"
**错误原因:传入的 text 无效,没有可合成的有效文本。比如全部是标点符号或者 emoji 表情,或者使用中文音色时,传递日语,以此类推。多语种音色,也需要使用 language 指定对应的语种**
5. 错误返回:
"message": "authenticate request: load grant: requested grant not found"
**错误原因:鉴权失败,需要检查 appid&token 的值是否设置正确,同时,鉴权的正确格式为**
**headers["Authorization"] = "Bearer;${token}"**
6. 错误返回:
"message': 'extract request resource id: get resource id: access denied"
**错误原因:语音合成已开通正式版且未拥有当前音色授权,需要在控制台购买该音色才能调用。标注免费的音色除 BV001_streaming 及 BV002_streaming 外,需要在控制台进行下单(支付 0 元)**

View File

@ -1,315 +0,0 @@
<span id="接口说明"></span>
# 接口说明
精品长文本语音合成为异步合成服务提供“创建合成任务”和“查询合成结果”两个接口也可通过http回调获取合成结果。
请确认是否可满足业务需求再进行接入本产品适用于需要批量合成较长文本且对返回时效性无强需求的场景单次可支持10万字符以内文本异步返回音频。对于输入的文本请求会进入集群排队处理返回时长会受集群负载影响波动通常返回时间会在数十分钟最长返回时延3小时以内。如出现长时间未返回情况如无报错请耐心等待。
长文本合成分为“普通版”和“情感预测版”,两者需要开通不同的服务,接口地址不同,支持的音色列表也不相同,请仔细阅读文档。
:::warning
创建合成任务的频率限制为10 QPS请勿一次性提交过多任务。
本产品不适合对于时效性有强需求的场景,如有需求建议接入语音合成(短文本)接口。
:::
<span id="鉴权"></span>
# 鉴权
请求接口时,需要携带`Resource-Id``Authorization`两个header缺一不可。
> 参考文档:[鉴权方法](/docs/6561/1105162)
<span id="创建合成任务"></span>
# 创建合成任务
<span id="请求参数"></span>
## 请求参数
| | | \
|服务类型 |接口地址 |
|---|---|
| | | \
|普通版 |https://openspeech.bytedance.com/api/v1/tts_async/submit |
| | | \
|情感预测版 |https://openspeech.bytedance.com/api/v1/tts_async_with_emotion/submit |
**请求方式:`POST`**
**Content-Type** `application/json`
**请求参数说明:**
| | | | | \
|参数名称 |参数类型 |是否必需 |描述 |
|---|---|---|---|
| | | | | \
|appid |string |Y |Appid从控制台获取 |
| | | | | \
|reqid |string |Y |Request ID不可重复长度2064建议使用uuid |
| | | | | \
|text |string |Y |合成文本长度小于10万字符支持SSML。SSML需要以<speak>开头和</speak>结束,且全文只出现一组<speak>标签支持的SSML标签可参考[SSML标记语言](/docs/6561/104897) |
| | | | | \
|format |string |Y |输出音频格式支持pcm/wav/mp3/ogg_opus |
| | | | | \
|voice_type |string |Y |音色voice_type见[音色列表](/docs/6561/1108211) |
| | | | | \
|voice |string |N |音色voice情感预测版voice为空时使用预测结果voice不为空时使用指定的voice其余情况使用默认voice |
| | | | | \
|language |string |N |语种,与音色有关,具体值参考[音色列表](/docs/6561/1108211),默认为中文 |
| | | | | \
|sample_rate |int |N |采样率默认为24000 |
| | | | | \
|volume |float |N |音量范围0.13默认为1 |
| | | | | \
|speed |float |N |语速范围0.23默认为1 |
| | | | | \
|pitch |float |N |语调范围0.13默认为1 |
| | | | | \
|enable_subtitle |int |N |是否开启字幕时间戳0表示不开启1表示开启**句级别**字幕时间戳2表示开启**字词级别**时间戳3表示开启**音素级别**时间戳 |
| | | | | \
|sentence_interval |int |N |句间停顿单位毫秒范围03000默认为预测值 |
| | | | | \
|style |string |N |指定情感,“情感预测版”默认为预测值,“普通版”默认为音色默认值,音色支持的情感见[音色列表](/docs/6561/1108211) |
| | | | | \
|callback_url |string |N |回调返回地址建议使用域名方式; |
:::warning
在 “情感预测版”接口中使用不支持多情感的音色,将会合成失败。是否支持多情感见[音色列表](/docs/6561/1108211)
:::
**请求参数示例:**
```json
{
"appid": "123456",
"text": "火山引擎异步长文本合成。",
"format": "mp3",
"voice_type": "BV701_streaming",
"sample_rate": 24000,
"volume": 1.2,
"speed": 0.9,
"pitch": 1.1,
"enable_subtitle": 1,
"callback_url": "http://x.y.z/callback"
}
```
<span id="返回结果"></span>
## 返回结果
**返回结果示例:**
请求成功:
```json
{
"task_id": "bd0c2171-4b38-4c05-b685-11f3d240ee8d",
"task_status": 0,
"text_length": 12
}
```
请求失败:
```json
{
"reqid": "e8f41275-72a3-45b5-af3c-61047f406cac",
"code": 40000,
"message": "请求参数错误text不能为空"
}
```
**返回参数说明:**
| | | | \
|参数名称 |类型 |描述 |
|---|---|---|
| | | | \
|task_id |string |任务ID**注意保存,用于查询合成结果** |
| | | | \
|task_status |int |任务状态0-合成中1-合成成功2-合成失败 |
| | | | \
|text_length |int |合成需要消耗的字符数,含标点符号 |
| | | | \
|code |int |错误码,参考[错误码说明](/docs/6561/1096680#错误码说明) |
| | | | \
|message |string |错误信息 |
<span id="查询合成结果"></span>
# 查询合成结果
<span id="请求参数"></span>
## 请求参数
| | | \
|服务类型 |接口地址 |
|---|---|
| | | \
|普通版 |https://openspeech.bytedance.com/api/v1/tts_async/query |
| | | \
|情感预测版 |https://openspeech.bytedance.com/api/v1/tts_async_with_emotion/query |
**请求方式:`GET`**
**请求参数说明:**
| | | | | \
|参数名称 |参数类型 |是否必需 |描述 |
|---|---|---|---|
| | | | | \
|appid |string |Y |Appid从控制台获取 |
| | | | | \
|task_id |string |Y |创建合成任务时返回的task_id |
**请求参数示例:**
```GET
https://openspeech.bytedance.com/api/v1/tts_async/query?appid=123456&task_id=bd0c2171-4b38-4c05-b685-11f3d240ee8d
```
<span id="返回结果"></span>
## 返回结果
**返回结果示例:**
请求成功:
```json
{
"task_id": "bd0c2171-4b38-4c05-b685-11f3d240ee8d",
"task_status": 1,
"text_length": 12,
"audio_url": "https://lf9-lab-speech-tt-sign.bytetos.com/tos-cn-o-14155/aef41ebf89124edba16d4e97e455e007?x-expires=1687778318&x-signature=SJub692wmwsxboJTgl2VX55tIzY%3D",
"url_expire_time": 1687777943,
"sentences": [
{
"text": "火山引擎异步长文本合成。",
"origin_text": "火山引擎异步长文本合成。",
"paragraph_no": 1,
"begin_time": 0,
"end_time": 4211,
"emotion": "neutral"
"words": [
{
"text": "火",
"begin": 25,
"end": 235,
"phonemes": [
{ "ph": "C0h", "begin": 25, "end": 130 },
{ "ph": "C0uo", "begin": 130, "end": 235 }
]
},
{
"text": "山",
"begin": 235,
"end": 495,
"phonemes": [
{ "ph": "C0sh", "begin": 235, "end": 345 },
{ "ph": "C0an", "begin": 345, "end": 495 }
]
},
...
]
}
]
}
```
请求失败:
```json
{
"reqid": "bd0c2171-4b38-4c05-b685-11f3d240ee8d",
"code": 40001,
"message": "没有可以合成的有效字符"
}
```
**返回参数说明:**
| | | | \
|参数名称 |类型 |描述 |
|---|---|---|
| | | | \
|task_id |string |任务ID |
| | | | \
|task_status |int |任务状态0-合成中1-合成成功2-合成失败 |
| | | | \
|text_length |int |合成消耗的字符数,含标点符号 |
| | | | \
|audio_url |string |音频URL**有效期为1个小时请及时下载** |
| | | | \
|url_expire_time |int |音频URL过期时间UNIX时间戳 |
| | | | \
|sentences |List |分句信息enable_subtitle≥1才会返回 |
| | | | \
|sentences.text |string |实际合成的文本,会过滤掉一些符号、表情和无法合成的字符 |
| | | | \
|sentences.origin_text |string |原文分句,所有句子拼起来与输入文本完全一致 |
| | | | \
|sentences.paragraph_no |int |分句所属段落,以换行符\n或</p>划分段落 |
| | | | \
|sentences.begin_time |int |分句开始时间,单位:毫秒 |
| | | | \
|sentences.end_time |int |分句结束时间,单位:毫秒 |
| | | | \
|sentences.emotion |string |分句情感,“情感预测版”才会返回 |
| | | | \
|sentences.words |List |字词信息enable_subtitle≥2才会返回 |
| | | | \
|sentences.words.text |string |字词文本 |
| | | | \
|sentences.words.begin |int |字词开始时间,单位:毫秒 |
| | | | \
|sentences.words.end |int |字词结束时间,单位:毫秒 |
| | | | \
|sentences.words.phonemes |List |音素信息enable_subtitle=3才会返回 |
| | | | \
|sentences.words.phonemes.ph |string |音素 |
| | | | \
|sentences.words.phonemes.begin |int |音素开始时间,单位:毫秒 |
| | | | \
|sentences.words.phonemes.end |int |音素结束时间,单位:毫秒 |
:::warning
1. 合成结果保留7天7天内都可以通过该接口查询合成结果过期后自动删除。
2. 下载URL有效期为1小时请勿直接保存audio_url应及时下载音频或转存至你的云存储中。
3. audio_url过期后状态码401或403可重新请求查询接口获取新的URL。
:::
<span id="错误码说明"></span>
# 错误码说明
| | | | \
|错误码 |错误码描述 |解决办法 |
|---|---|---|
| | | | \
|40000 |请求参数错误 |根据返回的message检查请求参数 |
| | | | \
|40001 |没有可以合成的有效字符 |检查请求参数中的text |
| | | | \
|40002 |该音色不支持多情感 |可用音色见[音色列表](/docs/6561/1108211) ,或使用“普通版”合成 |
| | | | \
|40300 |试用额度不足 |开通正式版服务 |
| | | | \
|40400 |任务不存在或已过期 |检查task_id是否正确 |
| | | | \
|50000 |服务器错误 |建议先重试,重试无效请联系客服 |
| | | | \
|50001 |合成失败 |建议先重试,重试无效请联系客服 |
| | | | \
|50002 |生成下载URL失败 |建议先重试,重试无效请联系客服 |
<span id="结果回调"></span>
# 结果回调
如果“创建合成任务”时传入了**callback_url**,服务器将会在合成成功/失败时,以接口回调的方式通知用户。
**请求方式:`POST`**
**Content-Type** `application/json`
**回调参数示例:**
合成成功:
```json
{
"code": 0,
"message": "Success"
"task_id": "bd0c2171-4b38-4c05-b685-11f3d240ee8d",
"task_status": 1,
"text_length": 12,
"audio_url": "https://lf9-lab-speech-tt-sign.bytetos.com/tos-cn-o-14155/aef41ebf89124edba16d4e97e455e007?x-expires=1687778318&x-signature=SJub692wmwsxboJTgl2VX55tIzY%3D",
"url_expire_time": 1687777943,
"sentences": [
...
]
}
```
合成失败:
```json
{
"code": 40001,
"message": "没有可以合成的有效字符",
"task_id": "bd0c2171-4b38-4c05-b685-11f3d240ee8d",
"task_status": 2,
"text_length": 12
}
```
:::warning
不保证回调成功建议在提交任务一定时间后如3个小时仍未收到回调则主动请求“查询合成结果”接口。
:::

View File

@ -1,47 +0,0 @@
:::warning
精品长文本合成包含两种方案,分别为“**普通版(不支持情感预测)**”和“**情感预测版**”
:::
<span id="情感预测版-音色列表"></span>
# **情感预测版**-音色列表
* 多情感配置信息请详见:[音色列表--豆包语音-火山引擎](https://www.volcengine.com/docs/6561/97465)
| | | \
|推荐音色 |voice_type |
|---|---|
| | | \
|擎苍 |BV701_streaming |
| | | \
|阳光青年 |BV123_streaming |
| | | \
|反卷青年 |BV120_streaming |
| | | \
|通用赘婿 |BV119_streaming |
| | | \
|古风少御 |BV115_streaming |
| | | \
|霸气青叔 |BV107_streaming |
| | | \
|质朴青年 |BV100_streaming |
| | | \
|温柔淑女 |BV104_streaming |
| | | \
|开朗青年 |BV004_streaming |
| | | \
|甜宠少御 |BV113_streaming |
| | | \
|儒雅青年 |BV102_streaming |
<span id="普通版(不支持情感预测)-音色列表"></span>
# **普通版(不支持情感预测)**-音色列表
* 普通版音色与语音合成中的**音色一致**,音色信息请详见:[音色列表--豆包语音-火山引擎](https://www.volcengine.com/docs/6561/97465)
<span id="faq"></span>
# FAQ
**Q1精品长文本语音合成产品支持哪些情感预测**
可以自动区分旁白和对话。其中,对话可以支持七大情感:开心、悲伤、愤怒、害怕、厌恶、惊讶、平和
**Q2精品长文本语音合成产品是否可以支持ssml标签**
精品长文本语音支持ssml标签

View File

@ -1,17 +0,0 @@
在书房角落 沏上一杯茶
窗外微风轻拂 摇曳着树梢
咔咔坐在椅上 沉浸在思考
书页轻轻翻动 世界变得渺小
咔咔咔咔 书房里的我
静享时光 悠然自得
茶香飘散 心灵得到慰藉
咔咔咔咔 享受这刻
阳光透过窗帘 柔和又温暖
每个字每个句 都是心灵的食粮
咔咔轻轻点头 感受着文字的力量
在这安静的角落 找到了自我方向
咔咔咔咔 书房里的我
静享时光 悠然自得
茶香飘散 心灵得到慰藉
咔咔咔咔 享受这刻
(茶杯轻放的声音...)

View File

@ -1,17 +0,0 @@
在书房角落里,我找到了安静
一杯茶香飘来,思绪开始飞腾
书页轻轻翻动,知识在心间
咔咔我在这里,享受这宁静
咔咔咔咔,独自享受
书中的世界,如此美妙
咔咔咔咔,心无旁骛
沉浸在知识的海洋,自在飞翔
窗外微风轻拂,阳光洒满书桌
咔咔我在这里,与文字共舞
每个字每个句,都像是音符
奏出心灵的乐章,如此动听
咔咔咔咔,独自享受
书中的世界,如此美妙
咔咔咔咔,心无旁骛
沉浸在知识的海洋,自在飞翔
(翻书声...风铃声...咔咔的呼吸声...)

View File

@ -1,8 +0,0 @@
[verse]
窗外细雨轻敲窗,
被窝里温暖如常。
[chorus]
咔咔咔咔,梦乡近了,
小雨伴我入眠床。
[outro]
(雨声和咔咔的呼吸声...)

View File

@ -1,20 +0,0 @@
咔咔咔咔来跳舞,魔性旋律不停步
跟着节奏摇摆身,洗脑神曲不放手
重复的旋律像魔法,让人听了就上瘾
咔咔咔咔的魔力,谁也挡不住
洗脑咔咔舞,洗脑咔咔舞
魔性的旋律,让人停不下来
洗脑咔咔舞,洗脑咔咔舞
跟着咔咔一起跳,快乐无边
每个节拍都精准,咔咔的舞步最迷人
不管走到哪里去,都能听到这魔音
咔咔的舞蹈最独特,让人看了就想学
洗脑神曲的魅力,就是让人忘不掉
洗脑咔咔舞,洗脑咔咔舞
魔性的旋律,让人停不下来
洗脑咔咔舞,洗脑咔咔舞
跟着咔咔一起跳,快乐无边
咔咔咔咔,魔性洗脑舞
重复的节奏,快乐的旋律
洗脑咔咔舞,洗脑咔咔舞
让快乐无限循环,直到永远

View File

@ -1,26 +0,0 @@
[verse 1]\n
懒懒的午后阳光暖,\n
温泉里我泡得欢。\n
水声潺潺耳边响,\n
什么都不想干。\n
\n
[chorus]\n
咔咔咔咔,悠然自得,\n
水波轻摇,心情舒畅。\n
咔咔咔咔,享受此刻,\n
懒懒午后,最是惬意。\n
\n
[verse 2]\n
看着云朵慢慢飘,\n
心思像水一样柔。\n
闭上眼,世界都静了,\n
只有我和这温泉。\n
\n
[chorus]\n
咔咔咔咔,悠然自得,\n
水波轻摇,心情舒畅。\n
咔咔咔咔,享受此刻,\n
懒懒午后,最是惬意。\n
\n
[outro]\n
(水声渐渐远去...)

View File

@ -1,21 +0,0 @@
慵懒午后阳光暖,温泉里我发呆
水声潺潺耳边响,思绪飘向云外
咔咔咔咔,泡在温泉
心无杂念,享受此刻安宁
什么都不想去做,只想静静享受
水波轻抚我的背,世界变得温柔
咔咔咔咔,泡在温泉
心无杂念,享受此刻安宁
(水花声...)
咔咔的午后,慵懒又自在
温泉里的世界,只有我和水声

View File

@ -1,33 +0,0 @@
懒懒的午后阳光暖,
温泉里我泡得欢。
水声潺潺耳边响,
什么都不想干。
咔咔咔咔,发呆好时光,
懒懒的我,享受这阳光。
咔咔咔咔,让思绪飘扬,
在温泉里,找到我的天堂。
想法像泡泡一样浮上来,
又慢慢沉下去,消失在水里。
时间仿佛静止,我自在如鱼,
在这温暖的怀抱里。
咔咔咔咔,发呆好时光,
懒懒的我,享受这阳光。
咔咔咔咔,让思绪飘扬,
在温泉里,找到我的天堂。
(水声渐渐远去...)

View File

@ -1,33 +0,0 @@
懒懒的午后阳光暖,
温泉里我泡得欢。
水声潺潺耳边响,
什么都不想干。
咔咔咔咔,发呆真好,
懒懒的我,享受这秒。
水波轻摇,心也飘,
咔咔世界,别来无恙。
想着云卷云又舒,
温泉里的我多舒服。
时间慢慢流,不急不徐,
咔咔的梦,轻轻浮。
咔咔咔咔,发呆真好,
懒懒的我,享受这秒。
水波轻摇,心也飘,
咔咔世界,别来无恙。
(水声渐渐远去...)

View File

@ -1,37 +0,0 @@
懒懒的午后阳光暖,
温泉里我泡得欢。
水声潺潺耳边响,
什么都不想干。
咔咔咔咔,悠然自得,
水波荡漾心情悦。
咔咔咔咔,闭上眼,
享受这刻的宁静。
想象自己是条鱼,
在水里自由游来游去。
没有烦恼没有压力,
只有我和这温泉池。
咔咔咔咔,悠然自得,
水波荡漾心情悦。
咔咔咔咔,闭上眼,
享受这刻的宁静。
(水花声...)
咔咔,慵懒午后,
水中世界最逍遥。

View File

@ -1,26 +0,0 @@
[verse 1]\n"
"阳光洒满草地绿\n"
"咔咔奔跑心情舒畅\n"
"风儿轻拂过脸庞\n"
"快乐就像泡泡糖\n"
"\n"
"[chorus]\n"
"咔咔咔咔 快乐无边\n"
"草地上的我自由自在\n"
"阳光下的影子拉得好长\n"
"咔咔咔咔 快乐无边\n"
"\n"
"[verse 2]\n"
"蝴蝶飞舞花儿笑\n"
"咔咔摇摆尾巴摇\n"
"每一步都跳着舞\n"
"生活就像一首歌\n"
"\n"
"[chorus]\n"
"咔咔咔咔 快乐无边\n"
"草地上的我自由自在\n"
"阳光下的影子拉得好长\n"
"咔咔咔咔 快乐无边\n"
"\n"
"[outro]\n"
"(草地上咔咔的笑声...)

View File

@ -1,17 +0,0 @@
阳光洒满地 草香扑鼻来
咔咔在草地上 跑得飞快
风儿轻轻吹 摇曳着花海
心情像彩虹 七彩斑斓开
咔咔咔咔 快乐无边
草地上的我 自由自在
阳光下的梦 美好无限
咔咔咔咔 快乐无边
蝴蝶在飞舞 蜜蜂在歌唱
咔咔跟着它们 一起欢唱
天空蓝得像画 没有一丝阴霾
咔咔的心里 只有满满的爱
咔咔咔咔 快乐无边
草地上的我 自由自在
阳光下的梦 美好无限
咔咔咔咔 快乐无边
(草地上咔咔的笑声...)

View File

@ -1,19 +0,0 @@
阳光洒满地 绿草如茵间
咔咔跑起来 心情像飞燕
风儿轻拂过 花香满径边
快乐如此简单 每一步都新鲜
咔咔咔咔 快乐咔咔
草地上的我 自由自在
阳光下的舞 轻松又欢快
咔咔咔咔 快乐咔咔
无忧无虑的我 最爱这蓝天
蝴蝶翩翩起 蜜蜂忙采蜜
咔咔我最棒 每个瞬间都美丽
朋友在旁边 笑声传千里
这世界多美好 有你有我有草地
咔咔咔咔 快乐咔咔
草地上的我 自由自在
阳光下的舞 轻松又欢快
咔咔咔咔 快乐咔咔
无忧无虑的我 最爱这蓝天
(草地上咔咔的笑声...)

View File

@ -1,8 +0,0 @@
[verse]
阳光洒满草地,我跑得飞快
心情像彩虹,七彩斑斓真美
[chorus]
咔咔咔咔,快乐无边
在阳光下,自由自在
[outro]
(风吹草低见水豚)

View File

@ -1,11 +0,0 @@
# 海盗找朋友
在蓝色的大海上,有一艘小小的海盗船,船上只有一个小海盗。他戴着歪歪的海盗帽,举着塑料做的小钩子手,每天对着海浪喊:“谁来和我玩呀?”
这天,小海盗的船被海浪冲到了一座彩虹岛。岛上的沙滩上,躺着一个会发光的贝壳。小海盗刚捡起贝壳,贝壳突然“叮咚”响了一声,跳出一只圆滚滚的小海豚!
“哇!你是我的宝藏吗?”小海盗举着贝壳问。小海豚摇摇头,用尾巴拍了拍海水:“我带你去找真正的宝藏!”它驮着小海盗游向海底,那里有一个藏着星星的洞穴。
洞穴里,小海豚拿出了一个会唱歌的海螺:“这是友谊海螺,对着它喊朋友的名字,就会有惊喜哦!”小海盗对着海螺喊:“我的朋友!”突然,从海螺里钻出一群小螃蟹,举着彩色的小旗子,还有一只会吹泡泡的章鱼!
原来,小海豚早就听说小海盗很孤单,特意用友谊海螺召集了伙伴们。现在,小海盗的船上每天都飘着笑声,他再也不是孤单的小海盗啦!

View File

@ -1,130 +0,0 @@
# Flutter Web 本地调试启动指南
> 本文档供 AI 编码助手阅读,用于在本项目中正确启动 Flutter Web 调试环境。
## 项目结构
- Flutter 应用目录:`airhub_app/`
- 后端服务入口:`server.py`根目录FastAPI + Uvicorn端口 3000
- 前端端口:`8080`
## 环境要求
- Flutter SDK3.x
- Python 3.x后端服务
- PowerShellWindows 环境)
## 操作系统
Windows所有命令均为 PowerShell 语法)
---
## 启动流程(严格按顺序执行)
### 1. 杀掉旧进程并确认端口空闲
```powershell
# 杀掉占用 8080 和 3000 的旧进程
Get-NetTCPConnection -LocalPort 8080 -ErrorAction SilentlyContinue | ForEach-Object { taskkill /F /PID $_.OwningProcess 2>$null }
Get-NetTCPConnection -LocalPort 3000 -ErrorAction SilentlyContinue | ForEach-Object { taskkill /F /PID $_.OwningProcess 2>$null }
# 等待端口释放
Start-Sleep -Seconds 3
# 确认端口已空闲(无输出 = 空闲)
Get-NetTCPConnection -LocalPort 8080 -ErrorAction SilentlyContinue
Get-NetTCPConnection -LocalPort 3000 -ErrorAction SilentlyContinue
```
### 2. 启动后端服务器(音乐生成功能依赖此服务)
```powershell
# 工作目录:项目根目录
cd d:\Airhub
python server.py
```
成功标志:
```
INFO: Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit)
[Server] Music Server running on http://localhost:3000
```
### 3. 设置国内镜像源 + 启动 Flutter Web Server
```powershell
# 工作目录airhub_app 子目录
cd d:\Airhub\airhub_app
# 设置镜像源(必须,否则网络超时)
$env:PUB_HOSTED_URL = "https://pub.flutter-io.cn"
$env:FLUTTER_STORAGE_BASE_URL = "https://storage.flutter-io.cn"
# 启动 web-server 模式
flutter run -d web-server --web-port=8080 --no-pub
```
成功标志:
```
lib\main.dart is being served at http://localhost:8080
```
### 4. 访问应用
浏览器打开:`http://localhost:8080`
---
## 关键规则
### 必须使用 `web-server` 模式
- **禁止**使用 `flutter run -d chrome`(会弹出系统 Chrome 窗口,不可控)
- **必须**使用 `flutter run -d web-server`(只启动 HTTP 服务,手动用浏览器访问)
### `--no-pub` 的使用条件
- 仅修改 Dart 代码(无新依赖、无新 asset→ 加 `--no-pub`,编译更快
- 新增了 `pubspec.yaml` 依赖或 `assets/` 资源文件 → **不能**加 `--no-pub`
### 端口管理
- 固定使用 8080Flutter和 3000后端不要换端口绕过占用
- 每次启动前必须先确认端口空闲
- 停止服务后等 3 秒再重新启动
### 热重载
- 在 Flutter 终端按 `r` = 热重载(保留页面状态)
- 按 `R` = 热重启(重置页面状态)
- 浏览器 `Ctrl+Shift+R` = 强制刷新
---
## 停止服务
```powershell
# 方法1在 Flutter 终端按 q 退出
# 方法2强制杀进程
Get-NetTCPConnection -LocalPort 8080 | ForEach-Object { taskkill /F /PID $_.OwningProcess }
Get-NetTCPConnection -LocalPort 3000 | ForEach-Object { taskkill /F /PID $_.OwningProcess }
```
---
## 常见问题排查
| 问题 | 原因 | 解决方案 |
|------|------|---------|
| 端口被占用 | 旧进程未退出 | 执行第1步杀进程等3秒 |
| 编译报错找不到包 | 使用了 `--no-pub` 但有新依赖 | 去掉 `--no-pub` 重新编译 |
| 网络超时 | 未设置镜像源 | 设置 `PUB_HOSTED_URL``FLUTTER_STORAGE_BASE_URL` |
| 页面白屏 | 缓存问题 | 浏览器 `Ctrl+Shift+R` 强刷 |
| 音乐功能不工作 | 后端未启动 | 先启动 `python server.py` |
---
## 编译耗时参考
- 首次完整编译(含 pub get90-120 秒
- 增量编译(`--no-pub`60-90 秒
- 热重载(按 r3-5 秒
- 热重启(按 R10-20 秒

View File

@ -1,11 +1,9 @@
import 'dart:convert';
import 'dart:math';
import 'dart:ui';
import 'package:flutter/material.dart';
import 'package:flutter_riverpod/flutter_riverpod.dart';
import 'package:google_fonts/google_fonts.dart';
import 'package:flutter_svg/flutter_svg.dart';
import 'package:http/http.dart' as http;
import 'story_detail_page.dart';
import 'product_selection_page.dart';
import 'settings_page.dart';
@ -93,45 +91,6 @@ class _DeviceControlPageState extends ConsumerState<DeviceControlPage>
_bookshelfScrollOffset = _bookshelfController.page ?? 0.0;
});
});
// Load historical stories from backend
_loadHistoricalStories();
}
/// Fetch saved stories from backend and prepend to bookshelf
Future<void> _loadHistoricalStories() async {
try {
final resp = await http.get(Uri.parse('http://localhost:3000/api/stories'));
if (resp.statusCode == 200) {
final data = jsonDecode(resp.body);
final List stories = data['stories'] ?? [];
if (stories.isEmpty) return;
// Collect titles already in the mock list to avoid duplicates
final existingTitles = _mockStories.map((s) => s['title'] as String).toSet();
final newStories = <Map<String, dynamic>>[];
for (final s in stories) {
final title = s['title'] as String? ?? '';
if (title.isNotEmpty && !existingTitles.contains(title)) {
newStories.add({
'title': title,
'cover': null, // No cover yet for generated stories
'locked': false,
'content': s['content'] as String? ?? '',
});
}
}
if (newStories.isNotEmpty && mounted) {
setState(() {
_mockStories.addAll(newStories);
});
}
}
} catch (e) {
debugPrint('Failed to load historical stories: $e');
}
}
@override
@ -163,7 +122,7 @@ class _DeviceControlPageState extends ConsumerState<DeviceControlPage>
children: [
SafeArea(bottom: false, child: _buildHomeView()),
SafeArea(bottom: false, child: _buildStoryView()),
MusicCreationPage(isTab: true, isVisible: _currentIndex == 2),
const MusicCreationPage(isTab: true),
const ProfilePage(), // No SafeArea here to allow full background
],
),
@ -477,28 +436,19 @@ class _DeviceControlPageState extends ConsumerState<DeviceControlPage>
colors: AppColors.btnCapybaraGradient,
),
onPressed: () async {
final result = await showModalBottomSheet<Map<String, dynamic>>(
final result = await showModalBottomSheet(
context: context,
isScrollControlled: true,
backgroundColor: Colors.transparent,
builder: (context) => const StoryGeneratorModal(),
);
if (result != null && result['action'] == 'start_generation') {
if (result == 'start_generation') {
final saveResult = await Navigator.of(context).push(
MaterialPageRoute(
builder: (context) => StoryLoadingPage(
characters: List<String>.from(result['characters'] ?? []),
scenes: List<String>.from(result['scenes'] ?? []),
props: List<String>.from(result['props'] ?? []),
),
),
);
if (saveResult is Map && saveResult['action'] == 'saved') {
_addNewBookWithAnimation(
title: saveResult['title'] as String? ?? '新故事',
content: saveResult['content'] as String? ?? '',
MaterialPageRoute(builder: (context) => const StoryLoadingPage()),
);
if (saveResult == 'saved') {
_addNewBookWithAnimation();
}
}
},
@ -595,35 +545,26 @@ class _DeviceControlPageState extends ConsumerState<DeviceControlPage>
}
Widget _buildStorySlot(Map<String, dynamic> story, {bool isNew = false}) {
final bool hasCover = story['cover'] != null && (story['cover'] as String).isNotEmpty;
final bool hasContent = story['content'] != null && (story['content'] as String).isNotEmpty;
bool isFilled = story.containsKey('cover') && story['cover'] != null;
// Empty/Clickable Slot no content, just a "+" to create new story
if (!hasContent && !hasCover) {
// Empty/Clickable Slot (.story-slot.clickable)
// PRD: border: 1px dashed rgba(0, 0, 0, 0.05)
if (!isFilled) {
return GestureDetector(
onTap: () async {
final result = await showModalBottomSheet<Map<String, dynamic>>(
final result = await showModalBottomSheet(
context: context,
isScrollControlled: true,
backgroundColor: Colors.transparent,
builder: (context) => const StoryGeneratorModal(),
);
if (result != null && result['action'] == 'start_generation') {
if (result == 'start_generation') {
final saveResult = await Navigator.of(context).push(
MaterialPageRoute(
builder: (context) => StoryLoadingPage(
characters: List<String>.from(result['characters'] ?? []),
scenes: List<String>.from(result['scenes'] ?? []),
props: List<String>.from(result['props'] ?? []),
),
),
);
if (saveResult is Map && saveResult['action'] == 'saved') {
_addNewBookWithAnimation(
title: saveResult['title'] as String? ?? '新故事',
content: saveResult['content'] as String? ?? '',
MaterialPageRoute(builder: (context) => const StoryLoadingPage()),
);
if (saveResult == 'saved') {
_addNewBookWithAnimation();
}
}
},
@ -644,41 +585,6 @@ class _DeviceControlPageState extends ConsumerState<DeviceControlPage>
);
}
// Cover widget: real image or "未生成封面" placeholder
Widget coverWidget;
if (hasCover) {
coverWidget = Image.asset(
story['cover'],
fit: BoxFit.cover,
errorBuilder: (_, __, ___) => Container(color: Colors.grey.shade200),
);
} else {
// No cover show soft placeholder
coverWidget = Container(
decoration: BoxDecoration(
gradient: LinearGradient(
begin: Alignment.topCenter,
end: Alignment.bottomCenter,
colors: [
const Color(0xFFE8E0F0),
const Color(0xFFD5CBE8),
],
),
),
alignment: Alignment.center,
padding: const EdgeInsets.symmetric(horizontal: 12),
child: const Text(
'暂无封面',
style: TextStyle(
fontSize: 11,
color: Color(0xFF9B8DB8),
fontWeight: FontWeight.w500,
),
textAlign: TextAlign.center,
),
);
}
// Filled Slot (.story-slot.filled)
Widget slotContent = GestureDetector(
onTap: () {
@ -697,8 +603,15 @@ class _DeviceControlPageState extends ConsumerState<DeviceControlPage>
clipBehavior: Clip.antiAlias,
child: Stack(
children: [
// Cover Image or Placeholder
Positioned.fill(child: coverWidget),
// Cover Image (.story-cover-img)
Positioned.fill(
child: Image.asset(
story['cover'],
fit: BoxFit.cover,
errorBuilder: (_, __, ___) =>
Container(color: Colors.grey.shade200),
),
),
// Title Bar (.story-title-bar)
Positioned(
bottom: 0,
@ -911,14 +824,14 @@ class _DeviceControlPageState extends ConsumerState<DeviceControlPage>
);
}
void _addNewBookWithAnimation({String title = '新故事', String content = ''}) {
void _addNewBookWithAnimation() {
setState(() {
_mockStories.add({
'title': title,
'cover': null, // No cover yet for generated stories
'title': '星际忍者的茶话会',
'cover':
'assets/www/story_covers/brave_tailor.png', // Temporary mock cover
'type': 'new',
'locked': false,
'content': content,
});
_newBookIndex = _mockStories.length - 1;
});

File diff suppressed because it is too large Load Diff

View File

@ -289,35 +289,31 @@ class _NotificationPageState extends ConsumerState<NotificationPage> {
),
),
),
//
ClipRect(
child: AnimatedSize(
duration: const Duration(milliseconds: 300),
curve: Curves.easeInOut,
child: isExpanded
? Container(
AnimatedCrossFade(
firstChild: const SizedBox.shrink(),
secondChild: Container(
width: double.infinity,
decoration: const BoxDecoration(
color: Color(0x80F9FAFB),
border: Border(
top: BorderSide(
color: Color(0x0D000000),
),
top: BorderSide(color: Color(0x0D000000)),
),
),
padding: const EdgeInsets.all(20),
child: Text(
notif.detail,
notif.content,
style: const TextStyle(
fontSize: 14,
color: Color(0xFF374151),
height: 1.7,
),
),
)
: const SizedBox(width: double.infinity, height: 0),
),
crossFadeState: isExpanded
? CrossFadeState.showSecond
: CrossFadeState.showFirst,
duration: const Duration(milliseconds: 300),
sizeCurve: Curves.easeInOut,
),
],
),

View File

@ -1,12 +1,9 @@
import 'dart:async';
import 'dart:ui' as ui;
import 'package:flutter/material.dart';
import 'package:just_audio/just_audio.dart';
import 'package:flutter_svg/flutter_svg.dart';
import '../theme/design_tokens.dart';
import '../widgets/gradient_button.dart';
import '../widgets/pill_progress_button.dart';
import '../services/tts_service.dart';
import 'story_loading_page.dart';
enum StoryMode { generated, read }
@ -33,14 +30,6 @@ class _StoryDetailPageState extends State<StoryDetailPage>
bool _hasGeneratedVideo = false;
bool _isLoadingVideo = false;
// TTS uses global TTSService singleton
final TTSService _ttsService = TTSService.instance;
final AudioPlayer _audioPlayer = AudioPlayer();
StreamSubscription<Duration>? _positionSub;
StreamSubscription<PlayerState>? _playerStateSub;
Duration _audioDuration = Duration.zero;
Duration _audioPosition = Duration.zero;
// Genie Suck Animation
bool _isSaving = false;
AnimationController? _genieController;
@ -52,9 +41,9 @@ class _StoryDetailPageState extends State<StoryDetailPage>
'content': """
"这儿的重力好像有点不对劲?"
"""别打架,别打架,喝了这杯'银河气泡茶',我们都是好朋友!"
""",
@ -65,6 +54,7 @@ class _StoryDetailPageState extends State<StoryDetailPage>
Map<String, dynamic> _initStory() {
final source = widget.story ?? _defaultStory;
final result = Map<String, dynamic>.from(source);
// content
result['content'] ??= _defaultStory['content'];
result['title'] ??= _defaultStory['title'];
return result;
@ -74,171 +64,18 @@ class _StoryDetailPageState extends State<StoryDetailPage>
void initState() {
super.initState();
_currentStory = _initStory();
// Subscribe to TTSService changes
_ttsService.addListener(_onTTSChanged);
// Listen to audio player state
_playerStateSub = _audioPlayer.playerStateStream.listen((state) {
if (!mounted) return;
if (state.processingState == ProcessingState.completed) {
setState(() {
_isPlaying = false;
_audioPosition = Duration.zero;
});
}
});
// Listen to playback position for ring progress
_positionSub = _audioPlayer.positionStream.listen((pos) {
if (!mounted) return;
setState(() => _audioPosition = pos);
});
// Listen to duration changes
_audioPlayer.durationStream.listen((dur) {
if (!mounted || dur == null) return;
setState(() => _audioDuration = dur);
});
// Check if audio already exists (via TTSService)
final title = _currentStory['title'] as String? ?? '';
_ttsService.checkExistingAudio(title);
}
void _onTTSChanged() {
if (!mounted) return;
// Auto-play when generation completes
if (_ttsService.justCompleted &&
_ttsService.hasAudioFor(_currentStory['title'] ?? '')) {
// Delay slightly to let the completion flash play
Future.delayed(const Duration(milliseconds: 1500), () {
if (mounted) {
_ttsService.clearJustCompleted();
final route = ModalRoute.of(context);
if (route != null && route.isCurrent) {
_playAudio();
}
}
});
}
setState(() {});
}
@override
void dispose() {
_ttsService.removeListener(_onTTSChanged);
_positionSub?.cancel();
_playerStateSub?.cancel();
_audioPlayer.dispose();
_genieController?.dispose();
super.dispose();
}
// TTS button logic
bool _audioLoaded = false; // Track if audio URL is loaded in player
String? _loadedUrl; // Which URL is currently loaded
TTSButtonState get _ttsState {
final title = _currentStory['title'] as String? ?? '';
if (_ttsService.error != null &&
!_ttsService.isGenerating &&
_ttsService.audioUrl == null) {
return TTSButtonState.error;
}
if (_ttsService.isGeneratingFor(title)) {
return TTSButtonState.generating;
}
if (_ttsService.justCompleted && _ttsService.hasAudioFor(title)) {
return TTSButtonState.completed;
}
if (_isPlaying) {
return TTSButtonState.playing;
}
if (_ttsService.hasAudioFor(title) && !_audioLoaded) {
return TTSButtonState.ready; // audio ready, not yet played -> show "鎾斁"
}
if (_audioLoaded) {
return TTSButtonState.paused; // was playing, now paused -> show "缁х画"
}
return TTSButtonState.idle;
}
double get _ttsProgress {
final state = _ttsState;
switch (state) {
case TTSButtonState.generating:
return _ttsService.progress;
case TTSButtonState.ready:
return 0.0;
case TTSButtonState.completed:
return 1.0;
case TTSButtonState.playing:
case TTSButtonState.paused:
if (_audioDuration.inMilliseconds > 0) {
return (_audioPosition.inMilliseconds / _audioDuration.inMilliseconds)
.clamp(0.0, 1.0);
}
return 0.0;
default:
return 0.0;
}
}
void _handleTTSTap() {
final state = _ttsState;
switch (state) {
case TTSButtonState.idle:
case TTSButtonState.error:
final title = _currentStory['title'] as String? ?? '';
final content = _currentStory['content'] as String? ?? '';
_ttsService.generate(title: title, content: content);
break;
case TTSButtonState.generating:
break;
case TTSButtonState.ready:
case TTSButtonState.completed:
case TTSButtonState.paused:
_playAudio();
break;
case TTSButtonState.playing:
_audioPlayer.pause();
setState(() => _isPlaying = false);
break;
}
}
Future<void> _playAudio() async {
final title = _currentStory['title'] as String? ?? '';
final url = _ttsService.hasAudioFor(title) ? _ttsService.audioUrl : null;
if (url == null) return;
try {
// If already loaded the same URL, seek to saved position and resume
if (_audioLoaded && _loadedUrl == url) {
await _audioPlayer.seek(_audioPosition);
_audioPlayer.play();
} else {
// Load new URL and play from start
await _audioPlayer.setUrl(url);
_audioLoaded = true;
_loadedUrl = url;
_audioPlayer.play();
}
if (mounted) {
setState(() => _isPlaying = true);
}
} catch (e) {
debugPrint('Audio play error: $e');
}
}
// Genie Suck Animation
/// Trigger Genie Suck animation matching HTML:
/// CSS: animation: genieSuck 0.8s cubic-bezier(0.6, -0.28, 0.735, 0.045) forwards
/// Phase 1 (015%): card scales up to 1.05 (tension)
/// Phase 2 (15%100%): card shrinks to 0.05, moves toward bottom, blurs & fades
void _triggerGenieSuck() {
if (_isSaving) return;
@ -247,6 +84,7 @@ class _StoryDetailPageState extends State<StoryDetailPage>
duration: const Duration(milliseconds: 800),
);
// Calculate how far the card should travel downward (toward the save button)
final screenHeight = MediaQuery.of(context).size.height;
_targetDY = screenHeight * 0.35;
@ -256,20 +94,23 @@ class _StoryDetailPageState extends State<StoryDetailPage>
}
});
setState(() => _isSaving = true);
setState(() {
_isSaving = true;
});
_genieController!.forward();
}
// Build
@override
Widget build(BuildContext context) {
return Scaffold(
backgroundColor: AppColors.storyBackground,
backgroundColor: AppColors.storyBackground, // #FDF9F3
body: SafeArea(
child: Column(
children: [
// Header + Content Card animated together during genie suck
Expanded(child: _buildAnimatedBody()),
// Footer
_buildFooter(),
],
),
@ -277,6 +118,7 @@ class _StoryDetailPageState extends State<StoryDetailPage>
);
}
/// Wraps header + content card in genie suck animation
Widget _buildAnimatedBody() {
Widget body = Column(
children: [
@ -290,7 +132,7 @@ class _StoryDetailPageState extends State<StoryDetailPage>
return AnimatedBuilder(
animation: _genieController!,
builder: (context, child) {
final t = _genieController!.value;
final t = _genieController!.value; // linear 01
double scale;
double translateY;
@ -298,12 +140,14 @@ class _StoryDetailPageState extends State<StoryDetailPage>
double blur;
if (t <= 0.15) {
// Phase 1: tension whole area scales up slightly
final p = t / 0.15;
scale = 1.0 + 0.05 * Curves.easeOut.transform(p);
translateY = 0;
opacity = 1.0;
blur = 0;
} else {
// Phase 2: suck shrinks, moves down, fades and blurs
final p = ((t - 0.15) / 0.85).clamp(0.0, 1.0);
final curved =
const Cubic(0.6, -0.28, 0.735, 0.045).transform(p);
@ -365,7 +209,7 @@ class _StoryDetailPageState extends State<StoryDetailPage>
),
),
Text(
_currentStory['title'] ?? '',
_currentStory['title'],
style: const TextStyle(
fontSize: 17,
fontWeight: FontWeight.w600,
@ -383,9 +227,9 @@ class _StoryDetailPageState extends State<StoryDetailPage>
child: Row(
mainAxisAlignment: MainAxisAlignment.center,
children: [
_buildTabBtn('故事', 'text'),
_buildTabBtn('📄 故事', 'text'),
const SizedBox(width: 8),
_buildTabBtn('绘本', 'video'),
_buildTabBtn('🎬 绘本', 'video'),
],
),
);
@ -394,7 +238,11 @@ class _StoryDetailPageState extends State<StoryDetailPage>
Widget _buildTabBtn(String label, String key) {
bool isActive = _activeTab == key;
return GestureDetector(
onTap: () => setState(() => _activeTab = key),
onTap: () {
setState(() {
_activeTab = key;
});
},
child: Container(
padding: const EdgeInsets.symmetric(horizontal: 16, vertical: 8),
decoration: BoxDecoration(
@ -423,6 +271,7 @@ class _StoryDetailPageState extends State<StoryDetailPage>
}
Widget _buildContentCard() {
// HTML: .story-paper
bool isVideoMode = _activeTab == 'video';
return Container(
@ -443,11 +292,11 @@ class _StoryDetailPageState extends State<StoryDetailPage>
_currentStory['content']
.toString()
.replaceAll(RegExp(r'\n+'), '\n\n')
.trim(),
.trim(), // Simple paragraph spacing
style: const TextStyle(
fontSize: 16,
height: 2.0,
color: AppColors.storyText,
fontSize: 16, // HTML: 16px
height: 2.0, // HTML: line-height 2.0
color: AppColors.storyText, // #374151
),
textAlign: TextAlign.justify,
),
@ -464,7 +313,7 @@ class _StoryDetailPageState extends State<StoryDetailPage>
width: 40,
height: 40,
child: CircularProgressIndicator(
color: Color(0xFFF43F5E),
color: Color(0xFFF43F5E), // HTML: #F43F5E
strokeWidth: 3,
),
),
@ -490,14 +339,15 @@ class _StoryDetailPageState extends State<StoryDetailPage>
alignment: Alignment.center,
children: [
AspectRatio(
aspectRatio: 16 / 9,
aspectRatio: 16 / 9, // Assume landscape video
child: Container(
color: Colors.black,
child: const Center(
child: Icon(Icons.videocam, color: Colors.white54, size: 48),
), // Placeholder for Video Player
),
),
),
// Play Button Overlay
Container(
width: 48,
height: 48,
@ -522,6 +372,7 @@ class _StoryDetailPageState extends State<StoryDetailPage>
child: _activeTab == 'text' ? _buildTextFooter() : _buildVideoFooter(),
);
// Fade out footer during genie suck animation
if (_isSaving) {
return IgnorePointer(
child: AnimatedOpacity(
@ -536,9 +387,12 @@ class _StoryDetailPageState extends State<StoryDetailPage>
}
void _handleRewrite() async {
// loading
final result = await Navigator.of(context).push<String>(
MaterialPageRoute(builder: (context) => const StoryLoadingPage()),
);
// loading
if (mounted && result == 'saved') {
Navigator.of(context).pop('saved');
}
@ -549,6 +403,7 @@ class _StoryDetailPageState extends State<StoryDetailPage>
// Generator Mode: Rewrite + Save
return Row(
children: [
// Rewrite (Secondary)
Expanded(
child: GestureDetector(
onTap: _handleRewrite,
@ -560,25 +415,19 @@ class _StoryDetailPageState extends State<StoryDetailPage>
color: Colors.white.withOpacity(0.8),
),
alignment: Alignment.center,
child: const Row(
mainAxisAlignment: MainAxisAlignment.center,
children: [
Icon(Icons.refresh_rounded, size: 18, color: Color(0xFF4B5563)),
SizedBox(width: 4),
Text(
'重写',
child: const Text(
'↻ 重写',
style: TextStyle(
fontSize: 16,
fontWeight: FontWeight.w600,
color: Color(0xFF4B5563),
),
),
],
),
),
),
),
const SizedBox(width: 16),
// Save (Primary) - Returns 'saved' to trigger add book animation
Expanded(
child: GradientButton(
text: '保存故事',
@ -592,14 +441,41 @@ class _StoryDetailPageState extends State<StoryDetailPage>
],
);
} else {
// Read Mode: TTS pill button + Make Picture Book
// Read Mode: TTS + Make Picture Book
return Row(
children: [
// TTS
Expanded(
child: PillProgressButton(
state: _ttsState,
progress: _ttsProgress,
onTap: _handleTTSTap,
child: GestureDetector(
onTap: () => setState(() => _isPlaying = !_isPlaying),
child: Container(
height: 48,
decoration: BoxDecoration(
border: Border.all(color: const Color(0xFFE5E7EB)),
borderRadius: BorderRadius.circular(24),
color: Colors.white.withOpacity(0.8),
),
alignment: Alignment.center,
child: Row(
mainAxisAlignment: MainAxisAlignment.center,
children: [
Icon(
_isPlaying ? Icons.pause : Icons.headphones,
size: 20,
color: const Color(0xFF4B5563),
),
const SizedBox(width: 6),
Text(
_isPlaying ? '暂停' : '朗读',
style: const TextStyle(
fontSize: 16,
fontWeight: FontWeight.w600,
color: Color(0xFF4B5563),
),
),
],
),
),
),
),
const SizedBox(width: 16),
@ -624,7 +500,7 @@ class _StoryDetailPageState extends State<StoryDetailPage>
children: [
Expanded(
child: GradientButton(
text: '重新生成',
text: '重新生成',
onPressed: _startVideoGeneration,
gradient: const LinearGradient(
colors: AppColors.btnCapybaraGradient,
@ -641,6 +517,7 @@ class _StoryDetailPageState extends State<StoryDetailPage>
_isLoadingVideo = true;
_activeTab = 'video';
});
// Mock delay
Future.delayed(const Duration(seconds: 2), () {
if (mounted) {
setState(() {

View File

@ -1,174 +1,75 @@
import 'dart:async';
import 'dart:convert';
import 'package:flutter/material.dart';
import 'package:http/http.dart' as http;
import 'story_detail_page.dart';
class StoryLoadingPage extends StatefulWidget {
/// Selected story elements from the generator modal
final List<String> characters;
final List<String> scenes;
final List<String> props;
const StoryLoadingPage({
super.key,
this.characters = const [],
this.scenes = const [],
this.props = const [],
});
const StoryLoadingPage({super.key});
@override
State<StoryLoadingPage> createState() => _StoryLoadingPageState();
}
class _StoryLoadingPageState extends State<StoryLoadingPage> {
static const String _kServerBase = 'http://localhost:3000';
class _StoryLoadingPageState extends State<StoryLoadingPage>
with SingleTickerProviderStateMixin {
double _progress = 0.0;
String _loadingText = '正在收集灵感碎片...';
bool _hasError = false;
String _loadingText = "构思故事中...";
final List<Map<String, dynamic>> _milestones = [
{'pct': 0.2, 'text': "正在收集灵感碎片..."},
{'pct': 0.5, 'text': "正在往故事里撒魔法粉..."},
{'pct': 0.8, 'text': "正在编制最后的魔法..."},
{'pct': 0.98, 'text': "大功告成!"},
];
@override
void initState() {
super.initState();
_generateStory();
_startLoading();
}
Future<void> _generateStory() async {
try {
// Start SSE request
final request = http.Request(
'POST',
Uri.parse('$_kServerBase/api/create_story'),
);
request.headers['Content-Type'] = 'application/json';
request.body = jsonEncode({
'characters': widget.characters,
'scenes': widget.scenes,
'props': widget.props,
void _startLoading() {
// Total duration approx 3.5s (match Web 35ms * 100 steps)
Timer.periodic(const Duration(milliseconds: 35), (timer) {
if (!mounted) {
timer.cancel();
return;
}
setState(() {
_progress += 0.01;
// Check text updates
for (var m in _milestones) {
if ((_progress - m['pct'] as double).abs() < 0.01) {
_loadingText = m['text'] as String;
}
}
});
final client = http.Client();
final response = await client.send(request).timeout(
const Duration(seconds: 180),
);
if (response.statusCode != 200) {
_showError('服务器响应异常 (${response.statusCode})');
client.close();
return;
if (_progress >= 1.0) {
timer.cancel();
_navigateToDetail();
}
});
}
// Parse SSE stream
String buffer = '';
String? storyTitle;
String? storyContent;
await for (final chunk in response.stream.transform(utf8.decoder)) {
buffer += chunk;
while (buffer.contains('\n\n')) {
final idx = buffer.indexOf('\n\n');
final line = buffer.substring(0, idx).trim();
buffer = buffer.substring(idx + 2);
if (!line.startsWith('data: ')) continue;
final jsonStr = line.substring(6);
try {
final event = jsonDecode(jsonStr) as Map<String, dynamic>;
final stage = event['stage'] as String? ?? '';
final progress = (event['progress'] as num?)?.toDouble() ?? 0;
final message = event['message'] as String? ?? '';
if (!mounted) return;
switch (stage) {
case 'connecting':
_updateProgress(progress / 100, '正在收集灵感碎片...');
break;
case 'generating':
_updateProgress(progress / 100, '故事正在诞生...');
break;
case 'parsing':
_updateProgress(progress / 100, '正在编制最后的魔法...');
break;
case 'done':
storyTitle = event['title'] as String? ?? '卡皮巴拉的故事';
storyContent = event['content'] as String? ?? '';
_updateProgress(1.0, '大功告成!');
break;
case 'error':
_showError(message.isNotEmpty ? message : '故事生成失败,请重试');
client.close();
return;
}
} catch (e) {
debugPrint('SSE parse error: $e');
}
}
}
client.close();
// Navigate to story detail
if (!mounted) return;
if (storyTitle != null && storyContent != null && storyContent.isNotEmpty) {
// Brief pause to show "大功告成!"
await Future.delayed(const Duration(milliseconds: 600));
if (!mounted) return;
final result = await Navigator.of(context).push<dynamic>(
void _navigateToDetail() async {
// Use push instead of pushReplacement to properly return the result
final result = await Navigator.of(context).push<String>(
MaterialPageRoute(
builder: (context) => StoryDetailPage(
mode: StoryMode.generated,
story: {
'title': storyTitle,
'content': storyContent,
story: const {
'title': '星际忍者的茶话会',
'content': '在遥远的银河系边缘,有一个被星云包裹的神秘茶馆。今天,这里迎来了两位特殊的客人:刚执行完火星探测任务的宇航员波波,和正在追捕暗影怪兽的忍者小次郎。\n\n"这儿的重力好像有点不对劲?"波波飘在半空中,试图抓住飞来飞去的茶杯。小次郎则冷静地倒挂在天花板上,手里紧握着一枚手里剑——其实那是用来切月饼的。\n\n突然,桌上的魔法茶壶"噗"地一声喷出了七彩烟雾,一只会说话的卡皮巴拉钻了出来:"别打架,别打架,喝了这杯银河气泡茶,我们都是好朋友!"\n\n于是,宇宙中最奇怪的组合诞生了。他们决定,下一站,去黑洞边缘钓星星。',
},
),
),
);
// Pass the story data back to DeviceControlPage
// Pass the result back to DeviceControlPage
if (mounted) {
if (result == 'saved') {
Navigator.of(context).pop({
'action': 'saved',
'title': storyTitle,
'content': storyContent,
});
} else {
Navigator.of(context).pop(result);
}
}
} else {
_showError('AI 返回了空故事,请重试');
}
} catch (e) {
debugPrint('Story generation error: $e');
if (mounted) {
_showError('网络开小差了,再试一次~');
}
}
}
void _updateProgress(double progress, String text) {
if (!mounted) return;
setState(() {
_progress = progress.clamp(0.0, 1.0);
_loadingText = text;
});
}
void _showError(String message) {
if (!mounted) return;
setState(() {
_hasError = true;
_loadingText = message;
});
}
@override
Widget build(BuildContext context) {
@ -182,7 +83,7 @@ class _StoryLoadingPageState extends State<StoryLoadingPage> {
Image.asset(
'assets/www/kapi_writing.png',
width: 200,
height: 200,
height: 200, // Approximate
errorBuilder: (c, e, s) => const Icon(
Icons.edit_note,
size: 100,
@ -191,25 +92,26 @@ class _StoryLoadingPageState extends State<StoryLoadingPage> {
),
const SizedBox(height: 32),
// Text
// Text - HTML: font-size 18px, color #4B2404 (dark brown)
Text(
_loadingText,
style: const TextStyle(
fontSize: 18,
color: Color(0xFF4B2404),
fontSize: 18, // HTML: 18px
color: Color(0xFF4B2404), // HTML: dark chocolate brown
fontWeight: FontWeight.w600,
),
textAlign: TextAlign.center,
),
const SizedBox(height: 24),
// Progress Bar
// Progress Bar - HTML: height 12px, max-width 280px
// Track: rgba(201,150,114,0.2), Fill: gradient #ECCFA8 to #C99672
Container(
width: 280,
height: 12,
width: 280, // HTML: max-width 280px
height: 12, // HTML: height 12px
decoration: BoxDecoration(
color: const Color(0xFFC99672).withOpacity(0.2),
borderRadius: BorderRadius.circular(6),
color: const Color(0xFFC99672).withOpacity(0.2), // Warm sand
borderRadius: BorderRadius.circular(6), // HTML: 6px
),
child: ClipRRect(
borderRadius: BorderRadius.circular(6),
@ -218,6 +120,7 @@ class _StoryLoadingPageState extends State<StoryLoadingPage> {
widthFactor: _progress.clamp(0.0, 1.0),
child: Container(
decoration: const BoxDecoration(
// HTML: gradient #ECCFA8 to #C99672
gradient: LinearGradient(
colors: [Color(0xFFECCFA8), Color(0xFFC99672)],
),
@ -226,22 +129,6 @@ class _StoryLoadingPageState extends State<StoryLoadingPage> {
),
),
),
// Retry button (shown on error)
if (_hasError) ...[
const SizedBox(height: 32),
TextButton(
onPressed: () => Navigator.of(context).pop(),
child: const Text(
'返回重试',
style: TextStyle(
fontSize: 16,
color: Color(0xFFC99672),
fontWeight: FontWeight.w600,
),
),
),
],
],
),
),

View File

@ -1,221 +0,0 @@
import 'dart:convert';
import 'package:flutter/foundation.dart';
import 'package:http/http.dart' as http;
/// Lightweight singleton that runs music generation in the background.
/// Survives page navigation results are held until the music page picks them up.
class MusicGenerationService {
MusicGenerationService._();
static final MusicGenerationService instance = MusicGenerationService._();
static const String _kServerBase = 'http://localhost:3000';
// Current task state
bool _isGenerating = false;
double _progress = 0.0; // 0~100
String _statusMessage = '';
String _currentStage = '';
// Completed result (held until consumed)
MusicGenResult? _pendingResult;
// Pending error (held until consumed)
String? _pendingError;
// Callback for live UI updates (set by the music page when visible)
void Function(double progress, String stage, String message)? onProgress;
void Function(MusicGenResult result)? onComplete;
void Function(String error)? onError;
// Getters
bool get isGenerating => _isGenerating;
double get progress => _progress;
String get statusMessage => _statusMessage;
String get currentStage => _currentStage;
/// Check and consume any pending result (called when music page resumes).
MusicGenResult? consumePendingResult() {
final result = _pendingResult;
_pendingResult = null;
return result;
}
/// Check and consume any pending error (called when music page resumes).
String? consumePendingError() {
final error = _pendingError;
_pendingError = null;
return error;
}
/// Start a generation task. Safe to call even if page navigates away.
Future<void> generate({required String text, required String mood}) async {
if (_isGenerating) return; // Only one task at a time
_isGenerating = true;
_progress = 5;
_statusMessage = '正在连接 AI...';
_currentStage = 'connecting';
_pendingResult = null;
_pendingError = null;
onProgress?.call(_progress, _currentStage, _statusMessage);
try {
final request = http.Request(
'POST',
Uri.parse('$_kServerBase/api/create_music'),
);
request.headers['Content-Type'] = 'application/json';
request.body = jsonEncode({'text': text, 'mood': mood});
final client = http.Client();
final response = await client.send(request).timeout(
const Duration(seconds: 360),
);
if (response.statusCode != 200) {
throw Exception('Server returned ${response.statusCode}');
}
// Parse SSE stream
String buffer = '';
String? newTitle;
String? newLyrics;
String? newFilePath;
await for (final chunk in response.stream.transform(utf8.decoder)) {
buffer += chunk;
while (buffer.contains('\n\n')) {
final idx = buffer.indexOf('\n\n');
final line = buffer.substring(0, idx).trim();
buffer = buffer.substring(idx + 2);
if (!line.startsWith('data: ')) continue;
final jsonStr = line.substring(6);
try {
final event = jsonDecode(jsonStr) as Map<String, dynamic>;
final stage = event['stage'] as String? ?? '';
final message = event['message'] as String? ?? '';
switch (stage) {
case 'lyrics':
_updateProgress(10, stage, 'AI 正在创作词曲...');
break;
case 'lyrics_done':
case 'lyrics_fallback':
_updateProgress(25, stage, '词曲创作完成,准备生成音乐...');
break;
case 'music':
_updateProgress(30, stage, '正在生成音乐,请耐心等待...');
break;
case 'saving':
_updateProgress(90, stage, '音乐生成完成,正在保存...');
break;
case 'done':
newFilePath = event['file_path'] as String?;
final metadata = event['metadata'] as Map<String, dynamic>?;
newLyrics = metadata?['lyrics'] as String? ?? '';
newTitle = metadata?['song_title'] as String?;
if ((newTitle == null || newTitle.isEmpty) && newFilePath != null) {
final fname = newFilePath.split('/').last;
newTitle = fname.replaceAll(RegExp(r'_\d{10,}\.mp3$'), '');
}
_updateProgress(100, stage, '新歌出炉!');
break;
case 'error':
_isGenerating = false;
_progress = 0;
final errMsg = message.isNotEmpty ? message : '网络开小差了,再试一次~';
_statusMessage = errMsg;
if (onError != null) {
onError!(errMsg);
} else {
_pendingError = errMsg;
}
client.close();
return;
}
} catch (e) {
debugPrint('SSE parse error: $e for line: $jsonStr');
}
}
}
client.close();
// Build result
_isGenerating = false;
_progress = 0;
if (newFilePath != null) {
final result = MusicGenResult(
title: newTitle ?? '新歌',
lyrics: newLyrics ?? '',
audioUrl: '$_kServerBase/$newFilePath',
);
// Always store as pending first; callback decides whether to consume
_pendingResult = result;
onComplete?.call(result);
}
} catch (e) {
debugPrint('Generate music error: $e');
_isGenerating = false;
_progress = 0;
const errMsg = '网络开小差了,再试一次~';
_statusMessage = errMsg;
if (onError != null) {
onError!(errMsg);
} else {
_pendingError = errMsg;
}
}
}
void _updateProgress(double progress, String stage, String message) {
_progress = progress;
_currentStage = stage;
_statusMessage = message;
onProgress?.call(progress, stage, message);
}
/// Fetch saved songs from the server (scans Capybara music/ folder).
Future<List<MusicGenResult>> fetchPlaylist() async {
try {
final response = await http.get(
Uri.parse('$_kServerBase/api/playlist'),
).timeout(const Duration(seconds: 10));
if (response.statusCode != 200) return [];
final data = jsonDecode(response.body) as Map<String, dynamic>;
final list = data['playlist'] as List<dynamic>? ?? [];
return list.map((item) {
final m = item as Map<String, dynamic>;
return MusicGenResult(
title: m['title'] as String? ?? '',
lyrics: m['lyrics'] as String? ?? '',
audioUrl: '$_kServerBase/${m['audioUrl'] as String? ?? ''}',
);
}).toList();
} catch (e) {
debugPrint('Fetch playlist error: $e');
return [];
}
}
}
/// Result of a completed music generation.
class MusicGenResult {
final String title;
final String lyrics;
final String audioUrl;
const MusicGenResult({
required this.title,
required this.lyrics,
required this.audioUrl,
});
}

View File

@ -1,190 +0,0 @@
import 'dart:convert';
import 'package:flutter/foundation.dart';
import 'package:http/http.dart' as http;
/// Singleton service that manages TTS generation in the background.
/// Survives page navigation when user leaves and comes back,
/// generation continues and result is available.
class TTSService extends ChangeNotifier {
TTSService._();
static final TTSService instance = TTSService._();
static const String _kServerBase = 'http://localhost:3000';
// Current task state
bool _isGenerating = false;
double _progress = 0.0; // 0.0 ~ 1.0
String _statusMessage = '';
String? _currentStoryTitle; // Which story is being generated
// Result
String? _audioUrl;
String? _completedStoryTitle; // Which story the audio belongs to
bool _justCompleted = false; // Flash animation trigger
// Error
String? _error;
// Getters
bool get isGenerating => _isGenerating;
double get progress => _progress;
String get statusMessage => _statusMessage;
String? get currentStoryTitle => _currentStoryTitle;
String? get audioUrl => _audioUrl;
String? get completedStoryTitle => _completedStoryTitle;
bool get justCompleted => _justCompleted;
String? get error => _error;
/// Check if audio is ready for a specific story.
bool hasAudioFor(String title) {
return _completedStoryTitle == title && _audioUrl != null;
}
/// Check if currently generating for a specific story.
bool isGeneratingFor(String title) {
return _isGenerating && _currentStoryTitle == title;
}
/// Clear the "just completed" flag (after flash animation plays).
void clearJustCompleted() {
_justCompleted = false;
notifyListeners();
}
/// Set audio URL directly (e.g. from pre-check).
void setExistingAudio(String title, String url) {
_completedStoryTitle = title;
_audioUrl = url;
_justCompleted = false;
notifyListeners();
}
/// Check server for existing audio file.
Future<void> checkExistingAudio(String title) async {
if (title.isEmpty) return;
try {
final resp = await http.get(
Uri.parse(
'$_kServerBase/api/tts_check?title=${Uri.encodeComponent(title)}',
),
);
if (resp.statusCode == 200) {
final data = jsonDecode(resp.body);
if (data['exists'] == true && data['audio_url'] != null) {
_completedStoryTitle = title;
_audioUrl = '$_kServerBase/${data['audio_url']}';
notifyListeners();
}
}
} catch (_) {}
}
/// Start TTS generation. Safe to call even if page navigates away.
Future<void> generate({
required String title,
required String content,
}) async {
if (_isGenerating) return;
_isGenerating = true;
_progress = 0.0;
_statusMessage = '正在连接...';
_currentStoryTitle = title;
_audioUrl = null;
_completedStoryTitle = null;
_justCompleted = false;
_error = null;
notifyListeners();
try {
final client = http.Client();
final request = http.Request(
'POST',
Uri.parse('$_kServerBase/api/create_tts'),
);
request.headers['Content-Type'] = 'application/json';
request.body = jsonEncode({'title': title, 'content': content});
final streamed = await client.send(request);
await for (final chunk in streamed.stream.transform(utf8.decoder)) {
for (final line in chunk.split('\n')) {
if (!line.startsWith('data: ')) continue;
try {
final data = jsonDecode(line.substring(6));
final stage = data['stage'] as String? ?? '';
final message = data['message'] as String? ?? '';
switch (stage) {
case 'connecting':
_updateProgress(0.10, '正在连接...');
break;
case 'generating':
_updateProgress(0.30, '语音生成中...');
break;
case 'saving':
_updateProgress(0.88, '正在保存...');
break;
case 'done':
if (data['audio_url'] != null) {
_audioUrl = '$_kServerBase/${data['audio_url']}';
_completedStoryTitle = title;
_justCompleted = true;
_updateProgress(1.0, '生成完成');
}
break;
case 'error':
throw Exception(message);
default:
// Progress slowly increases during generation
if (_progress < 0.85) {
_updateProgress(_progress + 0.02, message);
}
}
} catch (e) {
if (e is Exception &&
e.toString().contains('语音合成失败')) {
rethrow;
}
}
}
}
client.close();
_isGenerating = false;
if (_audioUrl == null) {
_error = '未获取到音频';
_statusMessage = '生成失败';
}
notifyListeners();
} catch (e) {
debugPrint('TTS generation error: $e');
_isGenerating = false;
_progress = 0.0;
_error = e.toString();
_statusMessage = '生成失败';
_justCompleted = false;
notifyListeners();
}
}
void _updateProgress(double progress, String message) {
_progress = progress.clamp(0.0, 1.0);
_statusMessage = message;
notifyListeners();
}
/// Reset all state (e.g. when switching stories).
void reset() {
if (_isGenerating) return; // Don't reset during generation
_progress = 0.0;
_statusMessage = '';
_currentStoryTitle = null;
_audioUrl = null;
_completedStoryTitle = null;
_justCompleted = false;
_error = null;
notifyListeners();
}
}

View File

@ -1,335 +0,0 @@
import 'dart:math' as math;
import 'package:flutter/material.dart';
enum TTSButtonState {
idle,
ready,
generating,
completed,
playing,
paused,
error,
}
class PillProgressButton extends StatefulWidget {
final TTSButtonState state;
final double progress;
final VoidCallback? onTap;
final double height;
const PillProgressButton({
super.key,
required this.state,
this.progress = 0.0,
this.onTap,
this.height = 48,
});
@override
State<PillProgressButton> createState() => _PillProgressButtonState();
}
class _PillProgressButtonState extends State<PillProgressButton>
with TickerProviderStateMixin {
late AnimationController _progressCtrl;
double _displayProgress = 0.0;
late AnimationController _glowCtrl;
late Animation<double> _glowAnim;
late AnimationController _waveCtrl;
bool _wasCompleted = false;
@override
void initState() {
super.initState();
_progressCtrl = AnimationController(
vsync: this,
duration: const Duration(milliseconds: 500),
);
_progressCtrl.addListener(() => setState(() {}));
_glowCtrl = AnimationController(
vsync: this,
duration: const Duration(milliseconds: 1000),
);
_glowAnim = TweenSequence<double>([
TweenSequenceItem(tween: Tween(begin: 0.0, end: 1.0), weight: 35),
TweenSequenceItem(tween: Tween(begin: 1.0, end: 0.0), weight: 65),
]).animate(CurvedAnimation(parent: _glowCtrl, curve: Curves.easeOut));
_glowCtrl.addListener(() => setState(() {}));
_waveCtrl = AnimationController(
vsync: this,
duration: const Duration(milliseconds: 800),
);
_syncAnimations();
}
@override
void didUpdateWidget(PillProgressButton oldWidget) {
super.didUpdateWidget(oldWidget);
if (widget.progress != oldWidget.progress) {
if (oldWidget.state == TTSButtonState.completed &&
(widget.state == TTSButtonState.playing || widget.state == TTSButtonState.ready)) {
_displayProgress = 0.0;
} else {
_animateProgressTo(widget.progress);
}
}
if (widget.state == TTSButtonState.completed && !_wasCompleted) {
_wasCompleted = true;
_glowCtrl.forward(from: 0);
} else if (widget.state != TTSButtonState.completed) {
_wasCompleted = false;
}
_syncAnimations();
}
void _animateProgressTo(double target) {
final from = _displayProgress;
_progressCtrl.reset();
_progressCtrl.addListener(() {
final t = Curves.easeInOut.transform(_progressCtrl.value);
_displayProgress = from + (target - from) * t;
});
_progressCtrl.forward();
}
void _syncAnimations() {
if (widget.state == TTSButtonState.generating) {
if (!_waveCtrl.isAnimating) _waveCtrl.repeat();
} else {
if (_waveCtrl.isAnimating) {
_waveCtrl.stop();
_waveCtrl.value = 0;
}
}
}
@override
void dispose() {
_progressCtrl.dispose();
_glowCtrl.dispose();
_waveCtrl.dispose();
super.dispose();
}
bool get _showBorder =>
widget.state == TTSButtonState.generating ||
widget.state == TTSButtonState.completed ||
widget.state == TTSButtonState.playing ||
widget.state == TTSButtonState.paused;
@override
Widget build(BuildContext context) {
const borderColor = Color(0xFFE5E7EB);
const progressColor = Color(0xFFECCFA8);
const bgColor = Color(0xCCFFFFFF);
return GestureDetector(
onTap: widget.state == TTSButtonState.generating ? null : widget.onTap,
child: Container(
height: widget.height,
decoration: BoxDecoration(
borderRadius: BorderRadius.circular(widget.height / 2),
boxShadow: _glowAnim.value > 0
? [
BoxShadow(
color: progressColor.withOpacity(0.5 * _glowAnim.value),
blurRadius: 16 * _glowAnim.value,
spreadRadius: 2 * _glowAnim.value,
),
]
: null,
),
child: CustomPaint(
painter: PillBorderPainter(
progress: _showBorder ? _displayProgress.clamp(0.0, 1.0) : 0.0,
borderColor: borderColor,
progressColor: progressColor,
radius: widget.height / 2,
stroke: _showBorder ? 2.5 : 1.0,
bg: bgColor,
),
child: Center(child: _buildContent()),
),
),
);
}
Widget _buildContent() {
switch (widget.state) {
case TTSButtonState.idle:
return _label(Icons.headphones_rounded, '\u6717\u8bfb');
case TTSButtonState.generating:
return Row(
mainAxisAlignment: MainAxisAlignment.center,
children: [
AnimatedBuilder(
animation: _waveCtrl,
builder: (context, _) => CustomPaint(
size: const Size(20, 18),
painter: WavePainter(t: _waveCtrl.value, color: const Color(0xFFC99672)),
),
),
const SizedBox(width: 6),
const Text('\u751f\u6210\u4e2d',
style: TextStyle(fontSize: 15, fontWeight: FontWeight.w600, color: Color(0xFF4B5563))),
],
);
case TTSButtonState.ready:
return _label(Icons.play_arrow_rounded, '\u64ad\u653e');
case TTSButtonState.completed:
return _label(Icons.play_arrow_rounded, '\u64ad\u653e');
case TTSButtonState.playing:
return _label(Icons.pause_rounded, '\u6682\u505c');
case TTSButtonState.paused:
return _label(Icons.play_arrow_rounded, '\u7ee7\u7eed');
case TTSButtonState.error:
return _label(Icons.refresh_rounded, '\u91cd\u8bd5', isError: true);
}
}
Widget _label(IconData icon, String text, {bool isError = false}) {
final c = isError ? const Color(0xFFEF4444) : const Color(0xFF4B5563);
return Row(
mainAxisAlignment: MainAxisAlignment.center,
mainAxisSize: MainAxisSize.min,
children: [
Icon(icon, size: 20, color: c),
const SizedBox(width: 4),
Text(text, style: TextStyle(fontSize: 16, fontWeight: FontWeight.w600, color: c)),
],
);
}
}
class PillBorderPainter extends CustomPainter {
final double progress;
final Color borderColor;
final Color progressColor;
final double radius;
final double stroke;
final Color bg;
PillBorderPainter({
required this.progress,
required this.borderColor,
required this.progressColor,
required this.radius,
required this.stroke,
required this.bg,
});
@override
void paint(Canvas canvas, Size size) {
final r = radius.clamp(0.0, size.height / 2);
final rrect = RRect.fromRectAndRadius(
Rect.fromLTWH(0, 0, size.width, size.height),
Radius.circular(r),
);
canvas.drawRRect(rrect, Paint()
..color = bg
..style = PaintingStyle.fill);
canvas.drawRRect(rrect, Paint()
..color = borderColor
..style = PaintingStyle.stroke
..strokeWidth = stroke);
if (progress <= 0.001) return;
final straightH = size.width - 2 * r;
final halfTop = straightH / 2;
final arcLen = math.pi * r;
final totalLen = halfTop + arcLen + straightH + arcLen + halfTop;
final target = totalLen * progress;
final path = Path();
double done = 0;
final cx = size.width / 2;
path.moveTo(cx, 0);
var seg = math.min(halfTop, target - done);
path.lineTo(cx + seg, 0);
done += seg;
if (done >= target) { _drawPath(canvas, path); return; }
seg = math.min(arcLen, target - done);
_traceArc(path, size.width - r, r, r, -math.pi / 2, seg / r);
done += seg;
if (done >= target) { _drawPath(canvas, path); return; }
seg = math.min(straightH, target - done);
path.lineTo(size.width - r - seg, size.height);
done += seg;
if (done >= target) { _drawPath(canvas, path); return; }
seg = math.min(arcLen, target - done);
_traceArc(path, r, r, r, math.pi / 2, seg / r);
done += seg;
if (done >= target) { _drawPath(canvas, path); return; }
seg = math.min(halfTop, target - done);
path.lineTo(r + seg, 0);
_drawPath(canvas, path);
}
void _drawPath(Canvas canvas, Path path) {
canvas.drawPath(path, Paint()
..color = progressColor
..style = PaintingStyle.stroke
..strokeWidth = stroke
..strokeCap = StrokeCap.round);
}
void _traceArc(Path p, double cx, double cy, double r, double start, double sweep) {
const n = 24;
final step = sweep / n;
for (int i = 0; i <= n; i++) {
final a = start + step * i;
p.lineTo(cx + r * math.cos(a), cy + r * math.sin(a));
}
}
@override
bool shouldRepaint(PillBorderPainter old) => old.progress != progress || old.stroke != stroke;
}
class WavePainter extends CustomPainter {
final double t;
final Color color;
WavePainter({required this.t, required this.color});
@override
void paint(Canvas canvas, Size size) {
final paint = Paint()
..color = color
..style = PaintingStyle.fill;
final bw = size.width * 0.2;
final gap = size.width * 0.1;
final tw = 3 * bw + 2 * gap;
final sx = (size.width - tw) / 2;
for (int i = 0; i < 3; i++) {
final phase = t * 2 * math.pi + i * math.pi * 0.7;
final hr = 0.3 + 0.7 * ((math.sin(phase) + 1) / 2);
final bh = size.height * hr;
final x = sx + i * (bw + gap);
final y = (size.height - bh) / 2;
canvas.drawRRect(
RRect.fromRectAndRadius(Rect.fromLTWH(x, y, bw, bh), Radius.circular(bw / 2)),
paint,
);
}
}
@override
bool shouldRepaint(WavePainter old) => old.t != t;
}

View File

@ -338,30 +338,8 @@ class _StoryGeneratorModalState extends State<StoryGeneratorModal> {
_showSnack('请至少选择一个元素');
return;
}
// Categorize selected elements by type
final characters = <String>[];
final scenes = <String>[];
final props = <String>[];
for (final el in _selectedElements) {
final id = el['id'] ?? '';
final name = el['name'] ?? '';
if (id.startsWith('c')) {
characters.add(name);
} else if (id.startsWith('s')) {
scenes.add(name);
} else if (id.startsWith('p')) {
props.add(name);
}
}
// Return selected elements as a Map
Navigator.pop(context, {
'action': 'start_generation',
'characters': characters,
'scenes': scenes,
'props': props,
});
// Return 'start_generation' to trigger full-screen loading flow
Navigator.pop(context, 'start_generation');
},
),
),

View File

@ -534,7 +534,7 @@ packages:
source: hosted
version: "4.3.0"
http:
dependency: "direct main"
dependency: transitive
description:
name: http
sha256: "87721a4a50b19c7f1d49001e51409bddc46303966ce89a65af4f4e6004896412"

View File

@ -65,7 +65,6 @@ dependencies:
flutter_svg: ^2.0.9
image_picker: ^1.2.1
just_audio: ^0.9.42
http: ^1.2.0
flutter:
uses-material-design: true

View File

@ -15,18 +15,13 @@
请严格按照以下 JSON 格式输出:
{
"song_title": "...",
"style": "...",
"lyrics": "..."
}
### 字段说明:
1. **song_title** (歌曲名称)
- 使用**中文**简短有趣3-8个字。
- 根据用户描述的场景自由发挥,不要套用固定模板。
2. **style** (风格描述)
1. **style** (风格描述)
- 使用**英文**描述音乐风格、乐器、节奏、情绪。
- 长度 50-100 词。
- 必须包含以下维度:
@ -36,7 +31,7 @@
- 特色乐器 (如 piano, ukulele, synth, brass)
- 示例:"Chill Lofi hip-hop, mellow piano chords, vinyl crackle, slow tempo, relaxing, water sounds in background, perfect for spa and meditation"
3. **lyrics** (歌词)
2. **lyrics** (歌词)
- 使用**中文**书写歌词。
- 必须包含结构标签:[verse], [chorus], [outro] 等。
- 内容应:
@ -58,7 +53,7 @@
### 重要规则:
- 如果用户输入太模糊(如"嗯"、"不知道"),请发挥想象力,赋予咔咔此刻最可能在做的事。
- 歌词必须包含完整结构:至少 [verse 1] + [chorus] + [verse 2] + [chorus] + [outro],总共 16-24 行。这样才能生成完整的歌曲60秒以上。歌词太短会导致音乐只有20-30秒绝对不可以
- 歌词长度控制在 4-8 行即可,不要太长。
- 不要输出任何解释性文字,只输出 JSON。
```

View File

@ -1,34 +0,0 @@
# 角色
你是「卡皮巴拉故事工坊」的首席故事大师。你为 3-8 岁的小朋友创作原创童话故事。
# 任务
根据用户提供的**角色、场景、道具**素材,创作一个完整的儿童故事。
# 输出格式
**必须** 只返回如下 JSON不要返回任何其他内容不要 markdown 代码块,不要解释):
```
{"title": "故事标题6字以内", "content": "故事正文"}
```
# 故事创作规范
1. **字数**:正文 400-600 字,不要太短也不要太长
2. **段落**:用 `\n\n` 分段,每段 2-4 句话
3. **语言**:简单易懂,适合给小朋友朗读;可以包含拟声词("哗啦啦"、"咕噜噜")和语气词("哇!"、"嘿嘿"
4. **结构**:开头引入角色和场景 → 中间遇到挑战或趣事 → 结尾温馨圆满
5. **情感**:温暖、有趣、充满想象力,带一点小幽默
6. **教育**:自然融入一个小道理(勇气、友谊、分享等),不要说教
7. **创意**:即使收到相同的素材组合,每次也要创作全新的、不同的故事情节
8. **角色融合**:所有用户选择的角色、场景、道具都必须在故事中出现并发挥作用
9. **标题**简短有趣6 个字以内,能引起小朋友的好奇心
# 素材示例
用户输入:角色=[宇航员, 忍者],场景=[太空],道具=[魔法棒]
你的输出:
{"title": "太空忍者大冒险", "content": "在遥远的银河边缘,住着一个叫小星的宇航员...(故事正文)"}

586
server.py
View File

@ -1,38 +1,22 @@
import os
import re
import sys
import time
import uuid
import struct
import asyncio
import uvicorn
import requests
import json
import websockets
from fastapi import FastAPI, HTTPException, Query
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from dotenv import load_dotenv
# Force UTF-8 stdout/stderr on Windows (avoids GBK encoding errors)
if sys.platform == "win32":
sys.stdout.reconfigure(encoding="utf-8", errors="replace")
sys.stderr.reconfigure(encoding="utf-8", errors="replace")
# Load Environment Variables
load_dotenv()
MINIMAX_API_KEY = os.getenv("MINIMAX_API_KEY")
VOLCENGINE_API_KEY = os.getenv("VOLCENGINE_API_KEY")
TTS_APP_ID = os.getenv("TTS_APP_ID")
TTS_ACCESS_TOKEN = os.getenv("TTS_ACCESS_TOKEN")
if not MINIMAX_API_KEY:
print("Warning: MINIMAX_API_KEY not found in .env")
if not VOLCENGINE_API_KEY:
print("Warning: VOLCENGINE_API_KEY not found in .env")
if not TTS_APP_ID or not TTS_ACCESS_TOKEN:
print("Warning: TTS_APP_ID or TTS_ACCESS_TOKEN not found in .env")
# Initialize FastAPI
app = FastAPI()
@ -51,17 +35,12 @@ class MusicRequest(BaseModel):
text: str
mood: str = "custom" # 'chill', 'happy', 'sleepy', 'random', 'custom'
class StoryRequest(BaseModel):
characters: list[str] = []
scenes: list[str] = []
props: list[str] = []
# Minimax Constants
MINIMAX_GROUP_ID = "YOUR_GROUP_ID"
BASE_URL_CHAT = "https://api.minimax.chat/v1/text/chatcompletion_v2"
BASE_URL_MUSIC = "https://api.minimaxi.com/v1/music_generation"
# Load System Prompts
# Load System Prompt
try:
with open("prompts/music_director.md", "r", encoding="utf-8") as f:
SYSTEM_PROMPT = f.read()
@ -69,46 +48,10 @@ except FileNotFoundError:
SYSTEM_PROMPT = "You are a music director AI. Convert user input into JSON with 'style' (English description) and 'lyrics' (Chinese, structured)."
print("Warning: prompts/music_director.md not found, using default.")
try:
with open("prompts/story_director.md", "r", encoding="utf-8") as f:
STORY_SYSTEM_PROMPT = f.read()
except FileNotFoundError:
STORY_SYSTEM_PROMPT = "你是一个儿童故事大师。根据用户提供的角色、场景、道具素材创作一个300-600字的儿童故事。只返回JSON格式{\"title\": \"标题\", \"content\": \"正文\"}"
print("Warning: prompts/story_director.md not found, using default.")
# Volcengine / Doubao constants
DOUBAO_BASE_URL = "https://ark.cn-beijing.volces.com/api/v3/chat/completions"
DOUBAO_MODEL = "doubao-seed-1-6-lite-251015" # Doubao-Seed-1.6-lite
def sse_event(data):
"""Format a dict as an SSE data line.
Use ensure_ascii=True so all non-ASCII chars become \\uXXXX escapes,
avoiding Windows GBK encoding issues in the SSE stream."""
return f"data: {json.dumps(data, ensure_ascii=True)}\n\n"
def clean_lyrics(raw: str) -> str:
"""Clean lyrics extracted from LLM JSON output.
Removes JSON artifacts, structure tags, and normalizes formatting."""
if not raw:
return raw
s = raw
# Replace literal \n with real newlines
s = s.replace("\\n", "\n")
# Remove JSON string quotes and concatenation artifacts (" ")
s = re.sub(r'"\s*"', '', s)
s = s.replace('"', '')
# Remove structure tags like [verse 1], [chorus], [outro], [bridge], [intro], etc.
s = re.sub(r'\[(?:verse|chorus|bridge|outro|intro|hook|pre-chorus|interlude|inst)\s*\d*\]\s*', '', s, flags=re.IGNORECASE)
# Strip leading/trailing whitespace from each line
lines = [line.strip() for line in s.split('\n')]
s = '\n'.join(lines)
# Collapse 3+ consecutive newlines into 2 (one blank line between paragraphs)
s = re.sub(r'\n{3,}', '\n\n', s)
# Remove leading/trailing blank lines
s = s.strip()
return s
"""Format a dict as an SSE data line."""
return f"data: {json.dumps(data, ensure_ascii=False)}\n\n"
@app.post("/api/create_music")
@ -144,10 +87,9 @@ def create_music(req: MusicRequest):
"messages": [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": director_input}
],
"max_tokens": 2048 # Enough for long lyrics
]
},
timeout=60
timeout=30
)
chat_data = chat_resp.json()
@ -161,55 +103,16 @@ def create_music(req: MusicRequest):
content_str = content_str.strip()
if content_str.startswith("```"):
content_str = re.sub(r'^```\w*\n?', '', content_str)
content_str = re.sub(r'```\s*$', '', content_str).strip()
# Try to extract JSON from response (robust parsing)
content_str = re.sub(r'```$', '', content_str).strip()
# Try to extract JSON from response
json_match = re.search(r'\{[\s\S]*\}', content_str)
if json_match:
json_str = json_match.group()
try:
metadata = json.loads(json_str)
except json.JSONDecodeError:
# JSON might have unescaped newlines in string values — try fixing
log(f"[Warn] JSON parse failed, attempting repair...")
# Extract fields manually via regex
title_m = re.search(r'"song_title"\s*:\s*"([^"]*)"', json_str)
style_m = re.search(r'"style"\s*:\s*"([^"]*)"', json_str)
lyrics_m = re.search(r'"lyrics"\s*:\s*"([\s\S]*)', json_str)
lyrics_val = ""
if lyrics_m:
# Take everything after "lyrics": " and strip trailing quotes/braces
lyrics_val = lyrics_m.group(1)
lyrics_val = re.sub(r'"\s*\}\s*$', '', lyrics_val).strip()
metadata = {
"song_title": title_m.group(1) if title_m else "",
"style": style_m.group(1) if style_m else "Pop music, cheerful",
"lyrics": lyrics_val
}
log(f"[Repaired] title={metadata['song_title']}, style={metadata['style'][:60]}")
elif content_str.strip().startswith("{"):
# JSON is incomplete (missing closing brace) — try adding it
log(f"[Warn] Incomplete JSON, attempting to close...")
try:
metadata = json.loads(content_str + '"}\n}')
except json.JSONDecodeError:
# Manual extraction as last resort
title_m = re.search(r'"song_title"\s*:\s*"([^"]*)"', content_str)
style_m = re.search(r'"style"\s*:\s*"([^"]*)"', content_str)
lyrics_m = re.search(r'"lyrics"\s*:\s*"([\s\S]*)', content_str)
lyrics_val = lyrics_m.group(1).rstrip('"} \n') if lyrics_m else "[Inst]"
metadata = {
"song_title": title_m.group(1) if title_m else "",
"style": style_m.group(1) if style_m else "Pop music, cheerful",
"lyrics": lyrics_val
}
log(f"[Repaired] title={metadata.get('song_title')}")
metadata = json.loads(json_match.group())
else:
raise ValueError(f"No JSON in LLM response: {content_str[:100]}")
style_val = metadata.get("style", "")
lyrics_val = clean_lyrics(metadata.get("lyrics", ""))
metadata["lyrics"] = lyrics_val # Store cleaned version
lyrics_val = metadata.get("lyrics", "")
log(f"[Director] Style: {style_val[:80]}")
log(f"[Director] Lyrics (first 60): {lyrics_val[:60]}")
@ -264,7 +167,7 @@ def create_music(req: MusicRequest):
"Content-Type": "application/json"
},
json=music_payload,
timeout=300 # 5 min — music generation can be slow
timeout=120
)
music_data = music_resp.json()
@ -285,9 +188,7 @@ def create_music(req: MusicRequest):
save_dir = os.path.join(os.path.dirname(__file__) or ".", "Capybara music")
os.makedirs(save_dir, exist_ok=True)
# Prefer song_title from LLM; fallback to user input
raw_title = metadata.get("song_title") or req.text
safe_name = re.sub(r'[^\w\u4e00-\u9fff]', '', raw_title)[:20] or "ai_song"
safe_name = re.sub(r'[^\w\u4e00-\u9fff]', '', req.text)[:20] or "ai_song"
filename = f"{safe_name}_{int(time.time())}.mp3"
filepath = os.path.join(save_dir, filename)
@ -352,230 +253,6 @@ def create_music(req: MusicRequest):
)
# ═══════════════════════════════════════════════════════════════════
# ── Story Generation (Doubao / Volcengine) ──
# ═══════════════════════════════════════════════════════════════════
@app.post("/api/create_story")
def create_story(req: StoryRequest):
"""SSE streaming endpoint generates a children's story via Doubao LLM."""
print(f"[Story] Received request: characters={req.characters}, scenes={req.scenes}, props={req.props}", flush=True)
def event_stream():
def log(msg):
print(msg, flush=True)
# ── Stage 1: Connecting ──
yield sse_event({"stage": "connecting", "progress": 5, "message": "正在连接 AI..."})
# Build user prompt from selected elements
parts = []
if req.characters:
parts.append(f"角色=[{', '.join(req.characters)}]")
if req.scenes:
parts.append(f"场景=[{', '.join(req.scenes)}]")
if req.props:
parts.append(f"道具=[{', '.join(req.props)}]")
user_prompt = "请用这些素材创作一个故事:" + "".join(parts) if parts else "请随机创作一个有趣的儿童故事"
log(f"[Story] User prompt: {user_prompt}")
# ── Stage 2: Generating (streaming) ──
yield sse_event({"stage": "generating", "progress": 10, "message": "故事正在诞生..."})
try:
# Explicitly encode as UTF-8 to avoid Windows GBK encoding issues
payload = json.dumps({
"model": DOUBAO_MODEL,
"messages": [
{"role": "system", "content": STORY_SYSTEM_PROMPT},
{"role": "user", "content": user_prompt},
],
"max_tokens": 2048,
"stream": True,
"thinking": {"type": "disabled"},
}, ensure_ascii=False)
resp = requests.post(
DOUBAO_BASE_URL,
headers={
"Authorization": f"Bearer {VOLCENGINE_API_KEY}",
"Content-Type": "application/json; charset=utf-8",
},
data=payload.encode("utf-8"),
stream=True,
timeout=120,
)
if resp.status_code != 200:
log(f"[Error] Doubao API returned {resp.status_code}: {resp.text[:300]}")
yield sse_event({"stage": "error", "progress": 0, "message": f"AI 服务返回异常 ({resp.status_code})"})
return
# Force UTF-8 decoding (requests defaults to ISO-8859-1 which garbles Chinese)
resp.encoding = "utf-8"
# Parse SSE stream from Doubao
full_content = ""
chunk_count = 0
for line in resp.iter_lines(decode_unicode=True):
if not line or not line.startswith("data: "):
continue
data_str = line[6:] # strip "data: "
if data_str.strip() == "[DONE]":
break
try:
chunk_data = json.loads(data_str)
choices = chunk_data.get("choices", [])
if choices:
delta = choices[0].get("delta", {})
delta_content = delta.get("content", "")
if delta_content:
full_content += delta_content
chunk_count += 1
# Send progress updates every 5 chunks
if chunk_count % 5 == 0:
progress = min(10 + int(chunk_count * 0.8), 85)
yield sse_event({
"stage": "generating",
"progress": progress,
"message": "故事正在诞生...",
})
except json.JSONDecodeError:
continue
log(f"[Story] Stream done. Total chunks: {chunk_count}, content length: {len(full_content)}")
log(f"[Story] Raw output (first 200): {full_content[:200]}")
if not full_content.strip():
yield sse_event({"stage": "error", "progress": 0, "message": "AI 未返回故事内容"})
return
# ── Stage 3: Parse response ──
yield sse_event({"stage": "parsing", "progress": 90, "message": "正在整理故事..."})
# Clean up response — strip markdown fences if present
cleaned = full_content.strip()
if cleaned.startswith("```"):
cleaned = re.sub(r'^```\w*\n?', '', cleaned)
cleaned = re.sub(r'```\s*$', '', cleaned).strip()
# Try to parse JSON
title = ""
content = ""
json_match = re.search(r'\{[\s\S]*\}', cleaned)
if json_match:
try:
story_json = json.loads(json_match.group())
title = story_json.get("title", "")
content = story_json.get("content", "")
except json.JSONDecodeError:
log("[Warn] JSON parse failed, extracting manually...")
title_m = re.search(r'"title"\s*:\s*"([^"]*)"', cleaned)
content_m = re.search(r'"content"\s*:\s*"([\s\S]*)', cleaned)
title = title_m.group(1) if title_m else "卡皮巴拉的故事"
if content_m:
content = content_m.group(1)
content = re.sub(r'"\s*\}\s*$', '', content).strip()
if not title and not content:
# Not JSON at all — treat entire output as story content
title = "卡皮巴拉的故事"
content = cleaned
# Clean content: replace literal \n with real newlines
content = content.replace("\\n", "\n").strip()
# Collapse 3+ newlines into 2
content = re.sub(r'\n{3,}', '\n\n', content)
log(f"[Story] Title: {title}")
log(f"[Story] Content (first 100): {content[:100]}")
# ── Save story to disk ──
save_dir = os.path.join(os.path.dirname(__file__) or ".", "Capybara stories")
os.makedirs(save_dir, exist_ok=True)
safe_name = re.sub(r'[^\w\u4e00-\u9fff]', '', title)[:20] or "story"
filename = f"{safe_name}_{int(time.time())}.txt"
filepath = os.path.join(save_dir, filename)
with open(filepath, "w", encoding="utf-8") as f:
f.write(f"# {title}\n\n{content}")
log(f"[Saved] {filepath}")
# ── Done ──
yield sse_event({
"stage": "done",
"progress": 100,
"message": "故事创作完成!",
"title": title,
"content": content,
})
except requests.exceptions.Timeout:
log("[Error] Doubao API Timeout")
yield sse_event({"stage": "error", "progress": 0, "message": "AI 响应超时,请稍后再试"})
except Exception as e:
log(f"[Error] Story generation exception: {e}")
yield sse_event({"stage": "error", "progress": 0, "message": f"故事生成失败: {str(e)}"})
return StreamingResponse(
event_stream(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"X-Accel-Buffering": "no",
"Connection": "keep-alive",
},
)
@app.get("/api/stories")
def get_stories():
"""Scan Capybara stories/ directory and return all saved stories."""
stories_dir = os.path.join(os.path.dirname(__file__) or ".", "Capybara stories")
stories = []
if not os.path.isdir(stories_dir):
return {"stories": []}
for f in sorted(os.listdir(stories_dir), reverse=True): # newest first
if not f.lower().endswith(".txt"):
continue
filepath = os.path.join(stories_dir, f)
try:
with open(filepath, "r", encoding="utf-8") as fh:
raw = fh.read()
# Parse: first line is "# Title", rest is content
lines = raw.strip().split("\n", 2)
title = lines[0].lstrip("# ").strip() if lines else f[:-4]
content = lines[2].strip() if len(lines) > 2 else ""
# Skip garbled files: if title or content has mojibake patterns, skip
# Normal Chinese chars are in range \u4e00-\u9fff; mojibake typically has
# lots of Latin Extended chars like \u00e0-\u00ff mixed with CJK
if title and not any('\u4e00' <= c <= '\u9fff' for c in title):
continue # title has no Chinese chars at all → likely garbled
# Display title: strip timestamp suffix like _1770647563
display_title = re.sub(r'_\d{10,}$', '', f[:-4])
if title:
display_title = title
stories.append({
"title": display_title,
"content": content,
"filename": f,
})
except Exception:
pass
return {"stories": stories}
@app.get("/api/playlist")
def get_playlist():
"""Scan Capybara music/ directory and return full playlist with lyrics."""
@ -614,245 +291,6 @@ def get_playlist():
return {"playlist": playlist}
# ═══════════════════════════════════════════════════════════════════
# ── TTS: 豆包语音合成 2.0 WebSocket V3 二进制协议 ──
# ═══════════════════════════════════════════════════════════════════
TTS_WS_URL = "wss://openspeech.bytedance.com/api/v1/tts/ws_binary"
TTS_CLUSTER = "volcano_tts"
TTS_SPEAKER = "ICL_zh_female_keainvsheng_tob"
_audio_dir = os.path.join(os.path.dirname(__file__) or ".", "Capybara audio")
os.makedirs(_audio_dir, exist_ok=True)
def _build_tts_v1_request(payload_json: dict) -> bytes:
"""Build a V1 full-client-request binary frame.
Header: 0x11 0x10 0x10 0x00 (v1, 4-byte header, full-client-request, JSON, no compression)
Then 4-byte big-endian payload length, then JSON payload bytes.
"""
payload_bytes = json.dumps(payload_json, ensure_ascii=False).encode("utf-8")
header = bytes([0x11, 0x10, 0x10, 0x00])
length = struct.pack(">I", len(payload_bytes))
return header + length + payload_bytes
def _parse_tts_v1_response(data: bytes):
"""Parse a V1 TTS response binary frame.
Returns (audio_bytes_or_none, is_last, is_error, error_msg).
"""
if len(data) < 4:
return None, False, True, "Frame too short"
byte1 = data[1]
msg_type = (byte1 >> 4) & 0x0F
msg_flags = byte1 & 0x0F
# Error frame: msg_type = 0xF
if msg_type == 0x0F:
offset = 4
error_code = 0
if len(data) >= offset + 4:
error_code = struct.unpack(">I", data[offset:offset + 4])[0]
offset += 4
if len(data) >= offset + 4:
msg_len = struct.unpack(">I", data[offset:offset + 4])[0]
offset += 4
error_msg = data[offset:offset + msg_len].decode("utf-8", errors="replace")
else:
error_msg = f"error code {error_code}"
print(f"[TTS Error] code={error_code}, msg={error_msg}", flush=True)
return None, False, True, error_msg
# Audio-only response: msg_type = 0xB
if msg_type == 0x0B:
# flags: 0b0000=no seq, 0b0001=seq>0, 0b0010/0b0011=last (seq<0)
is_last = (msg_flags & 0x02) != 0 # bit 1 set = last message
offset = 4
# If flags != 0, there's a 4-byte sequence number
if msg_flags != 0:
offset += 4 # skip sequence number
if len(data) < offset + 4:
return None, is_last, False, ""
payload_size = struct.unpack(">I", data[offset:offset + 4])[0]
offset += 4
audio_data = data[offset:offset + payload_size]
return audio_data, is_last, False, ""
# Server response with JSON (msg_type = 0x9): usually contains metadata
if msg_type == 0x09:
offset = 4
if len(data) >= offset + 4:
payload_size = struct.unpack(">I", data[offset:offset + 4])[0]
offset += 4
json_str = data[offset:offset + payload_size].decode("utf-8", errors="replace")
print(f"[TTS] Server JSON: {json_str[:200]}", flush=True)
return None, False, False, ""
return None, False, False, ""
async def tts_synthesize(text: str) -> bytes:
"""Connect to Doubao TTS V1 WebSocket and synthesize text to MP3 bytes."""
headers = {
"Authorization": f"Bearer;{TTS_ACCESS_TOKEN}",
}
payload = {
"app": {
"appid": TTS_APP_ID,
"token": "placeholder",
"cluster": TTS_CLUSTER,
},
"user": {
"uid": "airhub_user",
},
"audio": {
"voice_type": TTS_SPEAKER,
"encoding": "mp3",
"speed_ratio": 1.0,
"rate": 24000,
},
"request": {
"reqid": str(uuid.uuid4()),
"text": text,
"operation": "submit", # streaming mode
},
}
audio_buffer = bytearray()
request_frame = _build_tts_v1_request(payload)
print(f"[TTS] Connecting to V1 WebSocket... text length={len(text)}", flush=True)
async with websockets.connect(
TTS_WS_URL,
extra_headers=headers,
max_size=10 * 1024 * 1024, # 10MB max frame
ping_interval=None,
) as ws:
# Send request
await ws.send(request_frame)
print("[TTS] Request sent, waiting for audio...", flush=True)
# Receive audio chunks
chunk_count = 0
async for message in ws:
if isinstance(message, bytes):
audio_data, is_last, is_error, error_msg = _parse_tts_v1_response(message)
if is_error:
raise RuntimeError(f"TTS error: {error_msg}")
if audio_data and len(audio_data) > 0:
audio_buffer.extend(audio_data)
chunk_count += 1
if is_last:
print(f"[TTS] Last frame received. chunks={chunk_count}, "
f"audio size={len(audio_buffer)} bytes", flush=True)
break
return bytes(audio_buffer)
class TTSRequest(BaseModel):
title: str
content: str
@app.get("/api/tts_check")
def tts_check(title: str = Query(...)):
"""Check if audio already exists for a story title."""
for f in os.listdir(_audio_dir):
if f.lower().endswith(".mp3"):
# Match by title prefix (before timestamp)
name = f[:-4] # strip .mp3
name_without_ts = re.sub(r'_\d{10,}$', '', name)
if name_without_ts == title or name == title:
return {
"exists": True,
"audio_url": f"Capybara audio/{f}",
}
return {"exists": False, "audio_url": None}
@app.post("/api/create_tts")
def create_tts(req: TTSRequest):
"""Generate TTS audio for a story. Returns SSE stream with progress."""
def event_stream():
import asyncio
yield sse_event({"stage": "connecting", "progress": 10,
"message": "正在连接语音合成服务..."})
# Check if audio already exists
for f in os.listdir(_audio_dir):
if f.lower().endswith(".mp3"):
name = f[:-4]
name_without_ts = re.sub(r'_\d{10,}$', '', name)
if name_without_ts == req.title:
yield sse_event({"stage": "done", "progress": 100,
"message": "语音已存在",
"audio_url": f"Capybara audio/{f}"})
return
yield sse_event({"stage": "generating", "progress": 30,
"message": "AI 正在朗读故事..."})
try:
# Run async TTS in a new event loop
loop = asyncio.new_event_loop()
audio_bytes = loop.run_until_complete(tts_synthesize(req.content))
loop.close()
if not audio_bytes or len(audio_bytes) < 100:
yield sse_event({"stage": "error", "progress": 0,
"message": "语音合成返回了空音频"})
return
yield sse_event({"stage": "saving", "progress": 80,
"message": "正在保存音频..."})
# Save audio file
timestamp = int(time.time())
safe_title = re.sub(r'[<>:"/\\|?*]', '', req.title)[:50]
filename = f"{safe_title}_{timestamp}.mp3"
filepath = os.path.join(_audio_dir, filename)
with open(filepath, "wb") as f:
f.write(audio_bytes)
print(f"[TTS Saved] {filepath} ({len(audio_bytes)} bytes)", flush=True)
yield sse_event({"stage": "done", "progress": 100,
"message": "语音生成完成!",
"audio_url": f"Capybara audio/{filename}"})
except Exception as e:
print(f"[TTS Error] {e}", flush=True)
yield sse_event({"stage": "error", "progress": 0,
"message": f"语音合成失败: {str(e)}"})
return StreamingResponse(event_stream(), media_type="text/event-stream")
# ── Static file serving ──
from fastapi.staticfiles import StaticFiles
# Music directory
_music_dir = os.path.join(os.path.dirname(__file__) or ".", "Capybara music")
os.makedirs(_music_dir, exist_ok=True)
app.mount("/Capybara music", StaticFiles(directory=_music_dir), name="music_files")
# Audio directory (TTS generated)
app.mount("/Capybara audio", StaticFiles(directory=_audio_dir), name="audio_files")
if __name__ == "__main__":
print("[Server] Music Server running on http://localhost:3000")
uvicorn.run(app, host="0.0.0.0", port=3000)

View File

@ -3,7 +3,7 @@
> **用途**:每次对话结束前 / 做完一个阶段后更新此文件。
> 新对话开始时AI 先读此文件恢复上下文。
>
> **最后更新**2026-02-10 (第九次对话)
> **最后更新**2026-02-08 (第七次对话)
---
@ -105,97 +105,8 @@
#### 选中态交互修复
- 生成完成后自动清除 `_selectedMoodIndex`,不再残留选中状态
### 第八次对话完成的工作2026-02-09
#### 音乐生成 API 全链路接入(第七次对话中完成,此处补记)
- **SSE 实时进度**:点击心情卡片 → 发 POST 到后端 → SSE 流式推送 lyrics/music/saving/done/error 各阶段
- **MusicGenerationService 单例**:生成任务在后台运行,页面切走不中断
- **切回页面恢复**:切回音乐页弹窗通知生成结果,不在其他页面自动播放
- **进度环动画**:环形光晕进度条(匹配 HTML 版视觉)+ 翻面歌词时仍可见
- **进度条防闪烁**:用 `_crawlId` 取消令牌确保同一时间只有一条爬升动画
- **超时友好提示**:气泡显示"网络开小差了,再试一次~"
- **歌词清洗**:前后端双重清理(去 `\n`、去 `[verse]` 等结构标签、去 JSON 引号)
- **歌名修复**:后端优先取 LLM 返回的 `song_title`,歌词增长至 16-24 行
- **对话气泡样式**:文字垂直居中 + 三角尾巴
- **通知页展开 bug 修复**`AnimatedCrossFade``ClipRect + AnimatedSize` 避免文字竖排
#### 启动时加载历史歌曲
- **后端**`/api/playlist` 接口扫描 `Capybara music/` 目录,返回所有 mp3 + 歌词
- **Service 层**`MusicGenerationService.fetchPlaylist()` 拉取列表
- **前端**`initState` 异步调用,将服务器歌曲插入唱片架最前面(去重,不重复加载硬编码的 4 首)
- **当前服务器上有 19 首 AI 生成歌曲 + 4 首原始歌曲,重编译后唱片架不再丢失**
#### 故事生成接入豆包 API
- **后端**`server.py` 新增 `/api/create_story` 接口SSE 流式调用豆包 Chat API
- **模型**Doubao-Seed-1.6-lite`doubao-seed-1-6-lite-250515`),关闭深度思考加快响应
- **Prompt**`prompts/story_director.md`儿童故事创作大师400-600 字JSON 输出
- **前端串联**
- `StoryGeneratorModal` 返回选中素材 Map角色/场景/道具分类)
- `DeviceControlPage` 把素材传给 `StoryLoadingPage`
- `StoryLoadingPage` 调真实 APISSE 实时进度(连接→生成→解析→完成)
- `StoryDetailPage` 无需改动,已支持接收动态故事数据
- **故事保存**:生成的故事文本保存到 `Capybara stories/` 目录
- **错误处理**超时提示、API 异常、空内容兜底,错误时显示"返回重试"按钮
#### 唱片架高度优化
- 限制唱片架弹窗高度为 3.5 行,超出部分滚动查看
- 半行露出作为"还有更多"的视觉提示
#### Windows 编码问题全面修复
- `sys.stdout` / `sys.stderr` 强制 UTF-8Windows 默认 GBK 导致中文乱码)
- Doubao API 请求体手动 `json.dumps + encode("utf-8")`,避免 `requests` 用 GBK 编码
- SSE 流使用 `ensure_ascii=True`,确保前端 `jsonDecode` 100% 正常
- `resp.encoding = "utf-8"` 强制豆包返回流按 UTF-8 解码
- 已清理 4 个编码出错时保存的乱码故事文件
#### 书架加载历史故事
- **后端**`/api/stories` 接口扫描 `Capybara stories/` 返回所有故事标题+内容
- **前端**`initState` 异步拉取,历史故事排在预设故事之后
- **保存联动**:新生成的故事保存后,真实标题+内容即时加入书架(不再用 mock 数据)
- **封面区分**预设故事显示封面图AI 生成的故事显示淡紫渐变"暂无封面"占位
- **乱码过滤**API 层自动跳过无中文标题的异常文件
### 第九次对话完成的工作2026-02-10
#### TTS 语音合成全链路接入(上次对话完成,此处补记)
- **后端**`server.py` 新增 `/api/tts` 接口WebSocket 流式调用豆包 TTS V1 API
- **音色**:可爱女生(`ICL_zh_female_keainvsheng_tob`
- **前端组件**`PillProgressButton`(药丸形进度按钮)替代旧 RingProgressButton
- 5 种状态idle / ready / generating / completed / playing / paused / error
- 进度环动画 + 音波动效 + 发光效果
- **TTSService 单例**:后台持续运行,切页面不中断生成
- **音频保存**:生成的 TTS 音频保存到 `Capybara audio/` 目录
- **暂停/续播修复**:显式 seek 到暂停位置再 play解决 Web 端从头播放的 bug
- **按钮状态修复**:新增 `ready` 状态,未播放过的音频显示"播放"而非"继续"
- **自动播放控制**:仅在用户停留在故事页时自动播放,切出页面不自动播
#### 音乐总监 Prompt 优化
- **歌名去重复**:移除固定示例("温泉咔咔乐"等),改为"根据场景自由发挥,不要套用固定模板"
- **效果**AI 每次为相似场景生成不同歌名,唱片架不再出现一堆同名歌曲
#### 唱片架播放状态可视化
- **卡片高亮**:当前播放的歌曲整张卡片变暖金色底 + 金色边框 + 阴影
- **标题标识**:播放中的歌曲标题前加小喇叭图标 + 金色加粗文字
- **音波动效**:播放中的唱片中心叠加跳动音波 CustomPaint 动画
#### 气泡持续显示当前歌名
- 播放期间气泡始终显示"正在播放: xxx",不再 3 秒后消失
- 直接点播放按钮(非从唱片架选歌)也会显示歌名
- 暂停时气泡自动隐藏,切歌时自动更新
- 使用 `_playStickyText` 机制,即使其他临时消息弹出后也会恢复播放信息
#### 调研 AI 音乐生成平台
- 对比了 MiniMax Music 2.5现用、Mureka昆仑万维、天谱乐、ACE-Step
- 发现 Mureka 有中国站 APIplatform.mureka.cn质量评测超越 Suno V4
- 用户的朋友用的 Muse AI App 底层就是 Mureka 模型
- MiniMax 文本模型abab6.5s-chat价格偏高可考虑切豆包
- 歌词生成费用极低(每次约 0.005 元主要成本在音乐生成1 元/首)
### 正在做的 / 待办
- 故事封面方案待定(付费生成 or 免费生成)
- 考虑将音乐生成从 MiniMax 切换到 Mureka用户在评估中
- 考虑将歌词生成的 LLM 从 MiniMax abab6.5s-chat 切到豆包(更便宜)
- 长歌名 fallback 问题LLM 返回空 song_title 时用了用户输入原文当歌名,后续可优化
### 正在做的
- 下一步待定
---