Files
John Qiu 712063071c refactor: 通用技能按类别拆分为独立目录
skills/ → skills-dev(9), skills-req(10), skills-ops(4),
skills-integration(8), skills-biz(4), skills-workflow(7)

generate-marketplace.py 改为自动扫描所有 skills-* 目录。

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 11:31:58 +10:30

509 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: doubao-voice
description: 豆包语音API调用。支持语音合成(TTS)和唱歌。当用户提到语音合成、文字转语音、唱歌、豆包语音相关任务时自动激活。
---
# 豆包语音API技能
调用火山引擎豆包语音API实现文字转语音(TTS)和唱歌功能。
## 核心功能 ⭐
### 1. 文字转语音 (TTS)
```bash
# 1. 配置环境变量
export DOUBAO_APP_ID="your_app_id"
export DOUBAO_ACCESS_TOKEN="your_access_token"
# 2. 文字转语音
python scripts/voice_converter.py tts "你好世界"
```
### 2. 唱歌 🎵
```bash
# 让豆包唱歌
python scripts/singing.py sing "请唱一首关于春天的歌"
# 交互式唱歌模式
python scripts/singing.py interactive
```
## 功能概述
| 模块 | 功能 | 推荐模型 |
|------|------|---------|
| **语音合成 (TTS)** | 文字转语音、多种音色 | 豆包语音合成模型2.0 |
| **唱歌** | 实时语音交互、唱歌、角色扮演 | 豆包端到端实时语音大模型 |
---
## 环境配置
### 1. 获取火山引擎豆包语音凭证
1. 访问 [火山引擎控制台](https://console.volcengine.com/)
2. 开通「豆包语音」服务
3. 创建应用获取 `App ID``Access Token`
4. 开通所需服务:
- 「语音合成」权限:大模型语音合成
### 2. 环境变量配置
```bash
# ~/.zshrc 或 ~/.bashrc
export DOUBAO_APP_ID="your_app_id"
export DOUBAO_ACCESS_TOKEN="your_access_token"
export DOUBAO_CLUSTER="volcano_tts" # TTS服务集群
```
### 3. Python 依赖
```bash
# 推荐使用 uv
uv pip install requests websocket-client
# 或使用 pip
pip install requests websocket-client
```
---
## API 基础
### Base URL
```
TTS API: https://openspeech.bytedance.com/api/v1/tts
```
### 认证方式
使用 Access Token 进行认证,在请求头中添加:
```
Authorization: Bearer {access_token}
```
---
## 一、语音合成 (TTS)
### 1.1 基础语音合成
将文本转换为语音文件。
**自然语言示例**:
- "把这段文字转成语音"
- "用豆包合成语音"
- "生成语音:你好,欢迎使用豆包语音"
**Python 实现**:
```python
import os
import requests
import json
import base64
def text_to_speech(text: str, voice_type: str = "BV700_V2_streaming", output_file: str = "output.mp3"):
"""
文字转语音
Args:
text: 要合成的文本
voice_type: 音色类型 (默认: BV700_V2_streaming)
output_file: 输出音频文件路径
Returns:
音频文件路径
"""
app_id = os.environ.get("DOUBAO_APP_ID")
access_token = os.environ.get("DOUBAO_ACCESS_TOKEN")
cluster = os.environ.get("DOUBAO_CLUSTER", "volcano_tts")
url = "https://openspeech.bytedance.com/api/v1/tts"
headers = {
"Authorization": f"Bearer {access_token}",
"Content-Type": "application/json"
}
payload = {
"app": {
"appid": app_id,
"token": access_token,
"cluster": cluster
},
"user": {
"uid": "user123"
},
"audio": {
"voice_type": voice_type,
"encoding": "mp3",
"speed_ratio": 1.0,
"volume_ratio": 1.0,
"pitch_ratio": 1.0
},
"request": {
"reqid": "req_" + os.urandom(8).hex(),
"text": text,
"text_type": "plain",
"operation": "query"
}
}
response = requests.post(url, headers=headers, json=payload)
data = response.json()
if data.get("code") == 3000:
# 解码音频数据
audio_data = base64.b64decode(data["data"])
with open(output_file, "wb") as f:
f.write(audio_data)
return output_file
else:
raise Exception(f"TTS 失败: {data}")
# 使用示例
audio_file = text_to_speech("你好,我是豆包语音助手")
print(f"语音已生成: {audio_file}")
```
### 1.2 流式语音合成
适用于长文本,边生成边播放。
```python
import websocket
import json
import os
def stream_tts(text: str, voice_type: str = "BV700_V2_streaming"):
"""
流式语音合成
Args:
text: 要合成的文本
voice_type: 音色类型
"""
app_id = os.environ.get("DOUBAO_APP_ID")
access_token = os.environ.get("DOUBAO_ACCESS_TOKEN")
ws_url = f"wss://openspeech.bytedance.com/api/v1/tts/ws?appid={app_id}&token={access_token}"
def on_message(ws, message):
data = json.loads(message)
if "audio" in data:
# 处理音频数据
audio_chunk = base64.b64decode(data["audio"])
# 播放或保存音频片段
print(f"收到音频片段: {len(audio_chunk)} 字节")
def on_open(ws):
payload = {
"app": {
"appid": app_id,
"token": access_token,
"cluster": "volcano_tts"
},
"user": {
"uid": "user123"
},
"audio": {
"voice_type": voice_type,
"encoding": "mp3"
},
"request": {
"reqid": "stream_" + os.urandom(8).hex(),
"text": text,
"text_type": "plain",
"operation": "submit"
}
}
ws.send(json.dumps(payload))
ws = websocket.WebSocketApp(
ws_url,
on_message=on_message,
on_open=on_open
)
ws.run_forever()
# 使用示例
stream_tts("这是一段很长的文本,使用流式合成可以边生成边播放...")
```
### 1.3 音色选择
豆包语音提供多种音色:
| 音色代码 | 描述 | 场景 |
|---------|------|------|
| BV700_V2_streaming | 通用女声 | 通用场景 |
| BV701_V2_streaming | 通用男声 | 通用场景 |
| BV406_streaming | 温柔女声 | 客服、助手 |
| BV158_streaming | 活泼女声 | 教育、娱乐 |
| BV115_streaming | 磁性男声 | 新闻、播音 |
**查询可用音色**:
```bash
TOKEN="${DOUBAO_ACCESS_TOKEN}"
APP_ID="${DOUBAO_APP_ID}"
curl -s "https://openspeech.bytedance.com/api/v1/tts/voices?appid=$APP_ID" \
-H "Authorization: Bearer $TOKEN"
```
---
## 完整工具类
```python
import os
import requests
import base64
import json
from typing import Optional
class DoubaoVoice:
"""豆包语音API工具类"""
BASE_URL = "https://openspeech.bytedance.com/api/v1"
def __init__(self, app_id: str = None, access_token: str = None):
self.app_id = app_id or os.environ.get("DOUBAO_APP_ID")
self.access_token = access_token or os.environ.get("DOUBAO_ACCESS_TOKEN")
self.cluster_tts = os.environ.get("DOUBAO_CLUSTER", "volcano_tts")
@property
def headers(self):
return {
"Authorization": f"Bearer {self.access_token}",
"Content-Type": "application/json"
}
def text_to_speech(
self,
text: str,
voice_type: str = "BV700_V2_streaming",
output_file: str = "output.mp3"
) -> str:
"""文字转语音"""
url = f"{self.BASE_URL}/tts"
payload = {
"app": {
"appid": self.app_id,
"token": self.access_token,
"cluster": self.cluster_tts
},
"user": {"uid": "user123"},
"audio": {
"voice_type": voice_type,
"encoding": "mp3",
"speed_ratio": 1.0,
"volume_ratio": 1.0,
"pitch_ratio": 1.0
},
"request": {
"reqid": "req_" + os.urandom(8).hex(),
"text": text,
"text_type": "plain",
"operation": "query"
}
}
response = requests.post(url, headers=self.headers, json=payload)
data = response.json()
if data.get("code") == 3000:
audio_data = base64.b64decode(data["data"])
with open(output_file, "wb") as f:
f.write(audio_data)
return output_file
else:
raise Exception(f"TTS 失败: {data}")
def list_voices(self) -> list:
"""获取可用音色列表"""
url = f"{self.BASE_URL}/tts/voices"
params = {"appid": self.app_id}
response = requests.get(url, headers=self.headers, params=params)
data = response.json()
if data.get("code") == 0:
return data["voices"]
else:
raise Exception(f"获取音色列表失败: {data}")
# ==================== 使用示例 ====================
if __name__ == "__main__":
voice = DoubaoVoice()
# 示例1: 文字转语音
audio_file = voice.text_to_speech("你好,我是豆包语音助手")
print(f"语音已生成: {audio_file}")
# 示例2: 查看可用音色
voices = voice.list_voices()
for v in voices[:5]:
print(f"{v['voice_type']}: {v['description']}")
```
---
## 二、唱歌 (豆包端到端实时语音大模型)
### 2.1 基础唱歌
让豆包唱歌,支持任何歌曲主题。
**自然语言示例**:
- "请唱一首关于春天的歌"
- "唱一个温柔的摇篮曲"
- "来一首欢快的儿歌"
**Python 实现**:
```python
import asyncio
from scripts.singing import DoubaoSinging
async def main():
singing = DoubaoSinging()
# 让豆包唱歌
audio_file = await singing.sing(
"请唱一首关于春天的歌",
output_file="spring_song.mp3",
language="zh-CN"
)
print(f"唱歌完成: {audio_file}")
asyncio.run(main())
```
### 2.2 交互式唱歌
与豆包进行实时对话,可以要求她唱歌、讲故事等。
**Python 实现**:
```python
import asyncio
from scripts.singing import DoubaoSinging
async def main():
singing = DoubaoSinging()
# 启动交互式模式
await singing.interactive_singing(language="zh-CN")
asyncio.run(main())
```
**交互示例**:
```
你: 请唱一首情歌
豆包: [生成音频] 我会为你唱一首温柔的情歌...
你: 能加点方言吗?
豆包: [用方言重新唱歌]
你: quit
再见!
```
---
## 自然语言操作示例
### TTS 操作
| 用户说 | 执行操作 |
|--------|----------|
| "把这段话转成语音:你好世界" | 调用 TTS API 生成语音 |
| "用温柔女声合成语音" | 使用 BV406_streaming 音色 |
| "生成一段播音腔的新闻语音" | 使用磁性男声音色 |
### 唱歌操作
| 用户说 | 执行操作 |
|--------|----------|
| "请唱一首关于春天的歌" | 调用端到端实时语音大模型生成唱歌音频 |
| "唱一首摇篮曲" | 生成温柔的摇篮曲 |
| "唱歌的同时讲个故事" | 交互式对话中唱歌并讲故事 |
| "开启交互式唱歌模式" | 启动实时语音交互 |
---
## 计费说明
### TTS 计费
- **并发版**: 2000元/并发/月(纯并发计费,不收取字符调用费用)
- **按量付费**: 按合成字符数计费
### 免费试用
新用户开通服务后可获得一定免费额度,具体额度以控制台显示为准。
---
## 注意事项
1. **音频格式**: TTS 支持 mp3/wav/pcm
2. **文本长度**: TTS 单次请求最长支持 5000 字符
3. **并发限制**: 注意 API 调用频率和并发数限制
4. **Token 安全**: Access Token 存储在环境变量中,不要硬编码
---
## 错误处理
```python
def safe_tts(text: str):
"""带错误处理的 TTS"""
try:
voice = DoubaoVoice()
return voice.text_to_speech(text)
except Exception as e:
if "401" in str(e):
print("认证失败,请检查 Access Token")
elif "429" in str(e):
print("请求过于频繁,请稍后重试")
else:
print(f"合成失败: {e}")
return None
```
---
## 常见场景
### 场景 1: 生成多语言语音
```python
voice = DoubaoVoice()
# 中文
voice.text_to_speech("你好", voice_type="BV700_V2_streaming", output_file="zh.mp3")
# 英文
voice.text_to_speech("Hello", voice_type="EN_001", output_file="en.mp3")
```
---
## 参考资源
- [火山引擎豆包语音文档](https://www.volcengine.com/docs/6561/1359369)
- [豆包语音控制台](https://console.volcengine.com/speech/app)
- [API 接口文档](https://www.volcengine.com/docs/6561/1359370)
- [计费说明](https://www.volcengine.com/docs/6561/1359370)