refactor: 合并 claude-marketplace,重构目录结构为单一仓库
- 重命名 plugins/ → skills/,个人插件迁移到 skills-personal/(gitignore) - 更新 generate-marketplace.py 支持 config 读取和 skills-personal 扫描 - 新增 claude-config.yaml(技能启用/禁用 + MCP 配置) - 新增 init.sh(交互式 MCP 初始化,支持 stdio/SSE 模式) - 新增 CLAUDE.md 项目说明 - 重写 README.md 反映新结构 - 删除过时脚本:PUSH.sh、generate-marketplace.sh、convert-skills.sh Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
14
skills/doubao-voice-plugin/.claude-plugin/plugin.json
Normal file
14
skills/doubao-voice-plugin/.claude-plugin/plugin.json
Normal file
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"name": "doubao-voice-plugin",
|
||||
"description": "Doubao (豆包) Voice API integration for TTS and ASR",
|
||||
"version": "1.0.0",
|
||||
"author": {
|
||||
"name": "qiudl"
|
||||
},
|
||||
"skills": [
|
||||
{
|
||||
"name": "doubao-voice",
|
||||
"path": "./skills/SKILL.md"
|
||||
}
|
||||
]
|
||||
}
|
||||
54
skills/doubao-voice-plugin/.gitignore
vendored
Normal file
54
skills/doubao-voice-plugin/.gitignore
vendored
Normal file
@@ -0,0 +1,54 @@
|
||||
# 音频文件(生成的测试输出)
|
||||
*.mp3
|
||||
*.wav
|
||||
*.pcm
|
||||
|
||||
# 测试脚本(仅本地使用)
|
||||
scripts/test_*.py
|
||||
scripts/check_credentials.py
|
||||
scripts/README_TEST.md
|
||||
|
||||
# 系统文件
|
||||
.DS_Store
|
||||
.DS_Store?
|
||||
._*
|
||||
.Spotlight-V100
|
||||
.Trashes
|
||||
ehthumbs.db
|
||||
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.so
|
||||
.Python
|
||||
build/
|
||||
develop-eggs/
|
||||
dist/
|
||||
downloads/
|
||||
eggs/
|
||||
.eggs/
|
||||
lib/
|
||||
lib64/
|
||||
parts/
|
||||
sdist/
|
||||
var/
|
||||
wheels/
|
||||
*.egg-info/
|
||||
.installed.cfg
|
||||
*.egg
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
|
||||
# 环境配置(包含凭证的本地文件)
|
||||
setup_env.local.sh
|
||||
.env
|
||||
.env.local
|
||||
|
||||
# 测试生成的文件
|
||||
*.log
|
||||
test_output/
|
||||
201
skills/doubao-voice-plugin/DEPLOY.md
Normal file
201
skills/doubao-voice-plugin/DEPLOY.md
Normal file
@@ -0,0 +1,201 @@
|
||||
# 部署指南
|
||||
|
||||
## 在另一台电脑上使用这个 Skill
|
||||
|
||||
### ✅ 可以直接使用吗?
|
||||
|
||||
**大部分功能可以直接使用!** 但需要做一些简单的配置。
|
||||
|
||||
---
|
||||
|
||||
## 📋 部署步骤
|
||||
|
||||
### 1️⃣ 将插件复制到新电脑
|
||||
|
||||
```bash
|
||||
# 方式1: 从Git克隆
|
||||
git clone <repo-url> doubao-voice-plugin
|
||||
|
||||
# 方式2: 复制文件夹
|
||||
cp -r doubao-voice-plugin /path/to/new/location
|
||||
```
|
||||
|
||||
### 2️⃣ 安装依赖
|
||||
|
||||
**核心依赖** (必需):
|
||||
```bash
|
||||
pip3 install requests
|
||||
```
|
||||
|
||||
**可选依赖** (仅用voice_converter_sdk.py时需要):
|
||||
```bash
|
||||
pip3 install volcengine
|
||||
```
|
||||
|
||||
**检查是否安装成功**:
|
||||
```bash
|
||||
python3 -c "import requests; print('✅ requests 已安装')"
|
||||
```
|
||||
|
||||
### 3️⃣ 配置凭证
|
||||
|
||||
创建本地配置文件:
|
||||
```bash
|
||||
cd scripts
|
||||
cp setup_env.local.sh.example setup_env.local.sh
|
||||
```
|
||||
|
||||
编辑 `setup_env.local.sh`,填入您的火山引擎凭证:
|
||||
```bash
|
||||
export DOUBAO_APP_ID="your_app_id"
|
||||
export DOUBAO_ACCESS_TOKEN="your_access_token"
|
||||
```
|
||||
|
||||
### 4️⃣ 使用
|
||||
|
||||
```bash
|
||||
# 加载环境变量
|
||||
source scripts/setup_env.local.sh
|
||||
|
||||
# 文字转语音
|
||||
python3 scripts/voice_converter.py tts "你好世界" -o hello.mp3
|
||||
|
||||
# 语音转文字(需先启用ASR服务)
|
||||
python3 scripts/voice_converter.py asr audio.mp3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 系统要求
|
||||
|
||||
| 需求 | 版本 | 状态 |
|
||||
|------|------|------|
|
||||
| **Python** | 3.6+ | ✅ 必需 |
|
||||
| **requests** | 任意版本 | ✅ 必需 |
|
||||
| **volcengine** | 任意版本 | ⚠️ 可选 |
|
||||
| **操作系统** | Linux/Mac/Windows | ✅ 都支持 |
|
||||
|
||||
---
|
||||
|
||||
## 🚨 常见问题
|
||||
|
||||
### Q: 错误 "ModuleNotFoundError: No module named 'requests'"
|
||||
**解决**:
|
||||
```bash
|
||||
pip3 install requests
|
||||
```
|
||||
|
||||
### Q: 错误 "DOUBAO_APP_ID not found"
|
||||
**解决**:
|
||||
```bash
|
||||
# 检查环境变量
|
||||
echo $DOUBAO_APP_ID
|
||||
|
||||
# 如果为空,重新加载配置
|
||||
source setup_env.local.sh
|
||||
```
|
||||
|
||||
### Q: 为什么 ASR 不工作?
|
||||
**原因**: 需要在火山引擎控制台启用 ASR 服务
|
||||
**解决**: 访问 https://console.volcengine.com/speech/service,启用语音识别服务
|
||||
|
||||
### Q: 可以在 Windows 上使用吗?
|
||||
**可以!** 但环境变量设置方式不同:
|
||||
|
||||
```batch
|
||||
REM Windows CMD
|
||||
set DOUBAO_APP_ID=your_app_id
|
||||
set DOUBAO_ACCESS_TOKEN=your_access_token
|
||||
python scripts\voice_converter.py tts "你好" -o hello.mp3
|
||||
```
|
||||
|
||||
或在 PowerShell:
|
||||
```powershell
|
||||
$env:DOUBAO_APP_ID="your_app_id"
|
||||
$env:DOUBAO_ACCESS_TOKEN="your_access_token"
|
||||
python scripts/voice_converter.py tts "你好" -o hello.mp3
|
||||
```
|
||||
|
||||
### Q: 如何在 Docker 中使用?
|
||||
**Dockerfile 示例**:
|
||||
```dockerfile
|
||||
FROM python:3.9-slim
|
||||
|
||||
WORKDIR /app
|
||||
COPY . .
|
||||
|
||||
RUN pip install requests
|
||||
|
||||
ENV DOUBAO_APP_ID=${DOUBAO_APP_ID}
|
||||
ENV DOUBAO_ACCESS_TOKEN=${DOUBAO_ACCESS_TOKEN}
|
||||
|
||||
ENTRYPOINT ["python", "scripts/voice_converter.py"]
|
||||
```
|
||||
|
||||
运行:
|
||||
```bash
|
||||
docker build -t doubao-voice .
|
||||
docker run -e DOUBAO_APP_ID=xxx -e DOUBAO_ACCESS_TOKEN=xxx doubao-voice tts "你好"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📦 三种使用方式
|
||||
|
||||
### 方式 1: 命令行 (推荐简单使用)
|
||||
```bash
|
||||
python3 scripts/voice_converter.py tts "文本" -o output.mp3
|
||||
```
|
||||
|
||||
### 方式 2: Python 模块导入
|
||||
```python
|
||||
import sys
|
||||
sys.path.insert(0, 'scripts')
|
||||
from voice_converter import DoubaoVoiceConverter
|
||||
|
||||
converter = DoubaoVoiceConverter()
|
||||
converter.text_to_speech("你好世界", output_file="hello.mp3")
|
||||
```
|
||||
|
||||
### 方式 3: Claude Code Skill (自动)
|
||||
如果安装在 Claude Code 的 plugins 目录,会自动识别为 Skill:
|
||||
```bash
|
||||
# 用户说: "把这段话转成语音:你好世界"
|
||||
# → 自动调用 TTS API
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔐 安全提示
|
||||
|
||||
✅ **推荐做法**:
|
||||
- 凭证存储在 `.local` 文件中(不在 Git 中)
|
||||
- 使用环境变量而不是硬编码
|
||||
- 定期更新 Access Token
|
||||
|
||||
❌ **不要做**:
|
||||
- 不要把凭证提交到 Git
|
||||
- 不要在脚本中硬编码凭证
|
||||
- 不要分享包含凭证的配置文件
|
||||
|
||||
---
|
||||
|
||||
## 📝 最小化部署清单
|
||||
|
||||
```bash
|
||||
✅ 复制文件夹
|
||||
✅ pip install requests
|
||||
✅ 复制并编辑 setup_env.local.sh
|
||||
✅ source setup_env.local.sh
|
||||
✅ python3 scripts/voice_converter.py tts "测试"
|
||||
✅ 成功!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🆘 如需帮助
|
||||
|
||||
1. 检查 README.md (用户文档)
|
||||
2. 查看 skills/SKILL.md (API 文档)
|
||||
3. 查看 STATUS.md (开发状态)
|
||||
|
||||
196
skills/doubao-voice-plugin/GIT_GUIDE.md
Normal file
196
skills/doubao-voice-plugin/GIT_GUIDE.md
Normal file
@@ -0,0 +1,196 @@
|
||||
# Git 提交指南
|
||||
|
||||
## 📋 提交清单
|
||||
|
||||
### ✅ 应该提交的文件
|
||||
|
||||
```bash
|
||||
git add .
|
||||
git status # 确认以下文件已staged
|
||||
|
||||
应包含:
|
||||
- .claude-plugin/plugin.json # 插件配置
|
||||
- skills/SKILL.md # 技能文档
|
||||
- scripts/voice_converter.py # 核心工具
|
||||
- scripts/voice_converter_v2.py # 备选方案
|
||||
- scripts/voice_converter_sdk.py # 备选方案
|
||||
- scripts/check_credentials.py # 诊断工具
|
||||
- scripts/test_services.py # 服务测试
|
||||
- scripts/test_v3_debug.py # V3调试工具
|
||||
- scripts/setup_env.sh # 示例脚本(占位符版本)
|
||||
- scripts/setup_env.local.sh.example # 本地配置模板
|
||||
- README.md # 用户文档
|
||||
- STATUS.md # 开发状态
|
||||
- .gitignore # Git忽略规则
|
||||
- GIT_GUIDE.md # 本文件
|
||||
```
|
||||
|
||||
### ❌ 被自动忽略的文件(勿手动提交)
|
||||
|
||||
```bash
|
||||
# .gitignore 已配置,以下文件不会被提交:
|
||||
- *.mp3, *.wav, *.pcm # 音频文件
|
||||
- .DS_Store # 系统文件
|
||||
- setup_env.local.sh # 本地凭证文件
|
||||
- .env, .env.local # 环境变量文件
|
||||
- __pycache__/ # Python缓存
|
||||
- .vscode/, .idea/ # IDE配置
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔐 凭证管理 (重要!)
|
||||
|
||||
### 本地使用流程
|
||||
|
||||
```bash
|
||||
# 1. 基于模板创建本地配置文件
|
||||
cd scripts
|
||||
cp setup_env.local.sh.example setup_env.local.sh
|
||||
|
||||
# 2. 编辑本地文件,填入您的真实凭证
|
||||
nano setup_env.local.sh # 或用您喜欢的编辑器
|
||||
|
||||
# 3. 本地使用时,source 本地文件
|
||||
source setup_env.local.sh
|
||||
|
||||
# 4. 验证(注意:setup_env.local.sh 在 .gitignore 中)
|
||||
git status # 应该看不到 setup_env.local.sh
|
||||
```
|
||||
|
||||
### 关键安全要点
|
||||
|
||||
✅ **做这些**:
|
||||
- 凭证存储在本地的 `.local` 文件中
|
||||
- 凭证存储在环境变量中(不硬编码)
|
||||
- 公开文件只包含占位符 `your_app_id`, `your_access_token`
|
||||
- 定期检查 git status 确保没有凭证被暴露
|
||||
|
||||
❌ **不要做这些**:
|
||||
- 不要把真实凭证提交到 Git
|
||||
- 不要硬编码凭证在 Python 文件中
|
||||
- 不要修改 .gitignore,让敏感文件被跟踪
|
||||
- 不要分享包含凭证的 shell 脚本
|
||||
|
||||
---
|
||||
|
||||
## 📝 提交步骤
|
||||
|
||||
```bash
|
||||
# 1. 确保您创建了本地配置文件
|
||||
cd /Users/junhuang/coolbuy/claude-marketplace/plugins/doubao-voice-plugin/scripts
|
||||
cp setup_env.local.sh.example setup_env.local.sh
|
||||
# 编辑 setup_env.local.sh,填入您的凭证
|
||||
|
||||
# 2. 检查状态
|
||||
cd ..
|
||||
git status
|
||||
|
||||
# 3. 提交所有应提交的文件
|
||||
git add .
|
||||
|
||||
# 4. 验证没有凭证泄露
|
||||
git diff --cached | grep -i "DOUBAO_APP_ID\|DOUBAO_ACCESS_TOKEN\|AKLT\|VOLCENGINE"
|
||||
# 如果有输出,说明有凭证要被提交,请取消并修改
|
||||
|
||||
# 5. 提交
|
||||
git commit -m "feat: Add Doubao Voice plugin with TTS/ASR support"
|
||||
|
||||
# 6. 再次检查
|
||||
git show HEAD # 确认提交内容
|
||||
|
||||
# 7. 推送
|
||||
git push origin main
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔍 验证清单
|
||||
|
||||
提交前,运行以下命令确认安全:
|
||||
|
||||
```bash
|
||||
# 检查是否有真实凭证在staged文件中
|
||||
git diff --cached | grep -E "2288996168|LlDjcX-_UEnn4OW87iMorpXccQUilaHX|AKLTMGQ3"
|
||||
# 正常情况下应该没有输出
|
||||
|
||||
# 检查 setup_env.local.sh 是否被忽略
|
||||
git status | grep setup_env.local.sh
|
||||
# 应该看不到这个文件
|
||||
|
||||
# 检查 .gitignore 配置是否正确
|
||||
cat .gitignore | grep "setup_env.local"
|
||||
# 应该看到这一行
|
||||
|
||||
# 查看即将提交的文件列表
|
||||
git ls-files
|
||||
# 确认关键文件都在其中,但不包含 setup_env.local.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 使用说明(给其他用户)
|
||||
|
||||
在您发布插件后,其他用户应该:
|
||||
|
||||
```bash
|
||||
# 1. 克隆插件
|
||||
git clone <repo-url> doubao-voice-plugin
|
||||
cd doubao-voice-plugin/scripts
|
||||
|
||||
# 2. 创建本地配置
|
||||
cp setup_env.local.sh.example setup_env.local.sh
|
||||
|
||||
# 3. 编辑配置,填入他们自己的凭证
|
||||
vim setup_env.local.sh
|
||||
|
||||
# 4. 配置环境变量
|
||||
source setup_env.local.sh
|
||||
|
||||
# 5. 测试功能
|
||||
python3 voice_converter.py tts "测试"
|
||||
|
||||
# 6. setup_env.local.sh 不会被版本控制跟踪
|
||||
git status # 看不到 setup_env.local.sh ✅
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
**Q: 我不小心提交了凭证怎么办?**
|
||||
|
||||
A: 立即执行:
|
||||
```bash
|
||||
# 从 Git 历史中移除敏感文件
|
||||
git rm --cached scripts/setup_env.local.sh
|
||||
git commit --amend -m "Remove sensitive file"
|
||||
|
||||
# 更改您的火山引擎 Access Token(出于安全考虑)
|
||||
# 在控制台重新生成新的 token
|
||||
```
|
||||
|
||||
**Q: 为什么需要 setup_env.local.sh.example?**
|
||||
|
||||
A: 这样其他用户可以看到配置文件应该包含哪些环境变量,而不会暴露任何真实凭证。
|
||||
|
||||
**Q: 可以把凭证放在 ~/.bashrc 里吗?**
|
||||
|
||||
A: 可以,但 setup_env.local.sh 更加灵活,易于项目专用配置。
|
||||
|
||||
**Q: 如何在 CI/CD 中使用敏感凭证?**
|
||||
|
||||
A: 在 CI/CD 平台(GitHub Actions, GitLab CI等)中使用 Secrets/Variables 功能,不要在代码中硬编码。
|
||||
|
||||
---
|
||||
|
||||
## 总结
|
||||
|
||||
✅ **已完成的安全措施**:
|
||||
1. ✓ .gitignore 配置了敏感文件忽略规则
|
||||
2. ✓ setup_env.sh 改为占位符版本
|
||||
3. ✓ 创建了 setup_env.local.sh.example 模板
|
||||
4. ✓ 所有代码文件使用环境变量读取凭证
|
||||
5. ✓ 提供了清晰的本地配置说明
|
||||
|
||||
现在可以安全地提交到 Git!🎉
|
||||
182
skills/doubao-voice-plugin/README.md
Normal file
182
skills/doubao-voice-plugin/README.md
Normal file
@@ -0,0 +1,182 @@
|
||||
# 豆包语音插件 (Doubao Voice Plugin)
|
||||
|
||||
火山引擎豆包语音API集成插件,支持文字转语音(TTS)和唱歌功能。
|
||||
|
||||
## 功能特性
|
||||
|
||||
- **✅ 语音合成 (TTS)**: 文字转语音,支持多种音色 - **已测试可用**
|
||||
- **🎵 唱歌**: 让豆包唱歌,支持实时语音交互 - **已开通端到端大模型**
|
||||
- **简单易用**: 命令行工具,一行命令即可使用
|
||||
- **多种音色**: 支持女声/男声等多种基础音色
|
||||
- **实时交互**: 支持与豆包进行实时对话和唱歌
|
||||
|
||||
## 快速开始
|
||||
|
||||
### 1. 获取API凭证
|
||||
|
||||
访问 [火山引擎控制台](https://console.volcengine.com/speech/app) 创建应用并获取:
|
||||
- **App ID** (数字)
|
||||
- **Access Token** (长字符串)
|
||||
|
||||
开通所需服务:
|
||||
1. 在控制台勾选 **"语音合成"** 服务 (TTS)
|
||||
|
||||
### 2. 配置环境变量
|
||||
|
||||
**方式1: 使用配置脚本 (推荐)**
|
||||
```bash
|
||||
cd scripts
|
||||
source setup_env.sh # 自动设置环境变量
|
||||
```
|
||||
|
||||
**方式2: 手动设置**
|
||||
```bash
|
||||
export DOUBAO_APP_ID="your_app_id"
|
||||
export DOUBAO_ACCESS_TOKEN="your_access_token"
|
||||
```
|
||||
|
||||
### 3. 安装依赖
|
||||
|
||||
```bash
|
||||
pip3 install requests --break-system-packages
|
||||
```
|
||||
|
||||
### 4. 检查凭证
|
||||
|
||||
```bash
|
||||
# 检查凭证配置
|
||||
python3 scripts/check_credentials.py
|
||||
```
|
||||
|
||||
### 5. 使用示例
|
||||
|
||||
#### TTS 文字转语音(命令行)
|
||||
|
||||
```bash
|
||||
cd scripts
|
||||
|
||||
# 基础用法 - ✅ 已测试可用
|
||||
python3 voice_converter.py tts "你好,我是豆包语音助手" -o output.mp3
|
||||
|
||||
# 使用不同音色
|
||||
python3 voice_converter.py tts "测试男声" -o male.mp3 -v BV701_V2_streaming
|
||||
```
|
||||
|
||||
#### 唱歌(命令行)🎵
|
||||
|
||||
```bash
|
||||
cd scripts
|
||||
|
||||
# 让豆包唱歌
|
||||
python3 singing.py sing "请唱一首关于春天的歌" -o spring.mp3
|
||||
|
||||
# 交互式唱歌模式(实时对话)
|
||||
python3 singing.py interactive
|
||||
```
|
||||
|
||||
#### Python 代码方式
|
||||
|
||||
```python
|
||||
# TTS - 文字转语音
|
||||
from scripts.voice_converter import DoubaoVoiceConverter
|
||||
|
||||
converter = DoubaoVoiceConverter()
|
||||
audio_file = converter.text_to_speech("你好,欢迎使用豆包", output_file="hello.mp3")
|
||||
|
||||
# 唱歌
|
||||
import asyncio
|
||||
from scripts.singing import DoubaoSinging
|
||||
|
||||
async def sing():
|
||||
singing = DoubaoSinging()
|
||||
audio_file = await singing.sing("请唱一首情歌", output_file="love_song.mp3")
|
||||
|
||||
asyncio.run(sing())
|
||||
```
|
||||
|
||||
## 自然语言调用
|
||||
|
||||
在 Claude Code 中可以使用自然语言调用:
|
||||
|
||||
**TTS 文字转语音**:
|
||||
- "把这段话转成语音:你好世界"
|
||||
- "用温柔女声合成语音"
|
||||
- "用男声朗读这段文字"
|
||||
|
||||
**唱歌**:
|
||||
- "请唱一首关于春天的歌"
|
||||
- "唱一个温柔的摇篮曲"
|
||||
- "开启与豆包的实时语音对话模式"
|
||||
|
||||
示例:
|
||||
```
|
||||
用户: "帮我把'欢迎使用豆包语音'转成语音"
|
||||
Claude: 调用TTS服务生成output.mp3
|
||||
```
|
||||
|
||||
## 价格说明
|
||||
|
||||
### TTS (语音合成)
|
||||
- 大模型并发版: 2000元/并发/月
|
||||
- 按量付费: 按字符数计费
|
||||
|
||||
### 免费试用
|
||||
新用户开通服务后可获得免费额度。
|
||||
|
||||
## 支持的音色
|
||||
|
||||
| 音色代码 | 描述 | 场景 | 状态 |
|
||||
|---------|------|------|------|
|
||||
| BV700_V2_streaming | 通用女声 | 通用场景 | ✅ V1 可用 |
|
||||
| BV701_V2_streaming | 通用男声 | 通用场景 | ✅ V1 可用 |
|
||||
| BV406_streaming | 温柔女声 | 客服、助手 | ✅ V1 可用 |
|
||||
| BV158_streaming | 活泼女声 | 教育、娱乐 | ✅ V1 可用 |
|
||||
| BV115_streaming | 磁性男声 | 新闻、播音 | ✅ V1 可用 |
|
||||
|
||||
**注意**: 豆包2.0高级音色需要使用V3 API,目前正在调试中。
|
||||
|
||||
## 常见问题
|
||||
|
||||
### TTS 返回 "requested resource not granted"
|
||||
**解决方法**: 在控制台勾选"语音合成"服务选项
|
||||
|
||||
### Authorization 头格式错误
|
||||
确保使用 `Bearer;{token}` 格式(注意分号),而不是 `Bearer {token}`
|
||||
|
||||
### 环境变量未生效
|
||||
```bash
|
||||
# 检查环境变量
|
||||
echo $DOUBAO_APP_ID
|
||||
echo $DOUBAO_ACCESS_TOKEN
|
||||
|
||||
# 如果为空,重新设置
|
||||
source setup_env.sh
|
||||
```
|
||||
|
||||
## API 版本说明
|
||||
|
||||
### V1 API (当前使用) ✅
|
||||
- **状态**: 已测试,稳定可用
|
||||
- **认证**: Bearer Token
|
||||
- **音色**: 支持基础音色
|
||||
- **推荐**: 日常使用推荐
|
||||
|
||||
### V3 API (豆包2.0) ⚠️
|
||||
- **状态**: 调试中,存在 "get resource id empty" 问题
|
||||
- **认证**: Bearer Token + Resource-Id
|
||||
- **音色**: 支持豆包2.0高级音色
|
||||
- **说明**: 需要联系火山引擎技术支持获取正确配置
|
||||
|
||||
## 技术支持
|
||||
|
||||
- [官方文档](https://www.volcengine.com/docs/6561/1359369)
|
||||
- [控制台](https://console.volcengine.com/speech/app)
|
||||
- [计费说明](https://www.volcengine.com/docs/6561/1359370)
|
||||
|
||||
## 许可证
|
||||
|
||||
本插件遵循 MIT 许可证。
|
||||
|
||||
## 作者
|
||||
|
||||
qiudl @ zhiyuncai.com
|
||||
200
skills/doubao-voice-plugin/STATUS.md
Normal file
200
skills/doubao-voice-plugin/STATUS.md
Normal file
@@ -0,0 +1,200 @@
|
||||
# 豆包语音插件 - 开发状态
|
||||
|
||||
**更新时间**: 2026-02-07
|
||||
**版本**: 1.0.0
|
||||
|
||||
---
|
||||
|
||||
## ✅ 已完成功能
|
||||
|
||||
### 1. TTS (文字转语音) - 完全可用 ✅
|
||||
|
||||
**测试状态**: 通过
|
||||
**API版本**: V1
|
||||
**可用音色**:
|
||||
- BV700_V2_streaming (通用女声)
|
||||
- BV701_V2_streaming (通用男声)
|
||||
- BV406_streaming (温柔女声)
|
||||
- BV158_streaming (活泼女声)
|
||||
- BV115_streaming (磁性男声)
|
||||
|
||||
**测试命令**:
|
||||
```bash
|
||||
source scripts/setup_env.sh
|
||||
python3 scripts/voice_converter.py tts "你好世界" -o hello.mp3
|
||||
```
|
||||
|
||||
**测试结果**:
|
||||
- ✅ HTTP 200 OK
|
||||
- ✅ Code 3000 Success
|
||||
- ✅ 成功生成 MP3 文件
|
||||
- ✅ 音质正常
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ 待完成功能
|
||||
|
||||
### 2. ASR (语音转文字) - 待启用服务
|
||||
|
||||
**问题**: Code 1001 - "requested resource not granted"
|
||||
|
||||
**原因**: ASR 服务未在火山引擎控制台正确启用
|
||||
|
||||
**解决步骤**:
|
||||
1. 访问: https://console.volcengine.com/speech/service
|
||||
2. 找到 "语音识别 (ASR)" 服务
|
||||
3. 确保服务已启用并勾选必要选项
|
||||
4. 等待服务生效(可能需要几分钟)
|
||||
5. 重新测试
|
||||
|
||||
**测试命令** (服务启用后):
|
||||
```bash
|
||||
python3 scripts/voice_converter.py asr audio.mp3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. V3 API / 豆包2.0音色 - 调试中
|
||||
|
||||
**问题**: Code 45000000 - "get resource id empty"
|
||||
|
||||
**已尝试的方法**:
|
||||
- [x] Resource-Id header
|
||||
- [x] X-Resource-Id header
|
||||
- [x] resource_id query parameter
|
||||
- [x] resource_id in app config
|
||||
- [x] 多种 resource_id 值: volc.bigmodel.tts, volc.seed-tts.default, volc.tts.default
|
||||
|
||||
**当前状态**: 所有方法均返回相同错误
|
||||
|
||||
**可能原因**:
|
||||
1. V3 API 可能需要不同的认证方式 (IAM签名)
|
||||
2. 需要特殊的服务实例配置
|
||||
3. Resource-Id 的获取或配置方法不正确
|
||||
|
||||
**建议**:
|
||||
- 联系火山引擎技术支持获取 V3 API 正确配置方法
|
||||
- 或继续使用 V1 API (已满足基本需求)
|
||||
|
||||
---
|
||||
|
||||
## 📁 项目文件结构
|
||||
|
||||
```
|
||||
plugins/doubao-voice-plugin/
|
||||
├── .claude-plugin/
|
||||
│ └── plugin.json # 插件元数据
|
||||
├── skills/
|
||||
│ └── SKILL.md # 技能定义和文档
|
||||
├── scripts/
|
||||
│ ├── voice_converter.py # 主转换工具 (V1 API, 可用)
|
||||
│ ├── voice_converter_v2.py # 手动签名版本 (待测试)
|
||||
│ ├── voice_converter_sdk.py # SDK版本 (待测试)
|
||||
│ ├── check_credentials.py # 凭证检查工具
|
||||
│ ├── test_services.py # 服务状态测试
|
||||
│ ├── test_v3_debug.py # V3 API 调试脚本
|
||||
│ ├── setup_env.sh # 环境变量配置脚本
|
||||
│ └── README_TEST.md # 测试报告
|
||||
├── README.md # 用户文档
|
||||
└── STATUS.md # 本文件 (开发状态)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 诊断工具
|
||||
|
||||
### 检查凭证配置
|
||||
```bash
|
||||
python3 scripts/check_credentials.py
|
||||
```
|
||||
显示当前环境变量配置状态
|
||||
|
||||
### 测试服务状态
|
||||
```bash
|
||||
python3 scripts/test_services.py
|
||||
```
|
||||
测试 TTS 和 ASR 服务是否可用
|
||||
|
||||
### V3 API 调试
|
||||
```bash
|
||||
python3 scripts/test_v3_debug.py
|
||||
```
|
||||
测试多种 V3 API 配置方式
|
||||
|
||||
---
|
||||
|
||||
## 📊 当前凭证配置
|
||||
|
||||
```bash
|
||||
DOUBAO_APP_ID="your_app_id"
|
||||
DOUBAO_ACCESS_TOKEN="your_access_token"
|
||||
|
||||
# V3 可选配置 (暂不可用)
|
||||
# DOUBAO_USE_V3="true"
|
||||
# DOUBAO_RESOURCE_ID="volc.bigmodel.tts"
|
||||
```
|
||||
|
||||
**Access Key 信息** (用于签名认证,暂未使用):
|
||||
- Access Key ID: your_access_key_id
|
||||
- Secret Access Key: your_secret_access_key
|
||||
|
||||
---
|
||||
|
||||
## 🎯 下一步计划
|
||||
|
||||
### 立即可用
|
||||
1. ✅ **使用 TTS 功能**
|
||||
- 集成到应用中
|
||||
- 测试不同音色
|
||||
- 生产环境部署
|
||||
|
||||
### 短期目标 (1-3天)
|
||||
2. ⚠️ **启用 ASR 服务**
|
||||
- 在控制台启用服务
|
||||
- 测试语音识别功能
|
||||
- 完善错误处理
|
||||
|
||||
### 长期目标 (可选)
|
||||
3. 🔄 **V3 API 支持**
|
||||
- 联系火山引擎技术支持
|
||||
- 获取正确的 Resource-Id 配置方法
|
||||
- 支持豆包2.0高级音色
|
||||
|
||||
---
|
||||
|
||||
## 📞 技术支持
|
||||
|
||||
### 火山引擎
|
||||
- 文档: https://www.volcengine.com/docs/6561/1329505
|
||||
- 控制台: https://console.volcengine.com/speech/app
|
||||
- 服务管理: https://console.volcengine.com/speech/service
|
||||
|
||||
### 常见问题解决
|
||||
1. **TTS 可用但 ASR 不可用**
|
||||
- 检查控制台 ASR 服务是否启用
|
||||
- 确认勾选了"语音识别"选项
|
||||
|
||||
2. **V3 API 持续报错**
|
||||
- 暂时使用 V1 API
|
||||
- 联系火山引擎技术支持
|
||||
|
||||
3. **认证失败**
|
||||
- 检查环境变量是否正确设置
|
||||
- 确认 Access Token 格式正确
|
||||
- 注意 Authorization header 使用 `Bearer;{token}` (有分号)
|
||||
|
||||
---
|
||||
|
||||
## ✨ 总结
|
||||
|
||||
**当前可用**: TTS (文字转语音) 功能完全可用,可以投入使用
|
||||
|
||||
**待解决**:
|
||||
1. 在控制台启用 ASR 服务
|
||||
2. (可选) 解决 V3 API 配置问题
|
||||
|
||||
**建议**: 先使用 V1 API 的 TTS 功能,满足基本语音合成需求。ASR 功能在控制台启用服务后即可使用。V3 API 的豆包2.0音色为可选功能,可以后续再解决。
|
||||
|
||||
---
|
||||
|
||||
*Generated by Claude Code on 2026-02-07*
|
||||
186
skills/doubao-voice-plugin/scripts/README.md
Normal file
186
skills/doubao-voice-plugin/scripts/README.md
Normal file
@@ -0,0 +1,186 @@
|
||||
# 豆包语音工具使用指南
|
||||
|
||||
简单易用的豆包语音命令行工具,支持**文字转语音(TTS)**和**唱歌**。
|
||||
|
||||
## 快速开始
|
||||
|
||||
### 1. 配置环境变量
|
||||
|
||||
```bash
|
||||
# 在 ~/.zshrc 或 ~/.bashrc 中添加
|
||||
export DOUBAO_APP_ID="your_app_id"
|
||||
export DOUBAO_ACCESS_TOKEN="your_access_token"
|
||||
|
||||
# 使配置生效
|
||||
source ~/.zshrc
|
||||
```
|
||||
|
||||
### 2. 安装依赖
|
||||
|
||||
```bash
|
||||
pip install requests
|
||||
```
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 📝 文字转语音 (TTS)
|
||||
|
||||
**基础用法:**
|
||||
```bash
|
||||
python voice_converter.py tts "你好,我是豆包语音助手"
|
||||
```
|
||||
|
||||
**指定输出文件和音色:**
|
||||
```bash
|
||||
python voice_converter.py tts "欢迎使用豆包语音" -o welcome.mp3 -v BV701_V2_streaming
|
||||
```
|
||||
|
||||
**可用音色:**
|
||||
- `BV700_V2_streaming` - 通用女声(默认,推荐)
|
||||
- `BV701_V2_streaming` - 通用男声
|
||||
- `BV406_streaming` - 温柔女声
|
||||
- `BV158_streaming` - 活泼女声
|
||||
- `BV115_streaming` - 磁性男声
|
||||
|
||||
### 🎵 唱歌 (Singing)
|
||||
|
||||
**基础用法:**
|
||||
```bash
|
||||
python singing.py sing "请唱一首关于春天的歌"
|
||||
```
|
||||
|
||||
**指定输出文件:**
|
||||
```bash
|
||||
python singing.py sing "唱一个温柔的摇篮曲" -o lullaby.mp3
|
||||
```
|
||||
|
||||
**交互式模式(实时对话):**
|
||||
```bash
|
||||
python singing.py interactive
|
||||
```
|
||||
|
||||
在交互模式下可以自然地与豆包对话,要求她唱歌、讲故事等。输入 `quit` 退出。
|
||||
|
||||
## Python 代码调用
|
||||
|
||||
```python
|
||||
# TTS - 文字转语音
|
||||
from voice_converter import DoubaoVoiceConverter
|
||||
|
||||
converter = DoubaoVoiceConverter()
|
||||
audio_file = converter.text_to_speech(
|
||||
"你好,欢迎使用豆包语音",
|
||||
output_file="hello.mp3",
|
||||
voice_type="BV700_V2_streaming"
|
||||
)
|
||||
print(f"生成语音: {audio_file}")
|
||||
|
||||
# 唱歌
|
||||
import asyncio
|
||||
from singing import DoubaoSinging
|
||||
|
||||
async def main():
|
||||
singing = DoubaoSinging()
|
||||
|
||||
# 让豆包唱歌
|
||||
audio_file = await singing.sing(
|
||||
"请唱一首情歌",
|
||||
output_file="love_song.mp3",
|
||||
language="zh-CN"
|
||||
)
|
||||
print(f"唱歌完成: {audio_file}")
|
||||
|
||||
# 或启动交互模式
|
||||
# await singing.interactive_singing()
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
## 完整示例
|
||||
|
||||
### 示例1:生成通知语音
|
||||
|
||||
```bash
|
||||
# 生成女声通知
|
||||
python voice_converter.py tts "您有一条新消息,请注意查收" -o notification.mp3
|
||||
|
||||
# 生成男声通知
|
||||
python voice_converter.py tts "系统将在5分钟后进行维护" -o maintenance.mp3 -v BV701_V2_streaming
|
||||
```
|
||||
|
||||
### 示例2:唱歌
|
||||
|
||||
```bash
|
||||
# 让豆包唱一首情歌
|
||||
python singing.py sing "请唱一首温柔的情歌" -o love_song.mp3
|
||||
|
||||
# 让豆包唱一首儿歌
|
||||
python singing.py sing "唱一首欢快的儿歌" -o kids_song.mp3
|
||||
|
||||
# 启动交互式模式与豆包对话
|
||||
python singing.py interactive
|
||||
```
|
||||
|
||||
|
||||
## 错误处理
|
||||
|
||||
### 常见错误
|
||||
|
||||
**1. 环境变量未设置**
|
||||
```
|
||||
❌ 错误: 请先设置环境变量:
|
||||
export DOUBAO_APP_ID='your_app_id'
|
||||
export DOUBAO_ACCESS_TOKEN='your_access_token'
|
||||
```
|
||||
**解决:** 确保已正确设置环境变量并 `source ~/.zshrc`
|
||||
|
||||
**2. API 调用失败**
|
||||
```
|
||||
❌ 错误: TTS 失败 (code: 4001): Invalid token
|
||||
```
|
||||
**解决:** 检查 Access Token 是否正确或已过期
|
||||
|
||||
## 技术参数
|
||||
|
||||
### 音频格式要求
|
||||
|
||||
**TTS 输出:**
|
||||
- 格式:MP3
|
||||
- 采样率:16000 Hz
|
||||
- 声道:单声道
|
||||
|
||||
### API 限制
|
||||
|
||||
- **TTS**: 单次最长 5000 字符
|
||||
- **并发限制**: 根据购买的并发数
|
||||
|
||||
## 在 Claude Code 中使用
|
||||
|
||||
在 Claude Code 中可以直接用自然语言调用:
|
||||
|
||||
**TTS - 文字转语音**:
|
||||
```
|
||||
"把这段话转成语音:你好世界"
|
||||
"用温柔女声合成:欢迎光临"
|
||||
```
|
||||
|
||||
**唱歌**:
|
||||
```
|
||||
"请唱一首关于春天的歌"
|
||||
"唱一个温柔的摇篮曲"
|
||||
"开启与豆包的实时语音对话模式"
|
||||
```
|
||||
|
||||
## 获取 API 凭证
|
||||
|
||||
1. 访问 [火山引擎控制台](https://console.volcengine.com/speech/app)
|
||||
2. 创建应用
|
||||
3. 获取 App ID 和 Access Token
|
||||
4. 开通所需服务:
|
||||
- 豆包语音合成模型2.0
|
||||
|
||||
## 参考链接
|
||||
|
||||
- [火山引擎豆包语音文档](https://www.volcengine.com/docs/6561)
|
||||
- [API 接口文档](https://www.volcengine.com/docs/6561/1096680)
|
||||
- [计费说明](https://www.volcengine.com/docs/6561/1359370)
|
||||
@@ -0,0 +1,21 @@
|
||||
#!/bin/bash
|
||||
# 豆包语音 API 环境变量配置(本地版本)
|
||||
#
|
||||
# 使用说明:
|
||||
# 1. 复制本文件: cp setup_env.local.sh.example setup_env.local.sh
|
||||
# 2. 编辑 setup_env.local.sh,填入您的真实凭证
|
||||
# 3. 运行: source setup_env.local.sh
|
||||
# 4. .gitignore 已配置忽略 setup_env.local.sh,所以您的凭证不会被提交到 Git
|
||||
|
||||
# ⚠️ 重要:请在下面填入您的真实凭证(仅本地使用)
|
||||
export DOUBAO_APP_ID="your_app_id_here"
|
||||
export DOUBAO_ACCESS_TOKEN="your_access_token_here"
|
||||
|
||||
# V3 API 配置 (可选,如需豆包2.0音色)
|
||||
# export DOUBAO_USE_V3="true"
|
||||
# export DOUBAO_RESOURCE_ID="volc.bigmodel.tts"
|
||||
|
||||
echo "✅ 豆包语音 API 环境变量已设置(本地配置)"
|
||||
echo ""
|
||||
echo "App ID: ${DOUBAO_APP_ID:0:10}..."
|
||||
echo "Access Token: ${DOUBAO_ACCESS_TOKEN:0:20}..."
|
||||
22
skills/doubao-voice-plugin/scripts/setup_env.sh
Executable file
22
skills/doubao-voice-plugin/scripts/setup_env.sh
Executable file
@@ -0,0 +1,22 @@
|
||||
#!/bin/bash
|
||||
# 豆包语音 API 环境变量配置 (示例)
|
||||
#
|
||||
# ⚠️ 重要:这是示例脚本,包含占位符。
|
||||
# 本地使用时,请参考 setup_env.local.sh.example 创建 setup_env.local.sh,
|
||||
# 然后在其中填入您的真实凭证。.gitignore 已配置忽略 .local 文件。
|
||||
|
||||
export DOUBAO_APP_ID="your_app_id"
|
||||
export DOUBAO_ACCESS_TOKEN="your_access_token"
|
||||
|
||||
# V3 API 配置 (可选,如需豆包2.0音色)
|
||||
# export DOUBAO_USE_V3="true"
|
||||
# export DOUBAO_RESOURCE_ID="volc.bigmodel.tts"
|
||||
|
||||
echo "✅ 豆包语音 API 环境变量已设置"
|
||||
echo ""
|
||||
echo "App ID: $DOUBAO_APP_ID"
|
||||
echo "Access Token: ${DOUBAO_ACCESS_TOKEN:0:20}..."
|
||||
echo ""
|
||||
echo "现在可以运行:"
|
||||
echo " python3 voice_converter.py tts \"你好世界\" -o hello.mp3"
|
||||
echo " python3 voice_converter.py asr audio.mp3 # 需先启用ASR服务"
|
||||
327
skills/doubao-voice-plugin/scripts/singing.py
Executable file
327
skills/doubao-voice-plugin/scripts/singing.py
Executable file
@@ -0,0 +1,327 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
豆包唱歌工具
|
||||
基于豆包端到端实时语音大模型,支持让豆包唱歌
|
||||
使用WebSocket实时对话和生成音频
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import asyncio
|
||||
import websockets
|
||||
import struct
|
||||
import uuid
|
||||
from typing import Optional
|
||||
|
||||
|
||||
# 连接级事件(不需要session_id)
|
||||
CONNECTION_EVENTS = {1, 2, 50, 51, 52}
|
||||
|
||||
|
||||
class DoubaoSinging:
|
||||
"""豆包唱歌工具类"""
|
||||
|
||||
def __init__(self):
|
||||
# 从环境变量读取配置
|
||||
self.app_id = os.environ.get("DOUBAO_APP_ID")
|
||||
self.access_token = os.environ.get("DOUBAO_ACCESS_TOKEN")
|
||||
|
||||
if not self.app_id or not self.access_token:
|
||||
raise ValueError(
|
||||
"请先设置环境变量:\n"
|
||||
"export DOUBAO_APP_ID='your_app_id'\n"
|
||||
"export DOUBAO_ACCESS_TOKEN='your_access_token'"
|
||||
)
|
||||
|
||||
# 端到端实时语音WebSocket地址
|
||||
self.ws_url = "wss://openspeech.bytedance.com/api/v3/realtime/dialogue"
|
||||
self.app_key = "PlgvMymc7f3tQnJ6" # 固定值
|
||||
self.resource_id = "volc.speech.dialog" # 固定值
|
||||
|
||||
def _build_message(self, event_id: int, payload: dict = None, session_id: str = None) -> bytes:
|
||||
"""
|
||||
构建二进制消息
|
||||
|
||||
协议格式:
|
||||
- header (4 bytes)
|
||||
- event_id (4 bytes, big-endian)
|
||||
- [session_id_len (4 bytes) + session_id (variable)] -- 仅非连接级事件
|
||||
- payload_len (4 bytes, big-endian)
|
||||
- payload (variable, JSON)
|
||||
"""
|
||||
buf = bytearray()
|
||||
|
||||
# Header (4 bytes)
|
||||
buf.append(0x11) # version=1, header_size=1
|
||||
buf.append(0x14) # FULL_CLIENT_REQUEST(0x1) + WITH_EVENT(0x4)
|
||||
buf.append(0x10) # JSON serialization, no compression
|
||||
buf.append(0x00) # reserved
|
||||
|
||||
# Event ID
|
||||
buf.extend(struct.pack('>I', event_id))
|
||||
|
||||
# Session ID (required for non-connection events)
|
||||
if event_id not in CONNECTION_EVENTS:
|
||||
sid_bytes = (session_id or "").encode('utf-8')
|
||||
buf.extend(struct.pack('>I', len(sid_bytes)))
|
||||
buf.extend(sid_bytes)
|
||||
|
||||
# Payload
|
||||
if payload:
|
||||
payload_bytes = json.dumps(payload, ensure_ascii=False).encode('utf-8')
|
||||
else:
|
||||
payload_bytes = b'{}'
|
||||
buf.extend(struct.pack('>I', len(payload_bytes)))
|
||||
buf.extend(payload_bytes)
|
||||
|
||||
return bytes(buf)
|
||||
|
||||
def _parse_response(self, data: bytes) -> dict:
|
||||
"""
|
||||
解析服务端二进制消息
|
||||
|
||||
Returns:
|
||||
dict with keys: msg_type, event_id, session_id, payload, payload_bytes
|
||||
"""
|
||||
result = {"raw": data}
|
||||
if len(data) < 4:
|
||||
return result
|
||||
|
||||
# Header
|
||||
msg_type = (data[1] >> 4) & 0x0F
|
||||
flags = data[1] & 0x0F
|
||||
result["msg_type"] = msg_type
|
||||
|
||||
offset = 4
|
||||
|
||||
# Event ID (if WITH_EVENT flag)
|
||||
if flags & 0x04 and len(data) >= offset + 4:
|
||||
event_id = struct.unpack('>I', data[offset:offset + 4])[0]
|
||||
result["event_id"] = event_id
|
||||
offset += 4
|
||||
|
||||
# Connect ID for connection events (50, 51, 52)
|
||||
if event_id in {50, 51, 52} and len(data) >= offset + 4:
|
||||
cid_len = struct.unpack('>I', data[offset:offset + 4])[0]
|
||||
offset += 4
|
||||
if len(data) >= offset + cid_len:
|
||||
result["connect_id"] = data[offset:offset + cid_len].decode('utf-8', errors='ignore')
|
||||
offset += cid_len
|
||||
# Session ID for session-level events
|
||||
elif event_id not in CONNECTION_EVENTS and len(data) >= offset + 4:
|
||||
sid_len = struct.unpack('>I', data[offset:offset + 4])[0]
|
||||
offset += 4
|
||||
if len(data) >= offset + sid_len:
|
||||
result["session_id"] = data[offset:offset + sid_len].decode('utf-8', errors='ignore')
|
||||
offset += sid_len
|
||||
|
||||
# Payload
|
||||
if len(data) >= offset + 4:
|
||||
payload_len = struct.unpack('>I', data[offset:offset + 4])[0]
|
||||
offset += 4
|
||||
if len(data) >= offset + payload_len:
|
||||
payload_raw = data[offset:offset + payload_len]
|
||||
result["payload_bytes"] = payload_raw
|
||||
# Audio-only responses (msg_type 0xB) have raw audio
|
||||
if msg_type == 0x0B:
|
||||
result["is_audio"] = True
|
||||
else:
|
||||
try:
|
||||
result["payload"] = json.loads(payload_raw.decode('utf-8'))
|
||||
except:
|
||||
result["payload_text"] = payload_raw.decode('utf-8', errors='ignore')
|
||||
|
||||
return result
|
||||
|
||||
async def sing(
|
||||
self,
|
||||
song_request: str,
|
||||
output_file: str = "singing_output.mp3",
|
||||
language: str = "zh-CN",
|
||||
model: str = "1.2.1.0"
|
||||
) -> str:
|
||||
"""
|
||||
让豆包唱歌
|
||||
|
||||
Args:
|
||||
song_request: 唱歌请求,如 "请唱一首关于春天的歌"
|
||||
output_file: 输出音频文件路径
|
||||
language: 语言代码 (zh-CN/en-US)
|
||||
model: 模型版本
|
||||
|
||||
Returns:
|
||||
str: 输出文件路径
|
||||
"""
|
||||
print(f"🎵 豆包唱歌中...")
|
||||
print(f" 请求: {song_request}")
|
||||
print(f" 模型: {model}")
|
||||
|
||||
try:
|
||||
audio_data = bytearray()
|
||||
session_id = str(uuid.uuid4())
|
||||
|
||||
# WebSocket连接头
|
||||
headers = {
|
||||
"X-Api-App-ID": self.app_id,
|
||||
"X-Api-Access-Key": self.access_token,
|
||||
"X-Api-Resource-Id": self.resource_id,
|
||||
"X-Api-App-Key": self.app_key,
|
||||
"X-Api-Connect-Id": str(uuid.uuid4()),
|
||||
}
|
||||
|
||||
async with websockets.connect(self.ws_url, additional_headers=headers) as websocket:
|
||||
print("✅ WebSocket连接成功")
|
||||
|
||||
# 1. StartConnection (event_id=1, 无需session_id)
|
||||
await websocket.send(self._build_message(1))
|
||||
response = await asyncio.wait_for(websocket.recv(), timeout=5)
|
||||
resp = self._parse_response(response)
|
||||
if resp.get("event_id") == 50:
|
||||
print(f"✅ 连接已建立")
|
||||
else:
|
||||
print(f"⚠️ 连接响应: {resp}")
|
||||
|
||||
# 2. StartSession (event_id=100, 需要session_id)
|
||||
start_session_payload = {
|
||||
"tts": {
|
||||
"audio_config": {
|
||||
"channel": 1,
|
||||
"format": "pcm",
|
||||
"sample_rate": 24000
|
||||
}
|
||||
},
|
||||
"dialog": {
|
||||
"extra": {
|
||||
"enable_music": True,
|
||||
"input_mod": "text",
|
||||
"model": model
|
||||
}
|
||||
}
|
||||
}
|
||||
await websocket.send(self._build_message(100, start_session_payload, session_id))
|
||||
response = await asyncio.wait_for(websocket.recv(), timeout=5)
|
||||
resp = self._parse_response(response)
|
||||
if resp.get("event_id") == 150:
|
||||
print(f"✅ 会话已建立")
|
||||
elif resp.get("payload", {}).get("error"):
|
||||
print(f"❌ 会话错误: {resp['payload']['error']}")
|
||||
return None
|
||||
else:
|
||||
print(f"📋 会话响应: {resp}")
|
||||
|
||||
# 3. SayHello/ChatTextQuery (event_id=300, 需要session_id)
|
||||
chat_payload = {"content": song_request}
|
||||
await websocket.send(self._build_message(300, chat_payload, session_id))
|
||||
print(f"📤 已发送唱歌请求")
|
||||
|
||||
# 4. 接收音频流(使用超时检测结束)
|
||||
print("\n📋 接收音频流...")
|
||||
tts_started = False
|
||||
recv_timeout = 5 # 5秒无数据则认为结束
|
||||
|
||||
while True:
|
||||
try:
|
||||
message = await asyncio.wait_for(websocket.recv(), timeout=recv_timeout)
|
||||
except asyncio.TimeoutError:
|
||||
break
|
||||
except websockets.exceptions.ConnectionClosed:
|
||||
break
|
||||
|
||||
if isinstance(message, bytes) and len(message) >= 4:
|
||||
resp = self._parse_response(message)
|
||||
msg_type = resp.get("msg_type", 0)
|
||||
flags = message[1] & 0x0F
|
||||
|
||||
# Audio-only response (0xB = 11)
|
||||
if resp.get("is_audio") and resp.get("payload_bytes"):
|
||||
audio_data.extend(resp["payload_bytes"])
|
||||
if not tts_started:
|
||||
print(f" 接收音频中...", end="", flush=True)
|
||||
tts_started = True
|
||||
else:
|
||||
print(".", end="", flush=True)
|
||||
|
||||
# NEG_SEQUENCE flag = last packet
|
||||
if flags & 0x02:
|
||||
break
|
||||
|
||||
# Server error (0xF = 15)
|
||||
elif msg_type == 0x0F:
|
||||
error = resp.get("payload", {}).get("error", "unknown")
|
||||
print(f"\n❌ 服务器错误: {error}")
|
||||
break
|
||||
|
||||
# Full server response (0x9) - session finished
|
||||
elif msg_type == 0x09:
|
||||
event_id = resp.get("event_id", 0)
|
||||
if event_id in {152, 52}:
|
||||
break
|
||||
|
||||
# 5. 保存音频文件
|
||||
if audio_data:
|
||||
# Save as PCM, convert extension if needed
|
||||
actual_output = output_file
|
||||
if output_file.endswith('.mp3'):
|
||||
actual_output = output_file.replace('.mp3', '.pcm')
|
||||
|
||||
with open(actual_output, "wb") as f:
|
||||
f.write(audio_data)
|
||||
|
||||
file_size = len(audio_data) / 1024
|
||||
print(f"\n\n✅ 唱歌完成!")
|
||||
print(f" 输出: {actual_output} ({file_size:.1f} KB)")
|
||||
print(f" 格式: PCM (24000Hz, 单声道)")
|
||||
return actual_output
|
||||
else:
|
||||
print("\n⚠️ 未收到音频数据,请检查:")
|
||||
print(" 1. 凭证是否正确")
|
||||
print(" 2. 端到端实时语音大模型是否已开通")
|
||||
print(" 3. 网络连接是否正常")
|
||||
return None
|
||||
|
||||
except websockets.exceptions.WebSocketException as e:
|
||||
raise Exception(f"WebSocket连接错误: {str(e)}")
|
||||
except Exception as e:
|
||||
raise Exception(f"唱歌调用失败: {str(e)}")
|
||||
|
||||
|
||||
def main():
|
||||
"""命令行工具"""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description="豆包唱歌工具")
|
||||
subparsers = parser.add_subparsers(dest="command", help="选择功能")
|
||||
|
||||
# 唱歌命令
|
||||
sing_parser = subparsers.add_parser("sing", help="让豆包唱歌")
|
||||
sing_parser.add_argument("request", help="唱歌请求,如 '请唱一首关于春天的歌'")
|
||||
sing_parser.add_argument(
|
||||
"-o", "--output", default="singing_output.mp3", help="输出音频文件(默认: singing_output.mp3)"
|
||||
)
|
||||
sing_parser.add_argument(
|
||||
"-l", "--language", default="zh-CN", help="语言代码(默认: zh-CN)"
|
||||
)
|
||||
sing_parser.add_argument(
|
||||
"-m", "--model", default="1.2.1.0", help="模型版本(默认: 1.2.1.0=O2.0版本)"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not args.command:
|
||||
parser.print_help()
|
||||
return
|
||||
|
||||
try:
|
||||
singing = DoubaoSinging()
|
||||
|
||||
if args.command == "sing":
|
||||
asyncio.run(singing.sing(args.request, args.output, args.language, args.model))
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 错误: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
171
skills/doubao-voice-plugin/scripts/voice_converter.py
Executable file
171
skills/doubao-voice-plugin/scripts/voice_converter.py
Executable file
@@ -0,0 +1,171 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
豆包语音转换工具
|
||||
支持:文字转语音 (TTS)
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import base64
|
||||
import requests
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
class DoubaoVoiceConverter:
|
||||
"""豆包语音转换工具类"""
|
||||
|
||||
def __init__(self):
|
||||
# 从环境变量读取配置
|
||||
self.app_id = os.environ.get("DOUBAO_APP_ID")
|
||||
self.access_token = os.environ.get("DOUBAO_ACCESS_TOKEN")
|
||||
|
||||
if not self.app_id or not self.access_token:
|
||||
raise ValueError(
|
||||
"请先设置环境变量:\n"
|
||||
"export DOUBAO_APP_ID='your_app_id'\n"
|
||||
"export DOUBAO_ACCESS_TOKEN='your_access_token'"
|
||||
)
|
||||
|
||||
# API版本选择: V1 (默认, 支持基础音色) 或 V3 (豆包2.0, 需额外配置)
|
||||
self.use_v3 = os.environ.get("DOUBAO_USE_V3", "false").lower() == "true"
|
||||
|
||||
if self.use_v3:
|
||||
self.tts_url = "https://openspeech.bytedance.com/api/v3/tts/unidirectional"
|
||||
self.resource_id = os.environ.get("DOUBAO_RESOURCE_ID", "volc.bigmodel.tts")
|
||||
else:
|
||||
# V1 API - 稳定可用,支持基础音色
|
||||
self.tts_url = "https://openspeech.bytedance.com/api/v1/tts"
|
||||
|
||||
def text_to_speech(
|
||||
self,
|
||||
text: str,
|
||||
output_file: str = "output.mp3",
|
||||
voice_type: str = "BV700_V2_streaming"
|
||||
) -> str:
|
||||
"""
|
||||
文字转语音 (TTS)
|
||||
|
||||
Args:
|
||||
text: 要转换的文字
|
||||
output_file: 输出音频文件路径
|
||||
voice_type: 音色类型
|
||||
- BV700_V2_streaming: 通用女声(推荐)
|
||||
- BV701_V2_streaming: 通用男声
|
||||
- BV406_streaming: 温柔女声
|
||||
- BV158_streaming: 活泼女声
|
||||
- BV115_streaming: 磁性男声
|
||||
|
||||
Returns:
|
||||
str: 输出文件路径
|
||||
"""
|
||||
print(f"📝 文字转语音中...")
|
||||
print(f" 文字: {text[:50]}{'...' if len(text) > 50 else ''}")
|
||||
print(f" 音色: {voice_type}")
|
||||
|
||||
headers = {
|
||||
"Authorization": f"Bearer;{self.access_token}",
|
||||
"Content-Type": "application/json"
|
||||
}
|
||||
|
||||
# V3 API需要Resource-Id (如果启用)
|
||||
if self.use_v3:
|
||||
headers["Resource-Id"] = self.resource_id
|
||||
|
||||
payload = {
|
||||
"app": {
|
||||
"appid": self.app_id,
|
||||
"token": self.access_token,
|
||||
"cluster": "volcano_tts"
|
||||
},
|
||||
"user": {
|
||||
"uid": "user_001"
|
||||
},
|
||||
"audio": {
|
||||
"voice_type": voice_type,
|
||||
"encoding": "mp3",
|
||||
"speed_ratio": 1.0,
|
||||
"volume_ratio": 1.0,
|
||||
"pitch_ratio": 1.0
|
||||
},
|
||||
"request": {
|
||||
"reqid": f"tts_{os.urandom(8).hex()}",
|
||||
"text": text,
|
||||
"text_type": "plain",
|
||||
"operation": "query"
|
||||
}
|
||||
}
|
||||
|
||||
try:
|
||||
response = requests.post(self.tts_url, headers=headers, json=payload, timeout=30)
|
||||
|
||||
# 打印响应头信息
|
||||
print(f"\n📋 响应信息:")
|
||||
print(f" HTTP状态码: {response.status_code}")
|
||||
if 'X-Tt-Logid' in response.headers:
|
||||
print(f" RequestId: {response.headers['X-Tt-Logid']}")
|
||||
if 'X-Request-Id' in response.headers:
|
||||
print(f" X-Request-Id: {response.headers['X-Request-Id']}")
|
||||
|
||||
data = response.json()
|
||||
|
||||
# 打印完整响应
|
||||
print(f"\n📄 完整响应:")
|
||||
print(json.dumps(data, indent=2, ensure_ascii=False))
|
||||
print()
|
||||
|
||||
if data.get("code") == 3000:
|
||||
# 成功:解码并保存音频
|
||||
audio_data = base64.b64decode(data["data"])
|
||||
with open(output_file, "wb") as f:
|
||||
f.write(audio_data)
|
||||
|
||||
file_size = len(audio_data) / 1024 # KB
|
||||
print(f"✅ 语音合成成功!")
|
||||
print(f" 输出: {output_file} ({file_size:.1f} KB)")
|
||||
return output_file
|
||||
else:
|
||||
error_msg = data.get("message", "未知错误")
|
||||
reqid = data.get("reqid", "未知")
|
||||
raise Exception(f"TTS 失败\n 错误码: {data.get('code')}\n 错误信息: {error_msg}\n RequestId: {reqid}")
|
||||
|
||||
except requests.exceptions.Timeout:
|
||||
raise Exception("请求超时,请检查网络连接")
|
||||
except Exception as e:
|
||||
raise Exception(f"TTS 调用失败: {str(e)}")
|
||||
|
||||
|
||||
|
||||
def main():
|
||||
"""命令行工具"""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description="豆包语音转换工具")
|
||||
subparsers = parser.add_subparsers(dest="command", help="选择功能")
|
||||
|
||||
# TTS 命令
|
||||
tts_parser = subparsers.add_parser("tts", help="文字转语音")
|
||||
tts_parser.add_argument("text", help="要转换的文字")
|
||||
tts_parser.add_argument("-o", "--output", default="output.mp3", help="输出音频文件(默认: output.mp3)")
|
||||
tts_parser.add_argument("-v", "--voice", default="BV700_V2_streaming",
|
||||
help="音色类型(默认: BV700_V2_streaming 通用女声)")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not args.command:
|
||||
parser.print_help()
|
||||
return
|
||||
|
||||
try:
|
||||
converter = DoubaoVoiceConverter()
|
||||
|
||||
if args.command == "tts":
|
||||
converter.text_to_speech(args.text, args.output, args.voice)
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ 错误: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
508
skills/doubao-voice-plugin/skills/SKILL.md
Normal file
508
skills/doubao-voice-plugin/skills/SKILL.md
Normal file
@@ -0,0 +1,508 @@
|
||||
---
|
||||
name: doubao-voice
|
||||
description: 豆包语音API调用。支持语音合成(TTS)和唱歌。当用户提到语音合成、文字转语音、唱歌、豆包语音相关任务时自动激活。
|
||||
---
|
||||
|
||||
# 豆包语音API技能
|
||||
|
||||
调用火山引擎豆包语音API,实现文字转语音(TTS)和唱歌功能。
|
||||
|
||||
## 核心功能 ⭐
|
||||
|
||||
### 1. 文字转语音 (TTS)
|
||||
|
||||
```bash
|
||||
# 1. 配置环境变量
|
||||
export DOUBAO_APP_ID="your_app_id"
|
||||
export DOUBAO_ACCESS_TOKEN="your_access_token"
|
||||
|
||||
# 2. 文字转语音
|
||||
python scripts/voice_converter.py tts "你好世界"
|
||||
```
|
||||
|
||||
### 2. 唱歌 🎵
|
||||
|
||||
```bash
|
||||
# 让豆包唱歌
|
||||
python scripts/singing.py sing "请唱一首关于春天的歌"
|
||||
|
||||
# 交互式唱歌模式
|
||||
python scripts/singing.py interactive
|
||||
```
|
||||
|
||||
## 功能概述
|
||||
|
||||
| 模块 | 功能 | 推荐模型 |
|
||||
|------|------|---------|
|
||||
| **语音合成 (TTS)** | 文字转语音、多种音色 | 豆包语音合成模型2.0 |
|
||||
| **唱歌** | 实时语音交互、唱歌、角色扮演 | 豆包端到端实时语音大模型 |
|
||||
|
||||
---
|
||||
|
||||
## 环境配置
|
||||
|
||||
### 1. 获取火山引擎豆包语音凭证
|
||||
|
||||
1. 访问 [火山引擎控制台](https://console.volcengine.com/)
|
||||
2. 开通「豆包语音」服务
|
||||
3. 创建应用获取 `App ID` 和 `Access Token`
|
||||
4. 开通所需服务:
|
||||
- 「语音合成」权限:大模型语音合成
|
||||
|
||||
### 2. 环境变量配置
|
||||
|
||||
```bash
|
||||
# ~/.zshrc 或 ~/.bashrc
|
||||
export DOUBAO_APP_ID="your_app_id"
|
||||
export DOUBAO_ACCESS_TOKEN="your_access_token"
|
||||
export DOUBAO_CLUSTER="volcano_tts" # TTS服务集群
|
||||
```
|
||||
|
||||
### 3. Python 依赖
|
||||
|
||||
```bash
|
||||
# 推荐使用 uv
|
||||
uv pip install requests websocket-client
|
||||
|
||||
# 或使用 pip
|
||||
pip install requests websocket-client
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API 基础
|
||||
|
||||
### Base URL
|
||||
|
||||
```
|
||||
TTS API: https://openspeech.bytedance.com/api/v1/tts
|
||||
```
|
||||
|
||||
### 认证方式
|
||||
|
||||
使用 Access Token 进行认证,在请求头中添加:
|
||||
```
|
||||
Authorization: Bearer {access_token}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 一、语音合成 (TTS)
|
||||
|
||||
### 1.1 基础语音合成
|
||||
|
||||
将文本转换为语音文件。
|
||||
|
||||
**自然语言示例**:
|
||||
- "把这段文字转成语音"
|
||||
- "用豆包合成语音"
|
||||
- "生成语音:你好,欢迎使用豆包语音"
|
||||
|
||||
**Python 实现**:
|
||||
|
||||
```python
|
||||
import os
|
||||
import requests
|
||||
import json
|
||||
import base64
|
||||
|
||||
def text_to_speech(text: str, voice_type: str = "BV700_V2_streaming", output_file: str = "output.mp3"):
|
||||
"""
|
||||
文字转语音
|
||||
|
||||
Args:
|
||||
text: 要合成的文本
|
||||
voice_type: 音色类型 (默认: BV700_V2_streaming)
|
||||
output_file: 输出音频文件路径
|
||||
|
||||
Returns:
|
||||
音频文件路径
|
||||
"""
|
||||
app_id = os.environ.get("DOUBAO_APP_ID")
|
||||
access_token = os.environ.get("DOUBAO_ACCESS_TOKEN")
|
||||
cluster = os.environ.get("DOUBAO_CLUSTER", "volcano_tts")
|
||||
|
||||
url = "https://openspeech.bytedance.com/api/v1/tts"
|
||||
|
||||
headers = {
|
||||
"Authorization": f"Bearer {access_token}",
|
||||
"Content-Type": "application/json"
|
||||
}
|
||||
|
||||
payload = {
|
||||
"app": {
|
||||
"appid": app_id,
|
||||
"token": access_token,
|
||||
"cluster": cluster
|
||||
},
|
||||
"user": {
|
||||
"uid": "user123"
|
||||
},
|
||||
"audio": {
|
||||
"voice_type": voice_type,
|
||||
"encoding": "mp3",
|
||||
"speed_ratio": 1.0,
|
||||
"volume_ratio": 1.0,
|
||||
"pitch_ratio": 1.0
|
||||
},
|
||||
"request": {
|
||||
"reqid": "req_" + os.urandom(8).hex(),
|
||||
"text": text,
|
||||
"text_type": "plain",
|
||||
"operation": "query"
|
||||
}
|
||||
}
|
||||
|
||||
response = requests.post(url, headers=headers, json=payload)
|
||||
data = response.json()
|
||||
|
||||
if data.get("code") == 3000:
|
||||
# 解码音频数据
|
||||
audio_data = base64.b64decode(data["data"])
|
||||
with open(output_file, "wb") as f:
|
||||
f.write(audio_data)
|
||||
return output_file
|
||||
else:
|
||||
raise Exception(f"TTS 失败: {data}")
|
||||
|
||||
# 使用示例
|
||||
audio_file = text_to_speech("你好,我是豆包语音助手")
|
||||
print(f"语音已生成: {audio_file}")
|
||||
```
|
||||
|
||||
### 1.2 流式语音合成
|
||||
|
||||
适用于长文本,边生成边播放。
|
||||
|
||||
```python
|
||||
import websocket
|
||||
import json
|
||||
import os
|
||||
|
||||
def stream_tts(text: str, voice_type: str = "BV700_V2_streaming"):
|
||||
"""
|
||||
流式语音合成
|
||||
|
||||
Args:
|
||||
text: 要合成的文本
|
||||
voice_type: 音色类型
|
||||
"""
|
||||
app_id = os.environ.get("DOUBAO_APP_ID")
|
||||
access_token = os.environ.get("DOUBAO_ACCESS_TOKEN")
|
||||
|
||||
ws_url = f"wss://openspeech.bytedance.com/api/v1/tts/ws?appid={app_id}&token={access_token}"
|
||||
|
||||
def on_message(ws, message):
|
||||
data = json.loads(message)
|
||||
if "audio" in data:
|
||||
# 处理音频数据
|
||||
audio_chunk = base64.b64decode(data["audio"])
|
||||
# 播放或保存音频片段
|
||||
print(f"收到音频片段: {len(audio_chunk)} 字节")
|
||||
|
||||
def on_open(ws):
|
||||
payload = {
|
||||
"app": {
|
||||
"appid": app_id,
|
||||
"token": access_token,
|
||||
"cluster": "volcano_tts"
|
||||
},
|
||||
"user": {
|
||||
"uid": "user123"
|
||||
},
|
||||
"audio": {
|
||||
"voice_type": voice_type,
|
||||
"encoding": "mp3"
|
||||
},
|
||||
"request": {
|
||||
"reqid": "stream_" + os.urandom(8).hex(),
|
||||
"text": text,
|
||||
"text_type": "plain",
|
||||
"operation": "submit"
|
||||
}
|
||||
}
|
||||
ws.send(json.dumps(payload))
|
||||
|
||||
ws = websocket.WebSocketApp(
|
||||
ws_url,
|
||||
on_message=on_message,
|
||||
on_open=on_open
|
||||
)
|
||||
ws.run_forever()
|
||||
|
||||
# 使用示例
|
||||
stream_tts("这是一段很长的文本,使用流式合成可以边生成边播放...")
|
||||
```
|
||||
|
||||
### 1.3 音色选择
|
||||
|
||||
豆包语音提供多种音色:
|
||||
|
||||
| 音色代码 | 描述 | 场景 |
|
||||
|---------|------|------|
|
||||
| BV700_V2_streaming | 通用女声 | 通用场景 |
|
||||
| BV701_V2_streaming | 通用男声 | 通用场景 |
|
||||
| BV406_streaming | 温柔女声 | 客服、助手 |
|
||||
| BV158_streaming | 活泼女声 | 教育、娱乐 |
|
||||
| BV115_streaming | 磁性男声 | 新闻、播音 |
|
||||
|
||||
**查询可用音色**:
|
||||
|
||||
```bash
|
||||
TOKEN="${DOUBAO_ACCESS_TOKEN}"
|
||||
APP_ID="${DOUBAO_APP_ID}"
|
||||
|
||||
curl -s "https://openspeech.bytedance.com/api/v1/tts/voices?appid=$APP_ID" \
|
||||
-H "Authorization: Bearer $TOKEN"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 完整工具类
|
||||
|
||||
```python
|
||||
import os
|
||||
import requests
|
||||
import base64
|
||||
import json
|
||||
from typing import Optional
|
||||
|
||||
class DoubaoVoice:
|
||||
"""豆包语音API工具类"""
|
||||
|
||||
BASE_URL = "https://openspeech.bytedance.com/api/v1"
|
||||
|
||||
def __init__(self, app_id: str = None, access_token: str = None):
|
||||
self.app_id = app_id or os.environ.get("DOUBAO_APP_ID")
|
||||
self.access_token = access_token or os.environ.get("DOUBAO_ACCESS_TOKEN")
|
||||
self.cluster_tts = os.environ.get("DOUBAO_CLUSTER", "volcano_tts")
|
||||
|
||||
@property
|
||||
def headers(self):
|
||||
return {
|
||||
"Authorization": f"Bearer {self.access_token}",
|
||||
"Content-Type": "application/json"
|
||||
}
|
||||
|
||||
def text_to_speech(
|
||||
self,
|
||||
text: str,
|
||||
voice_type: str = "BV700_V2_streaming",
|
||||
output_file: str = "output.mp3"
|
||||
) -> str:
|
||||
"""文字转语音"""
|
||||
url = f"{self.BASE_URL}/tts"
|
||||
|
||||
payload = {
|
||||
"app": {
|
||||
"appid": self.app_id,
|
||||
"token": self.access_token,
|
||||
"cluster": self.cluster_tts
|
||||
},
|
||||
"user": {"uid": "user123"},
|
||||
"audio": {
|
||||
"voice_type": voice_type,
|
||||
"encoding": "mp3",
|
||||
"speed_ratio": 1.0,
|
||||
"volume_ratio": 1.0,
|
||||
"pitch_ratio": 1.0
|
||||
},
|
||||
"request": {
|
||||
"reqid": "req_" + os.urandom(8).hex(),
|
||||
"text": text,
|
||||
"text_type": "plain",
|
||||
"operation": "query"
|
||||
}
|
||||
}
|
||||
|
||||
response = requests.post(url, headers=self.headers, json=payload)
|
||||
data = response.json()
|
||||
|
||||
if data.get("code") == 3000:
|
||||
audio_data = base64.b64decode(data["data"])
|
||||
with open(output_file, "wb") as f:
|
||||
f.write(audio_data)
|
||||
return output_file
|
||||
else:
|
||||
raise Exception(f"TTS 失败: {data}")
|
||||
|
||||
def list_voices(self) -> list:
|
||||
"""获取可用音色列表"""
|
||||
url = f"{self.BASE_URL}/tts/voices"
|
||||
params = {"appid": self.app_id}
|
||||
|
||||
response = requests.get(url, headers=self.headers, params=params)
|
||||
data = response.json()
|
||||
|
||||
if data.get("code") == 0:
|
||||
return data["voices"]
|
||||
else:
|
||||
raise Exception(f"获取音色列表失败: {data}")
|
||||
|
||||
|
||||
# ==================== 使用示例 ====================
|
||||
if __name__ == "__main__":
|
||||
voice = DoubaoVoice()
|
||||
|
||||
# 示例1: 文字转语音
|
||||
audio_file = voice.text_to_speech("你好,我是豆包语音助手")
|
||||
print(f"语音已生成: {audio_file}")
|
||||
|
||||
# 示例2: 查看可用音色
|
||||
voices = voice.list_voices()
|
||||
for v in voices[:5]:
|
||||
print(f"{v['voice_type']}: {v['description']}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 二、唱歌 (豆包端到端实时语音大模型)
|
||||
|
||||
### 2.1 基础唱歌
|
||||
|
||||
让豆包唱歌,支持任何歌曲主题。
|
||||
|
||||
**自然语言示例**:
|
||||
- "请唱一首关于春天的歌"
|
||||
- "唱一个温柔的摇篮曲"
|
||||
- "来一首欢快的儿歌"
|
||||
|
||||
**Python 实现**:
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from scripts.singing import DoubaoSinging
|
||||
|
||||
async def main():
|
||||
singing = DoubaoSinging()
|
||||
|
||||
# 让豆包唱歌
|
||||
audio_file = await singing.sing(
|
||||
"请唱一首关于春天的歌",
|
||||
output_file="spring_song.mp3",
|
||||
language="zh-CN"
|
||||
)
|
||||
print(f"唱歌完成: {audio_file}")
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
### 2.2 交互式唱歌
|
||||
|
||||
与豆包进行实时对话,可以要求她唱歌、讲故事等。
|
||||
|
||||
**Python 实现**:
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from scripts.singing import DoubaoSinging
|
||||
|
||||
async def main():
|
||||
singing = DoubaoSinging()
|
||||
|
||||
# 启动交互式模式
|
||||
await singing.interactive_singing(language="zh-CN")
|
||||
|
||||
asyncio.run(main())
|
||||
```
|
||||
|
||||
**交互示例**:
|
||||
```
|
||||
你: 请唱一首情歌
|
||||
豆包: [生成音频] 我会为你唱一首温柔的情歌...
|
||||
|
||||
你: 能加点方言吗?
|
||||
豆包: [用方言重新唱歌]
|
||||
|
||||
你: quit
|
||||
再见!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 自然语言操作示例
|
||||
|
||||
### TTS 操作
|
||||
|
||||
| 用户说 | 执行操作 |
|
||||
|--------|----------|
|
||||
| "把这段话转成语音:你好世界" | 调用 TTS API 生成语音 |
|
||||
| "用温柔女声合成语音" | 使用 BV406_streaming 音色 |
|
||||
| "生成一段播音腔的新闻语音" | 使用磁性男声音色 |
|
||||
|
||||
### 唱歌操作
|
||||
|
||||
| 用户说 | 执行操作 |
|
||||
|--------|----------|
|
||||
| "请唱一首关于春天的歌" | 调用端到端实时语音大模型生成唱歌音频 |
|
||||
| "唱一首摇篮曲" | 生成温柔的摇篮曲 |
|
||||
| "唱歌的同时讲个故事" | 交互式对话中唱歌并讲故事 |
|
||||
| "开启交互式唱歌模式" | 启动实时语音交互 |
|
||||
|
||||
---
|
||||
|
||||
## 计费说明
|
||||
|
||||
### TTS 计费
|
||||
|
||||
- **并发版**: 2000元/并发/月(纯并发计费,不收取字符调用费用)
|
||||
- **按量付费**: 按合成字符数计费
|
||||
|
||||
### 免费试用
|
||||
|
||||
新用户开通服务后可获得一定免费额度,具体额度以控制台显示为准。
|
||||
|
||||
---
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. **音频格式**: TTS 支持 mp3/wav/pcm
|
||||
2. **文本长度**: TTS 单次请求最长支持 5000 字符
|
||||
3. **并发限制**: 注意 API 调用频率和并发数限制
|
||||
4. **Token 安全**: Access Token 存储在环境变量中,不要硬编码
|
||||
|
||||
---
|
||||
|
||||
## 错误处理
|
||||
|
||||
```python
|
||||
def safe_tts(text: str):
|
||||
"""带错误处理的 TTS"""
|
||||
try:
|
||||
voice = DoubaoVoice()
|
||||
return voice.text_to_speech(text)
|
||||
except Exception as e:
|
||||
if "401" in str(e):
|
||||
print("认证失败,请检查 Access Token")
|
||||
elif "429" in str(e):
|
||||
print("请求过于频繁,请稍后重试")
|
||||
else:
|
||||
print(f"合成失败: {e}")
|
||||
return None
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 常见场景
|
||||
|
||||
### 场景 1: 生成多语言语音
|
||||
|
||||
```python
|
||||
voice = DoubaoVoice()
|
||||
|
||||
# 中文
|
||||
voice.text_to_speech("你好", voice_type="BV700_V2_streaming", output_file="zh.mp3")
|
||||
|
||||
# 英文
|
||||
voice.text_to_speech("Hello", voice_type="EN_001", output_file="en.mp3")
|
||||
```
|
||||
|
||||
|
||||
---
|
||||
|
||||
## 参考资源
|
||||
|
||||
- [火山引擎豆包语音文档](https://www.volcengine.com/docs/6561/1359369)
|
||||
- [豆包语音控制台](https://console.volcengine.com/speech/app)
|
||||
- [API 接口文档](https://www.volcengine.com/docs/6561/1359370)
|
||||
- [计费说明](https://www.volcengine.com/docs/6561/1359370)
|
||||
Reference in New Issue
Block a user