refactor: 通用技能按类别拆分为独立目录

skills/ → skills-dev(9), skills-req(10), skills-ops(4), skills-integration(8), skills-biz(4), skills-workflow(7) generate-marketplace.py 改为自动扫描所有 skills-* 目录。 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 11:31:58 +10:30
parent ea266e9cce
commit 712063071c
170 changed files with 341 additions and 346 deletions
@@ -0,0 +1,508 @@
+---
+name: doubao-voice
+description: 豆包语音API调用。支持语音合成(TTS)和唱歌。当用户提到语音合成、文字转语音、唱歌、豆包语音相关任务时自动激活。
+---
+
+# 豆包语音API技能
+
+调用火山引擎豆包语音API，实现文字转语音(TTS)和唱歌功能。
+
+## 核心功能 ⭐
+
+### 1. 文字转语音 (TTS)
+
+```bash
+# 1. 配置环境变量
+export DOUBAO_APP_ID="your_app_id"
+export DOUBAO_ACCESS_TOKEN="your_access_token"
+
+# 2. 文字转语音
+python scripts/voice_converter.py tts "你好世界"
+```
+
+### 2. 唱歌 🎵
+
+```bash
+# 让豆包唱歌
+python scripts/singing.py sing "请唱一首关于春天的歌"
+
+# 交互式唱歌模式
+python scripts/singing.py interactive
+```
+
+## 功能概述
+
+| 模块 | 功能 | 推荐模型 |
+|------|------|---------|
+| **语音合成 (TTS)** | 文字转语音、多种音色 | 豆包语音合成模型2.0 |
+| **唱歌** | 实时语音交互、唱歌、角色扮演 | 豆包端到端实时语音大模型 |
+
+---
+
+## 环境配置
+
+### 1. 获取火山引擎豆包语音凭证
+
+1. 访问 [火山引擎控制台](https://console.volcengine.com/)
+2. 开通「豆包语音」服务
+3. 创建应用获取 `App ID` 和 `Access Token`
+4. 开通所需服务：
+   - 「语音合成」权限：大模型语音合成
+
+### 2. 环境变量配置
+
+```bash
+# ~/.zshrc 或 ~/.bashrc
+export DOUBAO_APP_ID="your_app_id"
+export DOUBAO_ACCESS_TOKEN="your_access_token"
+export DOUBAO_CLUSTER="volcano_tts"  # TTS服务集群
+```
+
+### 3. Python 依赖
+
+```bash
+# 推荐使用 uv
+uv pip install requests websocket-client
+
+# 或使用 pip
+pip install requests websocket-client
+```
+
+---
+
+## API 基础
+
+### Base URL
+
+```
+TTS API: https://openspeech.bytedance.com/api/v1/tts
+```
+
+### 认证方式
+
+使用 Access Token 进行认证，在请求头中添加：
+```
+Authorization: Bearer {access_token}
+```
+
+---
+
+## 一、语音合成 (TTS)
+
+### 1.1 基础语音合成
+
+将文本转换为语音文件。
+
+**自然语言示例**:
+- "把这段文字转成语音"
+- "用豆包合成语音"
+- "生成语音：你好，欢迎使用豆包语音"
+
+**Python 实现**:
+
+```python
+import os
+import requests
+import json
+import base64
+
+def text_to_speech(text: str, voice_type: str = "BV700_V2_streaming", output_file: str = "output.mp3"):
+    """
+    文字转语音
+
+    Args:
+        text: 要合成的文本
+        voice_type: 音色类型 (默认: BV700_V2_streaming)
+        output_file: 输出音频文件路径
+
+    Returns:
+        音频文件路径
+    """
+    app_id = os.environ.get("DOUBAO_APP_ID")
+    access_token = os.environ.get("DOUBAO_ACCESS_TOKEN")
+    cluster = os.environ.get("DOUBAO_CLUSTER", "volcano_tts")
+
+    url = "https://openspeech.bytedance.com/api/v1/tts"
+
+    headers = {
+        "Authorization": f"Bearer {access_token}",
+        "Content-Type": "application/json"
+    }
+
+    payload = {
+        "app": {
+            "appid": app_id,
+            "token": access_token,
+            "cluster": cluster
+        },
+        "user": {
+            "uid": "user123"
+        },
+        "audio": {
+            "voice_type": voice_type,
+            "encoding": "mp3",
+            "speed_ratio": 1.0,
+            "volume_ratio": 1.0,
+            "pitch_ratio": 1.0
+        },
+        "request": {
+            "reqid": "req_" + os.urandom(8).hex(),
+            "text": text,
+            "text_type": "plain",
+            "operation": "query"
+        }
+    }
+
+    response = requests.post(url, headers=headers, json=payload)
+    data = response.json()
+
+    if data.get("code") == 3000:
+        # 解码音频数据
+        audio_data = base64.b64decode(data["data"])
+        with open(output_file, "wb") as f:
+            f.write(audio_data)
+        return output_file
+    else:
+        raise Exception(f"TTS 失败: {data}")
+
+# 使用示例
+audio_file = text_to_speech("你好，我是豆包语音助手")
+print(f"语音已生成: {audio_file}")
+```
+
+### 1.2 流式语音合成
+
+适用于长文本，边生成边播放。
+
+```python
+import websocket
+import json
+import os
+
+def stream_tts(text: str, voice_type: str = "BV700_V2_streaming"):
+    """
+    流式语音合成
+
+    Args:
+        text: 要合成的文本
+        voice_type: 音色类型
+    """
+    app_id = os.environ.get("DOUBAO_APP_ID")
+    access_token = os.environ.get("DOUBAO_ACCESS_TOKEN")
+
+    ws_url = f"wss://openspeech.bytedance.com/api/v1/tts/ws?appid={app_id}&token={access_token}"
+
+    def on_message(ws, message):
+        data = json.loads(message)
+        if "audio" in data:
+            # 处理音频数据
+            audio_chunk = base64.b64decode(data["audio"])
+            # 播放或保存音频片段
+            print(f"收到音频片段: {len(audio_chunk)} 字节")
+
+    def on_open(ws):
+        payload = {
+            "app": {
+                "appid": app_id,
+                "token": access_token,
+                "cluster": "volcano_tts"
+            },
+            "user": {
+                "uid": "user123"
+            },
+            "audio": {
+                "voice_type": voice_type,
+                "encoding": "mp3"
+            },
+            "request": {
+                "reqid": "stream_" + os.urandom(8).hex(),
+                "text": text,
+                "text_type": "plain",
+                "operation": "submit"
+            }
+        }
+        ws.send(json.dumps(payload))
+
+    ws = websocket.WebSocketApp(
+        ws_url,
+        on_message=on_message,
+        on_open=on_open
+    )
+    ws.run_forever()
+
+# 使用示例
+stream_tts("这是一段很长的文本，使用流式合成可以边生成边播放...")
+```
+
+### 1.3 音色选择
+
+豆包语音提供多种音色：
+
+| 音色代码 | 描述 | 场景 |
+|---------|------|------|
+| BV700_V2_streaming | 通用女声 | 通用场景 |
+| BV701_V2_streaming | 通用男声 | 通用场景 |
+| BV406_streaming | 温柔女声 | 客服、助手 |
+| BV158_streaming | 活泼女声 | 教育、娱乐 |
+| BV115_streaming | 磁性男声 | 新闻、播音 |
+
+**查询可用音色**:
+
+```bash
+TOKEN="${DOUBAO_ACCESS_TOKEN}"
+APP_ID="${DOUBAO_APP_ID}"
+
+curl -s "https://openspeech.bytedance.com/api/v1/tts/voices?appid=$APP_ID" \
+  -H "Authorization: Bearer $TOKEN"
+```
+
+---
+
+## 完整工具类
+
+```python
+import os
+import requests
+import base64
+import json
+from typing import Optional
+
+class DoubaoVoice:
+    """豆包语音API工具类"""
+
+    BASE_URL = "https://openspeech.bytedance.com/api/v1"
+
+    def __init__(self, app_id: str = None, access_token: str = None):
+        self.app_id = app_id or os.environ.get("DOUBAO_APP_ID")
+        self.access_token = access_token or os.environ.get("DOUBAO_ACCESS_TOKEN")
+        self.cluster_tts = os.environ.get("DOUBAO_CLUSTER", "volcano_tts")
+
+    @property
+    def headers(self):
+        return {
+            "Authorization": f"Bearer {self.access_token}",
+            "Content-Type": "application/json"
+        }
+
+    def text_to_speech(
+        self,
+        text: str,
+        voice_type: str = "BV700_V2_streaming",
+        output_file: str = "output.mp3"
+    ) -> str:
+        """文字转语音"""
+        url = f"{self.BASE_URL}/tts"
+
+        payload = {
+            "app": {
+                "appid": self.app_id,
+                "token": self.access_token,
+                "cluster": self.cluster_tts
+            },
+            "user": {"uid": "user123"},
+            "audio": {
+                "voice_type": voice_type,
+                "encoding": "mp3",
+                "speed_ratio": 1.0,
+                "volume_ratio": 1.0,
+                "pitch_ratio": 1.0
+            },
+            "request": {
+                "reqid": "req_" + os.urandom(8).hex(),
+                "text": text,
+                "text_type": "plain",
+                "operation": "query"
+            }
+        }
+
+        response = requests.post(url, headers=self.headers, json=payload)
+        data = response.json()
+
+        if data.get("code") == 3000:
+            audio_data = base64.b64decode(data["data"])
+            with open(output_file, "wb") as f:
+                f.write(audio_data)
+            return output_file
+        else:
+            raise Exception(f"TTS 失败: {data}")
+
+    def list_voices(self) -> list:
+        """获取可用音色列表"""
+        url = f"{self.BASE_URL}/tts/voices"
+        params = {"appid": self.app_id}
+
+        response = requests.get(url, headers=self.headers, params=params)
+        data = response.json()
+
+        if data.get("code") == 0:
+            return data["voices"]
+        else:
+            raise Exception(f"获取音色列表失败: {data}")
+
+
+# ==================== 使用示例 ====================
+if __name__ == "__main__":
+    voice = DoubaoVoice()
+
+    # 示例1: 文字转语音
+    audio_file = voice.text_to_speech("你好，我是豆包语音助手")
+    print(f"语音已生成: {audio_file}")
+
+    # 示例2: 查看可用音色
+    voices = voice.list_voices()
+    for v in voices[:5]:
+        print(f"{v['voice_type']}: {v['description']}")
+```
+
+---
+
+## 二、唱歌 (豆包端到端实时语音大模型)
+
+### 2.1 基础唱歌
+
+让豆包唱歌，支持任何歌曲主题。
+
+**自然语言示例**:
+- "请唱一首关于春天的歌"
+- "唱一个温柔的摇篮曲"
+- "来一首欢快的儿歌"
+
+**Python 实现**:
+
+```python
+import asyncio
+from scripts.singing import DoubaoSinging
+
+async def main():
+    singing = DoubaoSinging()
+
+    # 让豆包唱歌
+    audio_file = await singing.sing(
+        "请唱一首关于春天的歌",
+        output_file="spring_song.mp3",
+        language="zh-CN"
+    )
+    print(f"唱歌完成: {audio_file}")
+
+asyncio.run(main())
+```
+
+### 2.2 交互式唱歌
+
+与豆包进行实时对话，可以要求她唱歌、讲故事等。
+
+**Python 实现**:
+
+```python
+import asyncio
+from scripts.singing import DoubaoSinging
+
+async def main():
+    singing = DoubaoSinging()
+
+    # 启动交互式模式
+    await singing.interactive_singing(language="zh-CN")
+
+asyncio.run(main())
+```
+
+**交互示例**:
+```
+你: 请唱一首情歌
+豆包: [生成音频] 我会为你唱一首温柔的情歌...
+
+你: 能加点方言吗？
+豆包: [用方言重新唱歌]
+
+你: quit
+再见!
+```
+
+---
+
+## 自然语言操作示例
+
+### TTS 操作
+
+| 用户说 | 执行操作 |
+|--------|----------|
+| "把这段话转成语音：你好世界" | 调用 TTS API 生成语音 |
+| "用温柔女声合成语音" | 使用 BV406_streaming 音色 |
+| "生成一段播音腔的新闻语音" | 使用磁性男声音色 |
+
+### 唱歌操作
+
+| 用户说 | 执行操作 |
+|--------|----------|
+| "请唱一首关于春天的歌" | 调用端到端实时语音大模型生成唱歌音频 |
+| "唱一首摇篮曲" | 生成温柔的摇篮曲 |
+| "唱歌的同时讲个故事" | 交互式对话中唱歌并讲故事 |
+| "开启交互式唱歌模式" | 启动实时语音交互 |
+
+---
+
+## 计费说明
+
+### TTS 计费
+
+- **并发版**: 2000元/并发/月（纯并发计费，不收取字符调用费用）
+- **按量付费**: 按合成字符数计费
+
+### 免费试用
+
+新用户开通服务后可获得一定免费额度，具体额度以控制台显示为准。
+
+---
+
+## 注意事项
+
+1. **音频格式**: TTS 支持 mp3/wav/pcm
+2. **文本长度**: TTS 单次请求最长支持 5000 字符
+3. **并发限制**: 注意 API 调用频率和并发数限制
+4. **Token 安全**: Access Token 存储在环境变量中，不要硬编码
+
+---
+
+## 错误处理
+
+```python
+def safe_tts(text: str):
+    """带错误处理的 TTS"""
+    try:
+        voice = DoubaoVoice()
+        return voice.text_to_speech(text)
+    except Exception as e:
+        if "401" in str(e):
+            print("认证失败，请检查 Access Token")
+        elif "429" in str(e):
+            print("请求过于频繁，请稍后重试")
+        else:
+            print(f"合成失败: {e}")
+        return None
+```
+
+---
+
+## 常见场景
+
+### 场景 1: 生成多语言语音
+
+```python
+voice = DoubaoVoice()
+
+# 中文
+voice.text_to_speech("你好", voice_type="BV700_V2_streaming", output_file="zh.mp3")
+
+# 英文
+voice.text_to_speech("Hello", voice_type="EN_001", output_file="en.mp3")
+```
+
+
+---
+
+## 参考资源
+
+- [火山引擎豆包语音文档](https://www.volcengine.com/docs/6561/1359369)
+- [豆包语音控制台](https://console.volcengine.com/speech/app)
+- [API 接口文档](https://www.volcengine.com/docs/6561/1359370)
+- [计费说明](https://www.volcengine.com/docs/6561/1359370)