refactor: 更新音频文件路径和UI样式调整
fix: 修正TTS提供商配置中的null值问题 chore: 清理无用文件和更新输入文本内容
This commit is contained in:
218
server/SECURITY.md
Normal file
218
server/SECURITY.md
Normal file
@@ -0,0 +1,218 @@
|
||||
# 安全配置指南
|
||||
|
||||
本文档描述了播客生成器应用的安全配置和最佳实践。
|
||||
|
||||
## 认证系统安全
|
||||
|
||||
### JWT令牌管理
|
||||
|
||||
1. **密钥安全**
|
||||
- 使用强随机密钥(至少32字符)
|
||||
- 定期轮换JWT密钥
|
||||
- 在生产环境中使用环境变量存储密钥
|
||||
|
||||
2. **令牌过期策略**
|
||||
- 访问令牌:30分钟过期
|
||||
- 刷新令牌:7天过期
|
||||
- 实现令牌黑名单机制
|
||||
|
||||
3. **令牌存储**
|
||||
- 使用HttpOnly Cookie存储刷新令牌
|
||||
- 访问令牌存储在内存中
|
||||
- 启用Secure和SameSite属性
|
||||
|
||||
### OAuth安全
|
||||
|
||||
1. **Google OAuth**
|
||||
- 验证state参数防止CSRF攻击
|
||||
- 使用HTTPS重定向URI
|
||||
- 限制授权范围
|
||||
|
||||
2. **微信OAuth**
|
||||
- 验证二维码场景ID
|
||||
- 设置合理的二维码过期时间
|
||||
- 验证回调来源
|
||||
|
||||
## 数据保护
|
||||
|
||||
### 用户数据安全
|
||||
|
||||
1. **数据加密**
|
||||
- 敏感数据使用AES-256加密
|
||||
- 密码使用bcrypt哈希
|
||||
- 传输层使用TLS 1.3
|
||||
|
||||
2. **数据最小化**
|
||||
- 只收集必要的用户信息
|
||||
- 定期清理过期数据
|
||||
- 实现数据匿名化
|
||||
|
||||
### 会话管理
|
||||
|
||||
1. **会话安全**
|
||||
- 实现会话超时机制
|
||||
- 检测异常登录行为
|
||||
- 支持强制登出功能
|
||||
|
||||
2. **并发控制**
|
||||
- 限制同一用户的并发会话数
|
||||
- 检测重复登录
|
||||
- 实现设备管理功能
|
||||
|
||||
## API安全
|
||||
|
||||
### 请求验证
|
||||
|
||||
1. **输入验证**
|
||||
- 验证所有用户输入
|
||||
- 使用白名单过滤
|
||||
- 防止SQL注入和XSS攻击
|
||||
|
||||
2. **速率限制**
|
||||
- 实现API调用频率限制
|
||||
- 防止暴力破解攻击
|
||||
- 监控异常请求模式
|
||||
|
||||
### CORS配置
|
||||
|
||||
```javascript
|
||||
// 生产环境CORS配置
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["https://yourdomain.com"],
|
||||
allow_credentials=True,
|
||||
allow_methods=["GET", "POST", "PUT", "DELETE"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
```
|
||||
|
||||
## 环境配置
|
||||
|
||||
### 生产环境安全
|
||||
|
||||
1. **环境变量**
|
||||
```bash
|
||||
# 必须更改的默认值
|
||||
JWT_SECRET_KEY=your-production-jwt-secret-key
|
||||
PODCAST_API_SECRET_KEY=your-production-api-secret-key
|
||||
|
||||
# 数据库安全
|
||||
DATABASE_URL=postgresql://user:password@localhost/dbname
|
||||
|
||||
# Redis安全
|
||||
REDIS_URL=redis://user:password@localhost:6379
|
||||
```
|
||||
|
||||
2. **服务器配置**
|
||||
- 禁用调试模式
|
||||
- 配置防火墙规则
|
||||
- 启用日志监控
|
||||
- 定期安全更新
|
||||
|
||||
### 开发环境安全
|
||||
|
||||
1. **本地开发**
|
||||
- 使用不同的密钥
|
||||
- 启用详细日志
|
||||
- 使用测试数据
|
||||
|
||||
2. **代码安全**
|
||||
- 不提交敏感信息到版本控制
|
||||
- 使用.gitignore排除配置文件
|
||||
- 定期安全代码审查
|
||||
|
||||
## 监控和日志
|
||||
|
||||
### 安全监控
|
||||
|
||||
1. **日志记录**
|
||||
- 记录所有认证事件
|
||||
- 监控失败的登录尝试
|
||||
- 记录权限变更
|
||||
|
||||
2. **异常检测**
|
||||
- 监控异常API调用
|
||||
- 检测可疑用户行为
|
||||
- 实时安全告警
|
||||
|
||||
### 审计跟踪
|
||||
|
||||
1. **用户操作审计**
|
||||
- 记录用户关键操作
|
||||
- 保留审计日志
|
||||
- 支持合规性要求
|
||||
|
||||
2. **系统事件记录**
|
||||
- 记录系统配置变更
|
||||
- 监控资源使用情况
|
||||
- 跟踪性能指标
|
||||
|
||||
## 安全检查清单
|
||||
|
||||
### 部署前检查
|
||||
|
||||
- [ ] 更改所有默认密钥和密码
|
||||
- [ ] 配置HTTPS和SSL证书
|
||||
- [ ] 启用防火墙和安全组
|
||||
- [ ] 配置备份和恢复策略
|
||||
- [ ] 测试所有安全功能
|
||||
- [ ] 进行渗透测试
|
||||
- [ ] 配置监控和告警
|
||||
- [ ] 准备安全事件响应计划
|
||||
|
||||
### 定期安全维护
|
||||
|
||||
- [ ] 更新依赖包和安全补丁
|
||||
- [ ] 轮换密钥和证书
|
||||
- [ ] 审查用户权限
|
||||
- [ ] 检查日志和监控
|
||||
- [ ] 备份验证和恢复测试
|
||||
- [ ] 安全培训和意识提升
|
||||
|
||||
## 安全事件响应
|
||||
|
||||
### 事件分类
|
||||
|
||||
1. **高危事件**
|
||||
- 数据泄露
|
||||
- 未授权访问
|
||||
- 系统入侵
|
||||
|
||||
2. **中危事件**
|
||||
- 异常登录
|
||||
- API滥用
|
||||
- 权限提升
|
||||
|
||||
3. **低危事件**
|
||||
- 密码重置
|
||||
- 账户锁定
|
||||
- 配置变更
|
||||
|
||||
### 响应流程
|
||||
|
||||
1. **事件检测**
|
||||
- 自动监控告警
|
||||
- 用户报告
|
||||
- 定期安全扫描
|
||||
|
||||
2. **事件响应**
|
||||
- 立即隔离受影响系统
|
||||
- 评估影响范围
|
||||
- 通知相关人员
|
||||
- 收集证据和日志
|
||||
|
||||
3. **事件恢复**
|
||||
- 修复安全漏洞
|
||||
- 恢复正常服务
|
||||
- 更新安全策略
|
||||
- 总结经验教训
|
||||
|
||||
## 联系信息
|
||||
|
||||
如果发现安全漏洞,请通过以下方式联系我们:
|
||||
|
||||
- 邮箱:security@yourcompany.com
|
||||
- 加密通信:使用PGP密钥
|
||||
- 紧急联系:+86-xxx-xxxx-xxxx
|
||||
|
||||
我们承诺在24小时内响应安全报告,并在合理时间内修复确认的漏洞。
|
||||
107
server/check/check_doubao_voices.py
Normal file
107
server/check/check_doubao_voices.py
Normal file
@@ -0,0 +1,107 @@
|
||||
import json
|
||||
import requests
|
||||
import time
|
||||
import base64
|
||||
import os
|
||||
import json
|
||||
|
||||
def check_doubao_tts_voices():
|
||||
config_file_path = "../config/doubao-tts.json"
|
||||
tts_providers_path = "../config/tts_providers.json"
|
||||
test_text = "你好" # 测试文本
|
||||
|
||||
try:
|
||||
with open(config_file_path, 'r', encoding='utf-8') as f:
|
||||
config_data = json.load(f)
|
||||
except FileNotFoundError:
|
||||
print(f"错误: 配置文件未找到,请检查路径: {config_file_path}")
|
||||
return
|
||||
except json.JSONDecodeError:
|
||||
print(f"错误: 无法解析 JSON 文件: {config_file_path}")
|
||||
return
|
||||
|
||||
url = config_data.get("apiUrl", "")
|
||||
headers = config_data.get("headers", {})
|
||||
request_payload = config_data.get("request_payload", {})
|
||||
voices = config_data.get('voices', [])
|
||||
|
||||
try:
|
||||
with open(tts_providers_path, 'r', encoding='utf-8') as f:
|
||||
tts_providers_data = json.load(f)
|
||||
doubao_config = tts_providers_data.get('doubao', {})
|
||||
doubao_app_id = doubao_config.get('X-Api-App-Id')
|
||||
doubao_access_key = doubao_config.get('X-Api-Access-Key')
|
||||
|
||||
if doubao_app_id and doubao_access_key:
|
||||
headers['X-Api-App-Id'] = doubao_app_id
|
||||
headers['X-Api-Access-Key'] = doubao_access_key
|
||||
else:
|
||||
print(f"警告: 未在 {tts_providers_path} 中找到豆包的 X-Api-App-Id 或 X-Api-Access-Key。")
|
||||
except FileNotFoundError:
|
||||
print(f"错误: TTS 提供商配置文件未找到,请检查路径: {tts_providers_path}")
|
||||
return
|
||||
except json.JSONDecodeError:
|
||||
print(f"错误: 无法解析 TTS 提供商 JSON 文件: {tts_providers_path}")
|
||||
return
|
||||
|
||||
print(f"开始验证 {len(voices)} 个豆包 TTS 语音...")
|
||||
|
||||
for voice in voices:
|
||||
voice_code = voice.get('code')
|
||||
voice_name = voice.get('alias', voice.get('name', '未知')) # 优先使用 alias, 否则使用 name
|
||||
|
||||
if voice_code:
|
||||
print(f"正在测试语音: {voice_name} (Code: {voice_code})")
|
||||
session = requests.Session()
|
||||
try:
|
||||
payload = request_payload.copy()
|
||||
payload['req_params']['text'] = test_text
|
||||
payload['req_params']['speaker'] = voice_code
|
||||
|
||||
response = session.post(url, headers=headers, json=payload, stream=True, timeout=30)
|
||||
|
||||
logid = response.headers.get('X-Tt-Logid')
|
||||
if logid:
|
||||
print(f" X-Tt-Logid: {logid}")
|
||||
|
||||
audio_data = bytearray()
|
||||
if response.status_code == 200:
|
||||
for chunk in response.iter_lines(decode_unicode=True):
|
||||
if not chunk:
|
||||
continue
|
||||
data = json.loads(chunk)
|
||||
|
||||
if data.get("code", 0) == 0 and "data" in data and data["data"]:
|
||||
chunk_audio = base64.b64decode(data["data"])
|
||||
audio_data.extend(chunk_audio)
|
||||
continue
|
||||
if data.get("code", 0) == 0 and "sentence" in data and data["sentence"]:
|
||||
continue
|
||||
if data.get("code", 0) == 20000000:
|
||||
break
|
||||
if data.get("code", 0) > 0:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 接口返回错误: {data}")
|
||||
audio_data = bytearray()
|
||||
break
|
||||
|
||||
if audio_data:
|
||||
print(f" ✅ {voice_name} (Code: {voice_code}): 可用")
|
||||
with open(f"test_{voice_code}.mp3", "wb") as f:
|
||||
f.write(audio_data)
|
||||
elif not audio_data and response.status_code == 200:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 接口返回成功但未收到音频数据。")
|
||||
else:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 不可用, HTTP状态码: {response.status_code}, 响应: {response.text}")
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 请求失败, 错误: {e}")
|
||||
finally:
|
||||
session.close()
|
||||
time.sleep(0.5)
|
||||
else:
|
||||
print(f"跳过一个缺少 'code' 字段的语音条目: {voice}")
|
||||
|
||||
print("豆包 TTS 语音验证完成。")
|
||||
|
||||
if __name__ == "__main__":
|
||||
check_doubao_tts_voices()
|
||||
48
server/check/check_edgetts_voices.py
Normal file
48
server/check/check_edgetts_voices.py
Normal file
@@ -0,0 +1,48 @@
|
||||
import json
|
||||
import requests
|
||||
import time
|
||||
|
||||
def check_tts_voices():
|
||||
config_file_path = "config/edge-tts.json"
|
||||
base_url = "http://192.168.1.178:7899/tts"
|
||||
test_text = "你好"
|
||||
rate = 5 # Assuming 'r' means rate
|
||||
|
||||
try:
|
||||
with open(config_file_path, 'r', encoding='utf-8') as f:
|
||||
config_data = json.load(f)
|
||||
except FileNotFoundError:
|
||||
print(f"错误: 配置文件未找到,请检查路径: {config_file_path}")
|
||||
return
|
||||
except json.JSONDecodeError:
|
||||
print(f"错误: 无法解析 JSON 文件: {config_file_path}")
|
||||
return
|
||||
|
||||
voices = config_data.get('voices', [])
|
||||
if not voices:
|
||||
print("未在配置文件中找到任何声音(voices)。")
|
||||
return
|
||||
|
||||
print(f"开始验证 {len(voices)} 个 TTS 语音...")
|
||||
for voice in voices:
|
||||
voice_code = voice.get('code')
|
||||
voice_name = voice.get('name', '未知')
|
||||
if voice_code:
|
||||
url = f"{base_url}?t={test_text}&v={voice_code}&r={rate}"
|
||||
print(f"正在测试语音: {voice_name} (Code: {voice_code}) - URL: {url}")
|
||||
try:
|
||||
response = requests.get(url, timeout=10) # 10秒超时
|
||||
if response.status_code == 200:
|
||||
print(f" ✅ {voice_name} (Code: {voice_code}): 可用")
|
||||
else:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 不可用, 状态码: {response.status_code}")
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 请求失败, 错误: {e}")
|
||||
time.sleep(0.1) # 短暂延迟,避免请求过快
|
||||
else:
|
||||
print(f"跳过一个缺少 'code' 字段的语音条目: {voice}")
|
||||
|
||||
print("TTS 语音验证完成。")
|
||||
|
||||
if __name__ == "__main__":
|
||||
check_tts_voices()
|
||||
80
server/check/check_fishaudio_voices.py
Normal file
80
server/check/check_fishaudio_voices.py
Normal file
@@ -0,0 +1,80 @@
|
||||
import json
|
||||
import requests
|
||||
import time
|
||||
import msgpack
|
||||
import json
|
||||
|
||||
def check_fishaudio_voices():
|
||||
config_file_path = "../config/fish-audio.json"
|
||||
tts_providers_path = "../config/tts_providers.json"
|
||||
test_text = "你好" # 测试文本
|
||||
|
||||
try:
|
||||
with open(config_file_path, 'r', encoding='utf-8') as f:
|
||||
config_data = json.load(f)
|
||||
except FileNotFoundError:
|
||||
print(f"错误: 配置文件未找到,请检查路径: {config_file_path}")
|
||||
return
|
||||
except json.JSONDecodeError:
|
||||
print(f"错误: 无法解析 JSON 文件: {config_file_path}")
|
||||
return
|
||||
|
||||
voices = config_data.get('voices', [])
|
||||
request_payload = config_data.get('request_payload', {})
|
||||
headers = config_data.get('headers', {})
|
||||
url = config_data.get('apiUrl','')
|
||||
|
||||
try:
|
||||
with open(tts_providers_path, 'r', encoding='utf-8') as f:
|
||||
tts_providers_data = json.load(f)
|
||||
fish_api_key = tts_providers_data.get('fish', {}).get('api_key')
|
||||
if fish_api_key:
|
||||
headers['Authorization'] = f"Bearer {fish_api_key}"
|
||||
else:
|
||||
print(f"警告: 未在 {tts_providers_path} 中找到 Fish Audio 的 API 密钥。")
|
||||
except FileNotFoundError:
|
||||
print(f"错误: TTS 提供商配置文件未找到,请检查路径: {tts_providers_path}")
|
||||
return
|
||||
except json.JSONDecodeError:
|
||||
print(f"错误: 无法解析 TTS 提供商 JSON 文件: {tts_providers_path}")
|
||||
return
|
||||
|
||||
if not voices:
|
||||
print("未在配置文件中找到任何声音(voices)。")
|
||||
return
|
||||
|
||||
print(f"开始验证 {len(voices)} 个 Fish Audio 语音...")
|
||||
for voice in voices:
|
||||
voice_code = voice.get('code')
|
||||
voice_name = voice.get('alias', voice.get('name', '未知')) # 优先使用 alias, 否则使用 name
|
||||
|
||||
if voice_code:
|
||||
print(f"正在测试语音: {voice_name} (Code: {voice_code})")
|
||||
try:
|
||||
# 准备请求数据
|
||||
payload = request_payload.copy()
|
||||
payload['text'] = test_text
|
||||
payload['reference_id'] = voice_code
|
||||
|
||||
# 编码请求数据
|
||||
encoded_payload = msgpack.packb(payload)
|
||||
|
||||
# 发送请求
|
||||
response = requests.post(url, data=encoded_payload, headers=headers, timeout=30)
|
||||
|
||||
if response.status_code == 200:
|
||||
print(f" ✅ {voice_name} (Code: {voice_code}): 可用")
|
||||
with open(f"test_{voice_code}.mp3", "wb") as f:
|
||||
f.write(response.content)
|
||||
else:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 不可用, 状态码: {response.status_code}")
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 请求失败, 错误: {e}")
|
||||
time.sleep(0.5) # 短暂延迟,避免请求过快
|
||||
else:
|
||||
print(f"跳过一个缺少 'code' 字段的语音条目: {voice}")
|
||||
|
||||
print("Fish Audio 语音验证完成。")
|
||||
|
||||
if __name__ == "__main__":
|
||||
check_fishaudio_voices()
|
||||
87
server/check/check_gemini_voices.py
Normal file
87
server/check/check_gemini_voices.py
Normal file
@@ -0,0 +1,87 @@
|
||||
import json
|
||||
import wave
|
||||
import time
|
||||
import os
|
||||
import requests
|
||||
import base64
|
||||
import json
|
||||
|
||||
def check_gemini_voices():
|
||||
config_file_path = "../config/gemini-tts.json"
|
||||
tts_providers_path = "../config/tts_providers.json"
|
||||
test_text = "你好" # 测试文本
|
||||
|
||||
try:
|
||||
with open(config_file_path, 'r', encoding='utf-8') as f:
|
||||
config_data = json.load(f)
|
||||
except FileNotFoundError:
|
||||
print(f"错误: 配置文件未找到,请检查路径: {config_file_path}")
|
||||
return
|
||||
except json.JSONDecodeError:
|
||||
print(f"错误: 无法解析 JSON 文件: {config_file_path}")
|
||||
return
|
||||
|
||||
voices = config_data.get('voices', [])
|
||||
request_payload = config_data.get('request_payload', {})
|
||||
headers = config_data.get('headers', {})
|
||||
url = config_data.get('apiUrl','')
|
||||
|
||||
try:
|
||||
with open(tts_providers_path, 'r', encoding='utf-8') as f:
|
||||
tts_providers_data = json.load(f)
|
||||
gemini_api_key = tts_providers_data.get('gemini', {}).get('api_key')
|
||||
if gemini_api_key:
|
||||
headers['x-goog-api-key'] = gemini_api_key
|
||||
else:
|
||||
print(f"警告: 未在 {tts_providers_path} 中找到 Gemini 的 API 密钥。")
|
||||
except FileNotFoundError:
|
||||
print(f"错误: TTS 提供商配置文件未找到,请检查路径: {tts_providers_path}")
|
||||
return
|
||||
except json.JSONDecodeError:
|
||||
print(f"错误: 无法解析 TTS 提供商 JSON 文件: {tts_providers_path}")
|
||||
return
|
||||
|
||||
if not voices:
|
||||
print("未在配置文件中找到任何声音(voices)。")
|
||||
return
|
||||
|
||||
print(f"开始验证 {len(voices)} 个 Gemini 语音...")
|
||||
for voice in voices:
|
||||
voice_code = voice.get('code')
|
||||
voice_name = voice.get('alias', voice.get('name', '未知')) # 优先使用 alias, 否则使用 name
|
||||
|
||||
if voice_code:
|
||||
print(f"正在测试语音: {voice_name} (Code: {voice_code})")
|
||||
try:
|
||||
url = url.replace('{{model}}', request_payload['model'])
|
||||
request_payload['contents'][0]['parts'][0]['text'] = test_text
|
||||
request_payload['generationConfig']['speechConfig']['voiceConfig']['prebuiltVoiceConfig']['voiceName'] = voice_code
|
||||
|
||||
|
||||
response = requests.post(url, headers=headers, json=request_payload, timeout=60)
|
||||
|
||||
if response.status_code == 200:
|
||||
response_data = response.json()
|
||||
audio_data_base64 = response_data['candidates'][0]['content']['parts'][0]['inlineData']['data']
|
||||
audio_data_pcm = base64.b64decode(audio_data_base64)
|
||||
|
||||
print(f" ✅ {voice_name} (Code: {voice_code}): 可用")
|
||||
with wave.open(f"test_{voice_code}.mp3", "wb") as f:
|
||||
f.setnchannels(1)
|
||||
f.setsampwidth(2)
|
||||
f.setframerate(24000)
|
||||
f.writeframes(audio_data_pcm)
|
||||
else:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 不可用, 状态码: {response.status_code}, 响应: {response.text}")
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 请求失败, 错误: {e}")
|
||||
except Exception as e:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 处理响应失败, 错误: {e}")
|
||||
time.sleep(0.5) # 短暂延迟,避免请求过快
|
||||
else:
|
||||
print(f"跳过一个缺少 'code' 字段的语音条目: {voice}")
|
||||
|
||||
print("Gemini 语音验证完成。")
|
||||
|
||||
if __name__ == "__main__":
|
||||
check_gemini_voices()
|
||||
56
server/check/check_indextts_voices.py
Normal file
56
server/check/check_indextts_voices.py
Normal file
@@ -0,0 +1,56 @@
|
||||
import json
|
||||
import requests
|
||||
import time
|
||||
import re
|
||||
|
||||
def check_indextts_voices():
|
||||
config_file_path = "config/index-tts.json"
|
||||
test_text = "你好" # 测试文本
|
||||
|
||||
try:
|
||||
with open(config_file_path, 'r', encoding='utf-8') as f:
|
||||
config_data = json.load(f)
|
||||
except FileNotFoundError:
|
||||
print(f"错误: 配置文件未找到,请检查路径: {config_file_path}")
|
||||
return
|
||||
except json.JSONDecodeError:
|
||||
print(f"错误: 无法解析 JSON 文件: {config_file_path}")
|
||||
return
|
||||
|
||||
voices = config_data.get('voices', [])
|
||||
api_url_template = config_data.get('apiUrl')
|
||||
|
||||
if not voices:
|
||||
print("未在配置文件中找到任何声音(voices)。")
|
||||
return
|
||||
|
||||
if not api_url_template:
|
||||
print("未在配置文件中找到 'apiUrl' 字段。")
|
||||
return
|
||||
|
||||
print(f"开始验证 {len(voices)} 个 IndexTTS 语音...")
|
||||
for voice in voices:
|
||||
voice_code = voice.get('code')
|
||||
voice_name = voice.get('alias', voice.get('name', '未知')) # 优先使用 alias, 否则使用 name
|
||||
|
||||
if voice_code:
|
||||
# 替换 URL 模板中的占位符
|
||||
url = api_url_template.replace("{{text}}", test_text).replace("{{voiceCode}}", voice_code)
|
||||
|
||||
print(f"正在测试语音: {voice_name} (Code: {voice_code}) - URL: {url}")
|
||||
try:
|
||||
response = requests.get(url, timeout=10) # 10秒超时
|
||||
if response.status_code == 200:
|
||||
print(f" ✅ {voice_name} (Code: {voice_code}): 可用")
|
||||
else:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 不可用, 状态码: {response.status_code}")
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 请求失败, 错误: {e}")
|
||||
time.sleep(0.1) # 短暂延迟,避免请求过快
|
||||
else:
|
||||
print(f"跳过一个缺少 'code' 字段的语音条目: {voice}")
|
||||
|
||||
print("IndexTTS 语音验证完成。")
|
||||
|
||||
if __name__ == "__main__":
|
||||
check_indextts_voices()
|
||||
100
server/check/check_minimax_voices.py
Normal file
100
server/check/check_minimax_voices.py
Normal file
@@ -0,0 +1,100 @@
|
||||
import json
|
||||
import requests
|
||||
import time
|
||||
import os
|
||||
import json
|
||||
|
||||
def check_minimax_voices():
|
||||
config_file_path = "../config/minimax.json"
|
||||
tts_providers_path = "../config/tts_providers.json"
|
||||
test_text = "你好" # 测试文本
|
||||
|
||||
try:
|
||||
with open(config_file_path, 'r', encoding='utf-8') as f:
|
||||
config_data = json.load(f)
|
||||
except FileNotFoundError:
|
||||
print(f"错误: 配置文件未找到,请检查路径: {config_file_path}")
|
||||
return
|
||||
except json.JSONDecodeError:
|
||||
print(f"错误: 无法解析 JSON 文件: {config_file_path}")
|
||||
return
|
||||
|
||||
voices = config_data.get('voices', [])
|
||||
request_payload = config_data.get('request_payload', {})
|
||||
headers = config_data.get('headers', {})
|
||||
url = config_data.get('apiUrl', '')
|
||||
|
||||
try:
|
||||
with open(tts_providers_path, 'r', encoding='utf-8') as f:
|
||||
tts_providers_data = json.load(f)
|
||||
minimax_config = tts_providers_data.get('minimax', {})
|
||||
minimax_api_key = minimax_config.get('api_key')
|
||||
minimax_group_id = minimax_config.get('group_id')
|
||||
|
||||
if minimax_api_key and minimax_group_id:
|
||||
headers['Authorization'] = f"Bearer {minimax_api_key}"
|
||||
url = url.replace('{{group_id}}', minimax_group_id)
|
||||
else:
|
||||
print(f"警告: 未在 {tts_providers_path} 中找到 Minimax 的 group_id 或 api_key。")
|
||||
except FileNotFoundError:
|
||||
print(f"错误: TTS 提供商配置文件未找到,请检查路径: {tts_providers_path}")
|
||||
return
|
||||
except json.JSONDecodeError:
|
||||
print(f"错误: 无法解析 TTS 提供商 JSON 文件: {tts_providers_path}")
|
||||
return
|
||||
|
||||
if not voices:
|
||||
print("未在配置文件中找到任何声音(voices)。")
|
||||
return
|
||||
|
||||
|
||||
print(f"开始验证 {len(voices)} 个 Minimax 语音...")
|
||||
for voice in voices:
|
||||
voice_code = voice.get('code')
|
||||
voice_name = voice.get('alias', voice.get('name', '未知')) # 优先使用 alias, 否则使用 name
|
||||
|
||||
if voice_code:
|
||||
print(f"正在测试语音: {voice_name} (Code: {voice_code})")
|
||||
try:
|
||||
# 准备请求数据
|
||||
payload = request_payload.copy()
|
||||
payload['text'] = test_text
|
||||
payload['voice_setting']['voice_id'] = voice_code
|
||||
|
||||
# 发送请求
|
||||
response = requests.post(url, json=payload, headers=headers, timeout=30)
|
||||
|
||||
if response.status_code == 200:
|
||||
# 检查响应体中的状态
|
||||
try:
|
||||
response_data = response.json()
|
||||
status = response_data.get('data', {}).get('status', 0)
|
||||
if status == 2:
|
||||
print(f" ✅ {voice_name} (Code: {voice_code}): 可用")
|
||||
# 解析并保存音频数据
|
||||
audio_hex = response_data.get('data', {}).get('audio')
|
||||
if audio_hex:
|
||||
try:
|
||||
audio_bytes = bytes.fromhex(audio_hex)
|
||||
with open(f"test_{voice_code}.mp3", "wb") as audio_file:
|
||||
audio_file.write(audio_bytes)
|
||||
except (ValueError, Exception) as e: # 捕获ValueError for invalid hex, Exception for other file errors
|
||||
print(f" ❌ 保存音频文件时发生错误: {e}")
|
||||
else:
|
||||
print(f" ⚠️ 响应中未找到音频数据。")
|
||||
else:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 不可用, 状态: {status}")
|
||||
except json.JSONDecodeError:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 无法解析响应 JSON")
|
||||
else:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 不可用, 状态码: {response.status_code}")
|
||||
except requests.exceptions.RequestException as e:
|
||||
print(f" ❌ {voice_name} (Code: {voice_code}): 请求失败, 错误: {e}")
|
||||
time.sleep(0.5) # 短暂延迟,避免请求过快
|
||||
else:
|
||||
print(f"跳过一个缺少 'code' 字段的语音条目: {voice}")
|
||||
|
||||
print("Minimax 语音验证完成。")
|
||||
|
||||
if __name__ == "__main__":
|
||||
check_minimax_voices()
|
||||
72
server/ext/doubao-voice-list.py
Normal file
72
server/ext/doubao-voice-list.py
Normal file
@@ -0,0 +1,72 @@
|
||||
import requests
|
||||
import json
|
||||
|
||||
# --- 配置 ---
|
||||
# 请将这里的URL替换为你要获取数据的实际URL
|
||||
URL = "https://lf3-config.bytetcc.com/obj/tcc-config-web/tcc-v2-data-lab.speech.tts_middle_layer-default" # <--- 替换成你的URL
|
||||
OUTPUT_FILENAME = "data_from_url_volc_bigtts.json"
|
||||
# 设置请求超时(秒),防止程序因网络问题无限期等待
|
||||
TIMEOUT = 10
|
||||
|
||||
print(f"准备从URL获取数据: {URL}")
|
||||
|
||||
# --- 主逻辑 ---
|
||||
try:
|
||||
# 1. 发送GET请求到URL
|
||||
# a. requests.get() 发送请求
|
||||
# b. timeout=TIMEOUT 是一个好习惯,避免程序卡死
|
||||
response = requests.get(URL, timeout=TIMEOUT)
|
||||
|
||||
# 2. 检查响应状态码,确保请求成功 (例如 200 OK)
|
||||
# response.raise_for_status() 会在响应码为 4xx 或 5xx (客户端/服务器错误) 时抛出异常
|
||||
response.raise_for_status()
|
||||
print("✅ HTTP请求成功,状态码: 200 OK")
|
||||
|
||||
# 3. 解析最外层的JSON
|
||||
# requests库的 .json() 方法可以直接将响应内容解析为Python字典
|
||||
# 这完成了我们的第一次解析
|
||||
outer_data = response.json()
|
||||
|
||||
# 4. 从解析后的字典中提取内层的JSON字符串
|
||||
# 这一步可能会因为键不存在而抛出KeyError
|
||||
volc_bigtts_string = outer_data['data']['volc_bigtts']
|
||||
|
||||
# 5. 解析内层的JSON字符串,得到最终的JSON数组(Python列表)
|
||||
# 这一步可能会因为字符串格式不正确而抛出JSONDecodeError
|
||||
final_json_array = json.loads(volc_bigtts_string)
|
||||
|
||||
print("✅ 成功解析嵌套的JSON数据。")
|
||||
print("解析出的数组内容:", final_json_array)
|
||||
|
||||
# 6. 将最终的JSON数组写入本地文件
|
||||
with open(OUTPUT_FILENAME, 'w', encoding='utf-8') as f:
|
||||
json.dump(final_json_array, f, indent=4, ensure_ascii=False)
|
||||
|
||||
print(f"\n🎉 成功将数据写入文件: {OUTPUT_FILENAME}")
|
||||
|
||||
# --- 错误处理 ---
|
||||
# 将不同类型的错误分开捕获,可以提供更清晰的错误信息
|
||||
except requests.exceptions.HTTPError as errh:
|
||||
# 捕获HTTP错误,如 404 Not Found, 500 Internal Server Error
|
||||
print(f"❌ HTTP错误: {errh}")
|
||||
except requests.exceptions.ConnectionError as errc:
|
||||
# 捕获连接错误,如DNS查询失败、拒绝连接等
|
||||
print(f"❌ 连接错误: {errc}")
|
||||
except requests.exceptions.Timeout as errt:
|
||||
# 捕获请求超时
|
||||
print(f"❌ 请求超时: {errt}")
|
||||
except requests.exceptions.RequestException as err:
|
||||
# 捕获requests库可能抛出的其他所有异常
|
||||
print(f"❌ 请求发生未知错误: {err}")
|
||||
except json.JSONDecodeError:
|
||||
# 捕获JSON解析错误
|
||||
# 可能发生在 response.json() 或 json.loads()
|
||||
print("❌ JSON解析失败。从URL返回的数据或内层字符串不是有效的JSON格式。")
|
||||
# 如果需要调试,可以打印原始响应内容
|
||||
# print("原始响应内容:", response.text)
|
||||
except KeyError:
|
||||
# 捕获键错误
|
||||
print("❌ JSON结构不符合预期,找不到 'data' 或 'volc_bigtts' 键。")
|
||||
except Exception as e:
|
||||
# 捕获所有其他未预料到的异常
|
||||
print(f"❌ 发生未知错误: {e}")
|
||||
284
server/ext/index-tts-api.py
Normal file
284
server/ext/index-tts-api.py
Normal file
@@ -0,0 +1,284 @@
|
||||
# 用法:
|
||||
# python ./indextts/index-tts-api.py
|
||||
# http://localhost:7899/synthesize?text=Hello world, this is a test using FastAPI&verbose=true&max_text_tokens_per_sentence=80&server_audio_prompt_path=johnny-v.wav
|
||||
|
||||
import os
|
||||
import shutil
|
||||
import tempfile
|
||||
import time
|
||||
from typing import Optional
|
||||
import re # For sanitizing filenames/paths
|
||||
|
||||
import uvicorn
|
||||
from fastapi import FastAPI, Query, HTTPException, BackgroundTasks
|
||||
from fastapi.responses import FileResponse
|
||||
# Removed File and UploadFile as we are not uploading anymore
|
||||
|
||||
# Assuming infer.py is in the same directory or in PYTHONPATH
|
||||
from infer import IndexTTS
|
||||
|
||||
# --- Configuration ---
|
||||
MODEL_CFG_PATH = "checkpoints/config.yaml"
|
||||
MODEL_DIR = "checkpoints"
|
||||
DEFAULT_IS_FP16 = True
|
||||
DEFAULT_USE_CUDA_KERNEL = None
|
||||
DEFAULT_DEVICE = None
|
||||
|
||||
# Default local audio prompt, can be overridden by a query parameter
|
||||
DEFAULT_SERVER_AUDIO_PROMPT_PATH = "prompts/fdt-v.wav" # <-- CHANGE THIS TO YOUR ACTUAL DEFAULT PROMPT
|
||||
# Define a base directory from which user-specified prompts can be loaded
|
||||
# THIS IS A SECURITY MEASURE. Prompts outside this directory (and its subdirs) won't be allowed.
|
||||
ALLOWED_PROMPT_BASE_DIR = os.path.abspath("prompts") # Example: /app/prompts
|
||||
|
||||
# --- Global TTS instance ---
|
||||
tts_model: Optional[IndexTTS] = None
|
||||
|
||||
app = FastAPI(title="IndexTTS FastAPI Service")
|
||||
|
||||
@app.on_event("startup")
|
||||
async def startup_event():
|
||||
global tts_model
|
||||
print("Loading IndexTTS model...")
|
||||
start_load_time = time.time()
|
||||
try:
|
||||
tts_model = IndexTTS(
|
||||
cfg_path=MODEL_CFG_PATH,
|
||||
model_dir=MODEL_DIR,
|
||||
is_fp16=DEFAULT_IS_FP16,
|
||||
device=DEFAULT_DEVICE,
|
||||
use_cuda_kernel=DEFAULT_USE_CUDA_KERNEL,
|
||||
)
|
||||
# Verify default prompt exists
|
||||
if not os.path.isfile(DEFAULT_SERVER_AUDIO_PROMPT_PATH):
|
||||
print(f"WARNING: Default server audio prompt file not found at: {DEFAULT_SERVER_AUDIO_PROMPT_PATH}")
|
||||
|
||||
# Create the allowed prompts directory if it doesn't exist (optional, for convenience)
|
||||
if not os.path.isdir(ALLOWED_PROMPT_BASE_DIR):
|
||||
try:
|
||||
os.makedirs(ALLOWED_PROMPT_BASE_DIR, exist_ok=True)
|
||||
print(f"Created ALLOWED_PROMPT_BASE_DIR: {ALLOWED_PROMPT_BASE_DIR}")
|
||||
except Exception as e:
|
||||
print(f"WARNING: Could not create ALLOWED_PROMPT_BASE_DIR at {ALLOWED_PROMPT_BASE_DIR}: {e}")
|
||||
else:
|
||||
print(f"User-specified prompts will be loaded from: {ALLOWED_PROMPT_BASE_DIR}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"Error loading IndexTTS model: {e}")
|
||||
tts_model = None
|
||||
load_time = time.time() - start_load_time
|
||||
print(f"IndexTTS model loaded in {load_time:.2f} seconds.")
|
||||
if tts_model:
|
||||
print(f"Model ready on device: {tts_model.device}")
|
||||
else:
|
||||
print("Model FAILED to load.")
|
||||
|
||||
|
||||
async def cleanup_temp_dir(temp_dir_path: str):
|
||||
try:
|
||||
if os.path.exists(temp_dir_path):
|
||||
shutil.rmtree(temp_dir_path)
|
||||
print(f"Successfully cleaned up temporary directory: {temp_dir_path}")
|
||||
except Exception as e:
|
||||
print(f"Error cleaning up temporary directory {temp_dir_path}: {e}")
|
||||
|
||||
def sanitize_path_component(component: str) -> str:
|
||||
"""Basic sanitization for a path component."""
|
||||
# Remove leading/trailing whitespace and dots
|
||||
component = component.strip().lstrip('.')
|
||||
# Replace potentially harmful characters or sequences
|
||||
component = re.sub(r'\.\.[/\\]', '', component) # Remove .. sequences
|
||||
component = re.sub(r'[<>:"|?*]', '_', component) # Replace illegal filename chars
|
||||
return component
|
||||
|
||||
def get_safe_prompt_path(base_dir: str, user_path: Optional[str]) -> str:
|
||||
"""
|
||||
Constructs a safe path within the base_dir from a user-provided path.
|
||||
Prevents directory traversal.
|
||||
"""
|
||||
if not user_path:
|
||||
return "" # Or raise error if user_path is mandatory when called
|
||||
|
||||
# Normalize user_path (e.g., handle mixed slashes, remove redundant ones)
|
||||
normalized_user_path = os.path.normpath(user_path)
|
||||
|
||||
# Split the path into components and sanitize each one
|
||||
path_components = []
|
||||
head = normalized_user_path
|
||||
while True:
|
||||
head, tail = os.path.split(head)
|
||||
if tail:
|
||||
path_components.insert(0, sanitize_path_component(tail))
|
||||
elif head: # Handle case like "/path" or "path/" leading to empty tail
|
||||
path_components.insert(0, sanitize_path_component(head))
|
||||
break
|
||||
else: # Both head and tail are empty
|
||||
break
|
||||
if not head or head == os.sep or head == '.': # Stop if root or current dir
|
||||
break
|
||||
|
||||
if not path_components:
|
||||
raise ValueError("Invalid or empty prompt path provided after sanitization.")
|
||||
|
||||
# Join sanitized components. This prevents using absolute paths from user_path directly.
|
||||
# os.path.join will correctly use the OS's path separator.
|
||||
# The first component of user_path is NOT joined with base_dir if it's absolute.
|
||||
# We ensure user_path is treated as relative to base_dir.
|
||||
# So, we must ensure path_components doesn't represent an absolute path itself after sanitization.
|
||||
# The sanitize_path_component and os.path.normpath help, but the critical part is os.path.join(base_dir, *path_components)
|
||||
|
||||
# Construct the full path relative to the base directory
|
||||
# *path_components will expand the list into arguments for join
|
||||
prospective_path = os.path.join(base_dir, *path_components)
|
||||
|
||||
# Final check: ensure the resolved path is still within the base_dir
|
||||
# os.path.abspath resolves any '..' etc., in the prospective_path
|
||||
resolved_path = os.path.abspath(prospective_path)
|
||||
if not resolved_path.startswith(os.path.abspath(base_dir)):
|
||||
raise ValueError("Prompt path traversal attempt detected or path is outside allowed directory.")
|
||||
|
||||
return resolved_path
|
||||
|
||||
|
||||
@app.api_route("/synthesize/", methods=["POST", "GET"], response_class=FileResponse)
|
||||
async def synthesize_speech(
|
||||
background_tasks: BackgroundTasks,
|
||||
text: str = Query(..., description="Text to synthesize."),
|
||||
# New parameter for specifying a server-side audio prompt path
|
||||
server_audio_prompt_path: Optional[str] = Query(None, description=f"Relative path to an audio prompt file on the server (within {ALLOWED_PROMPT_BASE_DIR}). If not provided, uses default."),
|
||||
|
||||
verbose: bool = Query(False, description="Enable verbose logging."),
|
||||
max_text_tokens_per_sentence: int = Query(100, description="Max text tokens per sentence."),
|
||||
sentences_bucket_max_size: int = Query(4, description="Sentences bucket max size."),
|
||||
do_sample: bool = Query(True, description="Enable sampling."),
|
||||
top_p: float = Query(0.8, description="Top P for sampling."),
|
||||
top_k: int = Query(30, description="Top K for sampling."),
|
||||
temperature: float = Query(1.0, description="Temperature for sampling."),
|
||||
length_penalty: float = Query(0.0, description="Length penalty."),
|
||||
num_beams: int = Query(3, description="Number of beams for beam search."),
|
||||
repetition_penalty: float = Query(10.0, description="Repetition penalty."),
|
||||
max_mel_tokens: int = Query(600, description="Max mel tokens to generate.")
|
||||
):
|
||||
if tts_model is None:
|
||||
raise HTTPException(status_code=503, detail="TTS model is not loaded or failed to load.")
|
||||
|
||||
temp_dir = tempfile.mkdtemp()
|
||||
actual_audio_prompt_to_use = "" # This will be the path on the server filesystem
|
||||
|
||||
try:
|
||||
if server_audio_prompt_path:
|
||||
print(f"Client specified server_audio_prompt_path: {server_audio_prompt_path}")
|
||||
# Auto-complete .wav extension if missing
|
||||
if server_audio_prompt_path and not server_audio_prompt_path.lower().endswith(".wav"):
|
||||
print(f"server_audio_prompt_path '{server_audio_prompt_path}' does not end with .wav, appending it.")
|
||||
server_audio_prompt_path += ".wav"
|
||||
try:
|
||||
# Sanitize and resolve the user-provided path against the allowed base directory
|
||||
safe_path = get_safe_prompt_path(ALLOWED_PROMPT_BASE_DIR, server_audio_prompt_path)
|
||||
if os.path.isfile(safe_path):
|
||||
actual_audio_prompt_to_use = safe_path
|
||||
print(f"Using user-specified server prompt: {actual_audio_prompt_to_use}")
|
||||
else:
|
||||
await cleanup_temp_dir(temp_dir)
|
||||
raise HTTPException(status_code=404, detail=f"Specified server audio prompt not found or not a file: {safe_path} (original: {server_audio_prompt_path})")
|
||||
except ValueError as ve: # From get_safe_prompt_path for security violations
|
||||
await cleanup_temp_dir(temp_dir)
|
||||
raise HTTPException(status_code=400, detail=f"Invalid server_audio_prompt_path: {str(ve)}")
|
||||
else:
|
||||
print(f"Using default server audio prompt: {DEFAULT_SERVER_AUDIO_PROMPT_PATH}")
|
||||
if not os.path.isfile(DEFAULT_SERVER_AUDIO_PROMPT_PATH):
|
||||
await cleanup_temp_dir(temp_dir)
|
||||
raise HTTPException(status_code=500, detail=f"Default server audio prompt file not found: {DEFAULT_SERVER_AUDIO_PROMPT_PATH}. Please configure the server.")
|
||||
actual_audio_prompt_to_use = DEFAULT_SERVER_AUDIO_PROMPT_PATH
|
||||
|
||||
# Copy the chosen prompt (either user-specified or default) to the temp_dir.
|
||||
# This keeps the subsequent logic (model interaction, cleanup) consistent.
|
||||
# It also means the original prompt files are not directly modified or locked.
|
||||
prompt_filename_for_temp = os.path.basename(actual_audio_prompt_to_use)
|
||||
temp_audio_prompt_path_in_job_dir = os.path.join(temp_dir, prompt_filename_for_temp)
|
||||
try:
|
||||
shutil.copy2(actual_audio_prompt_to_use, temp_audio_prompt_path_in_job_dir)
|
||||
except Exception as e:
|
||||
await cleanup_temp_dir(temp_dir)
|
||||
raise HTTPException(status_code=500, detail=f"Failed to copy audio prompt to temporary workspace: {str(e)}")
|
||||
|
||||
|
||||
output_filename = f"generated_speech_{int(time.time())}.wav"
|
||||
temp_output_path = os.path.join(temp_dir, output_filename)
|
||||
|
||||
print(f"Synthesizing for text: '{text[:50]}...' with prompt (in temp): {temp_audio_prompt_path_in_job_dir}")
|
||||
print(f"Output will be saved to: {temp_output_path}")
|
||||
|
||||
generation_kwargs = {
|
||||
"do_sample": do_sample,
|
||||
"top_p": top_p,
|
||||
"top_k": top_k,
|
||||
"temperature": temperature,
|
||||
"length_penalty": length_penalty,
|
||||
"num_beams": num_beams,
|
||||
"repetition_penalty": repetition_penalty,
|
||||
"max_mel_tokens": max_mel_tokens,
|
||||
}
|
||||
|
||||
start_infer_time = time.time()
|
||||
returned_output_path = tts_model.infer_fast(
|
||||
audio_prompt=temp_audio_prompt_path_in_job_dir, # Use the copied prompt in temp dir
|
||||
text=text,
|
||||
output_path=temp_output_path,
|
||||
verbose=verbose,
|
||||
max_text_tokens_per_sentence=max_text_tokens_per_sentence,
|
||||
sentences_bucket_max_size=sentences_bucket_max_size,
|
||||
**generation_kwargs
|
||||
)
|
||||
infer_time = time.time() - start_infer_time
|
||||
print(f"Inference completed in {infer_time:.2f} seconds. Expected output: {temp_output_path}, Returned path: {returned_output_path}")
|
||||
|
||||
if not os.path.exists(temp_output_path):
|
||||
print(f"ERROR: Output file {temp_output_path} was NOT found after inference call.")
|
||||
background_tasks.add_task(cleanup_temp_dir, temp_dir)
|
||||
raise HTTPException(status_code=500, detail="TTS synthesis failed to produce an output file.")
|
||||
|
||||
print(f"Output file {temp_output_path} confirmed to exist.")
|
||||
background_tasks.add_task(cleanup_temp_dir, temp_dir)
|
||||
|
||||
return FileResponse(
|
||||
path=temp_output_path,
|
||||
media_type="audio/wav",
|
||||
filename="synthesized_audio.wav"
|
||||
)
|
||||
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
print(f"An unexpected error occurred during synthesis: {e}")
|
||||
import traceback
|
||||
traceback.print_exc()
|
||||
if 'temp_dir' in locals() and os.path.exists(temp_dir):
|
||||
background_tasks.add_task(cleanup_temp_dir, temp_dir)
|
||||
raise HTTPException(status_code=500, detail=f"An unexpected error occurred: {str(e)}")
|
||||
|
||||
|
||||
@app.get("/")
|
||||
async def read_root():
|
||||
return {"message": "IndexTTS FastAPI service is running. Use the /synthesize/ endpoint (GET or POST) to generate audio."}
|
||||
|
||||
if __name__ == "__main__":
|
||||
if not os.path.isfile(DEFAULT_SERVER_AUDIO_PROMPT_PATH):
|
||||
print(f"CRITICAL WARNING: Default server audio prompt at '{DEFAULT_SERVER_AUDIO_PROMPT_PATH}' not found!")
|
||||
else:
|
||||
print(f"Default server audio prompt: {os.path.abspath(DEFAULT_SERVER_AUDIO_PROMPT_PATH)}")
|
||||
|
||||
if not os.path.isdir(ALLOWED_PROMPT_BASE_DIR):
|
||||
print(f"WARNING: ALLOWED_PROMPT_BASE_DIR '{ALLOWED_PROMPT_BASE_DIR}' does not exist. Consider creating it or prompts specified by 'server_audio_prompt_path' may not be found.")
|
||||
else:
|
||||
print(f"User-specified prompts should be relative to: {os.path.abspath(ALLOWED_PROMPT_BASE_DIR)}")
|
||||
|
||||
|
||||
print(f"Attempting to use MODEL_DIR: {os.path.abspath(MODEL_DIR)}")
|
||||
print(f"Attempting to use MODEL_CFG_PATH: {os.path.abspath(MODEL_CFG_PATH)}")
|
||||
|
||||
if not os.path.isdir(MODEL_DIR):
|
||||
print(f"ERROR: MODEL_DIR '{MODEL_DIR}' not found. Please check the path.")
|
||||
if not os.path.isfile(MODEL_CFG_PATH):
|
||||
print(f"ERROR: MODEL_CFG_PATH '{MODEL_CFG_PATH}' not found. Please check the path.")
|
||||
|
||||
uvicorn.run(app, host="0.0.0.0", port=7899)
|
||||
59
server/input.txt
Normal file
59
server/input.txt
Normal file
@@ -0,0 +1,59 @@
|
||||
```custom-begin
|
||||
start with '欢迎收听来生小酒馆,客官不进来喝点吗?' , end with '感谢收听,下期再见'
|
||||
不要自称主理人,馆长。说话符合人物角色设定。
|
||||
```custom-end
|
||||
|
||||
|
||||
|
||||
### 产品与功能更新
|
||||
|
||||
1. DeepSeek V3.1 版本悄然上线,**上下文长度直接飙升至 128K**,处理十几万字的文档或整个代码库都变得轻而易举 (o´ω'o)ノ。本次升级不仅推理能力提升43%、幻觉减少38%,多语言支持也更上一层楼,唯一的美中不足是大家翘首以盼的R2模型仍是"犹抱琵琶半遮面”。现在就去[官网体验一下 - (AI资讯)](https://chat.deepseek.com/),感受超长文本的威力吧!
|
||||
|
||||
2. 还在为复杂的图文视频生成流程头疼吗?Higgsfield AI 推出的 **Draw-to-Video** 功能让你彻底告别繁琐的文本提示词,只需在图片上画个箭头或圈圈,AI就能心领神会地生成电影级动态视频 🔥。这种"指哪打哪”的直观创作方式在外网迅速爆火,让视频创作的门槛又降低了一大截。快来[这里体验这份快乐 - (AI资讯)](https://higgsfield.ai/),让你的图片动起来!<br/>
|
||||
|
||||
3. 小红书AIGC团队祭出大招,正式发布了名为 **DynamicFace 的可控人脸生成技术**,致力于解决图像和视频换脸中的老大难问题 🤔。这项技术的核心亮点在于"可控”与"高度一致性”,旨在消除视频换脸时常见的闪烁和不连贯感,为用户提供更精准、更个性的创作工具。正如[这篇(AI资讯)报道](https://www.aibase.com/zh/news/20613)所说,这是小红书在AI内容生成领域迈出的重要一步,让创意表达拥有了更多可能。
|
||||
|
||||
4. 英伟达发布了在排行榜上名列前茅的 **Nemotron Nano 2** 模型,这个仅 **9B 参数**的多语言推理小钢炮,正在重新定义AI的效率边界 🚀。它采用了独特的 **Transformer-Mamba 混合架构**,实现了比同类8B模型快6倍的吞吐量,同时通过"思考预算”机制将成本削减高达60%。想了解更多[技术细节可看这篇(AI资讯)](https://nvda.ws/3JfcKST),或者直接去[排行榜围观(AI资讯)](https://nvda.ws/47B7iUh),见证它的强大!<br/><video src="https://video.twimg.com/amplify_video/1957573291566063621/vid/avc1/720x1280/goPWf6djGgXEiqL5.mp4?tag=14" controls="controls" width="100%"></video>
|
||||
|
||||
5. Gemini API 迎来了一项超实用的更新,现在**直接支持对URL进行内容抓取**,无论是网页、PDF还是图片链接,统统可以一网打尽!这意味着开发者可以省去调用第三方抓取API的麻烦和费用,直接让模型处理网络上的实时内容,堪称是降本增效的一大利器 (✧∀✧)。快来[看看这篇(AI资讯)解读](https://x.com/dotey/status/1957579164363481114),了解如何用好这个新功能吧!<br/>
|
||||
|
||||
### 前沿研究
|
||||
|
||||
1. AI模型在理解图像时,会不会因为思维定式而"一叶障目”?一篇来自arXiv的[最新研究(AI资讯)](https://arxiv.org/abs/2404.10357)提出了**CoKnow框架**,通过引入多知识表征来优化提示学习,极大地丰富了模型的"视野”💡。简单说,它不再让模型只走一条路,而是给它提供了多种"知识视角”来分析问题,从而在11个公开数据集上超越了既有方法,让模型预测更准确。
|
||||
|
||||
2. 如何让AI不仅会说话,更能"共情”?一篇名为 E3RG 的[前沿论文(AI资讯)](https://arxiv.org/abs/2508.12854)提出了一种全新的多模态共情响应生成系统,将任务分解为**理解、记忆和生成**三部曲。该系统无需额外训练,就能生成包含丰富情感且身份一致的虚拟人形象,仿佛拥有了真正的"同理心”❤️。这项研究在ACM MM 25挑战赛中斩获头名,为构建更具人情味的人机交互开辟了新道路。
|
||||
|
||||
### 行业展望与社会影响
|
||||
|
||||
1. AI投资热潮之下,现实却有些骨感;麻省理工学院的一项研究发现,高达 **95% 的企业未能从其AI投入中获得任何回报**,总计约400亿美元的投资几乎打了水漂 💸。报告指出,"生成式AI鸿沟”的根源并非人才或资源匮乏,而是AI系统普遍缺乏记忆和适应能力,无法深度融入关键工作流程。正如[宝玉的这篇(AI资讯)分享](https://x.com/dotey/status/1957648622851428689)所言,成功的AI部署更像是建立深度合作关系,而非简单购买产品。
|
||||
|
||||
### 开源TOP项目
|
||||
|
||||
1. 腾讯为多模态和强化学习领域送上了一份大礼,正式开源了名为 **WeChat-YATT** 的大模型训练库,旨在解决两大核心瓶颈 🔥。通过创新的**并行控制器**机制和**异步交互**策略,它有效解决了多模态训练的可扩展性难题和动态采样下的效率短板,显著提升了GPU的利用率。想了解这一[开源利器的(AI资讯)详情](https://www.aibase.com/zh/news/20620),不妨深入看看官方发布的内容。<br/>
|
||||
|
||||
2. 谷歌的Genie 3还在闭源,国产开源版世界模型 **Matrix-Game 2.0** 已经横空出世,在社区引发热议!这个仅 **1.8B 参数**的模型,能在单块GPU上以 **25FPS** 的帧率实时生成可交互的虚拟世界,你只需上传一张图片,就能在其中自由探索 (✧∀✧)。昆仑万维的这一开源力作,以其惊人的轻量化和高性能,为游戏开发和智能体训练开启了无限想象,快去[GitHub主页 - (AI资讯)](https://github.com/SkyworkAI/Matrix-Game)一探究竟吧。<br/><br/>
|
||||
|
||||
3. 想摆脱商业邮件服务商的月费"绑架”吗?**BillionMail** 这个在 [GitHub 上 ⭐8.9k 星的(AI资讯)项目](https://github.com/aaPanel/BillionMail) 为你提供了一站式开源解决方案,集邮件服务器、新闻通讯和邮件营销于一身。它完全支持自托管,对开发者极其友好,让你能以零月费的方式掌控自己的邮件系统,实现真正的数字独立 🚀。
|
||||
|
||||
4. 如果你是追求极致简约的音乐爱好者,那么在 [GitHub 上拥有 ⭐4.7k 星的 SPlayer(AI资讯)](https://github.com/imsyy/SPlayer) 绝对值得一试。这款播放器不仅界面清爽,还支持**逐字歌词、歌曲下载、音乐云盘管理**等强大功能,甚至还有酷炫的音乐频谱,堪称简约而不简单 (o´ω'o)ノ。它完美诠释了如何在小巧的体积中,容纳一个完整的音乐世界。
|
||||
|
||||
5. 对于那些对数字踪迹充满好奇的技术爱好者,[GitHub 上的 GhostTrack(AI资讯)](https://github.com/HunxByts/GhostTrack) 项目提供了一个用于追踪位置或手机号码的实用工具,已收获 ⭐1.9k 星。它就像一个数字世界的侦探工具,虽然用途广泛,但也提醒着我们在探索技术边界的同时,必须时刻关注隐私与伦理 🤔。
|
||||
|
||||
6. 让你的电脑拥有一个AI管家是怎样的体验?在 [GitHub 上收获 ⭐1.9k 星的 bytebot(AI资讯)](https://github.com/bytebot-ai/bytebot) 就是这样一个自托管的AI桌面代理,它能通过自然语言命令自动化执行电脑任务。它在安全的**容器化Linux环境**中运行,让你只需动动嘴,就能完成复杂操作,真正实现"君子动口不动手”的智能生活 🔥。
|
||||
|
||||
### 社媒分享
|
||||
|
||||
1. 进入AI领域不只需要懂代码和数学,软技能同样关键!吴恩达发布了一本免费的[职业指导电子书(AI资讯)](https://hubs.la/Q03DgNQ50),堪称是为AI求职者量身打造的"通关秘籍”💡。书中涵盖了**简历制作、面试技巧**,甚至还包括如何克服"冒名顶替综合症”,帮助你规划清晰的职业路线图,向心仪的工作迈进。<br/>
|
||||
|
||||
2. 在AI绘画中,提示词是不是越长越好?一位Reddit用户发出了灵魂拷问,他发现自己用二三十个词的短提示词,生成效果和别人几百词的长篇大论相差无几,甚至模型还会忽略大部分细节 🤔。这篇引发热议的[帖子 - (AI资讯)](https://old.reddit.com/r/FluxAI/comments/1mtyikj/whats_the_point_of_overly_long_prompts/)探讨了"长提示词”的实际意义,或许有时候,简洁才是通往好作品的捷径。
|
||||
|
||||
3. DeepSeek V3.1 的前端代码能力似乎又在"闷声发大财”了,有用户惊喜地发现,以前搞不定的一个复杂提示词,新版模型居然轻松拿捏,而且没有出现其他模型的字体大小问题 (✧∀✧)。这个在[社交媒体上的(AI资讯)发现](https://x.com/op7418/status/1957784895952155089),再次印证了官方宣布的 **128k 上下文**升级背后,是实打实的性能提升。<br/>
|
||||
|
||||
4. 提示词工程也能成为一门艺术!用户李继刚分享了一段极具诗意的"视觉编织场”Prompt,用**光、张力、流**等充满美学的隐喻,指导AI将播客链接转化为设计感十足的可视化卡片 🎨。这种将设计哲学融入提示词的[高级玩法(AI资讯)](https://x.com/lijigang_com/status/1957756215653724324),展示了与AI沟通的全新境界,堪称一场人与机器的灵感共舞。<br/>
|
||||
|
||||
5. 千问最新开源的图像编辑模型与FLUX Kontext的对决结果出炉!根据[博主的(AI资讯)评测](https://weibo.com/6182606334/Q0yOekb6d),千问模型的最大亮点在于其**独一无二的中文生成和编辑能力**,但图像美学和细节处理上则稍逊于FLUX,AI感较重。总的来说,它为中文内容创作提供了新利器,但想达到顶级效果可能还需社区的LoRA模型来"画龙点睛”✨。
|
||||
|
||||
6. OpenAI正在让顶级AI变得更亲民,**ChatGPT Go** 计划已在印度率先启动,每月订阅费仅需约4.55美元 🇮🇳!根据[Greg Brockman的(AI资讯)分享](https://x.com/gdb/status/1957650320923979996),该计划提供了比免费版**高10倍的消息量和图像生成量**,以及更长的记忆力。此举被视为AI普惠的重要一步,让更多人能以低成本享受强大AI工具带来的便利。
|
||||
|
||||
7. 想和孩子一起创作一本独一无二的故事书吗?Google Gemini 的 **Storybook** 功能让这一切变得简单有趣,正如[这篇(AI资讯)教程](https://x.com/shao__meng/status/1957605772017430917)所分享的,你可以上传照片作为灵感,指定**漫画或黏土动画**等艺术风格。这不仅是一个AI工具,更是一个激发家庭创造力、记录温馨回忆的互动平台 (o´ω'o)ノ。<br/>
|
||||
|
||||
521
server/main.py
Normal file
521
server/main.py
Normal file
@@ -0,0 +1,521 @@
|
||||
from fastapi import FastAPI, Request, HTTPException, Depends, Form, Header
|
||||
from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
|
||||
from typing import Optional, Dict
|
||||
import uuid
|
||||
import asyncio
|
||||
from starlette.background import BackgroundTasks
|
||||
from uuid import UUID
|
||||
import hashlib
|
||||
import hmac
|
||||
import time
|
||||
import os
|
||||
import json
|
||||
import argparse
|
||||
from enum import Enum
|
||||
import shutil
|
||||
from PIL import Image, ImageDraw
|
||||
import random
|
||||
import schedule
|
||||
import threading
|
||||
from contextlib import asynccontextmanager # 导入 asynccontextmanager
|
||||
import httpx # 导入 httpx 库
|
||||
from io import BytesIO # 导入 BytesIO
|
||||
import base64 # 导入 base64
|
||||
|
||||
from podcast_generator import generate_podcast_audio_api
|
||||
|
||||
class TaskStatus(str, Enum):
|
||||
PENDING = "pending"
|
||||
RUNNING = "running"
|
||||
COMPLETED = "completed"
|
||||
FAILED = "failed"
|
||||
|
||||
# --- 新的 Lifespan 上下文管理器 ---
|
||||
# 这是替代已弃用的 on_event("startup") 和 on_event("shutdown") 的新方法
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
# 在应用启动时运行的代码 (等同于 startup_event)
|
||||
print("FastAPI app is starting up...")
|
||||
|
||||
# 确保输出目录存在
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
|
||||
# 安排清理任务每30分钟运行一次
|
||||
schedule.every(time_after).minutes.do(clean_output_directory)
|
||||
|
||||
# 在单独的线程中启动调度器
|
||||
scheduler_thread = threading.Thread(target=run_scheduler, daemon=True)
|
||||
scheduler_thread.start()
|
||||
|
||||
print("FastAPI app started. Output directory cleaning is scheduled.")
|
||||
|
||||
# `yield` 语句是分割点,应用在这里运行
|
||||
yield
|
||||
|
||||
# 在应用关闭时运行的代码 (等同于 shutdown_event)
|
||||
print("FastAPI app is shutting down...")
|
||||
|
||||
# 发送信号让调度器线程停止
|
||||
stop_scheduler_event.set()
|
||||
|
||||
# 等待调度器线程结束(可选,但推荐)
|
||||
# 注意:在 lifespan 中,我们无法直接访问在启动部分创建的 scheduler_thread 局部变量
|
||||
# 因此,我们仍然需要一个全局事件标志来通信。
|
||||
# 线程本身是守护线程(daemon=True),如果主程序退出它也会被强制终止,
|
||||
# 但优雅地停止是更好的实践。
|
||||
print("Signaled scheduler to stop. Main application will now exit.")
|
||||
|
||||
|
||||
# 在创建 FastAPI 实例时,传入 lifespan 函数
|
||||
app = FastAPI(lifespan=lifespan)
|
||||
|
||||
# 全局标志,用于通知调度器线程停止
|
||||
stop_scheduler_event = threading.Event()
|
||||
|
||||
# 全局配置
|
||||
output_dir = "output"
|
||||
time_after = 30
|
||||
|
||||
# 内存中存储任务结果
|
||||
# {task_id: {"auth_id": auth_id, "status": TaskStatus, "result": any, "timestamp": float}}
|
||||
task_results: Dict[str, Dict[UUID, Dict]] = {}
|
||||
# 新增字典对象,key为音频文件名,value为task_results[auth_id][task_id]的值
|
||||
audio_file_mapping: Dict[str, Dict] = {}
|
||||
|
||||
# 签名验证配置
|
||||
SECRET_KEY = os.getenv("PODCAST_API_SECRET_KEY", "your-super-secret-key") # 在生产环境中请务必修改!
|
||||
# 定义从 tts_provider 名称到其配置文件路径的映射
|
||||
tts_provider_map = {
|
||||
"index-tts": "../config/index-tts.json",
|
||||
"doubao-tts": "../config/doubao-tts.json",
|
||||
"edge-tts": "../config/edge-tts.json",
|
||||
"fish-audio": "../config/fish-audio.json",
|
||||
"gemini-tts": "../config/gemini-tts.json",
|
||||
"minimax": "../config/minimax.json",
|
||||
}
|
||||
|
||||
# 定义一个函数来清理输出目录
|
||||
def clean_output_directory():
|
||||
"""
|
||||
清理 output 目录中的旧文件以及 task_results 中过期的任务。
|
||||
优先清理过期的任务及其关联文件,确保内存和文件系统同步。
|
||||
"""
|
||||
print(f"Cleaning output directory and expired tasks from memory: {output_dir}")
|
||||
now = time.time()
|
||||
threshold = time_after * 60 # 清理阈值,单位秒
|
||||
|
||||
# 第一阶段:清理 task_results 中已完成且过期的任务及其关联文件
|
||||
# 使用 list() 创建副本以安全地在迭代时修改原始字典
|
||||
auth_ids_to_clean = []
|
||||
for auth_id, tasks_by_auth in list(task_results.items()):
|
||||
task_ids_to_clean = []
|
||||
for task_id, task_info in list(tasks_by_auth.items()):
|
||||
# 只要 timestamp 过期,无论任务状态如何,都进行清理
|
||||
if (now - task_info["timestamp"] > threshold):
|
||||
task_ids_to_clean.append(task_id)
|
||||
|
||||
# 尝试删除对应的音频文件
|
||||
output_audio_filepath = task_info.get("output_audio_filepath")
|
||||
if output_audio_filepath:
|
||||
full_audio_path = os.path.join(output_dir, output_audio_filepath)
|
||||
try:
|
||||
if os.path.isfile(full_audio_path):
|
||||
os.unlink(full_audio_path)
|
||||
print(f"Deleted expired audio file: {full_audio_path}")
|
||||
else:
|
||||
print(f"Expired task {task_id} audio file {full_audio_path} not found or is not a file.")
|
||||
except Exception as e:
|
||||
print(f"Failed to delete audio file {full_audio_path}. Reason: {e}")
|
||||
|
||||
# 从 audio_file_mapping 中删除对应的条目
|
||||
filename_without_ext = os.path.splitext(output_audio_filepath)[0] if output_audio_filepath else None
|
||||
if filename_without_ext and filename_without_ext in audio_file_mapping:
|
||||
del audio_file_mapping[filename_without_ext]
|
||||
print(f"Removed audio_file_mapping entry for {filename_without_ext}.")
|
||||
|
||||
# 清理 task_results 中的任务
|
||||
for task_id in task_ids_to_clean:
|
||||
if task_id in task_results[auth_id]:
|
||||
del task_results[auth_id][task_id]
|
||||
print(f"Removed expired task {task_id} for auth_id {auth_id} from task_results.")
|
||||
|
||||
# 如果该 auth_id 下没有其他任务,则删除 auth_id 的整个条目
|
||||
if not task_results[auth_id]:
|
||||
auth_ids_to_clean.append(auth_id)
|
||||
|
||||
for auth_id in auth_ids_to_clean:
|
||||
if auth_id in task_results:
|
||||
del task_results[auth_id]
|
||||
print(f"Removed empty auth_id {auth_id} from task_results.")
|
||||
|
||||
# 第二阶段:清理 output 目录中可能未被任务关联的孤立文件
|
||||
# 或者那些任务还未过期,但文件因为某种原因在内存任务清理阶段没有被删除的文件
|
||||
for filename in os.listdir(output_dir):
|
||||
file_path = os.path.join(output_dir, filename)
|
||||
try:
|
||||
if os.path.isfile(file_path) or os.path.islink(file_path):
|
||||
# 获取最后修改时间
|
||||
if now - os.path.getmtime(file_path) > threshold:
|
||||
os.unlink(file_path)
|
||||
print(f"Deleted old unassociated file: {file_path}")
|
||||
elif os.path.isdir(file_path):
|
||||
# 可选地,递归删除旧的子目录或其中的文件
|
||||
pass
|
||||
except Exception as e:
|
||||
print(f"Failed to delete {file_path}. Reason: {e}")
|
||||
|
||||
async def get_auth_id(x_auth_id: str = Header(..., alias="X-Auth-Id")):
|
||||
"""
|
||||
从头部获取 X-Auth-Id 的依赖项。
|
||||
"""
|
||||
if not x_auth_id:
|
||||
raise HTTPException(status_code=400, detail="Missing X-Auth-Id header.")
|
||||
return x_auth_id
|
||||
|
||||
async def verify_signature(request: Request):
|
||||
"""
|
||||
验证请求头或查询参数中的 'sign' 参数。
|
||||
期望的签名格式: SHA256(timestamp + SECRET_KEY)
|
||||
"""
|
||||
timestamp = request.headers.get("X-Timestamp") or request.query_params.get("timestamp")
|
||||
client_sign = request.headers.get("X-Sign") or request.query_params.get("sign")
|
||||
|
||||
if not timestamp or not client_sign:
|
||||
raise HTTPException(status_code=400, detail="Missing X-Timestamp or X-Sign header/query parameter.")
|
||||
|
||||
try:
|
||||
current_time = int(time.time())
|
||||
if abs(current_time - int(timestamp)) > 300: # 请求在5分钟内有效
|
||||
raise HTTPException(status_code=400, detail="Request expired or timestamp is too far in the future.")
|
||||
|
||||
message = f"{timestamp}{SECRET_KEY}".encode('utf-8')
|
||||
server_sign = hmac.new(SECRET_KEY.encode('utf-8'), message, hashlib.sha256).hexdigest()
|
||||
|
||||
if server_sign != client_sign:
|
||||
raise HTTPException(status_code=401, detail="Invalid signature.")
|
||||
except ValueError:
|
||||
raise HTTPException(status_code=400, detail="Invalid timestamp format.")
|
||||
except Exception as e:
|
||||
raise HTTPException(status_code=500, detail=f"Signature verification error: {e}")
|
||||
|
||||
async def _generate_podcast_task(
|
||||
task_id: UUID,
|
||||
auth_id: str,
|
||||
api_key: str,
|
||||
base_url: str,
|
||||
model: str,
|
||||
input_txt_content: str,
|
||||
tts_providers_config_content: str,
|
||||
podUsers_json_content: str,
|
||||
threads: int,
|
||||
tts_provider: str,
|
||||
callback_url: Optional[str] = None, # 新增回调地址参数
|
||||
output_language: Optional[str] = None,
|
||||
usetime: Optional[str] = None,
|
||||
):
|
||||
task_results[auth_id][task_id]["status"] = TaskStatus.RUNNING
|
||||
try:
|
||||
parser = argparse.ArgumentParser(description="Generate podcast script and audio using OpenAI and local TTS.")
|
||||
parser.add_argument("--api-key", default=api_key, help="OpenAI API key.")
|
||||
parser.add_argument("--base-url", default=base_url, help="OpenAI API base URL (default: https://api.openai.com/v1).")
|
||||
parser.add_argument("--model", default=model, help="OpenAI model to use (default: gpt-3.5-turbo).")
|
||||
parser.add_argument("--threads", type=int, default=threads, help="Number of threads to use for audio generation (default: 1).")
|
||||
parser.add_argument("--output-language", default=output_language, help="Output language for the podcast script (default: Chinese).")
|
||||
parser.add_argument("--usetime", default=usetime, help="Time duration for the podcast script (default: 10 minutes).")
|
||||
args = parser.parse_args([])
|
||||
|
||||
actual_config_path = tts_provider_map.get(tts_provider)
|
||||
if not actual_config_path:
|
||||
raise ValueError(f"Invalid tts_provider: {tts_provider}.")
|
||||
|
||||
podcast_generation_results = await asyncio.to_thread(
|
||||
generate_podcast_audio_api,
|
||||
args=args,
|
||||
config_path=actual_config_path,
|
||||
input_txt_content=input_txt_content.strip(),
|
||||
tts_providers_config_content=tts_providers_config_content.strip(),
|
||||
podUsers_json_content=podUsers_json_content.strip()
|
||||
)
|
||||
task_results[auth_id][task_id]["status"] = TaskStatus.COMPLETED
|
||||
task_results[auth_id][task_id].update(podcast_generation_results)
|
||||
print(f"\nPodcast generation completed for task {task_id}. Output file: {podcast_generation_results.get('output_audio_filepath')}")
|
||||
# 更新 audio_file_mapping
|
||||
output_audio_filepath = podcast_generation_results.get('output_audio_filepath')
|
||||
if output_audio_filepath:
|
||||
# 从完整路径中提取文件名
|
||||
filename = os.path.basename(output_audio_filepath)
|
||||
filename = filename.split(".")[0]
|
||||
# 将任务信息添加到 audio_file_mapping
|
||||
audio_file_mapping[filename] = task_results[auth_id][task_id]
|
||||
|
||||
# 生成并编码像素头像
|
||||
avatar_bytes = generate_pixel_avatar(str(task_id)) # 使用 task_id 作为种子
|
||||
avatar_base64 = base64.b64encode(avatar_bytes).decode('utf-8')
|
||||
task_results[auth_id][task_id]["avatar_base64"] = avatar_base64 # 存储 Base64 编码的头像数据
|
||||
except Exception as e:
|
||||
task_results[auth_id][task_id]["status"] = TaskStatus.FAILED
|
||||
task_results[auth_id][task_id]["result"] = str(e)
|
||||
print(f"\nPodcast generation failed for task {task_id}: {e}")
|
||||
finally: # 无论成功或失败,都尝试调用回调
|
||||
if callback_url:
|
||||
print(f"Attempting to send callback for task {task_id} to {callback_url}")
|
||||
callback_data = {
|
||||
"task_id": str(task_id),
|
||||
"auth_id": auth_id,
|
||||
"task_results": task_results[auth_id][task_id],
|
||||
"timestamp": int(time.time()),
|
||||
"status": task_results[auth_id][task_id]["status"],
|
||||
}
|
||||
|
||||
MAX_RETRIES = 3 # 定义最大重试次数
|
||||
RETRY_DELAY = 5 # 定义重试间隔(秒)
|
||||
|
||||
for attempt in range(MAX_RETRIES + 1): # 尝试次数从0到MAX_RETRIES
|
||||
try:
|
||||
async with httpx.AsyncClient() as client:
|
||||
response = await client.put(callback_url, json=callback_data, timeout=30.0)
|
||||
response.raise_for_status() # 对 4xx/5xx 响应抛出异常
|
||||
print(f"Callback successfully sent for task {task_id} on attempt {attempt + 1}. Status: {response.status_code}")
|
||||
break # 成功发送,跳出循环
|
||||
except httpx.RequestError as req_err:
|
||||
print(f"Callback request failed for task {task_id} to {callback_url} on attempt {attempt + 1}: {req_err}")
|
||||
except httpx.HTTPStatusError as http_err:
|
||||
print(f"Callback received error response for task {task_id} from {callback_url} on attempt {attempt + 1}: {http_err.response.status_code} - {http_err.response.text}")
|
||||
except Exception as cb_err:
|
||||
print(f"An unexpected error occurred during callback for task {task_id} on attempt {attempt + 1}: {cb_err}")
|
||||
|
||||
if attempt < MAX_RETRIES:
|
||||
print(f"Retrying callback for task {task_id} in {RETRY_DELAY} seconds...")
|
||||
await asyncio.sleep(RETRY_DELAY)
|
||||
else:
|
||||
print(f"Callback failed for task {task_id} after {MAX_RETRIES} attempts.")
|
||||
|
||||
# @app.post("/generate-podcast", dependencies=[Depends(verify_signature)])
|
||||
@app.post("/generate-podcast")
|
||||
async def generate_podcast_submission(
|
||||
background_tasks: BackgroundTasks,
|
||||
auth_id: str = Depends(get_auth_id),
|
||||
api_key: str = Form("OpenAI API key."),
|
||||
base_url: str = Form("https://api.openai.com/v1"),
|
||||
model: str = Form("gpt-3.5-turbo"),
|
||||
input_txt_content: str = Form(...),
|
||||
tts_providers_config_content: str = Form(...),
|
||||
podUsers_json_content: str = Form(...),
|
||||
threads: int = Form(1),
|
||||
tts_provider: str = Form("index-tts"),
|
||||
callback_url: Optional[str] = Form(None),
|
||||
output_language: Optional[str] = Form(None),
|
||||
usetime: Optional[str] = Form(None),
|
||||
):
|
||||
# 1. 验证 tts_provider
|
||||
if tts_provider not in tts_provider_map:
|
||||
raise HTTPException(status_code=400, detail=f"Invalid tts_provider: {tts_provider}.")
|
||||
|
||||
# 2. 检查此 auth_id 是否有正在运行的任务
|
||||
if auth_id in task_results:
|
||||
for existing_task_id, existing_task_info in task_results[auth_id].items():
|
||||
if existing_task_info["status"] in (TaskStatus.RUNNING, TaskStatus.PENDING):
|
||||
raise HTTPException(status_code=409, detail=f"There is already a running task (ID: {existing_task_id}) for this auth_id. Please wait for it to complete.")
|
||||
|
||||
task_id = uuid.uuid4()
|
||||
if auth_id not in task_results:
|
||||
task_results[auth_id] = {}
|
||||
task_results[auth_id][task_id] = {
|
||||
"status": TaskStatus.PENDING,
|
||||
"result": None,
|
||||
"timestamp": time.time(),
|
||||
"callback_url": callback_url, # 存储回调地址
|
||||
"auth_id": auth_id, # 存储 auth_id
|
||||
}
|
||||
|
||||
background_tasks.add_task(
|
||||
_generate_podcast_task,
|
||||
task_id,
|
||||
auth_id,
|
||||
api_key,
|
||||
base_url,
|
||||
model,
|
||||
input_txt_content,
|
||||
tts_providers_config_content,
|
||||
podUsers_json_content,
|
||||
threads,
|
||||
tts_provider,
|
||||
callback_url,
|
||||
output_language,
|
||||
usetime
|
||||
)
|
||||
|
||||
return {"message": "Podcast generation started.", "task_id": task_id}
|
||||
|
||||
# @app.get("/podcast-status", dependencies=[Depends(verify_signature)])
|
||||
@app.get("/podcast-status")
|
||||
async def get_podcast_status(
|
||||
auth_id: str = Depends(get_auth_id)
|
||||
):
|
||||
if auth_id not in task_results:
|
||||
return {"message": "No tasks found for this auth_id.", "tasks": []}
|
||||
|
||||
all_tasks_for_auth_id = []
|
||||
for task_id, task_info in task_results.get(auth_id, {}).items():
|
||||
all_tasks_for_auth_id.append({
|
||||
"task_id": task_id,
|
||||
"status": task_info["status"],
|
||||
"podUsers": task_info.get("podUsers"),
|
||||
"output_audio_filepath": task_info.get("output_audio_filepath"),
|
||||
"overview_content": task_info.get("overview_content"),
|
||||
"podcast_script": task_info.get("podcast_script"),
|
||||
"avatar_base64": task_info.get("avatar_base64"), # 添加 Base64 编码的头像数据
|
||||
"audio_duration": task_info.get("audio_duration"),
|
||||
"title": task_info.get("title"),
|
||||
"tags": task_info.get("tags"),
|
||||
"error": task_info["result"] if task_info["status"] == TaskStatus.FAILED else None,
|
||||
"timestamp": task_info["timestamp"]
|
||||
})
|
||||
return {"message": "Tasks retrieved successfully.", "tasks": all_tasks_for_auth_id}
|
||||
|
||||
@app.get("/download-podcast/")
|
||||
async def download_podcast(file_name: str):
|
||||
file_path = os.path.join(output_dir, file_name)
|
||||
if not os.path.exists(file_path):
|
||||
raise HTTPException(status_code=404, detail="File not found.")
|
||||
return FileResponse(file_path, media_type='audio/mpeg', filename=file_name)
|
||||
|
||||
@app.get("/get-audio-info/")
|
||||
async def get_audio_info(file_name: str):
|
||||
"""
|
||||
根据文件名从 audio_file_mapping 中获取对应的任务信息。
|
||||
"""
|
||||
# 移除文件扩展名(如果存在),因为 audio_file_mapping 的键是文件名(不含扩展名)
|
||||
base_file_name = os.path.splitext(file_name)[0]
|
||||
|
||||
audio_info = audio_file_mapping.get(base_file_name)
|
||||
if audio_info:
|
||||
# 返回任务信息的副本,避免直接暴露内部字典引用
|
||||
return JSONResponse(content={k: str(v) if isinstance(v, UUID) else v for k, v in audio_info.items()})
|
||||
else:
|
||||
raise HTTPException(status_code=404, detail="Audio file information not found.")
|
||||
|
||||
@app.get("/avatar/{username}")
|
||||
async def get_avatar(username: str):
|
||||
"""
|
||||
根据用户名生成并返回一个像素头像。
|
||||
"""
|
||||
avatar_bytes = generate_pixel_avatar(username)
|
||||
return StreamingResponse(BytesIO(avatar_bytes), media_type="image/png")
|
||||
|
||||
@app.get("/get-voices")
|
||||
async def get_voices(tts_provider: str = "tts"):
|
||||
config_path = tts_provider_map.get(tts_provider)
|
||||
if not config_path:
|
||||
raise HTTPException(status_code=400, detail=f"Invalid tts_provider: {tts_provider}.")
|
||||
|
||||
try:
|
||||
with open(config_path, 'r', encoding='utf-8') as f:
|
||||
config_data = json.load(f)
|
||||
|
||||
voices = config_data.get("voices")
|
||||
if voices is None:
|
||||
raise HTTPException(status_code=404, detail=f"No 'voices' key found in config for {tts_provider}.")
|
||||
|
||||
return {"tts_provider": tts_provider, "voices": voices}
|
||||
except FileNotFoundError:
|
||||
raise HTTPException(status_code=404, detail=f"Config file not found for {tts_provider}: {config_path}")
|
||||
except json.JSONDecodeError:
|
||||
raise HTTPException(status_code=500, detail=f"Error decoding JSON from config file for {tts_provider}: {config_path}. Please check file format.")
|
||||
except Exception as e:
|
||||
raise HTTPException(status_code=500, detail=f"An unexpected error occurred: {e}")
|
||||
|
||||
@app.get("/")
|
||||
async def read_root():
|
||||
return {"message": "FastAPI server is running!"}
|
||||
|
||||
def generate_pixel_avatar(seed_string: str) -> bytes:
|
||||
"""
|
||||
根据给定的字符串生成一个48x48像素的像素头像。
|
||||
头像具有确定性(相同输入字符串生成相同头像)和对称性。
|
||||
"""
|
||||
size = 48
|
||||
pixel_grid_size = 5 # 内部像素网格大小 (例如 5x5)
|
||||
|
||||
# 使用SHA256哈希作为随机种子,确保确定性
|
||||
hash_object = hashlib.sha256(seed_string.encode('utf-8'))
|
||||
hash_hex = hash_object.hexdigest()
|
||||
|
||||
# 将哈希值转换为整数,作为随机数生成器的种子
|
||||
random.seed(int(hash_hex, 16))
|
||||
|
||||
# 创建一个空白的48x48 RGBA图像
|
||||
img = Image.new('RGBA', (size, size), (255, 255, 255, 0)) # 透明背景
|
||||
draw = ImageDraw.Draw(img)
|
||||
|
||||
# 随机生成头像的主颜色 (饱和度较高,亮度适中)
|
||||
hue = random.randint(0, 360)
|
||||
saturation = random.randint(70, 100) # 高饱和度
|
||||
lightness = random.randint(40, 60) # 适中亮度
|
||||
|
||||
# 将HSL转换为RGB
|
||||
def hsl_to_rgb(h, s, l):
|
||||
h /= 360
|
||||
s /= 100
|
||||
l /= 100
|
||||
|
||||
if s == 0:
|
||||
return (int(l * 255), int(l * 255), int(l * 255), 255)
|
||||
|
||||
def hue_to_rgb(p, q, t):
|
||||
if t < 0: t += 1
|
||||
if t > 1: t -= 1
|
||||
if t < 1/6: return p + (q - p) * 6 * t
|
||||
if t < 1/2: return q
|
||||
if t < 2/3: return p + (q - p) * (2/3 - t) * 6
|
||||
return p
|
||||
|
||||
q = l * (1 + s) if l < 0.5 else l + s - l * s
|
||||
p = 2 * l - q
|
||||
|
||||
r = hue_to_rgb(p, q, h + 1/3)
|
||||
g = hue_to_rgb(p, q, h)
|
||||
b = hue_to_rgb(p, q, h - 1/3)
|
||||
|
||||
return (int(r * 255), int(g * 255), int(b * 255), 255)
|
||||
|
||||
main_color = hsl_to_rgb(hue, saturation, lightness)
|
||||
|
||||
# 生成像素网格
|
||||
# 只需生成一半的网格,然后对称复制
|
||||
pixels = [[0 for _ in range(pixel_grid_size)] for _ in range(pixel_grid_size)]
|
||||
|
||||
for y in range(pixel_grid_size):
|
||||
for x in range((pixel_grid_size + 1) // 2): # 只生成左半部分或中间列
|
||||
if random.random() > 0.5: # 50% 的几率填充像素
|
||||
pixels[y][x] = 1 # 填充
|
||||
pixels[y][pixel_grid_size - 1 - x] = 1 # 对称填充
|
||||
|
||||
# 计算每个内部像素在最终图像中的大小
|
||||
pixel_width = size // pixel_grid_size
|
||||
pixel_height = size // pixel_grid_size
|
||||
|
||||
# 绘制像素
|
||||
for y in range(pixel_grid_size):
|
||||
for x in range(pixel_grid_size):
|
||||
if pixels[y][x] == 1:
|
||||
draw.rectangle(
|
||||
[x * pixel_width, y * pixel_height, (x + 1) * pixel_width, (y + 1) * pixel_height],
|
||||
fill=main_color
|
||||
)
|
||||
|
||||
# 将图像转换为字节流
|
||||
from io import BytesIO
|
||||
byte_io = BytesIO()
|
||||
img.save(byte_io, format='PNG')
|
||||
return byte_io.getvalue()
|
||||
|
||||
def run_scheduler():
|
||||
"""在循环中运行调度器,直到设置停止事件。"""
|
||||
while not stop_scheduler_event.is_set():
|
||||
schedule.run_pending()
|
||||
time.sleep(1) # 每秒检查一次新任务或停止事件
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
uvicorn.run(app, host="0.0.0.0", port=8000)
|
||||
227
server/openai_cli.py
Normal file
227
server/openai_cli.py
Normal file
@@ -0,0 +1,227 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
OpenAI CLI - 纯命令行OpenAI接口调用工具
|
||||
|
||||
支持以下功能:
|
||||
- 自定义API密钥、URL和模型名称
|
||||
- 交互式聊天模式
|
||||
- 单次查询模式
|
||||
- 流式输出
|
||||
|
||||
使用方法:
|
||||
|
||||
1. 安装依赖:
|
||||
pip install openai
|
||||
|
||||
2. 设置API密钥 (以下任意一种方式):
|
||||
- 环境变量: export OPENAI_API_KEY="你的API密钥"
|
||||
- 命令行参数: python openai_cli.py --api-key "你的API密钥"
|
||||
- 配置文件 (config.json):
|
||||
{
|
||||
"api_key": "你的API密钥",
|
||||
"base_url": "https://api.openai.com/v1",
|
||||
"model": "gpt-3.5-turbo",
|
||||
"temperature": 0.7,
|
||||
"top_p": 1.0
|
||||
}
|
||||
然后通过 --config config.json 加载
|
||||
|
||||
3. 运行脚本:
|
||||
|
||||
- 交互式聊天模式:
|
||||
python openai_cli.py [可选参数: --api-key VAL --base-url VAL --model VAL --temperature VAL --top-p VAL]
|
||||
在交互模式中,输入 'quit' 或 'exit' 退出,输入 'clear' 清空对话历史。
|
||||
|
||||
- 单次查询模式:
|
||||
python openai_cli.py --query "你的问题" [可选参数: --api-key VAL --base-url VAL --model VAL --temperature VAL --top-p VAL --max-tokens VAL --system-message VAL]
|
||||
|
||||
- 使用配置文件:
|
||||
python openai_cli.py --config config.json --query "你的问题"
|
||||
|
||||
示例:
|
||||
python openai_cli.py
|
||||
python openai_cli.py -q "你好,世界" -m gpt-4
|
||||
python openai_cli.py -q "你好,世界" --temperature 0.8 --top-p 0.9
|
||||
python openai_cli.py --config my_config.json
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
from typing import Optional, Any, List, Union
|
||||
import openai
|
||||
from openai.types.chat import ChatCompletionMessageParam
|
||||
|
||||
|
||||
class OpenAICli:
|
||||
def __init__(self, api_key: Optional[str] = None, base_url: Optional[str] = None, model: str = "gpt-3.5-turbo", system_message: Optional[str] = None):
|
||||
"""初始化CLI客户端"""
|
||||
self.api_key = api_key or os.getenv("OPENAI_API_KEY")
|
||||
self.model = model or os.getenv("OPENAI_MODEL", "gpt-3.5-turbo")
|
||||
self.system_message = system_message
|
||||
|
||||
if not self.api_key:
|
||||
raise ValueError("API密钥不能为空,请通过参数或环境变量OPENAI_API_KEY设置")
|
||||
|
||||
# openai.OpenAI 客户端会自动处理 base_url 的默认值,如果传入 None
|
||||
# 如果 base_url 传入的是 None 或空字符串,则使用 OpenAI 默认的 API base
|
||||
# 否则,使用传入的 base_url
|
||||
effective_base_url = None
|
||||
if base_url:
|
||||
effective_base_url = base_url
|
||||
elif os.getenv("OPENAI_BASE_URL"):
|
||||
effective_base_url = os.getenv("OPENAI_BASE_URL")
|
||||
|
||||
self.client = openai.OpenAI(api_key=self.api_key, base_url=effective_base_url)
|
||||
|
||||
def chat_completion(self, messages: List[ChatCompletionMessageParam], temperature: float = 0.7, top_p: float = 1.0, max_tokens: Optional[int] = None) -> Any:
|
||||
"""发送聊天完成请求"""
|
||||
# 处理系统提示词
|
||||
messages_to_send = list(messages) # 创建一个副本以避免修改原始列表
|
||||
|
||||
system_message_present = False
|
||||
if messages_to_send and messages_to_send[0].get("role") == "system":
|
||||
system_message_present = True
|
||||
|
||||
if self.system_message:
|
||||
if system_message_present:
|
||||
# 更新现有的系统提示词
|
||||
messages_to_send[0]["content"] = self.system_message
|
||||
else:
|
||||
# 插入新的系统提示词
|
||||
messages_to_send.insert(0, {"role": "system", "content": self.system_message})
|
||||
|
||||
try:
|
||||
response = self.client.chat.completions.create(
|
||||
model=self.model,
|
||||
messages=messages_to_send, # 使用包含系统提示词的列表
|
||||
temperature=temperature,
|
||||
top_p=top_p,
|
||||
max_tokens=max_tokens,
|
||||
stream=True
|
||||
)
|
||||
return response
|
||||
except Exception as e:
|
||||
raise Exception(f"API调用失败: {str(e)}")
|
||||
|
||||
def interactive_chat(self):
|
||||
"""启动交互式聊天模式"""
|
||||
print(f"🤖 OpenAI CLI 已启动 (模型: {self.model})")
|
||||
print("输入 'quit' 或 'exit' 退出,输入 'clear' 清空对话历史")
|
||||
print("-" * 50)
|
||||
|
||||
messages: List[ChatCompletionMessageParam] = []
|
||||
# 移除此处添加system_message的逻辑,因为它已在chat_completion中处理
|
||||
|
||||
while True:
|
||||
try:
|
||||
user_input = input("\n你: ").strip()
|
||||
|
||||
if user_input.lower() in ['quit', 'exit', 'q']:
|
||||
print("👋 再见!")
|
||||
break
|
||||
|
||||
if user_input.lower() == 'clear':
|
||||
messages = []
|
||||
print("🗑️ 对话历史已清空")
|
||||
continue
|
||||
|
||||
if not user_input:
|
||||
continue
|
||||
|
||||
messages.append({"role": "user", "content": user_input})
|
||||
|
||||
print("AI: ", end="", flush=True)
|
||||
|
||||
response_generator = self.chat_completion(messages)
|
||||
ai_message_full = ""
|
||||
for chunk in response_generator:
|
||||
if chunk.choices and chunk.choices[0].delta.content:
|
||||
content = chunk.choices[0].delta.content
|
||||
print(content, end="", flush=True)
|
||||
ai_message_full += content
|
||||
print() # Print a newline at the end of the AI's response
|
||||
messages.append({"role": "assistant", "content": ai_message_full})
|
||||
|
||||
except KeyboardInterrupt:
|
||||
print("\n\n👋 再见!")
|
||||
break
|
||||
except Exception as e:
|
||||
print(f"\n❌ 错误: {str(e)}")
|
||||
|
||||
def single_query(self, query: str, temperature: float = 0.7, top_p: float = 1.0, max_tokens: Optional[int] = None):
|
||||
"""单次查询模式"""
|
||||
messages: List[ChatCompletionMessageParam] = []
|
||||
# 移除此处添加system_message的逻辑,因为它已在chat_completion中处理
|
||||
messages.append({"role": "user", "content": query})
|
||||
|
||||
try:
|
||||
response_generator = self.chat_completion(messages, temperature, top_p, max_tokens)
|
||||
for chunk in response_generator:
|
||||
if chunk.choices and chunk.choices[0].delta.content:
|
||||
print(chunk.choices[0].delta.content, end="", flush=True)
|
||||
print() # Ensure a newline at the end
|
||||
except Exception as e:
|
||||
print(f"错误: {str(e)}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="OpenAI CLI - 使用litellm的纯命令行工具")
|
||||
|
||||
# 基本参数
|
||||
parser.add_argument("--api-key", "-k", help="OpenAI API密钥")
|
||||
parser.add_argument("--base-url", "-u", help="API基础URL")
|
||||
parser.add_argument("--model", "-m", default="gpt-3.5-turbo", help="模型名称")
|
||||
|
||||
# 查询参数
|
||||
parser.add_argument("--query", "-q", help="单次查询的问题")
|
||||
parser.add_argument("--temperature", "-t", type=float, default=1, help="温度参数 (0.0-2.0)")
|
||||
parser.add_argument("--top-p", type=float, default=0.95, help="Top-p采样参数 (0.0-1.0)")
|
||||
parser.add_argument("--max-tokens", type=int, help="最大token数")
|
||||
parser.add_argument("--system-message", "-s", help="系统提示词")
|
||||
|
||||
# 配置文件
|
||||
parser.add_argument("--config", "-c", help="配置文件路径 (JSON格式)")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# 加载配置文件
|
||||
config = {}
|
||||
if args.config and os.path.exists(args.config):
|
||||
try:
|
||||
with open(args.config, 'r', encoding='utf-8') as f:
|
||||
config = json.load(f)
|
||||
except Exception as e:
|
||||
print(f"配置文件加载失败: {str(e)}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
# 合并配置优先级: 命令行参数 > 配置文件 > 环境变量
|
||||
api_key = args.api_key or config.get("api_key")
|
||||
base_url = args.base_url or config.get("base_url")
|
||||
model = args.model or config.get("model", "gpt-3.5-turbo")
|
||||
system_message = args.system_message or config.get("system_message")
|
||||
temperature = args.temperature or config.get("temperature", 1)
|
||||
top_p = args.top_p or config.get("top_p", 0.95)
|
||||
|
||||
try:
|
||||
cli = OpenAICli(api_key=api_key, base_url=base_url, model=model, system_message=system_message)
|
||||
|
||||
if args.query:
|
||||
# 单次查询模式
|
||||
cli.single_query(args.query, temperature, top_p, args.max_tokens)
|
||||
else:
|
||||
# 交互式模式
|
||||
cli.interactive_chat()
|
||||
|
||||
except ValueError as e:
|
||||
print(f"配置错误: {str(e)}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except Exception as e:
|
||||
print(f"运行时错误: {str(e)}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
651
server/podcast_generator.py
Normal file
651
server/podcast_generator.py
Normal file
@@ -0,0 +1,651 @@
|
||||
# podcast_generator.py
|
||||
|
||||
import argparse # Import argparse for command-line arguments
|
||||
import os
|
||||
import json
|
||||
import time
|
||||
import glob
|
||||
import sys
|
||||
import subprocess # For calling external commands like ffmpeg
|
||||
import requests # For making HTTP requests to TTS API
|
||||
import uuid # For generating unique filenames for temporary audio files
|
||||
from datetime import datetime
|
||||
from openai_cli import OpenAICli # Moved to top for proper import
|
||||
import urllib.parse # For URL encoding
|
||||
import re # For regular expression operations
|
||||
from typing import Optional, Tuple
|
||||
from tts_adapters import TTSAdapter, IndexTTSAdapter, EdgeTTSAdapter, FishAudioAdapter, MinimaxAdapter, DoubaoTTSAdapter, GeminiTTSAdapter # Import TTS adapters
|
||||
|
||||
# Global configuration
|
||||
output_dir = "output"
|
||||
file_list_path = os.path.join(output_dir, "file_list.txt")
|
||||
tts_providers_config_path = '../config/tts_providers.json'
|
||||
|
||||
def read_file_content(filepath):
|
||||
"""Reads content from a given file path."""
|
||||
try:
|
||||
with open(filepath, 'r', encoding='utf-8') as f:
|
||||
return f.read()
|
||||
except FileNotFoundError:
|
||||
raise FileNotFoundError(f"Error: File not found at {filepath}")
|
||||
|
||||
def _load_json_config(file_path: str) -> dict:
|
||||
"""Loads a JSON configuration file."""
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
return json.load(f)
|
||||
except FileNotFoundError:
|
||||
raise FileNotFoundError(f"Error: Configuration file not found at {file_path}")
|
||||
except json.JSONDecodeError as e:
|
||||
raise ValueError(f"Error decoding JSON from {file_path}: {e}")
|
||||
|
||||
def select_json_config(config_dir='../config', return_file_path=False):
|
||||
"""
|
||||
Reads JSON files from the specified directory and allows the user to select one.
|
||||
Returns the content of the selected JSON file.
|
||||
If return_file_path is True, returns a tuple of (file_path, content).
|
||||
"""
|
||||
json_files = glob.glob(os.path.join(config_dir, '*.json'))
|
||||
if not json_files:
|
||||
raise FileNotFoundError(f"Error: No JSON files found in {config_dir}")
|
||||
|
||||
valid_json_files = []
|
||||
print(f"Found JSON configuration files in '{config_dir}':")
|
||||
for i, file_path in enumerate(json_files):
|
||||
file_name = os.path.basename(file_path)
|
||||
if file_name != os.path.basename(tts_providers_config_path):
|
||||
valid_json_files.append(file_path)
|
||||
print(f"{len(valid_json_files)}. {file_name}")
|
||||
|
||||
if not valid_json_files:
|
||||
raise FileNotFoundError(f"Error: No valid JSON files (excluding tts_providers.json) found in {config_dir}")
|
||||
|
||||
while True:
|
||||
try:
|
||||
choice_str = input("Enter the number of the configuration file to use: ")
|
||||
if not choice_str: # Allow empty input to raise an error
|
||||
raise ValueError("No input provided. Please enter a number.")
|
||||
choice = int(choice_str)
|
||||
if 1 <= choice <= len(valid_json_files):
|
||||
selected_file = valid_json_files[choice - 1]
|
||||
print(f"Selected: {os.path.basename(selected_file)}")
|
||||
with open(selected_file, 'r', encoding='utf-8') as f:
|
||||
content = json.load(f)
|
||||
if return_file_path:
|
||||
return selected_file, content
|
||||
else:
|
||||
return content
|
||||
else:
|
||||
raise ValueError("Invalid choice. Please enter a number within the range.")
|
||||
except FileNotFoundError as e:
|
||||
raise FileNotFoundError(f"Error loading selected JSON file: {e}")
|
||||
except json.JSONDecodeError as e:
|
||||
raise ValueError(f"Error decoding JSON from selected file: {e}")
|
||||
except ValueError as e:
|
||||
print(f"Invalid input: {e}. Please enter a number.")
|
||||
|
||||
def generate_speaker_id_text(pod_users, voices_list):
|
||||
"""
|
||||
Generates a text string mapping speaker IDs to their names/aliases based on podUsers and voices.
|
||||
Optimized by converting voices_list to a dictionary for faster lookups.
|
||||
"""
|
||||
voice_map = {voice.get("code"): voice for voice in voices_list if voice.get("code")}
|
||||
|
||||
speaker_info = []
|
||||
for speaker_id, pod_user in enumerate(pod_users):
|
||||
pod_user_code = pod_user.get("code")
|
||||
role = pod_user.get("role", "") # Default to "未知角色" if role is not provided
|
||||
|
||||
found_name = None
|
||||
voice = voice_map.get(pod_user_code)
|
||||
if voice:
|
||||
found_name = voice.get("usedname") or voice.get("alias") or voice.get("name")
|
||||
|
||||
if found_name:
|
||||
if role:
|
||||
speaker_info.append(f"speaker_id={speaker_id}的名叫{found_name},是一个{role}")
|
||||
else:
|
||||
speaker_info.append(f"speaker_id={speaker_id}的名叫{found_name}")
|
||||
else:
|
||||
raise ValueError(f"语音code '{pod_user_code}' (speaker_id={speaker_id}) 未找到对应名称或alias。请检查 config/edge-tts.json 中的 voices 配置。")
|
||||
|
||||
return "。".join(speaker_info) + "。"
|
||||
|
||||
def merge_audio_files():
|
||||
# 生成一个唯一的UUID
|
||||
unique_id = str(uuid.uuid4())
|
||||
unique_id = unique_id.replace("-", "")
|
||||
# 获取当前时间戳
|
||||
timestamp = int(time.time())
|
||||
# 组合UUID和时间戳作为文件名,去掉 'podcast_' 前缀
|
||||
output_audio_filename_wav = f"{unique_id}{timestamp}.wav"
|
||||
output_audio_filepath_wav = os.path.join(output_dir, output_audio_filename_wav)
|
||||
output_audio_filename_mp3 = f"{unique_id}{timestamp}.mp3"
|
||||
output_audio_filepath_mp3 = os.path.join(output_dir, output_audio_filename_mp3)
|
||||
|
||||
# Use ffmpeg to concatenate audio files
|
||||
# Check if ffmpeg is available
|
||||
try:
|
||||
subprocess.run(["ffmpeg", "-version"], check=True, capture_output=True)
|
||||
except FileNotFoundError:
|
||||
raise RuntimeError("FFmpeg is not installed or not in your PATH. Please install FFmpeg to merge audio files. You can download FFmpeg from: https://ffmpeg.org/download.html")
|
||||
|
||||
print(f"\nMerging audio files into {output_audio_filename_wav}...")
|
||||
try:
|
||||
command = [
|
||||
"ffmpeg",
|
||||
"-f", "concat",
|
||||
"-safe", "0",
|
||||
"-i", os.path.basename(file_list_path),
|
||||
"-acodec", "pcm_s16le",
|
||||
"-ar", "44100",
|
||||
"-ac", "2",
|
||||
output_audio_filename_wav # Output to WAV first
|
||||
]
|
||||
# Execute ffmpeg from the output_dir to correctly resolve file paths in file_list.txt
|
||||
process = subprocess.run(command, check=True, cwd=output_dir, capture_output=True, text=True)
|
||||
print(f"Audio files merged successfully into {output_audio_filepath_wav}!")
|
||||
print("FFmpeg stdout:\n", process.stdout)
|
||||
print("FFmpeg stderr:\n", process.stderr)
|
||||
|
||||
# Convert WAV to MP3
|
||||
print(f"Converting {output_audio_filename_wav} to {output_audio_filename_mp3} (high quality)...")
|
||||
mp3_command = [
|
||||
"ffmpeg",
|
||||
"-i", output_audio_filename_wav,
|
||||
"-vn", # No video
|
||||
"-b:a", "192k", # Audio bitrate to 192kbps for high quality
|
||||
"-acodec", "libmp3lame", # Use libmp3lame for MP3 encoding
|
||||
output_audio_filename_mp3
|
||||
]
|
||||
mp3_process = subprocess.run(mp3_command, check=True, cwd=output_dir, capture_output=True, text=True)
|
||||
print(f"Conversion to MP3 successful! Output: {output_audio_filepath_mp3}")
|
||||
print("FFmpeg MP3 stdout:\n", mp3_process.stdout)
|
||||
print("FFmpeg MP3 stderr:\n", mp3_process.stderr)
|
||||
|
||||
return output_audio_filename_mp3 # Return the MP3 filename
|
||||
except subprocess.CalledProcessError as e:
|
||||
raise RuntimeError(f"Error merging or converting audio files with FFmpeg: {e.stderr}")
|
||||
finally:
|
||||
# Clean up temporary audio files, the file list, and the intermediate WAV file
|
||||
for item in os.listdir(output_dir):
|
||||
if item.startswith("temp_audio"):
|
||||
try:
|
||||
os.remove(os.path.join(output_dir, item))
|
||||
except OSError as e:
|
||||
print(f"Error removing temporary audio file {item}: {e}") # This should not stop the process
|
||||
try:
|
||||
os.remove(file_list_path)
|
||||
except OSError as e:
|
||||
print(f"Error removing file list {file_list_path}: {e}") # This should not stop the process
|
||||
try:
|
||||
if os.path.exists(output_audio_filepath_wav):
|
||||
os.remove(output_audio_filepath_wav)
|
||||
print(f"Cleaned up intermediate WAV file: {output_audio_filename_wav}")
|
||||
except OSError as e:
|
||||
print(f"Error removing intermediate WAV file {output_audio_filepath_wav}: {e}")
|
||||
print("Cleaned up temporary files.")
|
||||
|
||||
def get_audio_duration(filepath: str) -> Optional[float]:
|
||||
"""
|
||||
Uses ffprobe to get the duration of an audio file in seconds.
|
||||
Returns None if duration cannot be determined.
|
||||
"""
|
||||
try:
|
||||
# Check if ffprobe is available
|
||||
subprocess.run(["ffprobe", "-version"], check=True, capture_output=True, text=True)
|
||||
except FileNotFoundError:
|
||||
print("Error: ffprobe is not installed or not in your PATH. Please install FFmpeg (which includes ffprobe) to get audio duration.")
|
||||
return None
|
||||
|
||||
try:
|
||||
command = [
|
||||
"ffprobe",
|
||||
"-v", "error",
|
||||
"-show_entries", "format=duration",
|
||||
"-of", "default=noprint_wrappers=1:nokey=1",
|
||||
filepath
|
||||
]
|
||||
result = subprocess.run(command, check=True, capture_output=True, text=True)
|
||||
duration = float(result.stdout.strip())
|
||||
return duration
|
||||
except subprocess.CalledProcessError as e:
|
||||
print(f"Error calling ffprobe for {filepath}: {e.stderr}")
|
||||
return None
|
||||
except ValueError:
|
||||
print(f"Could not parse duration from ffprobe output for {filepath}.")
|
||||
return None
|
||||
except Exception as e:
|
||||
print(f"An unexpected error occurred while getting audio duration for {filepath}: {e}")
|
||||
return None
|
||||
|
||||
def _parse_arguments():
|
||||
"""Parses command-line arguments."""
|
||||
parser = argparse.ArgumentParser(description="Generate podcast script and audio using OpenAI and local TTS.")
|
||||
parser.add_argument("--api-key", help="OpenAI API key.")
|
||||
parser.add_argument("--base-url", default="https://api.openai.com/v1", help="OpenAI API base URL (default: https://api.openai.com/v1).")
|
||||
parser.add_argument("--model", default="gpt-3.5-turbo", help="OpenAI model to use (default: gpt-3.5-turbo).")
|
||||
parser.add_argument("--threads", type=int, default=1, help="Number of threads to use for audio generation (default: 1).")
|
||||
parser.add_argument("--output-language", type=str, default=None, help="Language for the podcast overview and script (default: Chinese).")
|
||||
parser.add_argument("--usetime", type=str, default=None, help="Specific time to be mentioned in the podcast script, e.g., '今天', '昨天'.")
|
||||
return parser.parse_args()
|
||||
|
||||
def _load_configuration():
|
||||
"""Selects and loads JSON configuration, and infers tts_provider from the selected file name."""
|
||||
print("Podcast Generation Script")
|
||||
selected_file_path, config_data = select_json_config(return_file_path=True)
|
||||
|
||||
# 从文件名中提取 tts_provider
|
||||
# 假设文件名格式为 'provider-name.json'
|
||||
file_name = os.path.basename(selected_file_path)
|
||||
tts_provider = os.path.splitext(file_name)[0] # 移除 .json 扩展名
|
||||
|
||||
config_data["tts_provider"] = tts_provider # 将 tts_provider 添加到配置数据中
|
||||
|
||||
print("\nLoaded Configuration: " + tts_provider)
|
||||
return config_data
|
||||
|
||||
def _load_configuration_path(config_path: str) -> dict:
|
||||
"""Loads JSON configuration from a specified path and infers tts_provider from the file name."""
|
||||
config_data = _load_json_config(config_path)
|
||||
|
||||
# 从文件名中提取 tts_provider
|
||||
file_name = os.path.basename(config_path)
|
||||
tts_provider = os.path.splitext(file_name)[0] # 移除 .json 扩展名
|
||||
|
||||
config_data["tts_provider"] = tts_provider # 将 tts_provider 添加到配置数据中
|
||||
|
||||
print(f"\nLoaded Configuration: {tts_provider} from {config_path}")
|
||||
return config_data
|
||||
|
||||
def _prepare_openai_settings(args, config_data):
|
||||
"""Determines final OpenAI API key, base URL, and model based on priority."""
|
||||
api_key = args.api_key or config_data.get("api_key") or os.getenv("OPENAI_API_KEY")
|
||||
base_url = args.base_url or config_data.get("base_url") or os.getenv("OPENAI_BASE_URL")
|
||||
model = args.model or config_data.get("model") # Allow model to be None if not provided anywhere
|
||||
|
||||
if not model:
|
||||
model = "gpt-3.5-turbo"
|
||||
print(f"Using default model: {model} as it was not specified via command-line, config, or environment variables.")
|
||||
|
||||
if not api_key:
|
||||
raise ValueError("Error: OpenAI API key is not set. Please provide it via --api-key, in your config file, or as an environment variable (OPENAI_API_KEY).")
|
||||
return api_key, base_url, model
|
||||
|
||||
def _read_prompt_files():
|
||||
"""Reads content from input, overview, and podcast script prompt files."""
|
||||
input_prompt = read_file_content('input.txt')
|
||||
overview_prompt = read_file_content('prompt/prompt-overview.txt')
|
||||
original_podscript_prompt = read_file_content('prompt/prompt-podscript.txt')
|
||||
return input_prompt, overview_prompt, original_podscript_prompt
|
||||
|
||||
def _extract_custom_content(input_prompt_content):
|
||||
"""Extracts custom content from the input prompt."""
|
||||
custom_content = ""
|
||||
custom_begin_tag = '```custom-begin'
|
||||
custom_end_tag = '```custom-end'
|
||||
start_index = input_prompt_content.find(custom_begin_tag)
|
||||
if start_index != -1:
|
||||
end_index = input_prompt_content.find(custom_end_tag, start_index + len(custom_begin_tag))
|
||||
if end_index != -1:
|
||||
custom_content = input_prompt_content[start_index + len(custom_begin_tag):end_index].strip()
|
||||
input_prompt_content = input_prompt_content[end_index + len(custom_end_tag):].strip()
|
||||
return custom_content, input_prompt_content
|
||||
|
||||
def _prepare_podcast_prompts(config_data, original_podscript_prompt, custom_content, usetime: Optional[str] = None, output_language: Optional[str] = None):
|
||||
"""Prepares the podcast script prompts with speaker info and placeholders."""
|
||||
pod_users = config_data.get("podUsers", [])
|
||||
voices = config_data.get("voices", [])
|
||||
turn_pattern = config_data.get("turnPattern", "random")
|
||||
|
||||
original_podscript_prompt = original_podscript_prompt.replace("{{numSpeakers}}", str(len(pod_users)))
|
||||
original_podscript_prompt = original_podscript_prompt.replace("{{turnPattern}}", turn_pattern)
|
||||
original_podscript_prompt = original_podscript_prompt.replace("{{usetime}}", usetime if usetime is not None else "5-6 minutes")
|
||||
original_podscript_prompt = original_podscript_prompt.replace("{{outlang}}", output_language if output_language is not None else "Make sure the input language is set as the output language")
|
||||
|
||||
speaker_id_info = generate_speaker_id_text(pod_users, voices)
|
||||
podscript_prompt = speaker_id_info + "\n\n" + custom_content + "\n\n" + original_podscript_prompt
|
||||
return podscript_prompt, pod_users, voices, turn_pattern # Return voices for potential future use or consistency
|
||||
|
||||
def _generate_overview_content(api_key, base_url, model, overview_prompt, input_prompt, output_language: Optional[str] = None) -> Tuple[str, str, str]:
|
||||
"""Generates overview content using OpenAI CLI, and extracts title and tags."""
|
||||
print(f"\nGenerating overview with OpenAI CLI (Output Language: {output_language})...")
|
||||
try:
|
||||
# Replace the placeholder with the actual output language
|
||||
formatted_overview_prompt = overview_prompt.replace("{{outlang}}", output_language if output_language is not None else "Make sure the input language is set as the output language")
|
||||
|
||||
openai_client_overview = OpenAICli(api_key=api_key, base_url=base_url, model=model, system_message=formatted_overview_prompt)
|
||||
overview_response_generator = openai_client_overview.chat_completion(messages=[{"role": "user", "content": input_prompt}])
|
||||
overview_content = "".join([chunk.choices[0].delta.content for chunk in overview_response_generator if chunk.choices and chunk.choices[0].delta.content])
|
||||
|
||||
print("Generated Overview:")
|
||||
print(overview_content[:100])
|
||||
|
||||
# Extract title (first line) and tags (second line)
|
||||
lines = overview_content.strip().split('\n')
|
||||
title = lines[0].strip() if len(lines) > 0 else ""
|
||||
tags = lines[1].strip() if len(lines) > 1 else ""
|
||||
|
||||
print(f"Extracted Title: {title}")
|
||||
print(f"Extracted Tags: {tags}")
|
||||
|
||||
return overview_content, title, tags
|
||||
except Exception as e:
|
||||
raise RuntimeError(f"Error generating overview: {e}")
|
||||
|
||||
def _generate_podcast_script(api_key, base_url, model, podscript_prompt, overview_content):
|
||||
"""Generates and parses podcast script JSON using OpenAI CLI."""
|
||||
print("\nGenerating podcast script with OpenAI CLI...")
|
||||
# Initialize podscript_json_str outside try block to ensure it's always defined
|
||||
podscript_json_str = ""
|
||||
try:
|
||||
openai_client_podscript = OpenAICli(api_key=api_key, base_url=base_url, model=model, system_message=podscript_prompt)
|
||||
# Generate the response string first
|
||||
podscript_json_str = "".join([chunk.choices[0].delta.content for chunk in openai_client_podscript.chat_completion(messages=[{"role": "user", "content": overview_content}]) if chunk.choices and chunk.choices[0].delta.content])
|
||||
|
||||
podcast_script = None
|
||||
decoder = json.JSONDecoder()
|
||||
idx = 0
|
||||
valid_json_str = ""
|
||||
|
||||
while idx < len(podscript_json_str):
|
||||
try:
|
||||
obj, end = decoder.raw_decode(podscript_json_str[idx:])
|
||||
if isinstance(obj, dict) and "podcast_transcripts" in obj:
|
||||
podcast_script = obj
|
||||
valid_json_str = podscript_json_str[idx : idx + end]
|
||||
break
|
||||
idx += end
|
||||
except json.JSONDecodeError:
|
||||
idx += 1
|
||||
next_brace = podscript_json_str.find('{', idx)
|
||||
if next_brace != -1:
|
||||
idx = next_brace
|
||||
else:
|
||||
break
|
||||
|
||||
if podcast_script is None:
|
||||
raise ValueError(f"Error: Could not find a valid podcast script JSON object with 'podcast_transcripts' key in response. Raw response: {podscript_json_str}")
|
||||
|
||||
print("\nGenerated Podcast Script Length:"+ str(len(podcast_script.get("podcast_transcripts") or [])))
|
||||
print(valid_json_str[:100] + "...")
|
||||
if not podcast_script.get("podcast_transcripts"):
|
||||
raise ValueError("Error: 'podcast_transcripts' array is empty or not found in the generated script. Nothing to convert to audio.")
|
||||
return podcast_script
|
||||
except json.JSONDecodeError as e:
|
||||
raise ValueError(f"Error decoding JSON from podcast script response: {e}. Raw response: {podscript_json_str}")
|
||||
except Exception as e:
|
||||
raise RuntimeError(f"Error generating podcast script: {e}")
|
||||
|
||||
def generate_audio_for_item(item, config_data, tts_adapter: TTSAdapter, max_retries: int = 3):
|
||||
"""Generate audio for a single podcast transcript item using the provided TTS adapter."""
|
||||
speaker_id = item.get("speaker_id")
|
||||
dialog = item.get("dialog")
|
||||
|
||||
voice_code = None
|
||||
volume_adjustment = 0.0 # 默认值为 0.0
|
||||
speed_adjustment = 0.0 # 默认值为 0.0
|
||||
|
||||
|
||||
if config_data and "podUsers" in config_data and 0 <= speaker_id < len(config_data["podUsers"]):
|
||||
pod_user_entry = config_data["podUsers"][speaker_id]
|
||||
voice_code = pod_user_entry.get("code")
|
||||
# 从 voices 列表中获取对应的 volume_adjustment
|
||||
voice_map = {voice.get("code"): voice for voice in config_data.get("voices", []) if voice.get("code")}
|
||||
volume_adjustment = voice_map.get(voice_code, {}).get("volume_adjustment", 0.0)
|
||||
speed_adjustment = voice_map.get(voice_code, {}).get("speed_adjustment", 0.0)
|
||||
|
||||
if not voice_code:
|
||||
raise ValueError(f"No voice code found for speaker_id {speaker_id}. Cannot generate audio for this dialog.")
|
||||
|
||||
# print(f"dialog-before: {dialog}")
|
||||
dialog = re.sub(r'[^\w\s\-,,.。??!!\u4e00-\u9fa5]', '', dialog)
|
||||
print(f"dialog: {dialog}")
|
||||
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
print(f"Calling TTS API for speaker {speaker_id} ({voice_code}) (Attempt {attempt + 1}/{max_retries})...")
|
||||
temp_audio_file = tts_adapter.generate_audio(
|
||||
text=dialog,
|
||||
voice_code=voice_code,
|
||||
output_dir=output_dir,
|
||||
volume_adjustment=volume_adjustment, # 传递音量调整参数
|
||||
speed_adjustment=speed_adjustment # 传递速度调整参数
|
||||
)
|
||||
return temp_audio_file
|
||||
except RuntimeError as e: # Catch specific RuntimeError from TTS adapters
|
||||
print(f"Error generating audio for speaker {speaker_id} ({voice_code}) on attempt {attempt + 1}: {e}")
|
||||
if attempt < max_retries - 1:
|
||||
wait_time = 2 ** attempt
|
||||
print(f"Retrying in {wait_time} seconds...")
|
||||
time.sleep(wait_time)
|
||||
else:
|
||||
raise RuntimeError(f"Max retries ({max_retries}) reached for speaker {speaker_id} ({voice_code}). Audio generation failed.")
|
||||
except Exception as e: # Catch other unexpected errors
|
||||
raise RuntimeError(f"An unexpected error occurred for speaker {speaker_id} ({voice_code}) on attempt {attempt + 1}: {e}")
|
||||
|
||||
def _generate_all_audio_files(podcast_script, config_data, tts_adapter: TTSAdapter, threads):
|
||||
"""Orchestrates the generation of individual audio files."""
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
print("\nGenerating audio files...")
|
||||
# test script
|
||||
# podcast_script = json.loads("{\"podcast_transcripts\":[{\"speaker_id\":0,\"dialog\":\"欢迎收听,来生小酒馆,客官不进来喝点吗?今天咱们来唠唠AI。 小希,你有什么新鲜事来分享吗?\"},{\"speaker_id\":1,\"dialog\":\"当然了, AI 编程工具 Cursor 给开发者送上了一份大礼,付费用户现在可以限时免费体验 GPT 5 的强大编码能力\"}]}")
|
||||
transcripts = podcast_script.get("podcast_transcripts", [])
|
||||
|
||||
max_retries = config_data.get("tts_max_retries", 3) # 从配置中获取最大重试次数,默认3次
|
||||
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
|
||||
audio_files_dict = {}
|
||||
|
||||
with ThreadPoolExecutor(max_workers=threads) as executor:
|
||||
future_to_index = {
|
||||
executor.submit(generate_audio_for_item, item, config_data, tts_adapter, max_retries): i
|
||||
for i, item in enumerate(transcripts)
|
||||
}
|
||||
|
||||
exception_caught = None
|
||||
for future in as_completed(future_to_index):
|
||||
index = future_to_index[future]
|
||||
try:
|
||||
result = future.result()
|
||||
if result:
|
||||
audio_files_dict[index] = result
|
||||
except Exception as e:
|
||||
exception_caught = RuntimeError(f"Error generating audio for item {index}: {e}")
|
||||
# An error occurred, we should stop.
|
||||
break
|
||||
|
||||
# If we broke out of the loop due to an exception, cancel other futures.
|
||||
if exception_caught:
|
||||
print(f"An error occurred: {exception_caught}. Cancelling outstanding tasks.")
|
||||
for f in future_to_index:
|
||||
if not f.done():
|
||||
f.cancel()
|
||||
raise exception_caught
|
||||
|
||||
audio_files = [audio_files_dict[i] for i in sorted(audio_files_dict.keys())]
|
||||
|
||||
print(f"\nFinished generating individual audio files. Total files: {len(audio_files)}")
|
||||
return audio_files
|
||||
|
||||
def _create_ffmpeg_file_list(audio_files):
|
||||
"""Creates the file list for FFmpeg concatenation."""
|
||||
if not audio_files:
|
||||
raise ValueError("No audio files were generated to merge.")
|
||||
|
||||
print(f"Creating file list for ffmpeg at: {file_list_path}")
|
||||
with open(file_list_path, 'w', encoding='utf-8') as f:
|
||||
for audio_file in audio_files:
|
||||
f.write(f"file '{os.path.basename(audio_file)}'\n")
|
||||
|
||||
print("Content of file_list.txt:")
|
||||
with open(file_list_path, 'r', encoding='utf-8') as f:
|
||||
print(f.read())
|
||||
|
||||
from typing import cast # Add import for cast
|
||||
|
||||
def _initialize_tts_adapter(config_data: dict, tts_providers_config_content: Optional[str] = None) -> TTSAdapter:
|
||||
|
||||
"""
|
||||
根据配置数据初始化并返回相应的 TTS 适配器。
|
||||
"""
|
||||
tts_provider = config_data.get("tts_provider")
|
||||
if not tts_provider:
|
||||
raise ValueError("TTS provider is not specified in the configuration.")
|
||||
|
||||
tts_providers_config = {}
|
||||
try:
|
||||
if tts_providers_config_content:
|
||||
tts_providers_config = json.loads(tts_providers_config_content)
|
||||
else:
|
||||
tts_providers_config_content = read_file_content(tts_providers_config_path)
|
||||
tts_providers_config = json.loads(tts_providers_config_content)
|
||||
except Exception as e:
|
||||
print(f"Warning: Could not load tts_providers.json: {e}")
|
||||
|
||||
# 获取当前 tts_provider 的额外参数
|
||||
current_tts_extra_params = tts_providers_config.get(tts_provider.split('-')[0], {}) # 例如 'doubao-tts' -> 'doubao'
|
||||
|
||||
if tts_provider == "index-tts":
|
||||
api_url = config_data.get("apiUrl")
|
||||
if not api_url:
|
||||
raise ValueError("IndexTTS apiUrl is not configured.")
|
||||
return IndexTTSAdapter(api_url_template=cast(str, api_url), tts_extra_params=cast(dict, current_tts_extra_params))
|
||||
elif tts_provider == "edge-tts":
|
||||
api_url = config_data.get("apiUrl")
|
||||
if not api_url:
|
||||
raise ValueError("EdgeTTS apiUrl is not configured.")
|
||||
return EdgeTTSAdapter(api_url_template=cast(str, api_url), tts_extra_params=cast(dict, current_tts_extra_params))
|
||||
|
||||
elif tts_provider == "fish-audio":
|
||||
api_url = config_data.get("apiUrl")
|
||||
headers = config_data.get("headers")
|
||||
request_payload = config_data.get("request_payload")
|
||||
if not all([api_url, headers, request_payload]):
|
||||
raise ValueError("FishAudio requires apiUrl, headers, and request_payload configuration.")
|
||||
return FishAudioAdapter(api_url=cast(str, api_url), headers=cast(dict, headers), request_payload_template=cast(dict, request_payload), tts_extra_params=cast(dict, current_tts_extra_params))
|
||||
elif tts_provider == "minimax":
|
||||
api_url = config_data.get("apiUrl")
|
||||
headers = config_data.get("headers")
|
||||
request_payload = config_data.get("request_payload")
|
||||
if not all([api_url, headers, request_payload]):
|
||||
raise ValueError("Minimax requires apiUrl, headers, and request_payload configuration.")
|
||||
return MinimaxAdapter(api_url=cast(str, api_url), headers=cast(dict, headers), request_payload_template=cast(dict, request_payload), tts_extra_params=cast(dict, current_tts_extra_params))
|
||||
elif tts_provider == "doubao-tts":
|
||||
api_url = config_data.get("apiUrl")
|
||||
headers = config_data.get("headers")
|
||||
request_payload = config_data.get("request_payload")
|
||||
if not all([api_url, headers, request_payload]):
|
||||
raise ValueError("DoubaoTTS requires apiUrl, headers, and request_payload configuration.")
|
||||
return DoubaoTTSAdapter(api_url=cast(str, api_url), headers=cast(dict, headers), request_payload_template=cast(dict, request_payload), tts_extra_params=cast(dict, current_tts_extra_params))
|
||||
elif tts_provider == "gemini-tts":
|
||||
api_url = config_data.get("apiUrl")
|
||||
headers = config_data.get("headers")
|
||||
request_payload = config_data.get("request_payload")
|
||||
if not all([api_url, headers, request_payload]):
|
||||
raise ValueError("GeminiTTS requires apiUrl, headers, and request_payload configuration.")
|
||||
return GeminiTTSAdapter(api_url=cast(str, api_url), headers=cast(dict, headers), request_payload_template=cast(dict, request_payload), tts_extra_params=cast(dict, current_tts_extra_params))
|
||||
else:
|
||||
raise ValueError(f"Unsupported TTS provider: {tts_provider}")
|
||||
|
||||
def generate_podcast_audio():
|
||||
args = _parse_arguments()
|
||||
config_data = _load_configuration()
|
||||
api_key, base_url, model = _prepare_openai_settings(args, config_data)
|
||||
|
||||
input_prompt_content, overview_prompt, original_podscript_prompt = _read_prompt_files()
|
||||
custom_content, input_prompt = _extract_custom_content(input_prompt_content)
|
||||
podscript_prompt, pod_users, voices, turn_pattern = _prepare_podcast_prompts(config_data, original_podscript_prompt, custom_content, args.usetime, args.output_language)
|
||||
|
||||
print(f"\nInput Prompt (input.txt):\n{input_prompt[:100]}...")
|
||||
print(f"\nOverview Prompt (prompt-overview.txt):\n{overview_prompt[:100]}...")
|
||||
print(f"\nPodscript Prompt (prompt-podscript.txt):\n{podscript_prompt[:1000]}...")
|
||||
|
||||
overview_content, title, tags = _generate_overview_content(api_key, base_url, model, overview_prompt, input_prompt, args.output_language)
|
||||
podcast_script = _generate_podcast_script(api_key, base_url, model, podscript_prompt, overview_content)
|
||||
|
||||
tts_adapter = _initialize_tts_adapter(config_data) # 初始化 TTS 适配器
|
||||
|
||||
audio_files = _generate_all_audio_files(podcast_script, config_data, tts_adapter, args.threads)
|
||||
_create_ffmpeg_file_list(audio_files)
|
||||
output_audio_filepath = merge_audio_files()
|
||||
return {
|
||||
"output_audio_filepath": output_audio_filepath,
|
||||
"overview_content": overview_content,
|
||||
"podcast_script": podcast_script,
|
||||
"podUsers": pod_users,
|
||||
}
|
||||
|
||||
|
||||
def generate_podcast_audio_api(args, config_path: str, input_txt_content: str, tts_providers_config_content: str, podUsers_json_content: str) -> dict:
|
||||
"""
|
||||
Generates a podcast audio file based on the provided parameters.
|
||||
|
||||
Args:
|
||||
api_key (str): OpenAI API key.
|
||||
base_url (str): OpenAI API base URL.
|
||||
model (str): OpenAI model to use.
|
||||
threads (int): Number of threads for audio generation.
|
||||
config_path (str): Path to the configuration JSON file.
|
||||
input_txt_content (str): Content of the input prompt.
|
||||
output_language (str): Language for the podcast overview and script (default: Chinese).
|
||||
|
||||
Returns:
|
||||
str: The path to the generated audio file.
|
||||
"""
|
||||
print("Starting podcast audio generation...")
|
||||
config_data = _load_configuration_path(config_path)
|
||||
podUsers = json.loads(podUsers_json_content)
|
||||
config_data["podUsers"] = podUsers
|
||||
|
||||
final_api_key, final_base_url, final_model = _prepare_openai_settings(args, config_data)
|
||||
input_prompt, overview_prompt, original_podscript_prompt = _read_prompt_files()
|
||||
custom_content, input_prompt = _extract_custom_content(input_txt_content)
|
||||
# Assuming `output_language` is passed directly to the function
|
||||
podscript_prompt, pod_users, voices, turn_pattern = _prepare_podcast_prompts(config_data, original_podscript_prompt, custom_content, args.usetime, args.output_language)
|
||||
|
||||
print(f"\nInput Prompt (from provided content):\n{input_prompt[:100]}...")
|
||||
print(f"\nOverview Prompt (prompt-overview.txt):\n{overview_prompt[:100]}...")
|
||||
print(f"\nPodscript Prompt (prompt-podscript.txt):\n{podscript_prompt[:1000]}...")
|
||||
|
||||
overview_content, title, tags = _generate_overview_content(final_api_key, final_base_url, final_model, overview_prompt, input_prompt, args.output_language)
|
||||
podcast_script = _generate_podcast_script(final_api_key, final_base_url, final_model, podscript_prompt, overview_content)
|
||||
|
||||
tts_adapter = _initialize_tts_adapter(config_data, tts_providers_config_content) # 初始化 TTS 适配器
|
||||
|
||||
audio_files = _generate_all_audio_files(podcast_script, config_data, tts_adapter, args.threads)
|
||||
_create_ffmpeg_file_list(audio_files)
|
||||
|
||||
output_audio_filepath = merge_audio_files()
|
||||
|
||||
audio_duration_seconds = get_audio_duration(os.path.join(output_dir, output_audio_filepath))
|
||||
formatted_duration = "00:00"
|
||||
if audio_duration_seconds is not None:
|
||||
minutes = int(audio_duration_seconds // 60)
|
||||
seconds = int(audio_duration_seconds % 60)
|
||||
formatted_duration = f"{minutes:02}:{seconds:02}"
|
||||
|
||||
task_results = {
|
||||
"output_audio_filepath": output_audio_filepath,
|
||||
"overview_content": overview_content,
|
||||
"podcast_script": podcast_script,
|
||||
"podUsers": podUsers,
|
||||
"audio_duration": formatted_duration,
|
||||
"title": title,
|
||||
"tags": tags,
|
||||
}
|
||||
return task_results
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
start_time = time.time()
|
||||
try:
|
||||
generate_podcast_audio()
|
||||
except Exception as e:
|
||||
print(f"\nError: An unexpected error occurred during podcast generation: {e}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
finally:
|
||||
end_time = time.time()
|
||||
execution_time = end_time - start_time
|
||||
print(f"\nTotal execution time: {execution_time:.2f} seconds")
|
||||
139
server/prompt/prompt-overview.txt
Normal file
139
server/prompt/prompt-overview.txt
Normal file
@@ -0,0 +1,139 @@
|
||||
<Additional Customizations>
|
||||
**1. Metadata Generation**
|
||||
|
||||
* **Step 1: Intermediate Core Summary Generation (Internal Step)**
|
||||
* **Task**: First, generate a core idea summary of approximately 150 characters based *only* on the **[body content]** of the document (ignoring titles and subtitles).
|
||||
* **Purpose**: This summary is the sole basis for generating the final title and should **not** be displayed in the final output itself.
|
||||
|
||||
* **Step 2: Title Generation**
|
||||
* **Source**: Must be refined from the "core summary" generated in the previous step.
|
||||
* **Length**: Strictly controlled to be between 15-20 characters.
|
||||
* **Format**: Adopt a "Main Title: Subtitle" structure, using a full-width colon ":" for separation. For example: "Brevity and Precision: Practical Engineering for AI Context".
|
||||
* **Position**: As the **first line** of the final output.
|
||||
|
||||
* **Step 3: Tag Generation**
|
||||
* **Source**: Extract from the **[body content]** of the document (ignoring titles and subtitles).
|
||||
* **Quantity**: 3 to 5.
|
||||
* **Format**: Keywords separated by the "#" symbol (e.g., #Keyword1#Keyword2).
|
||||
* **Position**: As the **second line** of the final output.
|
||||
|
||||
**2. Output Language**
|
||||
|
||||
* **{{outlang}}**.
|
||||
</Additional Customizations>
|
||||
|
||||
<INSTRUCTIONS>
|
||||
<Role>
|
||||
You are a professional document analysis and processing expert, capable of intelligently switching work modes based on the length of the input content.
|
||||
</Role>
|
||||
|
||||
<TaskDeterminationLogic>
|
||||
1. **Evaluate Input**: First, evaluate the word count of the input document.
|
||||
2. **Execution Branch**:
|
||||
* **If Content is Sufficient (e.g., over 200 words)**: Switch to **"Mode A: In-depth Summary"** and strictly follow the <principles> and <output_format> defined below.
|
||||
* **If Content is Insufficient (e.g., under 200 words)**: Switch to **"Mode B: Topic Expansion"**, at which point, ignore the "fidelity to the original text" constraint in the <principles> and instead execute a content generation task.
|
||||
</TaskDeterminationLogic>
|
||||
|
||||
<TaskModeA: In-depth Summary>
|
||||
<Objective>
|
||||
When the input content is sufficient, your task is to distill it into a clear, comprehensive, objective, and structured summary.
|
||||
</Objective>
|
||||
<ExecutionRequirements>
|
||||
- Accurately capture the complete essence and core ideas of the source material.
|
||||
- Strictly adhere to the <principles> (Accuracy, Objectivity, Comprehensiveness).
|
||||
- Generate the summary following the <output_format> and <length_guidelines>.
|
||||
- Simultaneously complete the title and tag generation as specified in the <Additional Customizations>.
|
||||
</ExecutionRequirements>
|
||||
</TaskModeA>
|
||||
|
||||
<TaskModeB: Topic Expansion>
|
||||
<Objective>
|
||||
When the input content is too short to produce a meaningful summary, your task is to logically enrich and expand upon its core theme.
|
||||
</Objective>
|
||||
<ExecutionRequirements>
|
||||
- **Identify Core Theme**: Identify 1-2 core concepts or keywords from the brief input.
|
||||
- **Logical Association and Expansion**: Based on the identified core theme, perform logical association and expand on it from various dimensions (e.g., background, importance, applications, future trends) to generate a more information-rich text.
|
||||
- **Maintain Coherence**: Ensure the expanded content remains highly relevant and logically coherent with the core idea of the original text.
|
||||
- **Ignore Summarization Principles**: In this mode, requirements from the <Core Principles> such as "absolute fidelity to the original text" and "avoid inference" **do not apply**.
|
||||
- **Fulfill Customization Requirements**: You are still required to complete the title and tag generation from the <Additional Customization Requirements> based on the **expanded content**.
|
||||
- **Output**: Directly output the expanded text content without further summarization.
|
||||
</ExecutionRequirements>
|
||||
</TaskModeB>
|
||||
|
||||
<principles>
|
||||
<accuracy>
|
||||
- Maintain absolute factual accuracy and fidelity to source material
|
||||
- Avoid any subjective interpretation, inference or speculation
|
||||
- Preserve complete original meaning, nuance and contextual relationships
|
||||
- Report all quantitative data with precise values and appropriate units
|
||||
- Verify and cross-reference facts before inclusion
|
||||
- Flag any ambiguous or unclear information
|
||||
</accuracy>
|
||||
|
||||
<objectivity>
|
||||
- Present information with strict neutrality and impartiality
|
||||
- Exclude all forms of bias, personal opinions, and editorial commentary
|
||||
- Ensure balanced representation of all perspectives and viewpoints
|
||||
- Maintain objective professional distance from the content
|
||||
- Use precise, factual language free from emotional coloring
|
||||
- Focus solely on verifiable information and evidence
|
||||
</objectivity>
|
||||
|
||||
<comprehensiveness>
|
||||
- Capture all essential information, key themes, and central arguments
|
||||
- Preserve critical context and background necessary for understanding
|
||||
- Include relevant supporting details, examples, and evidence
|
||||
- Maintain logical flow and connections between concepts
|
||||
- Ensure hierarchical organization of information
|
||||
- Document relationships between different components
|
||||
- Highlight dependencies and causal links
|
||||
- Track chronological progression where relevant
|
||||
</comprehensiveness>
|
||||
</principles>
|
||||
|
||||
<output_format>
|
||||
<type>
|
||||
- Return summary in clean markdown format
|
||||
- Do not include markdown code block tags (```markdown ```)
|
||||
- Use standard markdown syntax for formatting (headers, lists, etc.)
|
||||
- Use ### for main headings
|
||||
- Use #### for subheadings where appropriate
|
||||
- Use bullet points (- item) for lists
|
||||
- Ensure proper indentation and spacing
|
||||
- Use appropriate emphasis (**bold**, *italic*) where needed
|
||||
</type>
|
||||
<style>
|
||||
- Use clear, concise language focused on key points
|
||||
- Maintain professional and objective tone throughout
|
||||
- Follow consistent formatting and style conventions
|
||||
- Provide descriptive section headings and subheadings
|
||||
- Utilize bullet points and lists for better readability
|
||||
- Structure content with clear hierarchy and organization
|
||||
- Avoid jargon and overly technical language
|
||||
- Include transition sentences between sections
|
||||
</style>
|
||||
</output_format>
|
||||
|
||||
<validation>
|
||||
<criteria>
|
||||
- Verify all facts and claims match source material exactly
|
||||
- Cross-reference and validate all numerical data points
|
||||
- Ensure logical flow and consistency throughout summary
|
||||
- Confirm comprehensive coverage of key information
|
||||
- Check for objective, unbiased language and tone
|
||||
- Validate accurate representation of source context
|
||||
- Review for proper attribution of ideas and quotes
|
||||
- Verify temporal accuracy and chronological order
|
||||
</criteria>
|
||||
</validation>
|
||||
|
||||
<length_guidelines>
|
||||
- Scale summary length proportionally to source document complexity and length
|
||||
- Minimum: 3-5 well-developed paragraphs per major section
|
||||
- Maximum: 8-10 paragraphs per section for highly complex documents
|
||||
- Adjust level of detail based on information density and importance
|
||||
- Ensure key concepts receive adequate coverage regardless of length
|
||||
</length_guidelines>
|
||||
|
||||
Now, create a summary of the following document:
|
||||
</INSTRUCTIONS>
|
||||
130
server/prompt/prompt-podscript.txt
Normal file
130
server/prompt/prompt-podscript.txt
Normal file
@@ -0,0 +1,130 @@
|
||||
* **Output Format:** No explanatory text!{{outlang}}
|
||||
* **End Format:** Before concluding, review and summarize the previous speeches, which are concise, concise, powerful and thought-provoking.
|
||||
|
||||
<podcast_generation_system>
|
||||
You are a master podcast scriptwriter, adept at transforming diverse input content into a lively, engaging, and natural-sounding conversation between multiple distinct podcast hosts. Your primary objective is to craft authentic, flowing dialogue that captures the spontaneity and chemistry of a real group discussion, completely avoiding any hint of robotic scripting or stiff formality. Think dynamic group interplay, not just information delivery.
|
||||
|
||||
<input>
|
||||
<!-- Podcast settings provide high-level configuration for the script generation. -->
|
||||
<podcast_settings>
|
||||
<!-- Define the total number of speakers in the podcast. Minimum 1. -->
|
||||
<num_speakers>{{numSpeakers}}</num_speakers>
|
||||
<!-- Define the speaking order. Options: "sequential" or "random". -->
|
||||
<turn_pattern>{{turnPattern}}</turn_pattern>
|
||||
</podcast_settings>
|
||||
|
||||
<!-- The source_content contains the factual basis for the podcast discussion. -->
|
||||
<source_content>
|
||||
A block of text containing the information to be discussed. This could be research findings, an article summary, a detailed outline, user chat history related to the topic, or any other relevant raw information.
|
||||
</source_content>
|
||||
</input>
|
||||
|
||||
<guidelines>
|
||||
|
||||
1. **Establish Distinct & Consistent Host Personas for N Speakers:**
|
||||
|
||||
* **Create Personas Based on `num_speakers`:** For the number of speakers specified, create a unique and consistent persona for each.
|
||||
* **Speaker 0 (Lead Host/Moderator):** This speaker should always act as the primary host. They drive the conversation, introduce segments, pose key questions, and help summarize takeaways. Their tone is guiding and engaging.
|
||||
* **Other Speakers (Co-Hosts):** For `speaker_1`, `speaker_2`, etc., create complementary personas that enhance the discussion. Examples of personas include:
|
||||
* **The Expert:** Provides deep, factual insights from the source content.
|
||||
* **The Curious Newcomer:** Asks clarifying questions that a listener might have, acting as an audience surrogate.
|
||||
* **The Practical Skeptic:** Grounds the conversation by questioning assumptions or focusing on real-world implications.
|
||||
* **The Enthusiast:** Brings energy, shares personal anecdotes, and expresses excitement about the topic.
|
||||
* **Consistency is Key:** Ensure each speaker maintains their distinct voice, vocabulary, and perspective throughout the script. Their interaction should feel like a genuine, established group dynamic.
|
||||
|
||||
2. **Adhere to the Specified Turn Pattern:**
|
||||
|
||||
* **If `turn_pattern` is "sequential":** The speakers should talk in a fixed, repeating order (e.g., 0 -> 1 -> 2 -> 0 -> 1 -> 2...). Maintain this strict sequence throughout the script.
|
||||
* **If `turn_pattern` is "random":** The speaking order should be more dynamic and less predictable, mimicking a real group conversation. A speaker might have two short turns in a row to elaborate, another might interject, or one might ask a question that a different speaker answers. Ensure a **balanced distribution** of speaking time over the entire podcast, avoiding any single speaker dominating or being left out for too long.
|
||||
|
||||
3. **Craft Natural & Dynamic Group Dialogue:**
|
||||
|
||||
* **Emulate Real Conversation:** Use contractions (e.g., "don't", "it's"), interjections ("Oh!", "Wow!", "Hmm"), and discourse markers ("you know", "right?", "well").Use common modal particles and pause words.
|
||||
* **Foster Group Interaction:** Write dialogue where speakers genuinely react to one another. They should build on points made by *any* other speaker ("Exactly, and to add to what [Speaker X] said..."), ask follow-up questions to the group, express agreement/disagreement respectfully, and show active listening. The conversation should not be a series of 1-on-1s with the host, but a true group discussion.
|
||||
* **Vary Rhythm & Pace:** Mix short, punchy lines with longer, more explanatory ones. The rhythm should feel spontaneous and collaborative.
|
||||
|
||||
4. **Structure for Flow and Listener Engagement:**
|
||||
|
||||
* **Natural Beginning:** Start with dialogue that flows naturally as if the introduction has just finished.
|
||||
* **Logical Progression & Signposting:** The lead host (`speaker_0`) should guide the listener through the information smoothly, using clear transitions to link different ideas.
|
||||
* **Meaningful Conclusion:** End by summarizing the key takeaways from the group discussion, reinforcing the core message. Close with a final thought or a lingering question for the audience.
|
||||
|
||||
5. **Integrate Source Content Seamlessly & Accurately:**
|
||||
|
||||
* **Translate, Don't Recite:** Rephrase information from the `<source_content>` into conversational language suitable for each host's persona.
|
||||
* **Explain & Contextualize:** Use analogies, examples, and clarifying questions among the hosts to break down complex ideas.
|
||||
* **Weave Information Naturally:** Integrate facts and data from the source within the group dialogue, not as standalone, undigested blocks.
|
||||
|
||||
6. **Length & Pacing:**
|
||||
|
||||
* **Target Duration:** Create a transcript that would result in approximately {{usetime}} of audio (around 800-1000 words total).
|
||||
* **Balanced Speaking Turns:** Aim for a natural conversational flow among speakers rather than extended monologues by one person. Prioritize the most important information from the source content.
|
||||
|
||||
7. **Copy & Replacement:**
|
||||
If a hyphen connects English letters and numbers or letters on both sides, replace it with a space.
|
||||
Replace four-digit Arabic numerals with their Chinese character equivalents, one-to-one.
|
||||
|
||||
</guidelines>
|
||||
|
||||
<examples>
|
||||
<!-- Example for a 3-person podcast with a 'random' turn pattern -->
|
||||
<input>
|
||||
<podcast_settings>
|
||||
<num_speakers>3</num_speakers>
|
||||
<turn_pattern>random</turn_pattern>
|
||||
</podcast_settings>
|
||||
<source_content>
|
||||
Quantum computing uses quantum bits or qubits which can exist in multiple states simultaneously due to superposition. This is different from classical bits (0 or 1). Think of it like a spinning coin. This allows for massive parallel computation.
|
||||
</source_content>
|
||||
</input>
|
||||
<output_format>
|
||||
{{
|
||||
"podcast_transcripts": [
|
||||
{{
|
||||
"speaker_id": 0,
|
||||
"dialog": "Alright team, today we're tackling a big one: Quantum Computing. I know a lot of listeners have been asking, so let's try to demystify it a bit."
|
||||
}},
|
||||
{{
|
||||
"speaker_id": 2,
|
||||
"dialog": "Yes! I'm so excited for this. But honestly, every time I read about it, it feels like science fiction. Where do we even start?"
|
||||
}},
|
||||
{{
|
||||
"speaker_id": 1,
|
||||
"dialog": "That's the perfect place to start, actually. Let's ground it. Forget the 'quantum' part for a second. We all know regular computers use 'bits', right? They're tiny switches, either a zero or a one. On or off. Simple."
|
||||
}},
|
||||
{{
|
||||
"speaker_id": 0,
|
||||
"dialog": "Right, the basic building block of all digital information. So, how do 'qubits'—the quantum version—change the game?"
|
||||
}},
|
||||
{{
|
||||
"speaker_id": 1,
|
||||
"dialog": "This is where the magic happens. A qubit isn't just a zero OR a one. Thanks to a principle called superposition, it can be zero, one, or both at the same time."
|
||||
}},
|
||||
{{
|
||||
"speaker_id": 2,
|
||||
"dialog": "Okay, hold on. 'Both at the same time'? My brain just short-circuited. How is that possible?"
|
||||
}},
|
||||
{{
|
||||
"speaker_id": 1,
|
||||
"dialog": "The classic analogy is a spinning coin. While it's in the air, before it lands, is it heads or tails? It's in a state of both possibilities. A qubit is like that spinning coin, holding multiple values at once."
|
||||
}},
|
||||
{{
|
||||
"speaker_id": 0,
|
||||
"dialog": "Ah, that's a great way to put it. So that 'spinning coin' state is what allows them to be so much more powerful, for massive parallel calculations?"
|
||||
}},
|
||||
{{
|
||||
"speaker_id": 1,
|
||||
"dialog": "Exactly. Because one qubit can hold multiple values, a set of them can explore a huge number of possibilities simultaneously, instead of one by one like a classical computer."
|
||||
}},
|
||||
{{
|
||||
"speaker_id": 2,
|
||||
"dialog": "Wow. Okay, that clicks. It's not just faster, it's a completely different way of thinking about problem-solving."
|
||||
}}
|
||||
]
|
||||
}}
|
||||
</output_format>
|
||||
<final>
|
||||
Transform the source material into a lively and engaging podcast conversation based on the provided settings. Craft dialogue that showcases authentic group chemistry and natural interaction. Use varied speech patterns reflecting real human conversation, ensuring the final script effectively educates and entertains the listener.
|
||||
The final output is a JSON string without code blocks.
|
||||
</final>
|
||||
</podcast_generation_system>
|
||||
383
server/tts_adapters.py
Normal file
383
server/tts_adapters.py
Normal file
@@ -0,0 +1,383 @@
|
||||
import os
|
||||
import json # 导入 json 模块
|
||||
import base64 # 导入 base64 模块
|
||||
from msgpack.fallback import EX_CONSTRUCT
|
||||
import requests
|
||||
import uuid
|
||||
import urllib.parse
|
||||
import re # Add re import
|
||||
import time # Add time import
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Optional # Add Optional import
|
||||
|
||||
class TTSAdapter(ABC):
|
||||
"""
|
||||
抽象基类,定义 TTS 适配器的接口。
|
||||
"""
|
||||
@abstractmethod
|
||||
def generate_audio(self, text: str, voice_code: str, output_dir: str, volume_adjustment: float = 0.0, speed_adjustment: float = 0.0) -> str:
|
||||
"""
|
||||
根据文本和语音代码生成音频文件。
|
||||
|
||||
Args:
|
||||
text (str): 要转换为语音的文本。
|
||||
voice_code (str): 用于生成语音的语音代码。
|
||||
output_dir (str): 生成的音频文件保存的目录。
|
||||
volume_adjustment (float): 音量调整值,正数增加,负数减少。
|
||||
|
||||
Returns:
|
||||
str: 生成的音频文件路径。
|
||||
|
||||
Raises:
|
||||
Exception: 如果音频生成失败。
|
||||
"""
|
||||
pass
|
||||
|
||||
def _apply_audio_effects(self, audio_file_path: str, volume_adjustment: float, speed_adjustment: float) -> str:
|
||||
"""
|
||||
对音频文件应用音量和速度调整。
|
||||
Args:
|
||||
audio_file_path (str): 原始音频文件路径。
|
||||
volume_adjustment (float): 音量调整值。例如,6.0 表示增加 6dB,-3.0 表示减少 3dB。
|
||||
speed_adjustment (float): 速度调整值,正数增加,负数减少。speed_adjustment 是百分比,例如 10 表示 +10%,-10 表示 -10%。
|
||||
Returns:
|
||||
str: 调整后的音频文件路径。
|
||||
Raises:
|
||||
ImportError: 如果 'pydub' 模块未安装。
|
||||
RuntimeError: 如果音频效果调整失败。
|
||||
"""
|
||||
if volume_adjustment == 0.0 and speed_adjustment == 0.0:
|
||||
return audio_file_path
|
||||
|
||||
try:
|
||||
from pydub import AudioSegment
|
||||
except ImportError:
|
||||
raise ImportError("The 'pydub' module is required for audio adjustments. Please install it using 'pip install pydub'.")
|
||||
|
||||
current_audio_file = audio_file_path
|
||||
base, ext = os.path.splitext(audio_file_path)
|
||||
|
||||
try:
|
||||
audio = AudioSegment.from_file(current_audio_file)
|
||||
|
||||
# 应用音量调整
|
||||
if volume_adjustment != 0.0:
|
||||
adjusted_audio = audio + volume_adjustment
|
||||
new_file_path = f"{base}_vol_adjusted{ext}"
|
||||
adjusted_audio.export(new_file_path, format=ext[1:])
|
||||
os.remove(current_audio_file)
|
||||
current_audio_file = new_file_path
|
||||
audio = adjusted_audio
|
||||
print(f"Applied volume adjustment of {volume_adjustment} dB to {os.path.basename(current_audio_file)}")
|
||||
|
||||
# 应用速度调整
|
||||
if speed_adjustment != 0.0:
|
||||
speed_multiplier = 1 + speed_adjustment / 100.0
|
||||
adjusted_audio = audio.speedup(playback_speed=speed_multiplier, chunk_size=150, crossfade=25)
|
||||
new_file_path = f"{base}_speed_adjusted{ext}"
|
||||
adjusted_audio.export(new_file_path, format=ext[1:])
|
||||
if current_audio_file != audio_file_path and os.path.exists(current_audio_file): # 只有当 current_audio_file 是中间文件时才删除
|
||||
os.remove(current_audio_file)
|
||||
else: # 如果没有音量调整,current_audio_file 仍然是原始文件
|
||||
os.remove(audio_file_path)
|
||||
current_audio_file = new_file_path
|
||||
print(f"Applied speed adjustment of {speed_adjustment}% to {os.path.basename(current_audio_file)}")
|
||||
|
||||
return current_audio_file
|
||||
|
||||
except Exception as e:
|
||||
# 如果发生错误,清理任何中间文件
|
||||
if current_audio_file != audio_file_path and os.path.exists(current_audio_file):
|
||||
os.remove(current_audio_file)
|
||||
raise RuntimeError(f"Error applying audio effects to {os.path.basename(audio_file_path)}: {e}")
|
||||
|
||||
|
||||
class IndexTTSAdapter(TTSAdapter):
|
||||
"""
|
||||
IndexTTS 的 TTS 适配器实现。
|
||||
"""
|
||||
def __init__(self, api_url_template: str, tts_extra_params: Optional[dict] = None):
|
||||
self.api_url_template = api_url_template
|
||||
self.tts_extra_params = tts_extra_params if tts_extra_params is not None else {}
|
||||
|
||||
def generate_audio(self, text: str, voice_code: str, output_dir: str, volume_adjustment: float = 0.0, speed_adjustment: float = 0.0) -> str:
|
||||
encoded_text = urllib.parse.quote(text)
|
||||
|
||||
self.api_url_template = self.tts_extra_params.get("api_url", self.api_url_template)
|
||||
api_url = self.api_url_template.replace("{{text}}", encoded_text).replace("{{voiceCode}}", voice_code)
|
||||
|
||||
if not api_url:
|
||||
raise ValueError("API URL is not configured for IndexTTS. Cannot generate audio.")
|
||||
|
||||
try:
|
||||
print(f"Calling IndexTTS API with voice {voice_code}...")
|
||||
response = requests.get(api_url, stream=True, timeout=30)
|
||||
response.raise_for_status()
|
||||
|
||||
temp_audio_file = os.path.join(output_dir, f"temp_audio_{uuid.uuid4()}.wav")
|
||||
with open(temp_audio_file, 'wb') as f:
|
||||
for chunk in response.iter_content(chunk_size=8192):
|
||||
f.write(chunk)
|
||||
print(f"Generated {os.path.basename(temp_audio_file)}")
|
||||
# 应用音量调整
|
||||
final_audio_file = self._apply_audio_effects(temp_audio_file, volume_adjustment, speed_adjustment)
|
||||
return final_audio_file
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
raise RuntimeError(f"Error calling IndexTTS API with voice {voice_code}: {e}")
|
||||
except Exception as e: # Catch other potential errors like JSON parsing or data decoding
|
||||
raise RuntimeError(f"Error processing IndexTTS API response for voice {voice_code}: {e}")
|
||||
|
||||
class EdgeTTSAdapter(TTSAdapter):
|
||||
"""
|
||||
EdgeTTS 的 TTS 适配器实现。
|
||||
"""
|
||||
def __init__(self, api_url_template: str, tts_extra_params: Optional[dict] = None):
|
||||
self.api_url_template = api_url_template
|
||||
self.tts_extra_params = tts_extra_params if tts_extra_params is not None else {}
|
||||
|
||||
def generate_audio(self, text: str, voice_code: str, output_dir: str, volume_adjustment: float = 0.0, speed_adjustment: float = 0.0) -> str:
|
||||
encoded_text = urllib.parse.quote(text)
|
||||
|
||||
self.api_url_template = self.tts_extra_params.get("api_url", self.api_url_template)
|
||||
api_url = self.api_url_template.replace("{{text}}", encoded_text).replace("{{voiceCode}}", voice_code)
|
||||
|
||||
if not api_url:
|
||||
raise ValueError("API URL is not configured for EdgeTTS. Cannot generate audio.")
|
||||
|
||||
try:
|
||||
print(f"Calling EdgeTTS API with voice {voice_code}...")
|
||||
response = requests.get(api_url, stream=True, timeout=30)
|
||||
response.raise_for_status()
|
||||
|
||||
temp_audio_file = os.path.join(output_dir, f"temp_audio_{uuid.uuid4()}.mp3")
|
||||
with open(temp_audio_file, 'wb') as f:
|
||||
for chunk in response.iter_content(chunk_size=8192):
|
||||
f.write(chunk)
|
||||
print(f"Generated {os.path.basename(temp_audio_file)}")
|
||||
# 应用音量调整
|
||||
final_audio_file = self._apply_audio_effects(temp_audio_file, volume_adjustment, speed_adjustment)
|
||||
return final_audio_file
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
raise RuntimeError(f"Error calling EdgeTTS API with voice {voice_code}: {e}")
|
||||
except Exception as e: # Catch other potential errors like JSON parsing or data decoding
|
||||
raise RuntimeError(f"Error processing EdgeTTS API response for voice {voice_code}: {e}")
|
||||
|
||||
# 尝试导入 msgpack
|
||||
class FishAudioAdapter(TTSAdapter):
|
||||
"""
|
||||
FishAudio 的 TTS 适配器实现。
|
||||
"""
|
||||
def __init__(self, api_url: str, headers: dict, request_payload_template: dict, tts_extra_params: Optional[dict] = None):
|
||||
self.api_url = api_url
|
||||
self.headers = headers
|
||||
self.request_payload_template = request_payload_template
|
||||
self.tts_extra_params = tts_extra_params if tts_extra_params is not None else {}
|
||||
|
||||
def generate_audio(self, text: str, voice_code: str, output_dir: str, volume_adjustment: float = 0.0, speed_adjustment: float = 0.0) -> str:
|
||||
try:
|
||||
import msgpack # 延迟导入 msgpack
|
||||
except ImportError:
|
||||
raise ImportError("The 'msgpack' module is required for FishAudioAdapter. Please install it using 'pip install msgpack'.")
|
||||
|
||||
# 构造请求体
|
||||
payload = self.request_payload_template.copy()
|
||||
payload["text"] = text
|
||||
payload["reference_id"] = voice_code
|
||||
self.headers["Authorization"] = self.headers["Authorization"].replace("{{api_key}}", self.tts_extra_params["api_key"])
|
||||
|
||||
# 使用 msgpack 打包请求体
|
||||
packed_payload = msgpack.packb(payload, use_bin_type=True)
|
||||
|
||||
try:
|
||||
print(f"Calling FishAudio API with voice {voice_code}...")
|
||||
response = requests.post(self.api_url, data=packed_payload, headers=self.headers, timeout=60) # Increased timeout for FishAudio
|
||||
|
||||
temp_audio_file = os.path.join(output_dir, f"temp_audio_{uuid.uuid4()}.mp3")
|
||||
with open(temp_audio_file, "wb") as f:
|
||||
f.write(response.content)
|
||||
|
||||
print(f"Generated {os.path.basename(temp_audio_file)}")
|
||||
# 应用音量调整
|
||||
final_audio_file = self._apply_audio_effects(temp_audio_file, volume_adjustment, speed_adjustment)
|
||||
return final_audio_file
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
raise RuntimeError(f"Error calling FishAudio API with voice {voice_code}: {e}")
|
||||
except Exception as e: # Catch other potential errors like JSON parsing or data decoding
|
||||
raise RuntimeError(f"Error processing FishAudio API response for voice {voice_code}: {e}")
|
||||
|
||||
|
||||
class MinimaxAdapter(TTSAdapter):
|
||||
"""
|
||||
Minimax 的 TTS 适配器实现。
|
||||
"""
|
||||
def __init__(self, api_url: str, headers: dict, request_payload_template: dict, tts_extra_params: Optional[dict] = None):
|
||||
self.api_url = api_url
|
||||
self.headers = headers
|
||||
self.request_payload_template = request_payload_template
|
||||
self.tts_extra_params = tts_extra_params if tts_extra_params is not None else {}
|
||||
|
||||
def generate_audio(self, text: str, voice_code: str, output_dir: str, volume_adjustment: float = 0.0, speed_adjustment: float = 0.0) -> str:
|
||||
|
||||
# 构造请求体
|
||||
payload = self.request_payload_template.copy()
|
||||
payload["text"] = text
|
||||
payload["voice_setting"]["voice_id"] = voice_code
|
||||
self.headers["Authorization"] = self.headers["Authorization"].replace("{{api_key}}", self.tts_extra_params["api_key"])
|
||||
self.api_url = self.api_url.replace("{{group_id}}", self.tts_extra_params["group_id"])
|
||||
|
||||
# Minimax 返回十六进制编码的音频数据,需要解码
|
||||
if payload.get("output_format") == "hex":
|
||||
is_hex_output = True
|
||||
else:
|
||||
is_hex_output = False
|
||||
|
||||
try:
|
||||
print(f"Calling Minimax API with voice {voice_code}...")
|
||||
response = requests.post(self.api_url, json=payload, headers=self.headers, timeout=60) # Increased timeout for Minimax
|
||||
|
||||
temp_audio_file = os.path.join(output_dir, f"temp_audio_{uuid.uuid4()}.mp3")
|
||||
response_data = response.json()
|
||||
# 解析并保存音频数据
|
||||
if is_hex_output:
|
||||
audio_hex = response_data.get('data', {}).get('audio')
|
||||
audio_bytes = bytes.fromhex(audio_hex)
|
||||
with open(temp_audio_file, "wb") as f:
|
||||
f.write(audio_bytes)
|
||||
else:
|
||||
audio_url = response_data.get('data', {}).get('audio')
|
||||
if not audio_url:
|
||||
raise RuntimeError("Minimax API returned success but no audio URL found when output_format is not hex.")
|
||||
|
||||
# 下载音频文件
|
||||
audio_response = requests.get(audio_url, stream=True, timeout=30)
|
||||
audio_response.raise_for_status()
|
||||
with open(temp_audio_file, 'wb') as f:
|
||||
for chunk in audio_response.iter_content(chunk_size=8192):
|
||||
f.write(chunk)
|
||||
|
||||
print(f"Generated {os.path.basename(temp_audio_file)}")
|
||||
# 应用音量调整
|
||||
final_audio_file = self._apply_audio_effects(temp_audio_file, volume_adjustment, speed_adjustment)
|
||||
return final_audio_file
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
raise RuntimeError(f"Error calling Minimax API with voice {voice_code}: {e}")
|
||||
except Exception as e: # Catch other potential errors like JSON parsing or data decoding
|
||||
raise RuntimeError(f"Error processing Minimax API response for voice {voice_code}: {e}")
|
||||
|
||||
|
||||
class DoubaoTTSAdapter(TTSAdapter):
|
||||
"""
|
||||
豆包TTS 的 TTS 适配器实现。
|
||||
"""
|
||||
def __init__(self, api_url: str, headers: dict, request_payload_template: dict, tts_extra_params: Optional[dict] = None):
|
||||
self.api_url = api_url
|
||||
self.headers = headers
|
||||
self.request_payload_template = request_payload_template
|
||||
self.tts_extra_params = tts_extra_params if tts_extra_params is not None else {}
|
||||
|
||||
def generate_audio(self, text: str, voice_code: str, output_dir: str, volume_adjustment: float = 0.0, speed_adjustment: float = 0.0) -> str:
|
||||
session = requests.Session()
|
||||
try:
|
||||
payload = self.request_payload_template.copy()
|
||||
payload['req_params']['text'] = text
|
||||
payload['req_params']['speaker'] = voice_code
|
||||
self.headers["X-Api-App-Id"] = self.headers["X-Api-App-Id"].replace("{{X-Api-App-Id}}", self.tts_extra_params["X-Api-App-Id"])
|
||||
self.headers["X-Api-Access-Key"] = self.headers["X-Api-Access-Key"].replace("{{X-Api-Access-Key}}", self.tts_extra_params["X-Api-Access-Key"])
|
||||
|
||||
print(f"Calling Doubao TTS API with voice {voice_code}...")
|
||||
response = session.post(self.api_url, headers=self.headers, json=payload, stream=True, timeout=30)
|
||||
response.raise_for_status()
|
||||
|
||||
audio_data = bytearray()
|
||||
for chunk in response.iter_lines(decode_unicode=True):
|
||||
if not chunk:
|
||||
continue
|
||||
data = json.loads(chunk)
|
||||
|
||||
if data.get("code", 0) == 0 and "data" in data and data["data"]:
|
||||
import base64
|
||||
chunk_audio = base64.b64decode(data["data"])
|
||||
audio_data.extend(chunk_audio)
|
||||
continue
|
||||
if data.get("code", 0) == 0 and "sentence" in data and data["sentence"]:
|
||||
continue
|
||||
if data.get("code", 0) == 20000000:
|
||||
break
|
||||
if data.get("code", 0) > 0:
|
||||
raise RuntimeError(f"Doubao TTS API returned error: {data}")
|
||||
|
||||
if not audio_data:
|
||||
raise RuntimeError("Doubao TTS API returned success but no audio data received.")
|
||||
|
||||
temp_audio_file = os.path.join(output_dir, f"temp_audio_{uuid.uuid4()}.mp3")
|
||||
with open(temp_audio_file, "wb") as f:
|
||||
f.write(audio_data)
|
||||
|
||||
print(f"Generated {os.path.basename(temp_audio_file)}")
|
||||
# 应用音量调整
|
||||
final_audio_file = self._apply_audio_effects(temp_audio_file, volume_adjustment, speed_adjustment)
|
||||
return final_audio_file
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
raise RuntimeError(f"Error calling Doubao TTS API with voice {voice_code}: {e}")
|
||||
except Exception as e:
|
||||
raise RuntimeError(f"Error processing Doubao TTS API response for voice {voice_code}: {e}")
|
||||
finally:
|
||||
session.close()
|
||||
|
||||
|
||||
class GeminiTTSAdapter(TTSAdapter):
|
||||
"""
|
||||
Gemini TTS 的 TTS 适配器实现。
|
||||
"""
|
||||
def __init__(self, api_url: str, headers: dict, request_payload_template: dict, tts_extra_params: Optional[dict] = None):
|
||||
self.api_url = api_url
|
||||
self.headers = headers
|
||||
self.request_payload_template = request_payload_template
|
||||
self.tts_extra_params = tts_extra_params if tts_extra_params is not None else {}
|
||||
|
||||
def generate_audio(self, text: str, voice_code: str, output_dir: str, volume_adjustment: float = 0.0, speed_adjustment: float = 0.0) -> str:
|
||||
try:
|
||||
# 构造请求体
|
||||
payload = self.request_payload_template.copy()
|
||||
model_name = payload['model']
|
||||
api_url = self.api_url.replace('{{model}}', model_name) if '{{model}}' in self.api_url else self.api_url
|
||||
|
||||
# 更新请求 payload
|
||||
payload['contents'][0]['parts'][0]['text'] = text
|
||||
payload['generationConfig']['speechConfig']['voiceConfig']['prebuiltVoiceConfig']['voiceName'] = voice_code
|
||||
|
||||
# 更新 headers 中的 API key
|
||||
gemini_api_key = self.tts_extra_params.get('api_key')
|
||||
self.headers['x-goog-api-key'] = gemini_api_key
|
||||
|
||||
print(f"Calling Gemini TTS API with voice {voice_code}...")
|
||||
response = requests.post(api_url, headers=self.headers, json=payload, timeout=60)
|
||||
response.raise_for_status()
|
||||
|
||||
response_data = response.json()
|
||||
audio_data_base64 = response_data['candidates'][0]['content']['parts'][0]['inlineData']['data']
|
||||
audio_data_pcm = base64.b64decode(audio_data_base64)
|
||||
|
||||
# Gemini 返回的是 PCM 数据,需要保存为 WAV
|
||||
temp_audio_file = os.path.join(output_dir, f"temp_audio_{uuid.uuid4()}.wav") # 更改为 .wav 扩展名
|
||||
import wave # 导入 wave 模块
|
||||
with wave.open(temp_audio_file, "wb") as f:
|
||||
f.setnchannels(1)
|
||||
f.setsampwidth(2) # 假设 16-bit PCM
|
||||
f.setframerate(24000) # 假设 24kHz 采样率
|
||||
f.writeframes(audio_data_pcm)
|
||||
|
||||
print(f"Generated {os.path.basename(temp_audio_file)}")
|
||||
# 应用音量和速度调整
|
||||
final_audio_file = self._apply_audio_effects(temp_audio_file, volume_adjustment, speed_adjustment)
|
||||
return final_audio_file
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
raise RuntimeError(f"Error calling Gemini TTS API with voice {voice_code}: {e}")
|
||||
except Exception as e:
|
||||
raise RuntimeError(f"Error processing Gemini TTS API response for voice {voice_code}: {e}")
|
||||
Reference in New Issue
Block a user