feat(api): 新增FastAPI服务接口及完善TTS配置管理

实现FastAPI服务接口,支持播客生成任务提交、状态查询和音频下载功能
重构TTS配置管理,统一处理不同TTS服务商的API URL配置
更新README文档,添加API使用说明和项目徽章
添加定时清理输出目录功能,优化资源管理
This commit is contained in:
hex2077
2025-08-11 22:09:18 +08:00
parent 924ff6ef83
commit c2930e4340
8 changed files with 665 additions and 321 deletions

261
README.md
View File

@@ -9,13 +9,13 @@
---
## ✨ 核心功能
## ✨ 核心亮点
* **🤖 AI 驱动脚本**借助强大的 OpenAI 模型,自动创作高质量、有深度的播客对话脚本。
* **👥 多角色支持**自由定义多个播客角色(如主持、嘉宾),并为每个角色指定独一无二的 TTS 语音。
* **🔌 灵活的 TTS 集成**通过简单的 API URL 配置,无缝对接您自建的或第三方的 TTS 服务。
* **🔊 智能音频合**自动将各个角色的语音片段精准拼接,并支持**音量与语速调整**,合成一个完整的、流畅的播客音频文件 (`.wav` 格式)。
* **⌨️ 便捷的命令行接口**提供清晰的命令行参数,让您对播客生成过程的每一个环节都了如指掌。
* **🤖 AI 驱动脚本**: 借助强大的 OpenAI 模型,自动创作高质量、有深度的播客对话脚本。
* **👥 多角色支持**: 自由定义多个播客角色(如主持、嘉宾),并为每个角色指定独一无二的 TTS 语音。
* **🔌 灵活的 TTS 集成**: 通过简单的 API URL 配置,无缝对接您自建的或第三方的 TTS 服务。
* **🔊 智能音频合**: 自动将各个角色的语音片段精准拼接,并支持**音量与语速调整**,合成一个完整的、流畅的播客音频文件 (`.wav` 格式)。
* **⌨️ 便捷的命令行接口**: 提供清晰的命令行参数,让您对播客生成过程的每一个环节都了如指掌。
---
@@ -28,7 +28,7 @@
2. **FFmpeg**
* 本项目依赖 FFmpeg 进行音频合并。请访问 [FFmpeg 官网](https://ffmpeg.org/download.html) 下载并安装。
* **重要提示**安装完成后,请确保 `ffmpeg` 命令已添加到您系统的环境变量 (PATH) 中,以便脚本可以正常调用。
* **重要提示**: 安装完成后,请确保 `ffmpeg` 命令已添加到您系统的环境变量 (PATH) 中,以便脚本可以正常调用。
### 🐍 Python 依赖
@@ -37,6 +37,12 @@
pip install requests openai pydub msgpack
```
> **依赖说明**:
> - `requests`: 用于向TTS服务API发送HTTP请求
> - `openai`: 用于与OpenAI API交互生成播客脚本
> - `pydub`: 用于音频处理,如调整音量和语速
> - `msgpack`: 用于与某些TTS服务如Fish Audio进行高效的数据序列化
---
## 🚀 快速开始
@@ -71,11 +77,62 @@ python podcast_generator.py [可选参数]
#### **运行示例**
```bash
# 使用 gpt-4o 模型和 4 个线程来生成播客
python podcast_generator.py --api-key sk-xxxxxx --model gpt-4o --threads 4
# 使用 gpt-4o 模型、edge-tts 服务和 4 个线程来生成播客
python podcast_generator.py --api-key sk-xxxxxx --model gpt-4o --tts-provider edge --threads 4
```
### 4. 自定义 AI 提示词(`custom` 代码块)
### 5. 使用 Web API (main.py)
本项目还提供了一个基于 FastAPI 的 Web 服务,允许您通过 HTTP 请求生成播客。
#### 启动 Web 服务
```bash
python main.py
```
默认情况下,服务将在 `http://localhost:8000` 上运行。
#### API 端点
1. **生成播客** - `POST /generate-podcast`
- 参数:
- `api_key`: OpenAI API 密钥
- `base_url`: OpenAI API 基础 URL (可选)
- `model`: OpenAI 模型名称 (可选)
- `input_txt_content`: 输入文本内容
- `tts_providers_config_content`: TTS 提供商配置内容
- `podUsers_json_content`: 播客用户 JSON 配置
- `threads`: 线程数 (可选,默认为 1)
- `tts_provider`: TTS 提供商名称 (可选,默认为 "index-tts")
2. **获取播客生成状态** - `GET /podcast-status`
- 需要提供 `X-Auth-Id` 头部
3. **下载播客** - `GET /download-podcast/`
- 参数:
- `file_name`: 要下载的文件名
4. **获取语音列表** - `GET /get-voices`
- 参数:
- `tts_provider`: TTS 提供商名称 (可选,默认为 "tts")
#### API 使用示例
```bash
# 启动服务后,使用 curl 发送请求生成播客
curl -X POST "http://localhost:8000/generate-podcast" \
-H "X-Auth-Id: your-auth-id" \
-F "api_key=sk-xxxxxx" \
-F "model=gpt-4o" \
-F "input_txt_content=人工智能的未来发展" \
-F "tts_providers_config_content={\"index\": {\"api_key\": \"your-api-key\"}}" \
-F "podUsers_json_content=[{\"code\":\"zh-CN-XiaoxiaoNeural\",\"role\":\"主持人\"}],\"voices\":[{\"name\":\"Xiaoxiao\",\"code\":\"zh-CN-XiaoxiaoNeural\"}]" \
-F "threads=4" \
-F "tts_provider=index-tts"
```
### 4. 自定义 AI 提示词 (`custom` 代码块)
为了提供更细致的 AI 指令或添加特定上下文,您可以在 `input.txt` 文件中嵌入 `custom` 代码块。此代码块中的内容将作为额外指示,被内置到播客脚本生成的核心提示词(`prompt-podscript.txt`)之中,从而影响 AI 的生成行为。
@@ -91,23 +148,13 @@ python podcast_generator.py --api-key sk-xxxxxx --model gpt-4o --threads 4
```custom-end
```
**效果**
`custom` 代码块中的所有文本内容(不包括 `custom-begin``custom-end` 标签本身)会被提取出来,并附加到 [`prompt/prompt-podscript.txt`](prompt/prompt-podscript.txt) 模板处理后的内容之中。这意味着,这些自定义指令将直接影响 AI 在生成具体播客对话脚本时的决策和风格,帮助您更精准地控制输出。
**示例场景**
如果您希望 AI 在讨论一个技术话题时,特别强调某个技术趋势的未来发展,您可以在 `input.txt` 中添加:
```
```custom-begin
请在讨论中预见性地分析人工智能在未来五年内可能带来的颠覆性变革,并提及量子计算对现有加密技术的潜在影响。
```custom-end
```
---
## ⚙️ 配置文件详解 (`config/*.json`)
## ⚙️ 配置文件详解
配置文件是整个项目的“大脑”,它告诉脚本如何与 AI 和 TTS 服务协同工作。
### `config/[tts-provider].json` (TTS 角色与语音配置)
这是您的 TTS 核心配置文件,文件名与您通过 `--tts-provider` 参数指定的提供商对应。它告诉脚本如何与 TTS 服务协同工作。
```json
{
@@ -124,94 +171,57 @@ python podcast_generator.py --api-key sk-xxxxxx --model gpt-4o --threads 4
"voices": [
{
"name": "XiaoMin",
"alias": "晓敏",
"code": "yue-CN-XiaoMinNeural",
"locale": "yue-CN",
"gender": "Female",
"usedname": "晓敏"
},
{
"name": "YunSong",
"alias": "云松",
"code": "yue-CN-YunSongNeural",
"locale": "yue-CN",
"gender": "Male",
"usedname": "云松"
"volume_adjustment": 1.0,
"speed_adjustment": 5.0
}
],
"apiUrl": "http://localhost:5000/api/tts?text={{text}}&voiceCode={{voiceCode}}",
"turnPattern": "random"
"turnPattern": "random",
"tts_max_retries": 3
}
```
* `tts_max_retries` (可选): TTS API 调用失败时的最大重试次数(默认为 `3`)。
* `podUsers`: 定义播客中的**角色**。每个角色的 `code` 必须对应 `voices` 列表中的一个有效语音。
* `voices`: 定义所有可用的 TTS **语音**,可包含 `volume_adjustment` (音量调整,单位 dB例如 `6.0` 增加 6dB`-3.0` 减少 3dB) 和 `speed_adjustment` (语速调整,单位百分比,例如 `10.0` 增加 10% 语速,`-10.0` 减少 10% 语速) 参数
* `apiUrl`: 您的 TTS 服务 API 端点。`{{text}}` 将被替换为对话文本,`{{voiceCode}}` 将被替换为角色的语音代码
* `voices`: 定义所有可用的 TTS **语音**
* `volume_adjustment` (可选): 音量调整 (dB)。例如 `6.0` 增加 6dB
* `speed_adjustment` (可选): 语速调整 (%)。例如 `10.0` 增加 10% 语速。
* `apiUrl`: 您的 TTS 服务 API 端点。`{{text}}``{{voiceCode}}` 是占位符。
* `turnPattern`: 定义角色对话的**轮流模式**,例如 `random` (随机) 或 `sequential` (顺序)。
* `tts_max_retries` (可选): TTS API 调用失败时的最大重试次数(默认为 `3`)。
### `tts_providers.json` 文件说明
### `config/tts_providers.json` (TTS 服务商认证)
`tts_providers.json` 文件用于存储各种 TTS 服务提供商的认证信息如 API 密钥等。该文件在以下场景中被使用:
此文件用于统一管理所有 TTS 服务提供商的认证信息如 API 密钥)。
1.`check/` 目录下的各种 TTS 服务测试脚本中,用于获取相应的认证信息
2.`podcast_generator.py` 脚本中,用于获取特定 TTS 服务的额外配置参数
该文件的结构如下:
```json
{
"index": {
"api_key": null
},
"edge": {
"api_key": null
},
"doubao": {
"X-Api-App-Id": "null",
"X-Api-Access-Key": "null"
},
"fish": {
"api_key": "null"
},
"minimax": {
"group_id": "null",
"api_key": "null"
},
"gemini": {
"api_key": "null"
}
"index": { "api_key": null },
"edge": { "api_key": null },
"doubao": { "X-Api-App-Id": "null", "X-Api-Access-Key": "null" },
"fish": { "api_key": "null" },
"minimax": { "group_id": "null", "api_key": "null" },
"gemini": { "api_key": "null" }
}
```
**注意**: 实际使用时,请将 `"null"` 替换为有效的认证信息。可以创建一个 `tts_providers-local.json` 来存放真实密钥,此文件已被 `.gitignore` 忽略。
注意事项:
* 实际使用时,请将 `"null"` 替换为相应的认证信息
* `tts_providers-local.json` 是一个本地配置文件示例,包含了实际的认证信息(请勿将其提交到版本控制系统中)
---
## 🔌 TTS (Text-to-Speech) 服务集成
## 🔌 支持的 TTS 服务
本项目设计为高度灵活,支持多种 TTS 服务,无论是本地部署还是基于云的网络服务,都可以通过简单的配置进行集成
本项目设计为高度灵活,支持多种 TTS 服务。
### 💻 本地 TTS 接口支持
您可以将以下开源项目作为本地 TTS 服务部署,并通过 `apiUrl` 配置集成到本项目中:
* **index-tts**: [https://github.com/index-tts/index-tts](https://github.com/index-tts/index-tts)
* **配合使用**: 需要配合 `ext/index-tts-api.py` 文件运行,该文件提供了一个简单的 API 接口,将 `index-tts` 封装为本项目可调用的服务。
* **edge-tts**: [https://github.com/zuoban/tts](https://github.com/zuoban/tts)
* 这是一个通用的 TTS 库,您可以通过自定义适配器将其集成。
### 🌐 网络 TTS 接口支持
本项目也可以轻松配置集成各种网络 TTS 服务,只需确保您的 `apiUrl` 配置符合服务提供商的要求。常见的支持服务包括:
* **豆包 TTS (Doubao TTS)**
* **Minimax TTS**
* **Fish Audio TTS**
* **Gemini TTS**
* **OpenAI TTS**(计划中)
* **Azure TTS**(计划中)
* **Google Cloud Text-to-Speech (Vertex AI)**(计划中)
| 服务商 | 类型 | 支持状态 |
| :--- | :--- | :---: |
| **Index-TTS** | 本地 | ✅ 已支持 |
| **Edge-TTS** | 本地 | ✅ 已支持 |
| **豆包 (Doubao)** | 网络 | ✅ 已支持 |
| **Minimax** | 网络 | ✅ 已支持 |
| **Fish Audio**| 网络 | ✅ 已支持 |
| **Gemini** | 网络 | ✅ 已支持 |
| **OpenAI TTS**| 网络 | 计划中 |
| **Azure TTS** | 网络 | 计划中 |
---
@@ -219,37 +229,20 @@ python podcast_generator.py --api-key sk-xxxxxx --model gpt-4o --threads 4
所有成功生成的播客音频文件将自动保存在 `output/` 目录下。文件名格式为 `podcast_` 加上生成时的时间戳,例如 `podcast_1678886400.wav`
---
## 🎧 示例音频
您可以在 `example/` 文件夹中找到使用不同 TTS 服务生成的播客示例音频
您可以在 `example/` 文件夹中找到使用不同 TTS 服务生成的播客示例音频
* **Edge TTS 生成示例**:
[edgeTTS](example/edgeTTS.wav)
* **Index TTS 生成示例**:
[indexTTS](example/indexTTS.wav)
* **豆包 TTS 生成示例**:
[doubaoTTS](example/doubaoTTS.wav)
* **Minimax 生成示例**:
[minimax](example/minimax.wav)
* **Fish Audio 生成示例**:
[fish](example/fish.wav)
* **Gemini TTS 生成示例**:
[geminiTTS](example/geminiTTS.wav)
这些音频文件展示了本工具在实际应用中的效果。
| TTS 服务 | 试听链接 |
| :--- | :--- |
| **Edge TTS** | [▶️ edgeTTS.wav](example/edgeTTS.wav) |
| **Index TTS** | [▶️ indexTTS.wav](example/indexTTS.wav) |
| **豆包 TTS** | [▶️ doubaoTTS.wav](example/doubaoTTS.wav) |
| **Minimax** | [▶️ minimax.wav](example/minimax.wav) |
| **Fish Audio**| [▶️ fish.wav](example/fish.wav) |
| **Gemini TTS**| [▶️ geminiTTS.wav](example/geminiTTS.wav) |
---
@@ -258,32 +251,28 @@ python podcast_generator.py --api-key sk-xxxxxx --model gpt-4o --threads 4
```
.
├── config/ # ⚙️ 配置文件目录
│ ├── doubao-tts.json
── edge-tts.json
│ ├── fish-audio.json
│ ├── gemini-tts.json
│ ├── index-tts.json
│ ├── minimax.json
│ └── tts_providers.json
│ ├── doubao-tts.json # ... (各 TTS 服务商的配置)
── tts_providers.json # 统一的 TTS 认证文件
├── prompt/ # 🧠 AI 提示词目录
│ ├── prompt-overview.txt
│ └── prompt-podscript.txt
├── example/ # 🎧 示例音频目录
│ ├── doubaoTTS.wav
│ ├── edgeTTS.wav
│ ├── fish.wav
│ ├── geminiTTS.wav
│ ├── indexTTS.wav
│ └── minimax.wav
├── output/ # 🎉 输出音频目录
├── input.txt # 🎙️ 播客主题输入文件
├── openai_cli.py # OpenAI 命令行工具
├── podcast_generator.py # 🚀 主运行脚本
├── README.md # 📄 项目说明文档
├── README_EN.md # 📄 英文说明文档
└── tts_adapters.py # TTS 适配器文件
├── tts_adapters.py # TTS 适配器文件
├── README.md # 📄 项目说明文档 (中文)
└── README_EN.md # 📄 项目说明文档 (英文)
```
---
## 📝 免责声明
本项目是根据 GNU 通用公共许可证 v3.0 (GPL-3.0) 授权的自由软件。我们不提供任何明示或暗示的担保,包括但不限于对适销性、特定用途的适用性和非侵权性的担保。在任何情况下,作者或版权持有者均不对因使用本软件而产生的任何直接、间接、附带、特殊、惩戒性或后果性损害(包括但不限于采购替代商品或服务;系统故障或数据丢失;业务中断;利润损失)承担责任,即使事先已被告知此类损害的可能性。您使用本软件的风险完全由您自己承担。本软件按"现状"提供,不附带任何形式的担保。在使用本软件前,请确保您已阅读并理解本免责声明的所有条款。如果您不同意这些条款,请勿使用本软件。本项目中使用的第三方服务(如 OpenAI API、TTS 服务等)可能有其自己的使用条款和限制,用户需自行承担使用这些服务的责任。我们不对任何第三方服务的可用性、性能或安全性做出任何承诺或保证。本项目仅供学习和研究目的使用,不应用于任何商业用途或生产环境。我们不对使用本项目产生的任何后果承担责任。用户在使用本项目时,应遵守所有适用的法律法规。任何违反法律法规的行为均由用户自行承担全部责任。本免责声明的解释权归项目开发者所有。我们保留随时修改本免责声明的权利,恕不另行通知。修改后的免责声明将在项目仓库中发布,用户应定期查看以了解最新版本。继续使用本项目即表示您接受并同意遵守最新版本的免责声明条款。如果您对本免责声明有任何疑问或需要更多信息,请通过项目仓库中的联系方式与我们取得联系。
* **许可证**: 本项目采用 [GPL-3.0](https://www.gnu.org/licenses/gpl-3.0.html) 授权。
* **无担保**: 本软件按“现状”提供,不附带任何明示或暗示的担保。
* **责任限制**: 在任何情况下,作者或版权持有者均不对因使用本软件而产生的任何损害承担责任。
* **第三方服务**: 用户需自行承担使用第三方服务(如 OpenAI API、TTS 服务)的风险和责任。
* **使用目的**: 本项目仅供学习和研究目的使用,请遵守所有适用的法律法规。
* **最终解释权**: 我们保留随时修改本免责声明的权利。

View File

@@ -1,20 +1,20 @@
# 🎙️ Simple Podcast Generator
> Easily transform your ideas into lively and engaging multi-person conversational podcasts with a single click!
> [中文版](README.md)
> Easily transform your ideas into lively and engaging multi-person dialogue podcasts with one click!
> [中文版](README.md)
This is a powerful script tool that leverages the intelligence of **OpenAI API** to generate insightful podcast scripts and transforms cold text into warm audio through **TTS (Text-to-Speech)** API services. You just need to provide a topic, and it handles the rest!
This is a powerful script tool that leverages the wisdom of **OpenAI API** to generate insightful podcast scripts, and through **TTS (Text-to-Speech)** API services, transforms cold text into warm audio. You just need to provide a topic, and leave the rest to it!
✨ The podcast script generation logic of this project is deeply inspired by the [SurfSense](https://github.com/MODSetter/SurfSense) project. We express our sincere gratitude for its open-source contribution!
✨ The podcast script generation logic of this project is deeply inspired by the [SurfSense](https://github.com/MODSetter/SurfSense) project. We would like to express our heartfelt thanks to their open-source contributions!
---
## ✨ Core Features
## ✨ Core Highlights
* **🤖 AI-Driven Scripting**: Automatically generate high-quality, in-depth podcast dialogue scripts with the powerful OpenAI model.
* **👥 Multi-Role Support**: Freely define multiple podcast roles (e.g., host, guest) and assign a unique TTS voice to each role.
* **🔌 Flexible TTS Integration**: Seamlessly connect with your self-built or third-party TTS services through simple API URL configuration.
* **🔊 Smart Audio Merging**: Automatically and precisely stitch together voice segments from various roles, and support volume and speed adjustment, to synthesize a complete, smooth podcast audio file (`.wav` format).
* **🤖 AI-Driven Scripts**: Automatically create high-quality, in-depth podcast dialogue scripts with the power of OpenAI models.
* **👥 Multi-Character Support**: Freely define multiple podcast characters (such as host, guest), and assign unique TTS voices to each character.
* **🔌 Flexible TTS Integration**: Seamlessly connect to your self-hosted or third-party TTS services through simple API URL configuration.
* **🔊 Intelligent Audio Synthesis**: Automatically splice voice segments of each character accurately, and support **volume and speed adjustment** to synthesize a complete, smooth podcast audio file (`.wav` format).
* **⌨️ Convenient Command-Line Interface**: Provides clear command-line parameters, giving you full control over every aspect of the podcast generation process.
---
@@ -27,8 +27,8 @@ This is a powerful script tool that leverages the intelligence of **OpenAI API**
* Please ensure Python 3 is installed on your system.
2. **FFmpeg**
* This project relies on FFmpeg for audio merging. Please visit the [FFmpeg official website](https://ffmpeg.org/download.html) to download and install it.
* **Important**: After installation, please ensure the `ffmpeg` command is added to your system's environment variable (PATH) so that the script can call it normally.
* This project depends on FFmpeg for audio merging. Please visit the [FFmpeg official website](https://ffmpeg.org/download.html) to download and install.
* **Important Note**: After installation, please ensure the `ffmpeg` command has been added to your system's environment variables (PATH) so that the script can call it normally.
### 🐍 Python Dependencies
@@ -37,6 +37,12 @@ Open your terminal or command prompt and install the required Python libraries u
pip install requests openai pydub msgpack
```
> **Dependency Explanation**:
> - `requests`: Used to send HTTP requests to TTS service APIs
> - `openai`: Used to interact with OpenAI API to generate podcast scripts
> - `pydub`: Used for audio processing, such as adjusting volume and speed
> - `msgpack`: Used for efficient data serialization with certain TTS services (such as Fish Audio)
---
## 🚀 Quick Start
@@ -45,210 +51,198 @@ pip install requests openai pydub msgpack
Before running, please ensure the following files are ready:
* `input.txt`: Enter the **podcast topic** or core ideas you wish to discuss in this file.
* `prompt/prompt-overview.txt`: A system prompt used to guide AI in generating the **overall outline** of the podcast.
* `prompt/prompt-podscript.txt`: A system prompt used to guide AI in generating the **detailed dialogue script**. It contains dynamic placeholders (e.g., `{{numSpeakers}}`, `{{turnPattern}}`), which the script will automatically replace.
* `input.txt`: Enter your **podcast topic** or core idea in this file.
* `prompt/prompt-overview.txt`: System prompt used to guide AI in generating the podcast **overall outline**.
* `prompt/prompt-podscript.txt`: System prompt used to guide AI in generating **detailed dialogue scripts**. It contains dynamic placeholders (such as `{{numSpeakers}}`, `{{turnPattern}}`), which the script will automatically replace.
### 2. Configure TTS Service and Roles
### 2. Configure TTS Services and Characters
* The `config/` directory contains your TTS configuration files (e.g., `edge-tts.json`). This file defines the TTS service's API interface, podcast roles (`podUsers`), and their corresponding voices (`voices`).
* TTS configuration files (such as `edge-tts.json`) are stored in the `config/` directory. This file defines the TTS service API interface, podcast characters (`podUsers`) and their corresponding voices (`voices`).
### 3. Run the Script
Execute the following command in the project root directory:
```bash
python podcast_generator.py [Optional Parameters]
python podcast_generator.py [optional parameters]
```
#### **Optional Parameters**
* `--api-key <YOUR_OPENAI_API_KEY>`: Your OpenAI API key. If not provided, it will be read from the configuration file or the `OPENAI_API_KEY` environment variable.
* `--base-url <YOUR_OPENAI_BASE_URL>`: Proxy address for the OpenAI API. If not provided, it will be read from the configuration file or the `OPENAI_BASE_URL` environment variable.
* `--model <OPENAI_MODEL_NAME>`: Specify the OpenAI model to use (e.g., `gpt-4o`, `gpt-4-turbo`). The default value is `gpt-3.5-turbo`.
* `--threads <NUMBER_OF_THREADS>`: Specify the number of parallel threads for audio generation (default is `1`), improving processing speed.
* `--api-key <YOUR_OPENAI_API_KEY>`: Your OpenAI API key. If not provided, it will be read from the configuration file or `OPENAI_API_KEY` environment variable.
* `--base-url <YOUR_OPENAI_BASE_URL>`: Proxy address of the OpenAI API. If not provided, it will be read from the configuration file or `OPENAI_BASE_URL` environment variable.
* `--model <OPENAI_MODEL_NAME>`: Specify the OpenAI model to use (such as `gpt-4o`, `gpt-4-turbo`). Default value is `gpt-3.5-turbo`.
* `--threads <NUMBER_OF_THREADS>`: Specify the number of parallel threads for audio generation (default is `1`) to improve processing speed.
#### **Run Example**
#### **Running Example**
```bash
# Use gpt-4o model and 4 threads to generate the podcast
python podcast_generator.py --api-key sk-xxxxxx --model gpt-4o --threads 4
# Use gpt-4o model, edge-tts service and 4 threads to generate podcast
python podcast_generator.py --api-key sk-xxxxxx --model gpt-4o --tts-provider edge --threads 4
```
### 4. Custom AI Prompts (`custom` code block)
### 5. Using Web API (main.py)
To provide more detailed AI instructions or add specific context, you can embed `custom` code blocks in the `input.txt` file. The content of this code block will serve as additional instructions, built into the core prompt (`prompt-podscript.txt`) for podcast script generation, thereby influencing the AI's generation behavior.
This project also provides a FastAPI-based web service that allows you to generate podcasts through HTTP requests.
#### Start Web Service
```bash
python main.py
```
By default, the service will run on `http://localhost:8000`.
#### API Endpoints
1. **Generate Podcast** - `POST /generate-podcast`
- Parameters:
- `api_key`: OpenAI API key
- `base_url`: OpenAI API base URL (optional)
- `model`: OpenAI model name (optional)
- `input_txt_content`: Input text content
- `tts_providers_config_content`: TTS provider configuration content
- `podUsers_json_content`: Podcast user JSON configuration
- `threads`: Number of threads (optional, default is 1)
- `tts_provider`: TTS provider name (optional, default is "index-tts")
2. **Get Podcast Generation Status** - `GET /podcast-status`
- Requires `X-Auth-Id` header
3. **Download Podcast** - `GET /download-podcast/`
- Parameters:
- `file_name`: Name of the file to download
4. **Get Voice List** - `GET /get-voices`
- Parameters:
- `tts_provider`: TTS provider name (optional, default is "tts")
#### API Usage Example
```bash
# After starting the service, use curl to send a request to generate podcast
curl -X POST "http://localhost:8000/generate-podcast" \
-H "X-Auth-Id: your-auth-id" \
-F "api_key=sk-xxxxxx" \
-F "model=gpt-4o" \
-F "input_txt_content=The future development of artificial intelligence" \
-F "tts_providers_config_content={\"index\": {\"api_key\": \"your-api-key\"}}" \
-F "podUsers_json_content=[{\"code\":\"zh-CN-XiaoxiaoNeural\",\"role\":\"Host\"}],\"voices\":[{\"name\":\"Xiaoxiao\",\"code\":\"zh-CN-XiaoxiaoNeural\"}]" \
-F "threads=4" \
-F "tts_provider=index-tts"
```
### 4. Customizing AI Prompts (`custom` code block)
To provide more detailed AI instructions or add specific context, you can embed `custom` code blocks in the `input.txt` file. The content in this code block will be used as additional instructions, built into the core prompt for podcast script generation (`prompt-podscript.txt`), thereby influencing the AI's generation behavior.
**Usage**:
In any location within the `input.txt` file, define your custom content using the following format:
In the `input.txt` file, define your custom content anywhere using the following format:
```
```custom-begin
Additional instructions or context you wish to provide to the AI, for example:
Additional instructions or context you want to provide to the AI, for example:
- "Please ensure the discussion includes an in-depth analysis of [specific concept]."
- "Please add some humorous elements to the dialogue, especially jokes about [a certain topic]."
- "All character speeches must be concise, and each sentence should not exceed two lines."
```custom-end
```
**Effect**:
All text content within the `custom` code block (excluding the `custom-begin` and `custom-end` tags themselves) will be extracted and appended to the processed content of the [`prompt/prompt-podscript.txt`](prompt/prompt-podscript.txt) template. This means that these custom instructions will directly influence the AI's decisions and style when generating specific podcast dialogue scripts, helping you to control the output more precisely.
**Example Scenario**:
If you want the AI to particularly emphasize the future development of a certain technological trend when discussing a tech topic, you can add this to `input.txt`:
```
```custom-begin
Please foresightedly analyze the disruptive changes AI might bring in the next five years, and mention the potential impact of quantum computing on existing encryption technologies.
- "Please add some humor to the conversation, especially jokes about [a certain topic]."
- "All characters' speeches must be brief, with each sentence not exceeding two lines."
```custom-end
```
---
## ⚙️ Configuration File Details (`config/*.json`)
## ⚙️ Configuration File Details
The configuration file is the "brain" of the entire project, telling the script how to work with AI and TTS services.
### `config/[tts-provider].json` (TTS Character and Voice Configuration)
This is your core TTS configuration file, with the filename corresponding to the provider specified by the `--tts-provider` parameter. It tells the script how to work with the TTS service.
```json
{
"podUsers": [
{
"code": "zh-CN-XiaoxiaoNeural",
"role": "主持人"
"role": "Host"
},
{
"code": "zh-CN-YunxiNeural",
"role": "技术专家"
"role": "Tech Expert"
}
],
"voices": [
{
"name": "XiaoMin",
"alias": "晓敏",
"code": "yue-CN-XiaoMinNeural",
"locale": "yue-CN",
"gender": "Female",
"usedname": "晓敏"
},
{
"name": "YunSong",
"alias": "云松",
"code": "yue-CN-YunSongNeural",
"locale": "yue-CN",
"gender": "Male",
"usedname": "云松"
"volume_adjustment": 1.0,
"speed_adjustment": 5.0
}
],
"apiUrl": "http://localhost:5000/api/tts?text={{text}}&voiceCode={{voiceCode}}",
"turnPattern": "random"
"turnPattern": "random",
"tts_max_retries": 3
}
```
* `podUsers`: Defines the **roles** in the podcast. The `code` for each role must correspond to a valid voice in the `voices` list.
* `tts_max_retries` (optional): The maximum number of retries when a TTS API call fails (default is `3`).
* `voices`: Defines all available TTS **voices**, which can include `volume_adjustment` (volume adjustment in dB, e.g., `6.0` to increase by 6dB, `-3.0` to decrease by 3dB) and `speed_adjustment` (speed adjustment in percentage, e.g., `10.0` to increase speed by 10%, `-10.0` to decrease speed by 10%) parameters.
* `podUsers`: Defines the **characters** in the podcast. Each character's `code` must correspond to a valid voice in the `voices` list.
* `voices`: Defines all available TTS **voices**.
* `apiUrl`: Your TTS service API endpoint. `{{text}}` will be replaced with the dialogue text, and `{{voiceCode}}` will be replaced with the character's voice code.
* `turnPattern`: Defines the **turn-taking pattern** for character dialogue, such as `random` or `sequential`.
* `volume_adjustment` (optional): Volume adjustment (dB). For example, `6.0` increases volume by 6dB.
* `speed_adjustment` (optional): Speed adjustment (%). For example, `10.0` increases speed by 10%.
* `apiUrl`: Your TTS service API endpoint. `{{text}}` and `{{voiceCode}}` are placeholders.
* `turnPattern`: Defines the **turn-taking mode** for character dialogue, such as `random` (random) or `sequential` (sequential).
* `tts_max_retries` (optional): Maximum number of retries when TTS API calls fail (default is `3`).
### `tts_providers.json` File Description
### `config/tts_providers.json` (TTS Provider Authentication)
The `tts_providers.json` file is used to store authentication information for various TTS service providers, such as API keys. This file is used in the following scenarios:
This file is used to centrally manage authentication information (such as API keys) for all TTS service providers.
1. In the various TTS service test scripts in the `check/` directory, to obtain the corresponding authentication information
2. In the `podcast_generator.py` script, to obtain additional configuration parameters for specific TTS services
The structure of this file is as follows:
```json
{
"index": {
"api_key": null
},
"edge": {
"api_key": null
},
"doubao": {
"X-Api-App-Id": "null",
"X-Api-Access-Key": "null"
},
"fish": {
"api_key": "null"
},
"minimax": {
"group_id": "null",
"api_key": "null"
},
"gemini": {
"api_key": "null"
}
"index": { "api_key": null },
"edge": { "api_key": null },
"doubao": { "X-Api-App-Id": "null", "X-Api-Access-Key": "null" },
"fish": { "api_key": "null" },
"minimax": { "group_id": "null", "api_key": "null" },
"gemini": { "api_key": "null" }
}
```
**Note**: In actual use, please replace `"null"` with valid authentication information. You can create a `tts_providers-local.json` to store real keys, which has been ignored by `.gitignore`.
Notes:
* In actual use, please replace `"null"` with the corresponding authentication information
* `tts_providers-local.json` is a local configuration file example that contains actual authentication information (please do not commit this file to version control systems)
---
## 🔌 TTS (Text-to-Speech) Service Integration
## 🔌 Supported TTS Services
This project is designed to be highly flexible, supporting various TTS services. Whether locally deployed or cloud-based web services, they can be integrated into this project through simple configuration.
This project is designed to be highly flexible and supports multiple TTS services.
### 💻 Local TTS Interface Support
You can deploy the following open-source projects as local TTS services and integrate them into this project via `apiUrl` configuration:
* **index-tts**: [https://github.com/index-tts/index-tts](https://github.com/index-tts/index-tts)
* **Usage with**: Requires running with `ext/index-tts-api.py`, which provides a simple API interface to encapsulate `index-tts` as a service callable by this project.
* **edge-tts**: [https://github.com/zuoban/tts](https://github.com/zuoban/tts)
* This is a general TTS library that you can integrate by customizing an adapter.
### 🌐 Web TTS Interface Support
This project can also be easily configured to integrate various web TTS services. Just ensure your `apiUrl` configuration meets the service provider's requirements. Commonly supported services include:
* **Doubao TTS**
* **Minimax TTS**
* **Fish Audio TTS**
* **Gemini TTS**
* **OpenAI TTS** (Planned)
* **Azure TTS** (Planned)
* **Google Cloud Text-to-Speech (Vertex AI)** (Planned)
| Provider | Type | Support Status |
| :--- | :--- | :---: |
| **Index-TTS** | Local | ✅ Supported |
| **Edge-TTS** | Local | ✅ Supported |
| **Doubao** | Network | ✅ Supported |
| **Minimax** | Network | ✅ Supported |
| **Fish Audio**| Network | ✅ Supported |
| **Gemini** | Network | ✅ Supported |
| **OpenAI TTS**| Network | Planned |
| **Azure TTS** | Network | Planned |
---
## 🎉 Output Results
All successfully generated podcast audio files will be automatically saved in the `output/` directory. The filename format is `podcast_` followed by a timestamp, e.g., `podcast_1678886400.wav`.
All successfully generated podcast audio files will be automatically saved in the `output/` directory. The filename format is `podcast_` plus the timestamp when it was generated, for example `podcast_1678886400.wav`.
---
## 🎧 Sample Audio
You can find sample podcast audio generated using different TTS services in the `example/` folder:
You can find sample podcast audio generated using different TTS services in the `example/` folder.
* **Edge TTS Sample**:
[edgeTTS](example/edgeTTS.wav)
* **Index TTS Sample**:
[indexTTS](example/indexTTS.wav)
* **Doubao TTS Sample**:
[doubaoTTS](example/doubaoTTS.wav)
* **Minimax Sample**:
[minimax](example/minimax.wav)
* **Fish Audio Sample**:
[fish](example/fish.wav)
* **Gemini TTS Sample**:
[geminiTTS](example/geminiTTS.wav)
These audio files demonstrate the actual effect of this tool in practical applications.
| TTS Service | Listen Link |
| :--- | :--- |
| **Edge TTS** | [▶️ edgeTTS.wav](example/edgeTTS.wav) |
| **Index TTS** | [▶️ indexTTS.wav](example/indexTTS.wav) |
| **Doubao TTS** | [▶️ doubaoTTS.wav](example/doubaoTTS.wav) |
| **Minimax** | [▶️ minimax.wav](example/minimax.wav) |
| **Fish Audio**| [▶️ fish.wav](example/fish.wav) |
| **Gemini TTS**| [▶️ geminiTTS.wav](example/geminiTTS.wav) |
---
@@ -256,31 +250,29 @@ These audio files demonstrate the actual effect of this tool in practical applic
```
.
├── config/ # ⚙️ Configuration Files Directory
│ ├── doubao-tts.json
── edge-tts.json
│ ├── fish-audio.json
│ ├── gemini-tts.json
│ ├── index-tts.json
│ ├── minimax.json
│ └── tts_providers.json
├── prompt/ # 🧠 AI Prompt Files Directory
├── config/ # ⚙️ Configuration directory
│ ├── doubao-tts.json # ... (configuration for each TTS provider)
── tts_providers.json # Unified TTS authentication file
├── prompt/ # 🧠 AI prompt directory
│ ├── prompt-overview.txt
│ └── prompt-podscript.txt
├── example/ # 🎧 Sample Audio Directory
├── doubaoTTS.wav
│ ├── edgeTTS.wav
│ ├── fish.wav
│ ├── geminiTTS.wav
│ ├── indexTTS.wav
│ └── minimax.wav
── output/ # 🎉 Output Audio Directory
├── input.txt # 🎙️ Podcast Topic Input File
├── openai_cli.py # OpenAI Command Line Tool
├── podcast_generator.py # 🚀 Main Running Script
├── README.md # 📄 Project Documentation
├── README_EN.md # 📄 English Documentation
└── tts_adapters.py # TTS Adapter File
├── example/ # 🎧 Sample audio directory
├── output/ # 🎉 Output audio directory
├── input.txt # 🎙️ Podcast topic input file
├── openai_cli.py # OpenAI command-line tool
├── podcast_generator.py # 🚀 Main script
├── tts_adapters.py # TTS adapter file
├── README.md # 📄 Project documentation (Chinese)
── README_EN.md # 📄 Project documentation (English)
```
---
## 📝 Disclaimer
This project is free software licensed under the GNU General Public License v3.0 (GPL-3.0). We provide no warranties, express or implied, including but not limited to warranties of merchantability, fitness for a particular purpose, or non-infringement. In no event shall the authors or copyright holders be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage. You use this software at your own risk. This software is provided "as is" without warranty of any kind. Before using this software, please ensure you have read and understood all terms of this disclaimer. If you do not agree to these terms, please do not use this software. Third-party services used in this project (such as OpenAI API, TTS services, etc.) may have their own terms of use and restrictions, and users are responsible for using these services at their own risk. We make no commitments or guarantees regarding the availability, performance, or security of any third-party services. This project is for learning and research purposes only and should not be used for any commercial purposes or production environments. We are not responsible for any consequences arising from the use of this project. Users should comply with all applicable laws and regulations when using this project. Any violations of laws and regulations are the sole responsibility of the user. The interpretation rights of this disclaimer belong to the project developers. We reserve the right to modify this disclaimer at any time without notice. The modified disclaimer will be published in the project repository, and users should check regularly to stay informed of the latest version. Continued use of this project indicates your acceptance and agreement to comply with the latest version of the disclaimer terms. If you have any questions about this disclaimer or need more information, please contact us through the contact information in the project repository.
* **License**: This project is licensed under [GPL-3.0](https://www.gnu.org/licenses/gpl-3.0.html).
* **No Warranty**: This software is provided "as is" without any express or implied warranties.
* **Liability Limitation**: Under no circumstances shall the authors or copyright holders be liable for any damages arising from the use of this software.
* **Third-Party Services**: Users bear the risks and responsibilities of using third-party services (such as OpenAI API, TTS services) on their own.
* **Usage Purpose**: This project is for learning and research purposes only. Please comply with all applicable laws and regulations.
* **Final Interpretation Rights**: We reserve the right to modify this disclaimer at any time.

View File

@@ -2231,7 +2231,7 @@
"speed_adjustment": 0
}
],
"apiUrl": "http://192.168.1.178:7899/tts?t={{text}}&v={{voiceCode}}&r=5",
"apiUrl": "{{api_url}}",
"podUsers": [
{"role": "酒馆主理人", "code": "zh-CN-Yunyi:DragonHDFlashLatestNeural"},
{"role": "科技爱好者", "code": "zh-CN-Xiaochen:DragonHDFlashLatestNeural"}

View File

@@ -37,7 +37,7 @@
"locale": "zh-CN",
"gender": "Male",
"usedname": "大同",
"volume_adjustment": -1,
"volume_adjustment":3,
"speed_adjustment": 0
},
{
@@ -47,8 +47,8 @@
"locale": "zh-CN",
"gender": "Female",
"usedname": "凯琪",
"volume_adjustment": 0,
"speed_adjustment": 0
"volume_adjustment": 5,
"speed_adjustment": 8
},
{
"name": "Daibei",
@@ -61,7 +61,7 @@
"speed_adjustment": 0
}
],
"apiUrl": "http://192.168.1.232:7899/synthesize?text={{text}}&server_audio_prompt_path={{voiceCode}}",
"apiUrl": "{{api_url}}",
"podUsers": [
{"role": "节目主持人", "code": "zh-CN-DatongIndex"},
{"role": "科技爱好者", "code": "zh-CN-KaiQiIndex"}

View File

@@ -1,9 +1,9 @@
{
"index": {
"api_key": null
"api_url": null
},
"edge": {
"api_key": null
"api_url": null
},
"doubao": {
"X-Api-App-Id": "null",

287
main.py Normal file
View File

@@ -0,0 +1,287 @@
from fastapi import FastAPI, Request, HTTPException, Depends, Form, Header
from fastapi.responses import FileResponse, JSONResponse
from typing import Optional, Dict
import uuid
import asyncio
from starlette.background import BackgroundTasks
from uuid import UUID
import hashlib
import hmac
import time
import os
import json
import argparse
from enum import Enum
import shutil
import schedule
import threading
from podcast_generator import generate_podcast_audio
class TaskStatus(str, Enum):
PENDING = "pending"
RUNNING = "running"
COMPLETED = "completed"
FAILED = "failed"
app = FastAPI()
# Global flag to signal the scheduler thread to stop
stop_scheduler_event = threading.Event()
# Global reference for the scheduler thread
scheduler_thread = None
# Global configuration
output_dir = "output"
# Define a function to clean the output directory
def clean_output_directory():
"""Removes files from the output directory that are older than 30 minutes."""
print(f"Cleaning output directory: {output_dir}")
now = time.time()
# 30 minutes in seconds
threshold = 30 * 60
for filename in os.listdir(output_dir):
file_path = os.path.join(output_dir, filename)
try:
if os.path.isfile(file_path) or os.path.islink(file_path):
# Get last modification time
if now - os.path.getmtime(file_path) > threshold:
os.unlink(file_path)
print(f"Deleted old file: {file_path}")
elif os.path.isdir(file_path):
# Optionally, recursively delete old subdirectories or files within them
# For now, just skip directories
pass
except Exception as e:
print(f"Failed to delete {file_path}. Reason: {e}")
# In-memory store for task results
# {task_id: {"auth_id": auth_id, "status": TaskStatus, "result": any, "timestamp": float}}
task_results: Dict[str, Dict[UUID, Dict]] = {}
# Configuration for signature verification
SECRET_KEY = os.getenv("PODCAST_API_SECRET_KEY", "your-super-secret-key") # Change this in production!
# Define a mapping from tts_provider names to their config file paths
tts_provider_map = {
"index-tts": "config/index-tts.json",
"doubao-tts": "config/doubao-tts.json",
"edge-tts": "config/edge-tts.json",
"fish-audio": "config/fish-audio.json",
"gemini-tts": "config/gemini-tts.json",
"minimax": "config/minimax.json",
}
async def get_auth_id(x_auth_id: str = Header(..., alias="X-Auth-Id")):
"""
Dependency to get X-Auth-Id from headers.
"""
if not x_auth_id:
raise HTTPException(status_code=400, detail="Missing X-Auth-Id header.")
return x_auth_id
async def verify_signature(request: Request):
"""
Verify the 'sign' parameter in the request headers or query parameters.
Expected signature format: SHA256(timestamp + SECRET_KEY)
"""
timestamp = request.headers.get("X-Timestamp") or request.query_params.get("timestamp")
client_sign = request.headers.get("X-Sign") or request.query_params.get("sign")
if not timestamp or not client_sign:
raise HTTPException(status_code=400, detail="Missing X-Timestamp or X-Sign header/query parameter.")
try:
current_time = int(time.time())
if abs(current_time - int(timestamp)) > 300:
raise HTTPException(status_code=400, detail="Request expired or timestamp is too far in the future.")
message = f"{timestamp}{SECRET_KEY}".encode('utf-8')
server_sign = hmac.new(SECRET_KEY.encode('utf-8'), message, hashlib.sha256).hexdigest()
if server_sign != client_sign:
raise HTTPException(status_code=401, detail="Invalid signature.")
except ValueError:
raise HTTPException(status_code=400, detail="Invalid timestamp format.")
except Exception as e:
raise HTTPException(status_code=500, detail=f"Signature verification error: {e}")
async def _generate_podcast_task(
task_id: UUID,
auth_id: str,
api_key: str,
base_url: str,
model: str,
input_txt_content: str,
tts_providers_config_content: str,
podUsers_json_content: str,
threads: int,
tts_provider: str
):
task_results[auth_id][task_id]["status"] = TaskStatus.RUNNING
try:
parser = argparse.ArgumentParser(description="Generate podcast script and audio using OpenAI and local TTS.")
parser.add_argument("--api-key", default=api_key, help="OpenAI API key.")
parser.add_argument("--base-url", default=base_url, help="OpenAI API base URL (default: https://api.openai.com/v1).")
parser.add_argument("--model", default=model, help="OpenAI model to use (default: gpt-3.5-turbo).")
parser.add_argument("--threads", type=int, default=threads, help="Number of threads to use for audio generation (default: 1).")
args = parser.parse_args([])
actual_config_path = tts_provider_map.get(tts_provider)
if not actual_config_path:
raise ValueError(f"Invalid tts_provider: {tts_provider}.") # Changed from HTTPException to ValueError
output_filepath = await asyncio.to_thread(
generate_podcast_audio,
args=args,
config_path=actual_config_path,
input_txt_content=input_txt_content.strip(),
tts_providers_config_content=tts_providers_config_content.strip(),
podUsers_json_content=podUsers_json_content.strip()
)
task_results[auth_id][task_id]["status"] = TaskStatus.COMPLETED
task_results[auth_id][task_id]["result"] = output_filepath
print(f"\nPodcast generation completed for task {task_id}. Output file: {output_filepath}")
except Exception as e:
task_results[auth_id][task_id]["status"] = TaskStatus.FAILED
task_results[auth_id][task_id]["result"] = str(e)
print(f"\nPodcast generation failed for task {task_id}: {e}")
# @app.post("/generate-podcast", dependencies=[Depends(verify_signature)])
@app.post("/generate-podcast")
async def generate_podcast_submission(
background_tasks: BackgroundTasks,
auth_id: str = Depends(get_auth_id),
api_key: str = Form("OpenAI API key."),
base_url: str = Form("https://api.openai.com/v1"),
model: str = Form("gpt-3.5-turbo"),
input_txt_content: str = Form(...),
tts_providers_config_content: str = Form(...),
podUsers_json_content: str = Form(...),
threads: int = Form(1),
tts_provider: str = Form("index-tts")
):
# 1. Validate tts_provider
if tts_provider not in tts_provider_map:
raise HTTPException(status_code=400, detail=f"Invalid tts_provider: {tts_provider}.")
# 2. Check for existing running tasks for this auth_id
if auth_id in task_results:
for existing_task_id, existing_task_info in task_results[auth_id].items():
if existing_task_info["status"] == TaskStatus.RUNNING or existing_task_info["status"] == TaskStatus.PENDING:
raise HTTPException(status_code=409, detail=f"There is already a running task (ID: {existing_task_id}) for this auth_id. Please wait for it to complete.")
task_id = uuid.uuid4()
if auth_id not in task_results:
task_results[auth_id] = {}
task_results[auth_id][task_id] = {
"status": TaskStatus.PENDING,
"result": None,
"timestamp": time.time()
}
background_tasks.add_task(
_generate_podcast_task,
task_id,
auth_id,
api_key,
base_url,
model,
input_txt_content,
tts_providers_config_content,
podUsers_json_content,
threads,
tts_provider
)
return {"message": "Podcast generation started.", "task_id": task_id}
# @app.get("/podcast-status", dependencies=[Depends(verify_signature)])
@app.get("/podcast-status")
async def get_podcast_status(
auth_id: str = Depends(get_auth_id)
):
if auth_id not in task_results:
return {"message": "No tasks found for this auth_id.", "tasks": []}
all_tasks_for_auth_id = []
for task_id, task_info in task_results[auth_id].items():
all_tasks_for_auth_id.append({
"task_id": task_id,
"status": task_info["status"],
"result": task_info["result"] if task_info["status"] == TaskStatus.COMPLETED else None,
"error": task_info["result"] if task_info["status"] == TaskStatus.FAILED else None,
"timestamp": task_info["timestamp"]
})
return {"message": "Tasks retrieved successfully.", "tasks": all_tasks_for_auth_id}
@app.get("/download-podcast/")
async def download_podcast(file_name: str):
file_path = os.path.join(output_dir, file_name)
if not os.path.exists(file_path):
raise HTTPException(status_code=404, detail="File not found.")
return FileResponse(file_path, media_type='audio/mpeg', filename=file_name)
@app.get("/get-voices")
async def get_voices(tts_provider: str = "tts"):
config_path = tts_provider_map.get(tts_provider)
if not config_path:
raise HTTPException(status_code=400, detail=f"Invalid tts_provider: {tts_provider}.")
try:
with open(config_path, 'r', encoding='utf-8') as f:
config_data = json.load(f)
voices = config_data.get("voices")
if voices is None:
raise HTTPException(status_code=404, detail=f"No 'voices' key found in config for {tts_provider}.")
return {"tts_provider": tts_provider, "voices": voices}
except FileNotFoundError:
raise HTTPException(status_code=404, detail=f"Config file not found for {tts_provider}: {config_path}")
except json.JSONDecodeError:
raise HTTPException(status_code=500, detail=f"Error decoding JSON from config file for {tts_provider}: {config_path}. Please check file format.")
except Exception as e:
raise HTTPException(status_code=500, detail=f"An unexpected error occurred: {e}")
@app.get("/")
async def read_root():
return {"message": "FastAPI server is running!"}
def run_scheduler():
"""Runs the scheduler in a loop until the stop event is set."""
while not stop_scheduler_event.is_set():
schedule.run_pending()
time.sleep(1) # Check every second for new jobs or stop event
@app.on_event("startup")
async def startup_event():
global scheduler_thread
# Ensure the output directory exists
os.makedirs(output_dir, exist_ok=True)
# Schedule the cleaning task to run every 30 minutes
schedule.every(30).minutes.do(clean_output_directory)
# Start the scheduler in a separate thread
if scheduler_thread is None or not scheduler_thread.is_alive():
scheduler_thread = threading.Thread(target=run_scheduler, daemon=True)
scheduler_thread.start()
print("FastAPI app started. Scheduled output directory cleaning.")
@app.on_event("shutdown")
async def shutdown_event():
global scheduler_thread
# Signal the scheduler thread to stop
stop_scheduler_event.set()
# Wait for the scheduler thread to finish (optional, but good practice)
if scheduler_thread is not None and scheduler_thread.is_alive():
scheduler_thread.join(timeout=5) # Wait for max 5 seconds
if scheduler_thread.is_alive():
print("Scheduler thread did not terminate gracefully.")
print("FastAPI app shutting down.")
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)

View File

@@ -13,6 +13,7 @@ from datetime import datetime
from openai_cli import OpenAICli # Moved to top for proper import
import urllib.parse # For URL encoding
import re # For regular expression operations
from typing import Optional, Tuple
from tts_adapters import TTSAdapter, IndexTTSAdapter, EdgeTTSAdapter, FishAudioAdapter, MinimaxAdapter, DoubaoTTSAdapter, GeminiTTSAdapter # Import TTS adapters
# Global configuration
@@ -28,6 +29,16 @@ def read_file_content(filepath):
except FileNotFoundError:
raise FileNotFoundError(f"Error: File not found at {filepath}")
def _load_json_config(file_path: str) -> dict:
"""Loads a JSON configuration file."""
try:
with open(file_path, 'r', encoding='utf-8') as f:
return json.load(f)
except FileNotFoundError:
raise FileNotFoundError(f"Error: Configuration file not found at {file_path}")
except json.JSONDecodeError as e:
raise ValueError(f"Error decoding JSON from {file_path}: {e}")
def select_json_config(config_dir='config', return_file_path=False):
"""
Reads JSON files from the specified directory and allows the user to select one.
@@ -42,7 +53,7 @@ def select_json_config(config_dir='config', return_file_path=False):
print(f"Found JSON configuration files in '{config_dir}':")
for i, file_path in enumerate(json_files):
file_name = os.path.basename(file_path)
if file_name != "tts_providers.json":
if file_name != os.path.basename(tts_providers_config_path):
valid_json_files.append(file_path)
print(f"{len(valid_json_files)}. {file_name}")
@@ -102,6 +113,8 @@ def generate_speaker_id_text(pod_users, voices_list):
def merge_audio_files():
output_audio_filename = f"podcast_{int(time.time())}.wav"
output_audio_filepath = "/".join([output_dir, output_audio_filename])
# Use ffmpeg to concatenate audio files
# Check if ffmpeg is available
try:
@@ -123,9 +136,10 @@ def merge_audio_files():
]
# Execute ffmpeg from the output_dir to correctly resolve file paths in file_list.txt
process = subprocess.run(command, check=True, cwd=output_dir, capture_output=True, text=True)
print("Audio files merged successfully!")
print(f"Audio files merged successfully into {output_audio_filepath}!")
print("FFmpeg stdout:\n", process.stdout)
print("FFmpeg stderr:\n", process.stderr)
return output_audio_filename
except subprocess.CalledProcessError as e:
raise RuntimeError(f"Error merging audio files with FFmpeg: {e.stderr}")
finally:
@@ -166,6 +180,19 @@ def _load_configuration():
print("\nLoaded Configuration: " + tts_provider)
return config_data
def _load_configuration_path(config_path: str) -> dict:
"""Loads JSON configuration from a specified path and infers tts_provider from the file name."""
config_data = _load_json_config(config_path)
# 从文件名中提取 tts_provider
file_name = os.path.basename(config_path)
tts_provider = os.path.splitext(file_name)[0] # 移除 .json 扩展名
config_data["tts_provider"] = tts_provider # 将 tts_provider 添加到配置数据中
print(f"\nLoaded Configuration: {tts_provider} from {config_path}")
return config_data
def _prepare_openai_settings(args, config_data):
"""Determines final OpenAI API key, base URL, and model based on priority."""
api_key = args.api_key or config_data.get("api_key") or os.getenv("OPENAI_API_KEY")
@@ -368,7 +395,8 @@ def _create_ffmpeg_file_list(audio_files):
from typing import cast # Add import for cast
def _initialize_tts_adapter(config_data: dict, output_dir: str) -> TTSAdapter:
def _initialize_tts_adapter(config_data: dict, tts_providers_config_content: Optional[str] = None) -> TTSAdapter:
"""
根据配置数据初始化并返回相应的 TTS 适配器。
"""
@@ -378,8 +406,11 @@ def _initialize_tts_adapter(config_data: dict, output_dir: str) -> TTSAdapter:
tts_providers_config = {}
try:
tts_providers_config_content = read_file_content(tts_providers_config_path)
tts_providers_config = json.loads(tts_providers_config_content)
if tts_providers_config_content:
tts_providers_config = json.loads(tts_providers_config_content)
else:
tts_providers_config_content = read_file_content(tts_providers_config_path)
tts_providers_config = json.loads(tts_providers_config_content)
except Exception as e:
print(f"Warning: Could not load tts_providers.json: {e}")
@@ -390,12 +421,13 @@ def _initialize_tts_adapter(config_data: dict, output_dir: str) -> TTSAdapter:
api_url = config_data.get("apiUrl")
if not api_url:
raise ValueError("IndexTTS apiUrl is not configured.")
return IndexTTSAdapter(api_url_template=cast(str, api_url))
return IndexTTSAdapter(api_url_template=cast(str, api_url), tts_extra_params=cast(dict, current_tts_extra_params))
elif tts_provider == "edge-tts":
api_url = config_data.get("apiUrl")
if not api_url:
raise ValueError("EdgeTTS apiUrl is not configured.")
return EdgeTTSAdapter(api_url_template=cast(str, api_url))
return EdgeTTSAdapter(api_url_template=cast(str, api_url), tts_extra_params=cast(dict, current_tts_extra_params))
elif tts_provider == "fish-audio":
api_url = config_data.get("apiUrl")
headers = config_data.get("headers")
@@ -443,12 +475,53 @@ def main():
overview_content = _generate_overview_content(api_key, base_url, model, overview_prompt, input_prompt)
podcast_script = _generate_podcast_script(api_key, base_url, model, podscript_prompt, overview_content)
tts_adapter = _initialize_tts_adapter(config_data, output_dir) # 初始化 TTS 适配器
tts_adapter = _initialize_tts_adapter(config_data) # 初始化 TTS 适配器
audio_files = _generate_all_audio_files(podcast_script, config_data, tts_adapter, args.threads)
_create_ffmpeg_file_list(audio_files)
def generate_podcast_audio(args, config_path: str, input_txt_content: str, tts_providers_config_content: str, podUsers_json_content: str) -> str:
"""
Generates a podcast audio file based on the provided parameters.
Args:
api_key (str): OpenAI API key.
base_url (str): OpenAI API base URL.
model (str): OpenAI model to use.
threads (int): Number of threads for audio generation.
config_path (str): Path to the configuration JSON file.
input_txt_content (str): Content of the input prompt.
Returns:
str: The path to the generated audio file.
"""
print("Starting podcast audio generation...")
config_data = _load_configuration_path(config_path)
podUsers = json.loads(podUsers_json_content)
config_data["podUsers"] = podUsers
final_api_key, final_base_url, final_model = _prepare_openai_settings(args, config_data)
input_prompt, overview_prompt, original_podscript_prompt = _read_prompt_files()
custom_content, input_prompt = _extract_custom_content(input_txt_content)
podscript_prompt, pod_users, voices, turn_pattern = _prepare_podcast_prompts(config_data, original_podscript_prompt, custom_content)
print(f"\nInput Prompt (from provided content):\n{input_prompt[:100]}...")
print(f"\nOverview Prompt (prompt-overview.txt):\n{overview_prompt[:100]}...")
print(f"\nPodscript Prompt (prompt-podscript.txt):\n{podscript_prompt[:1000]}...")
overview_content = _generate_overview_content(final_api_key, final_base_url, final_model, overview_prompt, input_prompt)
podcast_script = _generate_podcast_script(final_api_key, final_base_url, final_model, podscript_prompt, overview_content)
tts_adapter = _initialize_tts_adapter(config_data, tts_providers_config_content) # 初始化 TTS 适配器
audio_files = _generate_all_audio_files(podcast_script, config_data, tts_adapter, args.threads)
_create_ffmpeg_file_list(audio_files)
output_audio_filepath = merge_audio_files()
return output_audio_filepath
if __name__ == "__main__":
start_time = time.time()
try:
@@ -460,5 +533,4 @@ if __name__ == "__main__":
finally:
end_time = time.time()
execution_time = end_time - start_time
print(f"\nTotal execution time: {execution_time:.2f} seconds")
print(f"\nTotal execution time: {execution_time:.2f} seconds")

View File

@@ -96,12 +96,14 @@ class IndexTTSAdapter(TTSAdapter):
"""
IndexTTS 的 TTS 适配器实现。
"""
def __init__(self, api_url_template: str):
def __init__(self, api_url_template: str, tts_extra_params: Optional[dict] = None):
self.api_url_template = api_url_template
self.tts_extra_params = tts_extra_params if tts_extra_params is not None else {}
def generate_audio(self, text: str, voice_code: str, output_dir: str, volume_adjustment: float = 0.0, speed_adjustment: float = 0.0) -> str:
encoded_text = urllib.parse.quote(text)
self.api_url_template = self.tts_extra_params.get("api_url", self.api_url_template)
api_url = self.api_url_template.replace("{{text}}", encoded_text).replace("{{voiceCode}}", voice_code)
if not api_url:
@@ -130,12 +132,14 @@ class EdgeTTSAdapter(TTSAdapter):
"""
EdgeTTS 的 TTS 适配器实现。
"""
def __init__(self, api_url_template: str):
def __init__(self, api_url_template: str, tts_extra_params: Optional[dict] = None):
self.api_url_template = api_url_template
self.tts_extra_params = tts_extra_params if tts_extra_params is not None else {}
def generate_audio(self, text: str, voice_code: str, output_dir: str, volume_adjustment: float = 0.0, speed_adjustment: float = 0.0) -> str:
encoded_text = urllib.parse.quote(text)
self.api_url_template = self.tts_extra_params.get("api_url", self.api_url_template)
api_url = self.api_url_template.replace("{{text}}", encoded_text).replace("{{voiceCode}}", voice_code)
if not api_url: