refactor: 重构中间件和路由处理逻辑 fix: 修复音频示例API的错误处理 docs: 更新README和DOCKER_USAGE文档 style: 优化语言切换器样式 chore: 更新.gitignore添加生产环境配置文件
363 lines
16 KiB
Markdown
363 lines
16 KiB
Markdown
# 🎙️ Simple Podcast Generator
|
|
|
|
> Easily transform your ideas into lively and engaging multi-person dialogue podcasts with one click!
|
|
> [中文版本](README.md)
|
|
|
|
This is a powerful script tool that leverages the wisdom of **OpenAI API** to generate insightful podcast scripts, and through **TTS (Text-to-Speech)** API services, transforms cold text into warm audio. You just need to provide a topic, and leave the rest to it!
|
|
|
|
✨ The podcast script generation logic of this project is deeply inspired by the [SurfSense](https://github.com/MODSetter/SurfSense) project. We would like to express our heartfelt thanks to their open-source contributions!
|
|
|
|
---
|
|
|
|
## ✨ Core Highlights
|
|
|
|
* **🤖 AI-Driven Scripts**: Automatically create high-quality, in-depth podcast dialogue scripts with the power of OpenAI models.
|
|
* **👥 Multi-Character Support**: Freely define multiple podcast characters (such as host, guest), and assign unique TTS voices to each character.
|
|
* **🔌 Flexible TTS Integration**: Seamlessly connect to your self-hosted or third-party TTS services through simple API URL configuration.
|
|
* **🔊 Intelligent Audio Synthesis**: Automatically splice voice segments of each character accurately, and support **volume and speed adjustment** to synthesize a complete, smooth podcast audio file (`.wav` format).
|
|
* **⌨️ Convenient Command-Line Interface**: Provides clear command-line parameters, giving you full control over every aspect of the podcast generation process.
|
|
|
|
---
|
|
|
|
## 🛠️ Installation Guide
|
|
|
|
### 📝 Prerequisites
|
|
|
|
1. **Python 3.x**
|
|
* Please ensure Python 3 is installed on your system.
|
|
|
|
2. **FFmpeg**
|
|
* This project depends on FFmpeg for audio merging. Please visit the [FFmpeg official website](https://ffmpeg.org/download.html) to download and install.
|
|
* **Important Note**: After installation, please ensure the `ffmpeg` command has been added to your system's environment variables (PATH) so that the script can call it normally.
|
|
|
|
### 🐍 Python Dependencies
|
|
|
|
Open your terminal or command prompt and install the required Python libraries using pip:
|
|
```bash
|
|
pip install requests openai pydub msgpack
|
|
```
|
|
|
|
> **Dependency Explanation**:
|
|
> - `requests`: Used to send HTTP requests to TTS service APIs
|
|
> - `openai`: Used to interact with OpenAI API to generate podcast scripts
|
|
> - `pydub`: Used for audio processing, such as adjusting volume and speed
|
|
> - `msgpack`: Used for efficient data serialization with certain TTS services (such as Fish Audio)
|
|
|
|
---
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### 1. Prepare Input Files
|
|
|
|
Before running, please ensure the following files are ready:
|
|
|
|
* `input.txt`: Enter your **podcast topic** or core idea in this file.
|
|
* `prompt/prompt-overview.txt`: System prompt used to guide AI in generating the podcast **overall outline**.
|
|
* `prompt/prompt-podscript.txt`: System prompt used to guide AI in generating **detailed dialogue scripts**. It contains dynamic placeholders (such as `{{numSpeakers}}`, `{{turnPattern}}`), which the script will automatically replace.
|
|
|
|
### 2. Configure TTS Services and Characters
|
|
|
|
* TTS configuration files (such as `edge-tts.json`) are stored in the `config/` directory. This file defines the TTS service API interface, podcast characters (`podUsers`) and their corresponding voices (`voices`).
|
|
|
|
### 3. Run the Script
|
|
|
|
Execute the following command in the project root directory:
|
|
|
|
```bash
|
|
python podcast_generator.py [optional parameters]
|
|
```
|
|
|
|
#### **Optional Parameters**
|
|
|
|
* `--api-key <YOUR_OPENAI_API_KEY>`: Your OpenAI API key. If not provided, it will be read from the configuration file or `OPENAI_API_KEY` environment variable.
|
|
* `--base-url <YOUR_OPENAI_BASE_URL>`: Proxy address of the OpenAI API. If not provided, it will be read from the configuration file or `OPENAI_BASE_URL` environment variable.
|
|
* `--model <OPENAI_MODEL_NAME>`: Specify the OpenAI model to use (such as `gpt-4o`, `gpt-4-turbo`). Default value is `gpt-3.5-turbo`.
|
|
* `--threads <NUMBER_OF_THREADS>`: Specify the number of parallel threads for audio generation (default is `1`) to improve processing speed.
|
|
* `--output-language <LANGUAGE_CODE>`: Specify the output language of the podcast script (default is `Chinese`).
|
|
* `--usetime <TIME_DURATION>`: Specify the time length of the podcast script (default is `10 minutes`).
|
|
|
|
#### **Running Example**
|
|
|
|
```bash
|
|
# Use gpt-4o model, edge-tts service and 4 threads to generate podcast
|
|
python podcast_generator.py --api-key sk-xxxxxx --model gpt-4o --tts-provider edge --threads 4
|
|
```
|
|
|
|
### 5. Using Web API (main.py)
|
|
|
|
This project also provides a FastAPI-based web service that allows you to generate podcasts through HTTP requests.
|
|
|
|
#### Start Web Service
|
|
|
|
```bash
|
|
python main.py
|
|
```
|
|
|
|
By default, the service will run on `http://localhost:8000`.
|
|
|
|
#### API Endpoints
|
|
|
|
1. **Generate Podcast** - `POST /generate-podcast`
|
|
- Parameters:
|
|
- `api_key`: OpenAI API key
|
|
- `base_url`: OpenAI API base URL (optional)
|
|
- `model`: OpenAI model name (optional)
|
|
- `input_txt_content`: Input text content
|
|
- `tts_providers_config_content`: TTS provider configuration content
|
|
- `podUsers_json_content`: Podcast user JSON configuration
|
|
- `threads`: Number of threads (optional, default is 1)
|
|
- `tts_provider`: TTS provider name (optional, default is "index-tts")
|
|
|
|
2. **Get Podcast Generation Status** - `GET /podcast-status`
|
|
- Requires `X-Auth-Id` header
|
|
|
|
3. **Download Podcast** - `GET /download-podcast/`
|
|
- Parameters:
|
|
- `file_name`: Name of the file to download
|
|
|
|
4. **Get Voice List** - `GET /get-voices`
|
|
- Parameters:
|
|
- `tts_provider`: TTS provider name (optional, default is "tts")
|
|
|
|
#### API Usage Example
|
|
|
|
```bash
|
|
# After starting the service, use curl to send a request to generate podcast
|
|
curl -X POST "http://localhost:8000/generate-podcast" \
|
|
-H "X-Auth-Id: your-auth-id" \
|
|
-F "api_key=sk-xxxxxx" \
|
|
-F "model=gpt-4o" \
|
|
-F "input_txt_content=The future development of artificial intelligence" \
|
|
-F "tts_providers_config_content={\"index\": {\"api_key\": \"your-api-key\"}}" \
|
|
-F "podUsers_json_content=[{\"code\":\"zh-CN-XiaoxiaoNeural\",\"role\":\"Host\"}],\"voices\":[{\"name\":\"Xiaoxiao\",\"code\":\"zh-CN-XiaoxiaoNeural\"}]" \
|
|
-F "threads=4" \
|
|
-F "tts_provider=index-tts"
|
|
```
|
|
|
|
### 4. Customizing AI Prompts (`custom` code block)
|
|
|
|
To provide more detailed AI instructions or add specific context, you can embed `custom` code blocks in the `input.txt` file. The content in this code block will be used as additional instructions, built into the core prompt for podcast script generation (`prompt-podscript.txt`), thereby influencing the AI's generation behavior.
|
|
|
|
**Usage**:
|
|
In the `input.txt` file, define your custom content anywhere using the following format:
|
|
|
|
```
|
|
```custom-begin
|
|
Additional instructions or context you want to provide to the AI, for example:
|
|
- "Please ensure the discussion includes an in-depth analysis of [specific concept]."
|
|
- "Please add some humor to the conversation, especially jokes about [a certain topic]."
|
|
- "All characters' speeches must be brief, with each sentence not exceeding two lines."
|
|
```custom-end
|
|
```
|
|
|
|
---
|
|
|
|
## 🌐 Web Application (Next.js)
|
|
|
|
In addition to the command-line script and FastAPI service, this project also provides a fully functional web user interface. This interface aims to provide a more intuitive and convenient podcast generation and management experience, exposing complex backend functions through friendly frontend operations to users.
|
|
|
|
### ✨ Core Features
|
|
|
|
* **Web Operation Interface**: Intuitive and friendly web interface that makes the podcast generation process clear at a glance.
|
|
* **Micro User System Integration**: Supports user login, registration, points and billing functions, building a complete user ecosystem.
|
|
* **Podcast Creation and Configuration**: Allows users to enter topics through forms and configure TTS characters, volume and speed parameters.
|
|
* **Real-time Progress Tracking**: Displays the status and progress of podcast generation.
|
|
* **Podcast Playback and Management**: Integrates an audio player for users to listen to generated podcasts and may provide functions for managing historical podcasts.
|
|
* **API Interaction**: Seamless communication with the backend Python service through APIs, including podcast generation, status queries, and audio streaming.
|
|
|
|
### 🚀 Quick Start (Web)
|
|
|
|
1. **Install Node.js**: Please ensure Node.js is installed on your system (LTS version recommended).
|
|
2. **Install Dependencies**: Enter the `web/` directory and install all frontend dependencies.
|
|
```bash
|
|
cd web/
|
|
npm install
|
|
# or yarn install
|
|
```
|
|
3. **Start Development Server**:
|
|
```bash
|
|
npm run dev
|
|
# or yarn dev
|
|
```
|
|
The web application will start at `http://localhost:3000` (default).
|
|
4. **Build Production Environment**:
|
|
```bash
|
|
npm run build
|
|
# or yarn build
|
|
npm run start
|
|
# or yarn start
|
|
```
|
|
|
|
### 🐳 Docker Deployment
|
|
|
|
This project supports deployment via Docker. For detailed information, please refer to [Docker Usage Guide](DOCKER_USAGE.md).
|
|
|
|
---
|
|
|
|
## 🌍 Internationalization (i18n) Support
|
|
|
|
This project supports multilingual interfaces, currently supporting English (en), Chinese (zh-CN), and Japanese (ja).
|
|
|
|
### 📁 Language File Structure
|
|
|
|
Language files are located in the `web/public/locales/` directory, grouped by language code:
|
|
- `web/public/locales/en/common.json` - English translation
|
|
- `web/public/locales/zh-CN/common.json` - Chinese translation
|
|
- `web/public/locales/ja/common.json` - Japanese translation
|
|
|
|
### 🛠️ Adding New Languages
|
|
|
|
1. Create a new language folder in the `web/public/locales/` directory, for example `fr/`
|
|
2. Copy the `common.json` file to the new folder
|
|
3. Translate all key-value pairs in the file
|
|
4. Update the `languages` variable in the `web/src/i18n/settings.ts` file
|
|
|
|
### 🌐 Language Switching
|
|
|
|
Users can automatically switch languages through the URL path or browser language settings:
|
|
- `http://localhost:3000/en/` - English interface
|
|
- `http://localhost:3000/zh-CN/` - Chinese interface
|
|
- `http://localhost:3000/ja/` - Japanese interface
|
|
|
|
---
|
|
|
|
## ⚙️ Configuration File Details
|
|
|
|
### `config/[tts-provider].json` (TTS Character and Voice Configuration)
|
|
|
|
This is your core TTS configuration file, with the filename corresponding to the provider specified by the `--tts-provider` parameter. It tells the script how to work with the TTS service.
|
|
|
|
```json
|
|
{
|
|
"podUsers": [
|
|
{
|
|
"code": "zh-CN-XiaoxiaoNeural",
|
|
"role": "Host"
|
|
},
|
|
{
|
|
"code": "zh-CN-YunxiNeural",
|
|
"role": "Tech Expert"
|
|
}
|
|
],
|
|
"voices": [
|
|
{
|
|
"name": "XiaoMin",
|
|
"code": "yue-CN-XiaoMinNeural",
|
|
"volume_adjustment": 1.0,
|
|
"speed_adjustment": 5.0
|
|
}
|
|
],
|
|
"apiUrl": "http://localhost:5000/api/tts?text={{text}}&voiceCode={{voiceCode}}",
|
|
"turnPattern": "random",
|
|
"tts_max_retries": 3
|
|
}
|
|
```
|
|
|
|
* `podUsers`: Defines the **characters** in the podcast. Each character's `code` must correspond to a valid voice in the `voices` list.
|
|
* `voices`: Defines all available TTS **voices**.
|
|
* `volume_adjustment` (optional): Volume adjustment (dB). For example, `6.0` increases volume by 6dB.
|
|
* `speed_adjustment` (optional): Speed adjustment (%). For example, `10.0` increases speed by 10%.
|
|
* `apiUrl`: Your TTS service API endpoint. `{{text}}` and `{{voiceCode}}` are placeholders.
|
|
* `turnPattern`: Defines the **turn-taking mode** for character dialogue, such as `random` (random) or `sequential` (sequential).
|
|
* `tts_max_retries` (optional): Maximum number of retries when TTS API calls fail (default is `3`).
|
|
|
|
### `config/tts_providers.json` (TTS Provider Authentication)
|
|
|
|
This file is used to centrally manage authentication information (such as API keys) for all TTS service providers.
|
|
|
|
```json
|
|
{
|
|
"index": { "api_key": null },
|
|
"edge": { "api_key": null },
|
|
"doubao": { "X-Api-App-Id": "null", "X-Api-Access-Key": "null" },
|
|
"fish": { "api_key": "null" },
|
|
"minimax": { "group_id": "null", "api_key": "null" },
|
|
"gemini": { "api_key": "null" }
|
|
}
|
|
```
|
|
**Note**: In actual use, please replace `"null"` with valid authentication information. You can create a `tts_providers-local.json` to store real keys, which has been ignored by `.gitignore`.
|
|
|
|
---
|
|
|
|
## 🔌 Supported TTS Services
|
|
|
|
This project is designed to be highly flexible and supports multiple TTS services.
|
|
|
|
| Provider | Type | Support Status |
|
|
| :--- | :--- | :---: |
|
|
| **Index-TTS** | Local | ✅ Supported |
|
|
| **Edge-TTS** | Local | ✅ Supported |
|
|
| **Doubao** | Network | ✅ Supported |
|
|
| **Minimax** | Network | ✅ Supported |
|
|
| **Fish Audio**| Network | ✅ Supported |
|
|
| **Gemini** | Network | ✅ Supported |
|
|
| **OpenAI TTS**| Network | Planned |
|
|
| **Azure TTS** | Network | Planned |
|
|
|
|
---
|
|
|
|
## 🎉 Output Results
|
|
|
|
All successfully generated podcast audio files will be automatically saved in the `output/` directory. The filename format is `podcast_` plus the timestamp when it was generated, for example `podcast_1678886400.wav`.
|
|
|
|
---
|
|
|
|
## 🎧 Sample Audio
|
|
|
|
You can find sample podcast audio generated using different TTS services in the `example/` folder.
|
|
|
|
| TTS Service | Listen Link |
|
|
| :--- | :--- |
|
|
| **Edge TTS** | [▶️ edgeTTS.wav](example/edgeTTS.wav) |
|
|
| **Index TTS** | [▶️ indexTTS.wav](example/indexTTS.wav) |
|
|
| **Doubao TTS** | [▶️ doubaoTTS.wav](example/doubaoTTS.wav) |
|
|
| **Minimax** | [▶️ minimax.wav](example/minimax.wav) |
|
|
| **Fish Audio**| [▶️ fish.wav](example/fish.wav) |
|
|
| **Gemini TTS**| [▶️ geminiTTS.wav](example/geminiTTS.wav) |
|
|
|
|
---
|
|
|
|
## 📂 File Structure
|
|
|
|
```
|
|
.
|
|
├── config/ # ⚙️ Configuration directory
|
|
│ ├── doubao-tts.json # ... (configuration for each TTS provider)
|
|
│ └── tts_providers.json # Unified TTS authentication file
|
|
├── server/ # 🐍 Backend service directory
|
|
│ ├── main.py # FastAPI Web API entry: Provides RESTful APIs for podcast generation, status query, audio download, manages task lifecycle, and performs data cleanup.
|
|
│ ├── podcast_generator.py # Core podcast generation logic: Responsible for interacting with OpenAI API to generate podcast scripts, calling TTS adapters to convert text to speech, and using FFmpeg to merge audio files.
|
|
│ ├── tts_adapters.py # TTS adapter: Encapsulates interaction logic with different TTS services (such as Index-TTS, Edge-TTS, Doubao, Minimax, Fish Audio, Gemini).
|
|
│ ├── openai_cli.py # OpenAI command-line tool
|
|
│ └── ... # Other backend files
|
|
├── web/ # 🌐 Frontend Web Application Directory (Next.js)
|
|
│ ├── public/ # Static resources
|
|
│ ├── src/ # Source code
|
|
│ │ ├── app/ # Next.js route pages
|
|
│ │ ├── components/ # React components
|
|
│ │ ├── hooks/ # React Hooks
|
|
│ │ ├── lib/ # Library files (authentication, database, API, etc.)
|
|
│ │ └── types/ # TypeScript type definitions
|
|
│ ├── package.json # Frontend dependencies
|
|
│ ├── next.config.js # Next.js configuration
|
|
│ └── ... # Other frontend files
|
|
├── prompt/ # 🧠 AI prompt directory
|
|
│ ├── prompt-overview.txt
|
|
│ └── prompt-podscript.txt
|
|
├── example/ # 🎧 Sample audio directory
|
|
├── output/ # 🎉 Output audio directory
|
|
├── input.txt # 🎙️ Podcast topic input file
|
|
├── README.md # 📄 Project documentation (Chinese)
|
|
└── README_EN.md # 📄 Project documentation (English)
|
|
```
|
|
|
|
---
|
|
|
|
## 📝 Disclaimer
|
|
|
|
* **License**: This project is licensed under [GPL-3.0](https://www.gnu.org/licenses/gpl-3.0.html).
|
|
* **No Warranty**: This software is provided "as is" without any express or implied warranties.
|
|
* **Liability Limitation**: Under no circumstances shall the authors or copyright holders be liable for any damages arising from the use of this software.
|
|
* **Third-Party Services**: Users bear the risks and responsibilities of using third-party services (such as OpenAI API, TTS services) on their own.
|
|
* **Usage Purpose**: This project is for learning and research purposes only. Please comply with all applicable laws and regulations.
|
|
* **Final Interpretation Rights**: We reserve the right to modify this disclaimer at any time. |