将IndexTTS、EdgeTTS、FishAudio、Minimax、Doubao和Gemini等所有TTS适配器的API请求超时时间从30-60秒统一增加到90秒,以提高音频生成的稳定性,避免因网络延迟导致的请求超时问题。
🎙️ Simple Podcast Generator
Easily transform your ideas into lively and engaging multi-person dialogue podcasts with one click! 中文版本
This is a powerful script tool that leverages the wisdom of OpenAI API to generate insightful podcast scripts, and through TTS (Text-to-Speech) API services, transforms cold text into warm audio. You just need to provide a topic, and leave the rest to it!
✨ The podcast script generation logic of this project is deeply inspired by the SurfSense project. We would like to express our heartfelt thanks to their open-source contributions!
✨ Core Highlights
- 🤖 AI-Driven Scripts: Automatically create high-quality, in-depth podcast dialogue scripts with the power of OpenAI models.
- 👥 Multi-Character Support: Freely define multiple podcast characters (such as host, guest), and assign unique TTS voices to each character.
- 🔌 Flexible TTS Integration: Seamlessly connect to your self-hosted or third-party TTS services through simple API URL configuration.
- 🔊 Intelligent Audio Synthesis: Automatically splice voice segments of each character accurately, and support volume and speed adjustment to synthesize a complete, smooth podcast audio file (
.wavformat). - ⌨️ Convenient Command-Line Interface: Provides clear command-line parameters, giving you full control over every aspect of the podcast generation process.
🛠️ Installation Guide
📝 Prerequisites
-
Python 3.x
- Please ensure Python 3 is installed on your system.
-
FFmpeg
- This project depends on FFmpeg for audio merging. Please visit the FFmpeg official website to download and install.
- Important Note: After installation, please ensure the
ffmpegcommand has been added to your system's environment variables (PATH) so that the script can call it normally.
🐍 Python Dependencies
Open your terminal or command prompt and install the required Python libraries using pip:
pip install requests openai pydub msgpack
Dependency Explanation:
requests: Used to send HTTP requests to TTS service APIsopenai: Used to interact with OpenAI API to generate podcast scriptspydub: Used for audio processing, such as adjusting volume and speedmsgpack: Used for efficient data serialization with certain TTS services (such as Fish Audio)
🚀 Quick Start
1. Prepare Input Files
Before running, please ensure the following files are ready:
input.txt: Enter your podcast topic or core idea in this file.prompt/prompt-overview.txt: System prompt used to guide AI in generating the podcast overall outline.prompt/prompt-podscript.txt: System prompt used to guide AI in generating detailed dialogue scripts. It contains dynamic placeholders (such as{{numSpeakers}},{{turnPattern}}), which the script will automatically replace.
2. Configure TTS Services and Characters
- TTS configuration files (such as
edge-tts.json) are stored in theconfig/directory. This file defines the TTS service API interface, podcast characters (podUsers) and their corresponding voices (voices).
3. Run the Script
Execute the following command in the project root directory:
python podcast_generator.py [optional parameters]
Optional Parameters
--api-key <YOUR_OPENAI_API_KEY>: Your OpenAI API key. If not provided, it will be read from the configuration file orOPENAI_API_KEYenvironment variable.--base-url <YOUR_OPENAI_BASE_URL>: Proxy address of the OpenAI API. If not provided, it will be read from the configuration file orOPENAI_BASE_URLenvironment variable.--model <OPENAI_MODEL_NAME>: Specify the OpenAI model to use (such asgpt-4o,gpt-4-turbo). Default value isgpt-3.5-turbo.--threads <NUMBER_OF_THREADS>: Specify the number of parallel threads for audio generation (default is1) to improve processing speed.--output-language <LANGUAGE_CODE>: Specify the output language of the podcast script (default isChinese).--usetime <TIME_DURATION>: Specify the time length of the podcast script (default is10 minutes).
Running Example
# Use gpt-4o model, edge-tts service and 4 threads to generate podcast
python podcast_generator.py --api-key sk-xxxxxx --model gpt-4o --tts-provider edge --threads 4
5. Using Web API (main.py)
This project also provides a FastAPI-based web service that allows you to generate podcasts through HTTP requests.
Start Web Service
python main.py
By default, the service will run on http://localhost:8000.
API Endpoints
-
Generate Podcast -
POST /generate-podcast- Parameters:
api_key: OpenAI API keybase_url: OpenAI API base URL (optional)model: OpenAI model name (optional)input_txt_content: Input text contenttts_providers_config_content: TTS provider configuration contentpodUsers_json_content: Podcast user JSON configurationthreads: Number of threads (optional, default is 1)tts_provider: TTS provider name (optional, default is "index-tts")
- Parameters:
-
Get Podcast Generation Status -
GET /podcast-status- Requires
X-Auth-Idheader
- Requires
-
Download Podcast -
GET /download-podcast/- Parameters:
file_name: Name of the file to download
- Parameters:
-
Get Voice List -
GET /get-voices- Parameters:
tts_provider: TTS provider name (optional, default is "tts")
- Parameters:
API Usage Example
# After starting the service, use curl to send a request to generate podcast
curl -X POST "http://localhost:8000/generate-podcast" \
-H "X-Auth-Id: your-auth-id" \
-F "api_key=sk-xxxxxx" \
-F "model=gpt-4o" \
-F "input_txt_content=The future development of artificial intelligence" \
-F "tts_providers_config_content={\"index\": {\"api_key\": \"your-api-key\"}}" \
-F "podUsers_json_content=[{\"code\":\"zh-CN-XiaoxiaoNeural\",\"role\":\"Host\"}],\"voices\":[{\"name\":\"Xiaoxiao\",\"code\":\"zh-CN-XiaoxiaoNeural\"}]" \
-F "threads=4" \
-F "tts_provider=index-tts"
4. Customizing AI Prompts (custom code block)
To provide more detailed AI instructions or add specific context, you can embed custom code blocks in the input.txt file. The content in this code block will be used as additional instructions, built into the core prompt for podcast script generation (prompt-podscript.txt), thereby influencing the AI's generation behavior.
Usage:
In the input.txt file, define your custom content anywhere using the following format:
```custom-begin
Additional instructions or context you want to provide to the AI, for example:
- "Please ensure the discussion includes an in-depth analysis of [specific concept]."
- "Please add some humor to the conversation, especially jokes about [a certain topic]."
- "All characters' speeches must be brief, with each sentence not exceeding two lines."
```custom-end
🌐 Web Application (Next.js)
In addition to the command-line script and FastAPI service, this project also provides a fully functional web user interface. This interface aims to provide a more intuitive and convenient podcast generation and management experience, exposing complex backend functions through friendly frontend operations to users.
✨ Core Features
- Web Operation Interface: Intuitive and friendly web interface that makes the podcast generation process clear at a glance.
- Micro User System Integration: Supports user login, registration, points and billing functions, building a complete user ecosystem.
- Podcast Creation and Configuration: Allows users to enter topics through forms and configure TTS characters, volume and speed parameters.
- Real-time Progress Tracking: Displays the status and progress of podcast generation.
- Podcast Playback and Management: Integrates an audio player for users to listen to generated podcasts and may provide functions for managing historical podcasts.
- API Interaction: Seamless communication with the backend Python service through APIs, including podcast generation, status queries, and audio streaming.
🚀 Quick Start (Web)
- Install Node.js: Please ensure Node.js is installed on your system (LTS version recommended).
- Install Dependencies: Enter the
web/directory and install all frontend dependencies.cd web/ npm install # or yarn install - Start Development Server:
The web application will start at
npm run dev # or yarn devhttp://localhost:3000(default). - Build Production Environment:
npm run build # or yarn build npm run start # or yarn start
🐳 Docker Deployment
This project supports deployment via Docker. For detailed information, please refer to Docker Usage Guide.
🌍 Internationalization (i18n) Support
This project supports multilingual interfaces, currently supporting English (en), Chinese (zh-CN), and Japanese (ja).
📁 Language File Structure
Language files are located in the web/public/locales/ directory, grouped by language code:
web/public/locales/en/common.json- English translationweb/public/locales/zh-CN/common.json- Chinese translationweb/public/locales/ja/common.json- Japanese translation
🛠️ Adding New Languages
- Create a new language folder in the
web/public/locales/directory, for examplefr/ - Copy the
common.jsonfile to the new folder - Translate all key-value pairs in the file
- Update the
languagesvariable in theweb/src/i18n/settings.tsfile
🌐 Language Switching
Users can automatically switch languages through the URL path or browser language settings:
http://localhost:3000/en/- English interfacehttp://localhost:3000/zh-CN/- Chinese interfacehttp://localhost:3000/ja/- Japanese interface
⚙️ Configuration File Details
config/[tts-provider].json (TTS Character and Voice Configuration)
This is your core TTS configuration file, with the filename corresponding to the provider specified by the --tts-provider parameter. It tells the script how to work with the TTS service.
{
"podUsers": [
{
"code": "zh-CN-XiaoxiaoNeural",
"role": "Host"
},
{
"code": "zh-CN-YunxiNeural",
"role": "Tech Expert"
}
],
"voices": [
{
"name": "XiaoMin",
"code": "yue-CN-XiaoMinNeural",
"volume_adjustment": 1.0,
"speed_adjustment": 5.0
}
],
"apiUrl": "http://localhost:5000/api/tts?text={{text}}&voiceCode={{voiceCode}}",
"turnPattern": "random",
"tts_max_retries": 3
}
podUsers: Defines the characters in the podcast. Each character'scodemust correspond to a valid voice in thevoiceslist.voices: Defines all available TTS voices.volume_adjustment(optional): Volume adjustment (dB). For example,6.0increases volume by 6dB.speed_adjustment(optional): Speed adjustment (%). For example,10.0increases speed by 10%.
apiUrl: Your TTS service API endpoint.{{text}}and{{voiceCode}}are placeholders.turnPattern: Defines the turn-taking mode for character dialogue, such asrandom(random) orsequential(sequential).tts_max_retries(optional): Maximum number of retries when TTS API calls fail (default is3).
config/tts_providers.json (TTS Provider Authentication)
This file is used to centrally manage authentication information (such as API keys) for all TTS service providers.
{
"index": { "api_key": null },
"edge": { "api_key": null },
"doubao": { "X-Api-App-Id": "null", "X-Api-Access-Key": "null" },
"fish": { "api_key": "null" },
"minimax": { "group_id": "null", "api_key": "null" },
"gemini": { "api_key": "null" }
}
Note: In actual use, please replace "null" with valid authentication information. You can create a tts_providers-local.json to store real keys, which has been ignored by .gitignore.
🔌 Supported TTS Services
This project is designed to be highly flexible and supports multiple TTS services.
| Provider | Type | Support Status |
|---|---|---|
| Index-TTS | Local | ✅ Supported |
| Edge-TTS | Local | ✅ Supported |
| Doubao | Network | ✅ Supported |
| Minimax | Network | ✅ Supported |
| Fish Audio | Network | ✅ Supported |
| Gemini | Network | ✅ Supported |
| OpenAI TTS | Network | Planned |
| Azure TTS | Network | Planned |
🎉 Output Results
All successfully generated podcast audio files will be automatically saved in the output/ directory. The filename format is podcast_ plus the timestamp when it was generated, for example podcast_1678886400.wav.
🎧 Sample Audio
You can find sample podcast audio generated using different TTS services in the example/ folder.
| TTS Service | Listen Link |
|---|---|
| Edge TTS | ▶️ edgeTTS.wav |
| Index TTS | ▶️ indexTTS.wav |
| Doubao TTS | ▶️ doubaoTTS.wav |
| Minimax | ▶️ minimax.wav |
| Fish Audio | ▶️ fish.wav |
| Gemini TTS | ▶️ geminiTTS.wav |
📂 File Structure
.
├── config/ # ⚙️ Configuration directory
│ ├── doubao-tts.json # ... (configuration for each TTS provider)
│ └── tts_providers.json # Unified TTS authentication file
├── server/ # 🐍 Backend service directory
│ ├── main.py # FastAPI Web API entry: Provides RESTful APIs for podcast generation, status query, audio download, manages task lifecycle, and performs data cleanup.
│ ├── podcast_generator.py # Core podcast generation logic: Responsible for interacting with OpenAI API to generate podcast scripts, calling TTS adapters to convert text to speech, and using FFmpeg to merge audio files.
│ ├── tts_adapters.py # TTS adapter: Encapsulates interaction logic with different TTS services (such as Index-TTS, Edge-TTS, Doubao, Minimax, Fish Audio, Gemini).
│ ├── openai_cli.py # OpenAI command-line tool
│ └── ... # Other backend files
├── web/ # 🌐 Frontend Web Application Directory (Next.js)
│ ├── public/ # Static resources
│ ├── src/ # Source code
│ │ ├── app/ # Next.js route pages
│ │ ├── components/ # React components
│ │ ├── hooks/ # React Hooks
│ │ ├── lib/ # Library files (authentication, database, API, etc.)
│ │ └── types/ # TypeScript type definitions
│ ├── package.json # Frontend dependencies
│ ├── next.config.js # Next.js configuration
│ └── ... # Other frontend files
├── prompt/ # 🧠 AI prompt directory
│ ├── prompt-overview.txt
│ └── prompt-podscript.txt
├── example/ # 🎧 Sample audio directory
├── output/ # 🎉 Output audio directory
├── input.txt # 🎙️ Podcast topic input file
├── README.md # 📄 Project documentation (Chinese)
└── README_EN.md # 📄 Project documentation (English)
📝 Disclaimer
- License: This project is licensed under GPL-3.0.
- No Warranty: This software is provided "as is" without any express or implied warranties.
- Liability Limitation: Under no circumstances shall the authors or copyright holders be liable for any damages arising from the use of this software.
- Third-Party Services: Users bear the risks and responsibilities of using third-party services (such as OpenAI API, TTS services) on their own.
- Usage Purpose: This project is for learning and research purposes only. Please comply with all applicable laws and regulations.
- Final Interpretation Rights: We reserve the right to modify this disclaimer at any time.