# πŸŽ™οΈ Simple Podcast Generator > Easily transform your ideas into lively and engaging multi-person dialogue podcasts with one click! > [δΈ­ζ–‡η‰ˆζœ¬](README.md) This is a powerful script tool that leverages the wisdom of **OpenAI API** to generate insightful podcast scripts, and through **TTS (Text-to-Speech)** API services, transforms cold text into warm audio. You just need to provide a topic, and leave the rest to it! ✨ The podcast script generation logic of this project is deeply inspired by the [SurfSense](https://github.com/MODSetter/SurfSense) project. We would like to express our heartfelt thanks to their open-source contributions! --- ## ✨ Core Highlights * **πŸ€– AI-Driven Scripts**: Automatically create high-quality, in-depth podcast dialogue scripts with the power of OpenAI models. * **πŸ‘₯ Multi-Character Support**: Freely define multiple podcast characters (such as host, guest), and assign unique TTS voices to each character. * **πŸ”Œ Flexible TTS Integration**: Seamlessly connect to your self-hosted or third-party TTS services through simple API URL configuration. * **πŸ”Š Intelligent Audio Synthesis**: Automatically splice voice segments of each character accurately, and support **volume and speed adjustment** to synthesize a complete, smooth podcast audio file (`.wav` format). * **⌨️ Convenient Command-Line Interface**: Provides clear command-line parameters, giving you full control over every aspect of the podcast generation process. --- ## πŸ› οΈ Installation Guide ### πŸ“ Prerequisites 1. **Python 3.x** * Please ensure Python 3 is installed on your system. 2. **FFmpeg** * This project depends on FFmpeg for audio merging. Please visit the [FFmpeg official website](https://ffmpeg.org/download.html) to download and install. * **Important Note**: After installation, please ensure the `ffmpeg` command has been added to your system's environment variables (PATH) so that the script can call it normally. ### 🐍 Python Dependencies Open your terminal or command prompt and install the required Python libraries using pip: ```bash pip install requests openai pydub msgpack ``` > **Dependency Explanation**: > - `requests`: Used to send HTTP requests to TTS service APIs > - `openai`: Used to interact with OpenAI API to generate podcast scripts > - `pydub`: Used for audio processing, such as adjusting volume and speed > - `msgpack`: Used for efficient data serialization with certain TTS services (such as Fish Audio) --- ## πŸš€ Quick Start ### 1. Prepare Input Files Before running, please ensure the following files are ready: * `input.txt`: Enter your **podcast topic** or core idea in this file. * `prompt/prompt-overview.txt`: System prompt used to guide AI in generating the podcast **overall outline**. * `prompt/prompt-podscript.txt`: System prompt used to guide AI in generating **detailed dialogue scripts**. It contains dynamic placeholders (such as `{{numSpeakers}}`, `{{turnPattern}}`), which the script will automatically replace. ### 2. Configure TTS Services and Characters * TTS configuration files (such as `edge-tts.json`) are stored in the `config/` directory. This file defines the TTS service API interface, podcast characters (`podUsers`) and their corresponding voices (`voices`). ### 3. Run the Script Execute the following command in the project root directory: ```bash python podcast_generator.py [optional parameters] ``` #### **Optional Parameters** * `--api-key `: Your OpenAI API key. If not provided, it will be read from the configuration file or `OPENAI_API_KEY` environment variable. * `--base-url `: Proxy address of the OpenAI API. If not provided, it will be read from the configuration file or `OPENAI_BASE_URL` environment variable. * `--model `: Specify the OpenAI model to use (such as `gpt-4o`, `gpt-4-turbo`). Default value is `gpt-3.5-turbo`. * `--threads `: Specify the number of parallel threads for audio generation (default is `1`) to improve processing speed. * `--output-language `: Specify the output language of the podcast script (default is `Chinese`). * `--usetime `: Specify the time length of the podcast script (default is `10 minutes`). #### **Running Example** ```bash # Use gpt-4o model, edge-tts service and 4 threads to generate podcast python podcast_generator.py --api-key sk-xxxxxx --model gpt-4o --tts-provider edge --threads 4 ``` ### 5. Using Web API (main.py) This project also provides a FastAPI-based web service that allows you to generate podcasts through HTTP requests. #### Start Web Service ```bash python main.py ``` By default, the service will run on `http://localhost:8000`. #### API Endpoints 1. **Generate Podcast** - `POST /generate-podcast` - Parameters: - `api_key`: OpenAI API key - `base_url`: OpenAI API base URL (optional) - `model`: OpenAI model name (optional) - `input_txt_content`: Input text content - `tts_providers_config_content`: TTS provider configuration content - `podUsers_json_content`: Podcast user JSON configuration - `threads`: Number of threads (optional, default is 1) - `tts_provider`: TTS provider name (optional, default is "index-tts") 2. **Get Podcast Generation Status** - `GET /podcast-status` - Requires `X-Auth-Id` header 3. **Download Podcast** - `GET /download-podcast/` - Parameters: - `file_name`: Name of the file to download 4. **Get Voice List** - `GET /get-voices` - Parameters: - `tts_provider`: TTS provider name (optional, default is "tts") #### API Usage Example ```bash # After starting the service, use curl to send a request to generate podcast curl -X POST "http://localhost:8000/generate-podcast" \ -H "X-Auth-Id: your-auth-id" \ -F "api_key=sk-xxxxxx" \ -F "model=gpt-4o" \ -F "input_txt_content=The future development of artificial intelligence" \ -F "tts_providers_config_content={\"index\": {\"api_key\": \"your-api-key\"}}" \ -F "podUsers_json_content=[{\"code\":\"zh-CN-XiaoxiaoNeural\",\"role\":\"Host\"}],\"voices\":[{\"name\":\"Xiaoxiao\",\"code\":\"zh-CN-XiaoxiaoNeural\"}]" \ -F "threads=4" \ -F "tts_provider=index-tts" ``` ### 4. Customizing AI Prompts (`custom` code block) To provide more detailed AI instructions or add specific context, you can embed `custom` code blocks in the `input.txt` file. The content in this code block will be used as additional instructions, built into the core prompt for podcast script generation (`prompt-podscript.txt`), thereby influencing the AI's generation behavior. **Usage**: In the `input.txt` file, define your custom content anywhere using the following format: ``` ```custom-begin Additional instructions or context you want to provide to the AI, for example: - "Please ensure the discussion includes an in-depth analysis of [specific concept]." - "Please add some humor to the conversation, especially jokes about [a certain topic]." - "All characters' speeches must be brief, with each sentence not exceeding two lines." ```custom-end ``` --- ## 🌐 Web Application (Next.js) In addition to the command-line script and FastAPI service, this project also provides a fully functional web user interface. This interface aims to provide a more intuitive and convenient podcast generation and management experience, exposing complex backend functions through friendly frontend operations to users. ### ✨ Core Features * **Web Operation Interface**: Intuitive and friendly web interface that makes the podcast generation process clear at a glance. * **Micro User System Integration**: Supports user login, registration, points and billing functions, building a complete user ecosystem. * **Podcast Creation and Configuration**: Allows users to enter topics through forms and configure TTS characters, volume and speed parameters. * **Real-time Progress Tracking**: Displays the status and progress of podcast generation. * **Podcast Playback and Management**: Integrates an audio player for users to listen to generated podcasts and may provide functions for managing historical podcasts. * **API Interaction**: Seamless communication with the backend Python service through APIs, including podcast generation, status queries, and audio streaming. ### πŸš€ Quick Start (Web) 1. **Install Node.js**: Please ensure Node.js is installed on your system (LTS version recommended). 2. **Install Dependencies**: Enter the `web/` directory and install all frontend dependencies. ```bash cd web/ npm install # or yarn install ``` 3. **Start Development Server**: ```bash npm run dev # or yarn dev ``` The web application will start at `http://localhost:3000` (default). 4. **Build Production Environment**: ```bash npm run build # or yarn build npm run start # or yarn start ``` ### 🐳 Docker Deployment This project supports deployment via Docker. For detailed information, please refer to [Docker Usage Guide](DOCKER_USAGE.md). --- ## 🌍 Internationalization (i18n) Support This project supports multilingual interfaces, currently supporting English (en), Chinese (zh-CN), and Japanese (ja). ### πŸ“ Language File Structure Language files are located in the `web/public/locales/` directory, grouped by language code: - `web/public/locales/en/common.json` - English translation - `web/public/locales/zh-CN/common.json` - Chinese translation - `web/public/locales/ja/common.json` - Japanese translation ### πŸ› οΈ Adding New Languages 1. Create a new language folder in the `web/public/locales/` directory, for example `fr/` 2. Copy the `common.json` file to the new folder 3. Translate all key-value pairs in the file 4. Update the `languages` variable in the `web/src/i18n/settings.ts` file ### 🌐 Language Switching Users can automatically switch languages through the URL path or browser language settings: - `http://localhost:3000/en/` - English interface - `http://localhost:3000/zh-CN/` - Chinese interface - `http://localhost:3000/ja/` - Japanese interface --- ## βš™οΈ Configuration File Details ### `config/[tts-provider].json` (TTS Character and Voice Configuration) This is your core TTS configuration file, with the filename corresponding to the provider specified by the `--tts-provider` parameter. It tells the script how to work with the TTS service. ```json { "podUsers": [ { "code": "zh-CN-XiaoxiaoNeural", "role": "Host" }, { "code": "zh-CN-YunxiNeural", "role": "Tech Expert" } ], "voices": [ { "name": "XiaoMin", "code": "yue-CN-XiaoMinNeural", "volume_adjustment": 1.0, "speed_adjustment": 5.0 } ], "apiUrl": "http://localhost:5000/api/tts?text={{text}}&voiceCode={{voiceCode}}", "turnPattern": "random", "tts_max_retries": 3 } ``` * `podUsers`: Defines the **characters** in the podcast. Each character's `code` must correspond to a valid voice in the `voices` list. * `voices`: Defines all available TTS **voices**. * `volume_adjustment` (optional): Volume adjustment (dB). For example, `6.0` increases volume by 6dB. * `speed_adjustment` (optional): Speed adjustment (%). For example, `10.0` increases speed by 10%. * `apiUrl`: Your TTS service API endpoint. `{{text}}` and `{{voiceCode}}` are placeholders. * `turnPattern`: Defines the **turn-taking mode** for character dialogue, such as `random` (random) or `sequential` (sequential). * `tts_max_retries` (optional): Maximum number of retries when TTS API calls fail (default is `3`). ### `config/tts_providers.json` (TTS Provider Authentication) This file is used to centrally manage authentication information (such as API keys) for all TTS service providers. ```json { "index": { "api_key": null }, "edge": { "api_key": null }, "doubao": { "X-Api-App-Id": "null", "X-Api-Access-Key": "null" }, "fish": { "api_key": "null" }, "minimax": { "group_id": "null", "api_key": "null" }, "gemini": { "api_key": "null" } } ``` **Note**: In actual use, please replace `"null"` with valid authentication information. You can create a `tts_providers-local.json` to store real keys, which has been ignored by `.gitignore`. --- ## πŸ”Œ Supported TTS Services This project is designed to be highly flexible and supports multiple TTS services. | Provider | Type | Support Status | | :--- | :--- | :---: | | **Index-TTS** | Local | βœ… Supported | | **Edge-TTS** | Local | βœ… Supported | | **Doubao** | Network | βœ… Supported | | **Minimax** | Network | βœ… Supported | | **Fish Audio**| Network | βœ… Supported | | **Gemini** | Network | βœ… Supported | | **OpenAI TTS**| Network | Planned | | **Azure TTS** | Network | Planned | --- ## πŸŽ‰ Output Results All successfully generated podcast audio files will be automatically saved in the `output/` directory. The filename format is `podcast_` plus the timestamp when it was generated, for example `podcast_1678886400.wav`. --- ## 🎧 Sample Audio You can find sample podcast audio generated using different TTS services in the `example/` folder. | TTS Service | Listen Link | | :--- | :--- | | **Edge TTS** | [▢️ edgeTTS.wav](example/edgeTTS.wav) | | **Index TTS** | [▢️ indexTTS.wav](example/indexTTS.wav) | | **Doubao TTS** | [▢️ doubaoTTS.wav](example/doubaoTTS.wav) | | **Minimax** | [▢️ minimax.wav](example/minimax.wav) | | **Fish Audio**| [▢️ fish.wav](example/fish.wav) | | **Gemini TTS**| [▢️ geminiTTS.wav](example/geminiTTS.wav) | --- ## πŸ“‚ File Structure ``` . β”œβ”€β”€ config/ # βš™οΈ Configuration directory β”‚ β”œβ”€β”€ doubao-tts.json # ... (configuration for each TTS provider) β”‚ └── tts_providers.json # Unified TTS authentication file β”œβ”€β”€ server/ # 🐍 Backend service directory β”‚ β”œβ”€β”€ main.py # FastAPI Web API entry: Provides RESTful APIs for podcast generation, status query, audio download, manages task lifecycle, and performs data cleanup. β”‚ β”œβ”€β”€ podcast_generator.py # Core podcast generation logic: Responsible for interacting with OpenAI API to generate podcast scripts, calling TTS adapters to convert text to speech, and using FFmpeg to merge audio files. β”‚ β”œβ”€β”€ tts_adapters.py # TTS adapter: Encapsulates interaction logic with different TTS services (such as Index-TTS, Edge-TTS, Doubao, Minimax, Fish Audio, Gemini). β”‚ β”œβ”€β”€ openai_cli.py # OpenAI command-line tool β”‚ └── ... # Other backend files β”œβ”€β”€ web/ # 🌐 Frontend Web Application Directory (Next.js) β”‚ β”œβ”€β”€ public/ # Static resources β”‚ β”œβ”€β”€ src/ # Source code β”‚ β”‚ β”œβ”€β”€ app/ # Next.js route pages β”‚ β”‚ β”œβ”€β”€ components/ # React components β”‚ β”‚ β”œβ”€β”€ hooks/ # React Hooks β”‚ β”‚ β”œβ”€β”€ lib/ # Library files (authentication, database, API, etc.) β”‚ β”‚ └── types/ # TypeScript type definitions β”‚ β”œβ”€β”€ package.json # Frontend dependencies β”‚ β”œβ”€β”€ next.config.js # Next.js configuration β”‚ └── ... # Other frontend files β”œβ”€β”€ prompt/ # 🧠 AI prompt directory β”‚ β”œβ”€β”€ prompt-overview.txt β”‚ └── prompt-podscript.txt β”œβ”€β”€ example/ # 🎧 Sample audio directory β”œβ”€β”€ output/ # πŸŽ‰ Output audio directory β”œβ”€β”€ input.txt # πŸŽ™οΈ Podcast topic input file β”œβ”€β”€ README.md # πŸ“„ Project documentation (Chinese) └── README_EN.md # πŸ“„ Project documentation (English) ``` --- ## πŸ“ Disclaimer * **License**: This project is licensed under [GPL-3.0](https://www.gnu.org/licenses/gpl-3.0.html). * **No Warranty**: This software is provided "as is" without any express or implied warranties. * **Liability Limitation**: Under no circumstances shall the authors or copyright holders be liable for any damages arising from the use of this software. * **Third-Party Services**: Users bear the risks and responsibilities of using third-party services (such as OpenAI API, TTS services) on their own. * **Usage Purpose**: This project is for learning and research purposes only. Please comply with all applicable laws and regulations. * **Final Interpretation Rights**: We reserve the right to modify this disclaimer at any time.