# ๐ŸŽ™๏ธ Simple Podcast Generator > Easily transform your ideas into lively and engaging multi-person conversational podcasts with a single click! > [ไธญๆ–‡็‰ˆ](README.md) This is a powerful script tool that leverages the intelligence of **OpenAI API** to generate insightful podcast scripts and transforms cold text into warm audio through **TTS (Text-to-Speech)** API services. You just need to provide a topic, and it handles the rest! โœจ The podcast script generation logic of this project is deeply inspired by the [SurfSense](https://github.com/MODSetter/SurfSense) project. We express our sincere gratitude for its open-source contribution! --- ## โœจ Core Features * **๐Ÿค– AI-Driven Scripting**: Automatically generate high-quality, in-depth podcast dialogue scripts with the powerful OpenAI model. * **๐Ÿ‘ฅ Multi-Role Support**: Freely define multiple podcast roles (e.g., host, guest) and assign a unique TTS voice to each role. * **๐Ÿ”Œ Flexible TTS Integration**: Seamlessly connect with your self-built or third-party TTS services through simple API URL configuration. * **๐Ÿ”Š Smart Audio Merging**: Automatically and precisely stitch together voice segments from various roles to synthesize a complete, smooth podcast audio file (`.wav` format). * **โŒจ๏ธ Convenient Command-Line Interface**: Provides clear command-line parameters, giving you full control over every aspect of the podcast generation process. --- ## ๐Ÿ› ๏ธ Installation Guide ### ๐Ÿ“ Prerequisites 1. **Python 3.x** * Please ensure Python 3 is installed on your system. 2. **FFmpeg** * This project relies on FFmpeg for audio merging. Please visit the [FFmpeg official website](https://ffmpeg.org/download.html) to download and install it. * **Important**: After installation, please ensure the `ffmpeg` command is added to your system's environment variable (PATH) so that the script can call it normally. ### ๐Ÿ Python Dependencies Open your terminal or command prompt and install the required Python libraries using pip: ```bash pip install requests openai ``` --- ## ๐Ÿš€ Quick Start ### 1. Prepare Input Files Before running, please ensure the following files are ready: * `input.txt`: Enter the **podcast topic** or core ideas you wish to discuss in this file. * `prompt/prompt-overview.txt`: A system prompt used to guide AI in generating the **overall outline** of the podcast. * `prompt/prompt-podscript.txt`: A system prompt used to guide AI in generating the **detailed dialogue script**. It contains dynamic placeholders (e.g., `{{numSpeakers}}`, `{{turnPattern}}`), which the script will automatically replace. ### 2. Configure TTS Service and Roles * The `config/` directory contains your TTS configuration files (e.g., `edge-tts.json`). This file defines the TTS service's API interface, podcast roles (`podUsers`), and their corresponding voices (`voices`). ### 3. Run the Script Execute the following command in the project root directory: ```bash python podcast_generator.py [Optional Parameters] ``` #### **Optional Parameters** * `--api-key `: Your OpenAI API key. If not provided, it will be read from the configuration file or the `OPENAI_API_KEY` environment variable. * `--base-url `: Proxy address for the OpenAI API. If not provided, it will be read from the configuration file or the `OPENAI_BASE_URL` environment variable. * `--model `: Specify the OpenAI model to use (e.g., `gpt-4o`, `gpt-4-turbo`). The default value is `gpt-3.5-turbo`. * `--threads `: Specify the number of parallel threads for audio generation (default is `1`), improving processing speed. #### **Run Example** ```bash # Use gpt-4o model and 4 threads to generate the podcast python podcast_generator.py --api-key sk-xxxxxx --model gpt-4o --threads 4 ``` ### 4. Custom AI Prompts (`custom` code block) To provide more detailed AI instructions or add specific context, you can embed `custom` code blocks in the `input.txt` file. The content of this code block will serve as additional instructions, built into the core prompt (`prompt-podscript.txt`) for podcast script generation, thereby influencing the AI's generation behavior. **Usage**: In any location within the `input.txt` file, define your custom content using the following format: ``` ```custom-begin Additional instructions or context you wish to provide to the AI, for example: - "Please ensure the discussion includes an in-depth analysis of [specific concept]." - "Please add some humorous elements to the dialogue, especially jokes about [a certain topic]." - "All character speeches must be concise, and each sentence should not exceed two lines." ```custom-end ``` **Effect**: All text content within the `custom` code block (excluding the `custom-begin` and `custom-end` tags themselves) will be extracted and appended to the processed content of the [`prompt/prompt-podscript.txt`](prompt/prompt-podscript.txt) template. This means that these custom instructions will directly influence the AI's decisions and style when generating specific podcast dialogue scripts, helping you to control the output more precisely. **Example Scenario**: If you want the AI to particularly emphasize the future development of a certain technological trend when discussing a tech topic, you can add this to `input.txt`: ``` ```custom-begin Please foresightedly analyze the disruptive changes AI might bring in the next five years, and mention the potential impact of quantum computing on existing encryption technologies. ```custom-end ``` --- ## โš™๏ธ Configuration File Details (`config/*.json`) The configuration file is the "brain" of the entire project, telling the script how to work with AI and TTS services. ```json { "podUsers": [ { "code": "zh-CN-XiaoxiaoNeural", "role": "ไธปๆŒไบบ" }, { "code": "zh-CN-YunxiNeural", "role": "ๆŠ€ๆœฏไธ“ๅฎถ" } ], "voices": [ { "name": "XiaoMin", "alias": "ๆ™“ๆ•", "code": "yue-CN-XiaoMinNeural", "locale": "yue-CN", "gender": "Female", "usedname": "ๆ™“ๆ•" }, { "name": "YunSong", "alias": "ไบ‘ๆพ", "code": "yue-CN-YunSongNeural", "locale": "yue-CN", "gender": "Male", "usedname": "ไบ‘ๆพ" } ], "apiUrl": "http://localhost:5000/api/tts?text={{text}}&voiceCode={{voiceCode}}", "turnPattern": "random" } ``` * `podUsers`: Defines the **roles** in the podcast. The `code` for each role must correspond to a valid voice in the `voices` list. * `voices`: Defines all available TTS **voices**. * `apiUrl`: Your TTS service API endpoint. `{{text}}` will be replaced with the dialogue text, and `{{voiceCode}}` will be replaced with the character's voice code. * `turnPattern`: Defines the **turn-taking pattern** for character dialogue, such as `random` or `sequential`. --- ## ๐Ÿ”Œ TTS (Text-to-Speech) Service Integration This project is designed to be highly flexible, supporting various TTS services. Whether locally deployed or cloud-based web services, they can be integrated into this project through simple configuration. ### ๐Ÿ’ป Local TTS Interface Support You can deploy the following open-source projects as local TTS services and integrate them into this project via `apiUrl` configuration: * **index-tts**: [https://github.com/index-tts/index-tts](https://github.com/index-tts/index-tts) * **Usage with**: Requires running with `ext/index-tts-api.py`, which provides a simple API interface to encapsulate `index-tts` as a service callable by this project. * **edge-tts**: [https://github.com/zuoban/tts](https://github.com/zuoban/tts) * This is a general TTS library that you can integrate by customizing an adapter. ### ๐ŸŒ Web TTS Interface Support (Pending) This project can also be easily configured to integrate various web TTS services. Just ensure your `apiUrl` configuration meets the service provider's requirements. Commonly supported services include: * **OpenAI TTS** * **Azure TTS** * **Google Cloud Text-to-Speech (Vertex AI)** * **Minimax TTS** * **Gemini TTS** (may require integration via custom API adapter) * **Fish Audio TTS** --- ## ๐ŸŽ‰ Output Results All successfully generated podcast audio files will be automatically saved in the `output/` directory. The filename format is `podcast_` followed by a timestamp, e.g., `podcast_1678886400.wav`. ## ๐ŸŽง Sample Audio You can find sample podcast audio generated using different TTS services in the `example/` folder: * **Edge TTS Sample**: * **Index TTS Sample**: These audio files demonstrate the actual effect of this tool in practical applications. --- ## ๐Ÿ“‚ File Structure ``` . โ”œโ”€โ”€ config/ # โš™๏ธ Configuration Files Directory โ”‚ โ”œโ”€โ”€ edge-tts.json โ”‚ โ””โ”€โ”€ index-tts.json โ”œโ”€โ”€ prompt/ # ๐Ÿง  AI Prompt Files Directory โ”‚ โ”œโ”€โ”€ prompt-overview.txt โ”‚ โ””โ”€โ”€ prompt-podscript.txt โ”œโ”€โ”€ output/ # ๐ŸŽ‰ Output Audio Directory โ”œโ”€โ”€ input.txt # ๐ŸŽ™๏ธ Podcast Topic Input File โ”œโ”€โ”€ openai_cli.py # OpenAI Command Line Tool โ”œโ”€โ”€ podcast_generator.py # ๐Ÿš€ Main Running Script โ””โ”€โ”€ README.md # ๐Ÿ“„ Project Documentation