hex2077 7b641fdeff feat: 添加确认模态框并优化音频生成流程
- 新增确认生成模态框组件,支持多语言显示
- 调整音频时长选项为"5分钟左右"和"8-15分钟"
- 优化Docker配置,添加.env和config目录挂载
- 改进音频生成流程,增加静音修剪功能
- 更新多语言翻译文件,添加确认相关文本
- 修复播客内容组件中overview_content处理逻辑
- 优化中间件配置,排除robots.txt和sitemap.xml
- 完善Docker使用文档,补充挂载点说明
- 改进播客脚本提示词,增强对话深度要求
2025-08-26 21:38:00 +08:00
2025-08-10 22:32:48 +08:00

🎙️ Simple Podcast Generator

Easily transform your ideas into lively and engaging multi-person dialogue podcasts with one click! 中文版本

This is a powerful script tool that leverages the wisdom of OpenAI API to generate insightful podcast scripts, and through TTS (Text-to-Speech) API services, transforms cold text into warm audio. You just need to provide a topic, and leave the rest to it!

The podcast script generation logic of this project is deeply inspired by the SurfSense project. We would like to express our heartfelt thanks to their open-source contributions!


Core Highlights

  • 🤖 AI-Driven Scripts: Automatically create high-quality, in-depth podcast dialogue scripts with the power of OpenAI models.
  • 👥 Multi-Character Support: Freely define multiple podcast characters (such as host, guest), and assign unique TTS voices to each character.
  • 🔌 Flexible TTS Integration: Seamlessly connect to your self-hosted or third-party TTS services through simple API URL configuration.
  • 🔊 Intelligent Audio Synthesis: Automatically splice voice segments of each character accurately, and support volume and speed adjustment to synthesize a complete, smooth podcast audio file (.wav format).
  • ⌨️ Convenient Command-Line Interface: Provides clear command-line parameters, giving you full control over every aspect of the podcast generation process.

🛠️ Installation Guide

📝 Prerequisites

  1. Python 3.x

    • Please ensure Python 3 is installed on your system.
  2. FFmpeg

    • This project depends on FFmpeg for audio merging. Please visit the FFmpeg official website to download and install.
    • Important Note: After installation, please ensure the ffmpeg command has been added to your system's environment variables (PATH) so that the script can call it normally.

🐍 Python Dependencies

Open your terminal or command prompt and install the required Python libraries using pip:

pip install requests openai pydub msgpack

Dependency Explanation:

  • requests: Used to send HTTP requests to TTS service APIs
  • openai: Used to interact with OpenAI API to generate podcast scripts
  • pydub: Used for audio processing, such as adjusting volume and speed
  • msgpack: Used for efficient data serialization with certain TTS services (such as Fish Audio)

🚀 Quick Start

1. Prepare Input Files

Before running, please ensure the following files are ready:

  • input.txt: Enter your podcast topic or core idea in this file.
  • prompt/prompt-overview.txt: System prompt used to guide AI in generating the podcast overall outline.
  • prompt/prompt-podscript.txt: System prompt used to guide AI in generating detailed dialogue scripts. It contains dynamic placeholders (such as {{numSpeakers}}, {{turnPattern}}), which the script will automatically replace.

2. Configure TTS Services and Characters

  • TTS configuration files (such as edge-tts.json) are stored in the config/ directory. This file defines the TTS service API interface, podcast characters (podUsers) and their corresponding voices (voices).

3. Run the Script

Execute the following command in the project root directory:

python podcast_generator.py [optional parameters]

Optional Parameters

  • --api-key <YOUR_OPENAI_API_KEY>: Your OpenAI API key. If not provided, it will be read from the configuration file or OPENAI_API_KEY environment variable.
  • --base-url <YOUR_OPENAI_BASE_URL>: Proxy address of the OpenAI API. If not provided, it will be read from the configuration file or OPENAI_BASE_URL environment variable.
  • --model <OPENAI_MODEL_NAME>: Specify the OpenAI model to use (such as gpt-4o, gpt-4-turbo). Default value is gpt-3.5-turbo.
  • --threads <NUMBER_OF_THREADS>: Specify the number of parallel threads for audio generation (default is 1) to improve processing speed.
  • --output-language <LANGUAGE_CODE>: Specify the output language of the podcast script (default is Chinese).
  • --usetime <TIME_DURATION>: Specify the time length of the podcast script (default is 10 minutes).

Running Example

# Use gpt-4o model, edge-tts service and 4 threads to generate podcast
python podcast_generator.py --api-key sk-xxxxxx --model gpt-4o --tts-provider edge --threads 4

5. Using Web API (main.py)

This project also provides a FastAPI-based web service that allows you to generate podcasts through HTTP requests.

Start Web Service

python main.py

By default, the service will run on http://localhost:8000.

API Endpoints

  1. Generate Podcast - POST /generate-podcast

    • Parameters:
      • api_key: OpenAI API key
      • base_url: OpenAI API base URL (optional)
      • model: OpenAI model name (optional)
      • input_txt_content: Input text content
      • tts_providers_config_content: TTS provider configuration content
      • podUsers_json_content: Podcast user JSON configuration
      • threads: Number of threads (optional, default is 1)
      • tts_provider: TTS provider name (optional, default is "index-tts")
  2. Get Podcast Generation Status - GET /podcast-status

    • Requires X-Auth-Id header
  3. Download Podcast - GET /download-podcast/

    • Parameters:
      • file_name: Name of the file to download
  4. Get Voice List - GET /get-voices

    • Parameters:
      • tts_provider: TTS provider name (optional, default is "tts")

API Usage Example

# After starting the service, use curl to send a request to generate podcast
curl -X POST "http://localhost:8000/generate-podcast" \
  -H "X-Auth-Id: your-auth-id" \
  -F "api_key=sk-xxxxxx" \
  -F "model=gpt-4o" \
  -F "input_txt_content=The future development of artificial intelligence" \
  -F "tts_providers_config_content={\"index\": {\"api_key\": \"your-api-key\"}}" \
  -F "podUsers_json_content=[{\"code\":\"zh-CN-XiaoxiaoNeural\",\"role\":\"Host\"}],\"voices\":[{\"name\":\"Xiaoxiao\",\"code\":\"zh-CN-XiaoxiaoNeural\"}]" \
  -F "threads=4" \
  -F "tts_provider=index-tts"

4. Customizing AI Prompts (custom code block)

To provide more detailed AI instructions or add specific context, you can embed custom code blocks in the input.txt file. The content in this code block will be used as additional instructions, built into the core prompt for podcast script generation (prompt-podscript.txt), thereby influencing the AI's generation behavior.

Usage: In the input.txt file, define your custom content anywhere using the following format:

```custom-begin
Additional instructions or context you want to provide to the AI, for example:
- "Please ensure the discussion includes an in-depth analysis of [specific concept]."
- "Please add some humor to the conversation, especially jokes about [a certain topic]."
- "All characters' speeches must be brief, with each sentence not exceeding two lines."
```custom-end

🌐 Web Application (Next.js)

In addition to the command-line script and FastAPI service, this project also provides a fully functional web user interface. This interface aims to provide a more intuitive and convenient podcast generation and management experience, exposing complex backend functions through friendly frontend operations to users.

Core Features

  • Web Operation Interface: Intuitive and friendly web interface that makes the podcast generation process clear at a glance.
  • Micro User System Integration: Supports user login, registration, points and billing functions, building a complete user ecosystem.
  • Podcast Creation and Configuration: Allows users to enter topics through forms and configure TTS characters, volume and speed parameters.
  • Real-time Progress Tracking: Displays the status and progress of podcast generation.
  • Podcast Playback and Management: Integrates an audio player for users to listen to generated podcasts and may provide functions for managing historical podcasts.
  • API Interaction: Seamless communication with the backend Python service through APIs, including podcast generation, status queries, and audio streaming.

🚀 Quick Start (Web)

  1. Install Node.js: Please ensure Node.js is installed on your system (LTS version recommended).
  2. Install Dependencies: Enter the web/ directory and install all frontend dependencies.
    cd web/
    npm install
    # or yarn install
    
  3. Start Development Server:
    npm run dev
    # or yarn dev
    
    The web application will start at http://localhost:3000 (default).
  4. Build Production Environment:
    npm run build
    # or yarn build
    npm run start
    # or yarn start
    

🐳 Docker Deployment

This project supports deployment via Docker. For detailed information, please refer to Docker Usage Guide.


🌍 Internationalization (i18n) Support

This project supports multilingual interfaces, currently supporting English (en), Chinese (zh-CN), and Japanese (ja).

📁 Language File Structure

Language files are located in the web/public/locales/ directory, grouped by language code:

  • web/public/locales/en/common.json - English translation
  • web/public/locales/zh-CN/common.json - Chinese translation
  • web/public/locales/ja/common.json - Japanese translation

🛠️ Adding New Languages

  1. Create a new language folder in the web/public/locales/ directory, for example fr/
  2. Copy the common.json file to the new folder
  3. Translate all key-value pairs in the file
  4. Update the languages variable in the web/src/i18n/settings.ts file

🌐 Language Switching

Users can automatically switch languages through the URL path or browser language settings:

  • http://localhost:3000/en/ - English interface
  • http://localhost:3000/zh-CN/ - Chinese interface
  • http://localhost:3000/ja/ - Japanese interface

⚙️ Configuration File Details

config/[tts-provider].json (TTS Character and Voice Configuration)

This is your core TTS configuration file, with the filename corresponding to the provider specified by the --tts-provider parameter. It tells the script how to work with the TTS service.

{
  "podUsers": [
    {
      "code": "zh-CN-XiaoxiaoNeural",
      "role": "Host"
    },
    {
      "code": "zh-CN-YunxiNeural",
      "role": "Tech Expert"
    }
  ],
  "voices": [
    {
      "name": "XiaoMin",
      "code": "yue-CN-XiaoMinNeural",
      "volume_adjustment": 1.0, 
      "speed_adjustment": 5.0
    }
  ],
  "apiUrl": "http://localhost:5000/api/tts?text={{text}}&voiceCode={{voiceCode}}",
  "turnPattern": "random",
  "tts_max_retries": 3
}
  • podUsers: Defines the characters in the podcast. Each character's code must correspond to a valid voice in the voices list.
  • voices: Defines all available TTS voices.
    • volume_adjustment (optional): Volume adjustment (dB). For example, 6.0 increases volume by 6dB.
    • speed_adjustment (optional): Speed adjustment (%). For example, 10.0 increases speed by 10%.
  • apiUrl: Your TTS service API endpoint. {{text}} and {{voiceCode}} are placeholders.
  • turnPattern: Defines the turn-taking mode for character dialogue, such as random (random) or sequential (sequential).
  • tts_max_retries (optional): Maximum number of retries when TTS API calls fail (default is 3).

config/tts_providers.json (TTS Provider Authentication)

This file is used to centrally manage authentication information (such as API keys) for all TTS service providers.

{
  "index": { "api_key": null },
  "edge": { "api_key": null },
  "doubao": { "X-Api-App-Id": "null", "X-Api-Access-Key": "null" },
  "fish": { "api_key": "null" },
  "minimax": { "group_id": "null", "api_key": "null" },
  "gemini": { "api_key": "null" }
}

Note: In actual use, please replace "null" with valid authentication information. You can create a tts_providers-local.json to store real keys, which has been ignored by .gitignore.


🔌 Supported TTS Services

This project is designed to be highly flexible and supports multiple TTS services.

Provider Type Support Status
Index-TTS Local Supported
Edge-TTS Local Supported
Doubao Network Supported
Minimax Network Supported
Fish Audio Network Supported
Gemini Network Supported
OpenAI TTS Network Planned
Azure TTS Network Planned

🎉 Output Results

All successfully generated podcast audio files will be automatically saved in the output/ directory. The filename format is podcast_ plus the timestamp when it was generated, for example podcast_1678886400.wav.


🎧 Sample Audio

You can find sample podcast audio generated using different TTS services in the example/ folder.

TTS Service Listen Link
Edge TTS ▶️ edgeTTS.wav
Index TTS ▶️ indexTTS.wav
Doubao TTS ▶️ doubaoTTS.wav
Minimax ▶️ minimax.wav
Fish Audio ▶️ fish.wav
Gemini TTS ▶️ geminiTTS.wav

📂 File Structure

.
├── config/                  # ⚙️ Configuration directory
│   ├── doubao-tts.json      # ... (configuration for each TTS provider)
│   └── tts_providers.json   # Unified TTS authentication file
├── server/                  # 🐍 Backend service directory
│   ├── main.py              # FastAPI Web API entry: Provides RESTful APIs for podcast generation, status query, audio download, manages task lifecycle, and performs data cleanup.
│   ├── podcast_generator.py # Core podcast generation logic: Responsible for interacting with OpenAI API to generate podcast scripts, calling TTS adapters to convert text to speech, and using FFmpeg to merge audio files.
│   ├── tts_adapters.py      # TTS adapter: Encapsulates interaction logic with different TTS services (such as Index-TTS, Edge-TTS, Doubao, Minimax, Fish Audio, Gemini).
│   ├── openai_cli.py        # OpenAI command-line tool
│   └── ...                  # Other backend files
├── web/                     # 🌐 Frontend Web Application Directory (Next.js)
│   ├── public/              # Static resources
│   ├── src/                 # Source code
│   │   ├── app/             # Next.js route pages
│   │   ├── components/      # React components
│   │   ├── hooks/           # React Hooks
│   │   ├── lib/             # Library files (authentication, database, API, etc.)
│   │   └── types/           # TypeScript type definitions
│   ├── package.json         # Frontend dependencies
│   ├── next.config.js       # Next.js configuration
│   └── ...                  # Other frontend files
├── prompt/                  # 🧠 AI prompt directory
│   ├── prompt-overview.txt
│   └── prompt-podscript.txt
├── example/                 # 🎧 Sample audio directory
├── output/                  # 🎉 Output audio directory
├── input.txt                # 🎙️ Podcast topic input file
├── README.md                # 📄 Project documentation (Chinese)
└── README_EN.md             # 📄 Project documentation (English)

📝 Disclaimer

  • License: This project is licensed under GPL-3.0.
  • No Warranty: This software is provided "as is" without any express or implied warranties.
  • Liability Limitation: Under no circumstances shall the authors or copyright holders be liable for any damages arising from the use of this software.
  • Third-Party Services: Users bear the risks and responsibilities of using third-party services (such as OpenAI API, TTS services) on their own.
  • Usage Purpose: This project is for learning and research purposes only. Please comply with all applicable laws and regulations.
  • Final Interpretation Rights: We reserve the right to modify this disclaimer at any time.
Description
播客音频生成cli ,python服务端 和 web前端,支持单人和多人对话。支持原文智能配音。
Readme GPL-3.0 31 MiB
Languages
TypeScript 65.8%
Python 31.4%
JavaScript 2%
CSS 0.7%
Batchfile 0.1%