🎙️

SoulX-Podcast

Multi-Speaker Speech Generation System for Realistic Podcasts

SoulX-Podcast is an innovative speech synthesis model launched by Soul AI Lab, specifically designed for podcast scenarios. It can generate long-form, natural, emotionally rich multi-speaker dialogue speech, authentically reproducing the intonation, pauses, and dialectal features of human conversation.

Quick Start View Paper →

🌟 Key Features

🎧

Long-form Multi-turn Dialogue Generation

SoulX-Podcast can generate over 90 minutes of high-quality speech content, perfectly supporting multi-speaker podcasts, virtual interviews, dialogue novels, and other scenarios.

🌍

Cross-Dialect Voice Cloning

Supports Mandarin, English, and multiple Chinese dialects (Sichuan, Henan, Cantonese, etc.). No training required - achieve Zero-Shot Voice Cloning, giving each character a unique voice and accent.

🎭

Paralinguistic Control

Built-in rich Paralinguistic Controls — naturally add laughter, sighs, tone changes, and other details to speech, making synthesized speech more infectious and humanized.

🚀 Latest Updates

2025-10-29

Latest Model SoulX-Podcast-1.7B Released

The latest model is now available on Hugging Face with improved performance and capabilities.

2025-10-28

Research Paper Published

Project paper officially published on arXiv. View Paper →

🧩 Quick Start

Clone Repository

git clone https://github.com/Soul-AILab/SoulX-Podcast.git

Download Model

huggingface-cli download Soul-AILab/SoulX-Podcast-1.7B

Run Inference

bash example/infer_dialogue.sh

💡 Application Scenarios

📻

Podcast & Interview Automation

Automated generation of podcasts and interviews

🤖

Virtual Characters & AI Hosts

Create virtual characters and AI broadcasters

🗣️

Multi-dialect Speech Research

Research and education in multi-dialect speech

📚

Audio Novel & Drama Creation

Create audio novels and radio dramas

⚖️ Open Source & Compliance

SoulX-Podcast adopts the Apache 2.0 open source license and can be freely used for research and educational projects.

Please follow ethical guidelines and avoid using it for any unauthorized voice cloning or fraudulent activities.