OmniVoice

Developed by k2-fsa

Open Source Python Global free #text-to-speech #voice-cloning #multilingual-tts #diffusion-model

ABOUT

OmniVoice by k2-fsa is a state-of-the-art massively multilingual zero-shot Text-to-Speech (TTS) model, uniquely supporting over 600 languages. Leveraging an innovative diffusion language model-style architecture, it delivers high-quality speech generation with exceptional inference speed. Its core capabilities include industry-leading voice cloning, sophisticated voice design via attributes like gender, age, pitch, and accent, as well as precise control over non-verbal symbols and pronunciation correction. OmniVoice stands out for its extensive language coverage and rapid performance, making it an ideal choice for diverse applications in multilingual content creation, personalized voice synthesis, and real-time interactive systems.

CAPABILITIES

Massively multilingual zero-shot TTS, supporting over 600 languages for unprecedented global reach.
State-of-the-art voice cloning and sophisticated voice design with granular control over attributes like gender, age, and accent.
Fine-grained pronunciation control via non-verbal symbols and pinyin/phonemes, coupled with exceptional inference speed (RTF as low as 0.025).

SUPPORTED PLATFORMS

linuxmacos

EXTERNAL RESOURCES

GitHub Repository ↗