INDEX // #TEXT-TO-SPEECH

SYSTEM // ACTIVE // AGGREGATED TELEMETRY FOR ECOSYSTEM NODE

PRODUCTS // Ecosystem Node TOTAL: 05
O
OmniVoice
OPEN SOURCE

OmniVoice by k2-fsa is a state-of-the-art massively multilingual zero-shot Text-to-Speech (TTS) model, uniquely supporting over 600 languages. Leveraging an innovative diffusion language model-style architecture, it delivers high-quality speech generation with exceptional inference speed. Its core capabilities include industry-leading voice cloning, sophisticated voice design via attributes like gender, age, pitch, and accent, as well as precise control over non-verbal symbols and pronunciation correction. OmniVoice stands out for its extensive language coverage and rapid performance, making it an ideal choice for diverse applications in multilingual content creation, personalized voice synthesis, and real-time interactive systems.

#TEXT-TO-SPEECH#VOICE-CLONING#MULTILINGUAL-TTS
v
voice-pro
OPEN SOURCE

Voice-Pro by ABUS-AIKOREA is a powerful AI-driven desktop web application designed for multimedia content creation and processing. It integrates YouTube video downloading, voice separation, advanced speech recognition, multilingual translation, and text-to-speech capabilities. The tool supports zero-shot voice cloning and multilingual TTS, offering a comprehensive solution for content creators, researchers, and multilingual professionals. Utilizing core technologies like Whisper series, F5-TTS, E2-TTS, and CosyVoice, it provides high-quality speech recognition, cloning, and translation services.

#AUDIOBOOK#FASTER-WHISPER#GRADIO
V
VoiceFlow-TTS
OPEN SOURCE

VoiceFlow is an efficient Text-to-Speech (TTS) system based on Rectified Flow Matching, addressing the efficiency limitations of traditional diffusion models in speech synthesis. As the official implementation of its ICASSP 2024 paper, it generates high-quality mel-spectrograms by learning a continuous flow between noise and data. Through a flow rectification process, it further optimizes the sampling trajectory, achieving superior synthesis quality and efficiency with a limited number of sampling steps. It features Kaldi-style data organization, flexible training configurations, supervised duration modeling, and experimental voice conversion capabilities.

#CONDITIONAL-FLOW-MATCHING#GENERATIVE-MODELS#PROBABILISTIC-MODELS
V
VoxCPM
OPEN SOURCE

VoxCPM is a tokenizer-free Text-to-Speech system that directly generates continuous speech representations via an end-to-end diffusion autoregressive architecture, achieving highly natural and expressive synthesis. VoxCPM2, the latest 2B parameter model, is trained on over 2 million hours of multilingual speech data, supporting 30 languages, Voice Design, Controllable Voice Cloning, and 48kHz studio-quality audio output with built-in super-resolution.

#AUDIO#DEEPLEARNING#MINICPM
p
pyvideotrans
OPEN SOURCE

pyVideoTrans is an open-source tool dedicated to video translation, audio transcription, AI dubbing, and subtitle generation. It seamlessly converts videos into another language using an automated pipeline: Speech Recognition (ASR), Subtitle Translation, Speech Synthesis (TTS), and video-audio synchronization. Key features include speaker diarization and zero-shot voice cloning. It offers extensive compatibility with both local offline models and mainstream cloud APIs. Featuring an interactive GUI for manual proofreading and a headless CLI for batch deployment, it provides a highly flexible solution for multimedia localization.

#SPEECH-TO-TEXT#TEXT-TO-SPEECH#VIDEO-TRANSITION