open source

OpenVoice

Provided by: Framework: PyTorch

OpenVoice V2 is an open-source voice cloning and speech synthesis model developed by MyShell AI with contributions from MIT and Tsinghua University. Released under the MIT license in April 2024, it enables accurate tone color cloning, flexible style control (emotion, accent, rhythm), and zero-shot cross-lingual voice cloning across English, Japanese, Chinese, Spanish, French, and Korean using only a short reference audio

Model Performance Statistics

13

Views

November 30, 2023

Released

Jul 20, 2025

Last Checked

v1.1

Version

Capabilities
  • Voice Cloning
  • TTS
Performance Benchmarks
RTF0.08
Similarity85%
Technical Specifications
Parameter Count
N/A
Training & Dataset

Dataset Used

VCTK, LibriTTS

Related AI Models

Discover similar AI models that might interest you

Modelopen source

wav2vec 2.0

wav2vec 2.0

wav2vec 2.0

Meta AI

wav2vec 2.0 is a self-supervised speech representation learning model developed by Meta AI, offering state-of-the-art performance in automatic speech recognition (ASR). Built on PyTorch and licensed under MIT, it drastically reduces the need for labeled data, making it ideal for multilingual transcription and voice applications. The model is widely used and integrated into the Hugging Face ecosystem.

Speech & Audiospeech-recognition
15
Modelopen source

SpeechT5

SpeechT5

SpeechT5

Microsoft

SpeechT5 is a versatile speech processing model developed by Microsoft, designed to handle speech recognition, speech synthesis, and speech translation tasks within a unified framework. Built using PyTorch and released under the MIT license, it leverages transformer architectures for improved accuracy and flexibility in various speech applications, including voice assistants and translation systems.

Speech & Audioasrspeech-recognition
14
Modelopen source

FastSpeech 2

FastSpeech 2

FastSpeech 2

Microsoft Research Asia

FastSpeech 2 is an improved neural text-to-speech model from Microsoft that generates natural-sounding speech quickly and efficiently. Built with PyTorch and licensed under MIT, it enhances prosody modeling and robustness, making it suitable for real-time voice assistants, audiobooks, and accessibility tools. The open-source code allows developers to customize and deploy the model easily.

Speech & Audioaudiotext-to-speech
14