Explore 2 APIs and 10 AI models.
OpenVoice V2 is a cutting-edge open-source voice cloning and speech synthesis model focused on delivering high-fidelity voice outputs with emotional and stylistic flexibility.
https://research.myshell.ai/open-voiceMusicGen is a cutting-edge, single-stage autoregressive transformer AI from Meta AI via the AudioCraft library, designed for high-quality music generation.
https://github.com/facebookresearch/audiocraftDistilโWhisper is a distilled version of OpenAIโs Whisper model created by Hugging Face. It achieves up to six times faster inference while using under half the parameters and maintaining a low word error rate, making it ideal for real-time transcription.
https://huggingface.co/distil-whisperVITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an advanced speech synthesis model developed by NVIDIA. It combines variational autoencoders and GANs to generate high-quality, natural-sounding speech directly from text.
https://arxiv.org/abs/2106.06103SpeechT5 is a versatile speech processing model developed by Microsoft, designed to handle speech recognition, speech synthesis, and speech translation tasks within a unified framework.
https://github.com/microsoft/SpeechT5wav2vec 2.0 is a self-supervised speech representation learning model developed by Meta AI, revolutionizing automatic speech recognition (ASR) by significantly decreasing the need for labeled data.
https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vecSeamlessM4T v2 is Meta AIโs advanced multilingual speech and text translation model, designed for real-time translation across over 100 languages.
https://ai.meta.com/research/seamless-communication/