🎙️

Speech & Audio

Explore 2 APIs and 10 AI models.

2 APIs 10 AI Models 12 Total

Speech & Audiopublic

Async.ai API

1 endpoints

75 popularity

The Async.ai API offers developers advanced tools for voice cloning and tex...

Authentication

Required

Base URL

https://api.async.ai/v1

real-time-ttsaudio-synthesis

Speech & AudioPublic

Google Cloud Speech-to-Text API

1 endpoints

85 popularity

The Google Cloud Speech-to-Text API allows developers to convert spoken aud...

Authentication

Required

Base URL

https://speech.googleapis.com/v1

speech-recognitionaudio-transcriptionvoice-commands

Speech & Audio

myshell.ai

OpenVoice

Open SourcePyTorch

OpenVoice V2 is a cutting-edge open-source voice cloning and speech synthesis model focused on delivering high-fidelity voice outputs with emotional and stylistic flexibility.

Views

590

Favorites

Released

2023

Official URL

https://research.myshell.ai/open-voice

voice cloning

Speech & Audio

Meta AI

MusicGen

Open SourcePyTorch

MusicGen is a cutting-edge, single-stage autoregressive transformer AI from Meta AI via the AudioCraft library, designed for high-quality music generation.

Views

1.0K

Favorites

Released

2023

Official URL

https://github.com/facebookresearch/audiocraft

audiotext-to-music

Speech & Audio

Hugging Face

Distil-Whisper

Open SourcePyTorch

Distil‑Whisper is a distilled version of OpenAI’s Whisper model created by Hugging Face. It achieves up to six times faster inference while using under half the parameters and maintaining a low word error rate, making it ideal for real-time transcription.

Views

152

Favorites

Released

2023

Official URL

https://huggingface.co/distil-whisper

asrspeech-recognition

Speech & Audio

Stability AI

Stable Audio 2.0

Open SourcePyTorch

Stable Audio 2.0 is an advanced open-source AI model developed by Stability AI for generating music and audio from textual descriptions.

Views

223

Favorites

Released

2024

Official URL

https://stability.ai/stable-audio

audiomusic

Speech & Audio

NVIDIA

VITS

Open SourcePyTorch

VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an advanced speech synthesis model developed by NVIDIA. It combines variational autoencoders and GANs to generate high-quality, natural-sounding speech directly from text.

Views

222

Favorites

Released

2021

Official URL

https://arxiv.org/abs/2106.06103

audiotext-to-speech

Speech & Audio

Microsoft

FastSpeech 2

Open SourcePyTorch

FastSpeech 2 is an improved neural text-to-speech model from Microsoft that generates natural-sounding speech quickly and efficiently.

Views

238

Favorites

Released

2020

Official URL

https://arxiv.org/abs/2006.04558

audiotext-to-speech

Speech & Audio

Microsoft

SpeechT5

Open SourcePyTorch

SpeechT5 is a versatile speech processing model developed by Microsoft, designed to handle speech recognition, speech synthesis, and speech translation tasks within a unified framework.

Views

169

Favorites

Released

2022

Official URL

https://github.com/microsoft/SpeechT5

asrspeech-recognition

Speech & Audio

Meta AI

wav2vec 2.0

Open SourcePyTorch

wav2vec 2.0 is a self-supervised speech representation learning model developed by Meta AI, revolutionizing automatic speech recognition (ASR) by significantly decreasing the need for labeled data.

Views

197

Favorites

Released

2020

Official URL

https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec

speech-recognition

Speech & Audio

Mozilla

DeepSpeech

Open SourceTensorFlow

DeepSpeech is an open-source automatic speech recognition (ASR) model developed by Mozilla.

Views

157

Favorites

Released

2017

Official URL

https://github.com/mozilla/DeepSpeech

speech-recognitionvoice

Speech & Audio

Meta AI

SeamlessM4T v2

Open SourcePyTorch

SeamlessM4T v2 is Meta AI’s advanced multilingual speech and text translation model, designed for real-time translation across over 100 languages.

Views

412

Favorites

Released

2025

Official URL

https://ai.meta.com/research/seamless-communication/

translationspeechai-models

Speech & Audio

Other Categories

Async.ai API

Google Cloud Speech-to-Text API

OpenVoice

MusicGen

Distil-Whisper

Stable Audio 2.0

VITS

FastSpeech 2

SpeechT5

wav2vec 2.0

DeepSpeech

SeamlessM4T v2

Speech & Audio

Other Categories

Async.ai API

Google Cloud Speech-to-Text API

OpenVoice

MusicGen

Distil-Whisper

Stable Audio 2.0

VITS

FastSpeech 2

SpeechT5

wav2vec 2.0

DeepSpeech

SeamlessM4T v2