FreeAPIHub
HomeAPIsAI ModelsAI ToolsComing SoonBlog
Favorites
FreeAPIHub

The central hub for discovering, testing, and integrating the world's best AI models and APIs.

Platform

  • Categories
  • AI Models
  • APIs

Company

  • About Us
  • Contact
  • FAQ

Help

  • Terms of Service
  • Privacy Policy
  • Cookies

ยฉ 2026 FreeAPIHub. All rights reserved.

GitHubTwitterLinkedIn
  1. Home
  2. Categories
  3. Speech & Audio
๐ŸŽ™๏ธ

Speech & Audio

Explore 2 APIs and 10 AI models.

2 APIs 10 AI Models 12 Total

Other Categories

Agent FrameworksAI Art GenerationAnalyticsAnimeArtificial IntelligenceAuthenticationAutomationBioinformaticsBlockchainBooksCalendarCode GenerationCollaborationCommunicationComputer VisionDataDatabaseDevelopment
Speech & Audiopublic

Async.ai API

1 endpoints
75 popularity

The Async.ai API offers developers advanced tools for voice cloning and tex...

Authentication

Required

Base URL
https://api.async.ai/v1
real-time-ttsaudio-synthesis
Speech & AudioPublic

Google Cloud Speech-to-Text API

1 endpoints
85 popularity

The Google Cloud Speech-to-Text API allows developers to convert spoken aud...

Authentication

Required

Base URL
https://speech.googleapis.com/v1
speech-recognitionaudio-transcriptionvoice-commands
Speech & Audio
myshell.ai

OpenVoice

Open SourcePyTorch

OpenVoice V2 is a cutting-edge open-source voice cloning and speech synthesis model focused on delivering high-fidelity voice outputs with emotional and stylistic flexibility.

Views
590
Favorites
0
Released
2023
Official URL
https://research.myshell.ai/open-voice
voice cloning
Speech & Audio
Meta AI

MusicGen

Open SourcePyTorch

MusicGen is a cutting-edge, single-stage autoregressive transformer AI from Meta AI via the AudioCraft library, designed for high-quality music generation.

Views
1.0K
Favorites
0
Released
2023
Official URL
https://github.com/facebookresearch/audiocraft
audiotext-to-music
Speech & Audio
Hugging Face

Distil-Whisper

Open SourcePyTorch

Distilโ€‘Whisper is a distilled version of OpenAIโ€™s Whisper model created by Hugging Face. It achieves up to six times faster inference while using under half the parameters and maintaining a low word error rate, making it ideal for real-time transcription.

Views
152
Favorites
0
Released
2023
Official URL
https://huggingface.co/distil-whisper
asrspeech-recognition
Speech & Audio
Stability AI

Stable Audio 2.0

Open SourcePyTorch

Stable Audio 2.0 is an advanced open-source AI model developed by Stability AI for generating music and audio from textual descriptions.

Views
223
Favorites
0
Released
2024
Official URL
https://stability.ai/stable-audio
audiomusic
Speech & Audio
NVIDIA

VITS

Open SourcePyTorch

VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an advanced speech synthesis model developed by NVIDIA. It combines variational autoencoders and GANs to generate high-quality, natural-sounding speech directly from text.

Views
222
Favorites
0
Released
2021
Official URL
https://arxiv.org/abs/2106.06103
audiotext-to-speech
Speech & Audio
Microsoft

FastSpeech 2

Open SourcePyTorch

FastSpeech 2 is an improved neural text-to-speech model from Microsoft that generates natural-sounding speech quickly and efficiently.

Views
238
Favorites
0
Released
2020
Official URL
https://arxiv.org/abs/2006.04558
audiotext-to-speech
Speech & Audio
Microsoft

SpeechT5

Open SourcePyTorch

SpeechT5 is a versatile speech processing model developed by Microsoft, designed to handle speech recognition, speech synthesis, and speech translation tasks within a unified framework.

Views
169
Favorites
0
Released
2022
Official URL
https://github.com/microsoft/SpeechT5
asrspeech-recognition
Speech & Audio
Meta AI

wav2vec 2.0

Open SourcePyTorch

wav2vec 2.0 is a self-supervised speech representation learning model developed by Meta AI, revolutionizing automatic speech recognition (ASR) by significantly decreasing the need for labeled data.

Views
197
Favorites
0
Released
2020
Official URL
https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec
speech-recognition
Speech & Audio
Mozilla

DeepSpeech

Open SourceTensorFlow

DeepSpeech is an open-source automatic speech recognition (ASR) model developed by Mozilla.

Views
157
Favorites
0
Released
2017
Official URL
https://github.com/mozilla/DeepSpeech
speech-recognitionvoice
Speech & Audio
Meta AI

SeamlessM4T v2

Open SourcePyTorch

SeamlessM4T v2 is Meta AIโ€™s advanced multilingual speech and text translation model, designed for real-time translation across over 100 languages.

Views
412
Favorites
0
Released
2025
Official URL
https://ai.meta.com/research/seamless-communication/
translationspeechai-models