🎙️

Speech & Audio

Unlock the power of speech and audio with free APIs and AI models at Free API Hub. Build voice-controlled apps, transcribe audio, and analyze sound with cutting-edge technology designed for easy integration and reliable performance.

3 APIs 10 AI Models 13 Total

13 resources

Speech & AudioPublic

AssemblyAI

1 endpoints

75 popularity

AssemblyAI offers developers a powerful API for transcribing audio and vide...

Authentication

Required

Base URL

https://api.assemblyai.com/v2

speaker-diarizationassemblyaireal-time+4

Speech & Audiopublic

Async.ai TTS API

1 endpoints

75 popularity

The Async.ai TTS API offers developers free access to a robust text-to-spee...

Authentication

Required

Base URL

https://api.async.ai/v1

async-aiTTSMultilingual+4

Speech & Audiopublic

Google Cloud Speech-to-Text API

1 endpoints

85 popularity

The Google Cloud Speech-to-Text API provides developers with free audio tra...

Authentication

Required

Base URL

https://speech.googleapis.com/v1

real-timeTranscriptionaudio+4

Speech & Audio

MyShell.ai

OpenVoice

Open SourcePyTorch

OpenVoice by MyShell.ai is a free open-source voice-cloning AI that clones any voice from a short audio sample. Multilingual, controllable emotion/accent, MIT license. Best free ElevenLabs alternative for self-hosting.

Views

187

Favorites

Released

2023

Official URL

https://research.myshell.ai/open-voice

Speech SynthesisOpenvoiceMyshell+3

Speech & Audio

Meta AI

MusicGen

Open SourcePyTorch

MusicGen by Meta AI is a free open-source AI music generator that creates original songs from text or melody prompts. Generate royalty-free background music, soundtracks, and beats — no signup, runs locally, MIT license.

Views

Favorites

Released

2023

Official URL

https://github.com/facebookresearch/audiocraft

MusicgenMusic AIAudio Generation+3

Speech & Audio

Hugging Face

Distil-Whisper

Open SourcePyTorch

Distil-Whisper is a free open-source speech-to-text AI by Hugging Face — 6x faster and 49% smaller than Whisper, with 99% of the accuracy. MIT license, runs on CPU, perfect for transcription, podcasts, and subtitles.

Views

Favorites

Released

2023

Official URL

https://huggingface.co/distil-whisper

Audio AIWhisperHuggingface+3

Speech & Audio

Stability AI

Stable Audio 2.0

FreemiumPyTorch

Stable Audio 2.0 by Stability AI is a free AI music and sound generator that creates full 3-minute tracks from text prompts. Audio-to-audio transformations, structured musical arrangements. Best free music AI for content creators.

Views

Favorites

Released

2024

Official URL

https://stability.ai/news/stable-audio-2-0

Sound EffectsStable AudioMusic AI+3

Speech & Audio

Kakao Enterprise

VITS

Open SourcePyTorch

VITS is a free open-source end-to-end text-to-speech AI that produces natural human-like voice from text in one step. MIT license, fast inference, supports multiple languages and voice cloning. Foundation of modern open TTS.

Views

Favorites

Released

2021

Official URL

https://github.com/jaywalnut310/vits

TTSVITSSpeech Synthesis+3

Speech & Audio

Microsoft Research

FastSpeech 2

Open SourcePyTorch

FastSpeech 2 by Microsoft is a free open-source non-autoregressive text-to-speech AI that's 3x faster than Tacotron 2. MIT license, supports pitch/duration/energy control. Perfect for real-time TTS in production apps.

Views

Favorites

Released

2020

Official URL

https://speechresearch.github.io/fastspeech2/

FastspeechTTSMicrosoft Research+3

Speech & Audio

Microsoft Research

SpeechT5

Open SourcePyTorch

SpeechT5 by Microsoft is a free open-source unified speech model that handles TTS, ASR, voice conversion, and speech-to-text translation in one architecture. MIT license, perfect for multi-task speech AI applications.

Views

Favorites

Released

2022

Official URL

https://www.microsoft.com/en-us/research/publication/speecht5-unified-modal-encoder-decoder-pre-training-for-spoken-language-processing/

Multi Task AISpeecht5Unified Model+3

Speech & Audio

Meta AI

wav2vec 2.0

Open SourcePyTorch

wav2vec 2.0 by Meta AI is the foundational self-supervised speech recognition model. Free, open-source, MIT license. Powers free transcription, voice command systems, and supports 100+ languages with minimal training data.

Views

Favorites

Released

2020

Official URL

https://ai.meta.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/

Self SupervisedWav2vecMeta AI+3

Speech & Audio

Mozilla

DeepSpeech

Open SourceTensorFlow

DeepSpeech is Mozilla's free open-source speech-to-text engine based on Baidu's research. Runs offline on Raspberry Pi and mobile devices. Apache 2.0, perfect for privacy-first voice apps and embedded systems.

Views

Favorites

Released

2017

Official URL

https://deepspeech.readthedocs.io/

Offline AIEmbedded AIDeepspeech+3