open source

wav2vec 2.0

Provided by:

Meta AI

• Framework: PyTorch

wav2vec 2.0 is a self-supervised speech representation learning model developed by Meta AI, offering state-of-the-art performance in automatic speech recognition (ASR). Built on PyTorch and licensed under MIT, it drastically reduces the need for labeled data, making it ideal for multilingual transcription and voice applications. The model is widely used and integrated into the Hugging Face ecosystem.

wav2vec 2.0 AI Model

Views

June 24, 2020

Released

Jul 20, 2025

Last Checked

Version

Capabilities

Speech-to-Text

Performance Benchmarks

WER1.8% on LibriSpeech test-clean

Technical Specifications

Parameter Count: N/A

Training & Dataset

Dataset Used

LibriSpeech

Related AI Models

Discover similar AI models that might interest you

More AI Models

Modelopen source

SpeechT5

Microsoft

SpeechT5 is a versatile speech processing model developed by Microsoft, designed to handle speech recognition, speech synthesis, and speech translation tasks within a unified framework. Built using PyTorch and released under the MIT license, it leverages transformer architectures for improved accuracy and flexibility in various speech applications, including voice assistants and translation systems.

Speech & Audioasrspeech-recognition

Modelopen source

Distil-Whisper

Hugging Face

Distil‑Whisper is a distilled version of OpenAI’s Whisper model created by Hugging Face. Implemented in PyTorch and licensed under MIT, it offers up to six times faster inference and uses under half the parameters while maintaining ≤ 1% word error rate (WER) on English speech tasks. Ideal for real-time transcription in constrained resource environments.

Speech & Audioasrspeech-recognition

Modelopen source

DeepSpeech

Mozilla

DeepSpeech is an open-source automatic speech recognition (ASR) model developed by Mozilla, utilizing TensorFlow and licensed under the Mozilla Public License 2.0. It enables developers to build reliable, real-time speech-to-text transcription systems optimized for multiple languages and accents. Its architecture is designed for efficient deployment on edge devices and supports custom language model training.

Speech & Audiospeech-recognitionvoice

Model Performance Statistics

Dataset Used

Related AI Models

SpeechT5

SpeechT5

Distil-Whisper

Distil-Whisper

DeepSpeech

DeepSpeech