open source

Stable Audio 2.0

Provided by:

Stability AI

• Framework: PyTorch

Stable Audio 2.0 is an advanced open-source AI model developed by Stability AI for generating music and audio from textual descriptions. Built with PyTorch and licensed under MIT, it offers creators and developers an accessible tool to produce diverse audio content, including music composition and sound design, with high fidelity and creativity.

Stable Audio 2.0 AI Model

Views

March 12, 2024

Released

Jul 20, 2025

Last Checked

2.0

Version

Capabilities

Text-to-Audio
Music Generation

Performance Benchmarks

FAD1.8

Length90 seconds

Technical Specifications

Parameter Count: N/A

Training & Dataset

Dataset Used

AudioSparx, FreeSound

Related AI Models

Discover similar AI models that might interest you

More AI Models

Modelopen source

FastSpeech 2

Microsoft Research Asia

FastSpeech 2 is an improved neural text-to-speech model from Microsoft that generates natural-sounding speech quickly and efficiently. Built with PyTorch and licensed under MIT, it enhances prosody modeling and robustness, making it suitable for real-time voice assistants, audiobooks, and accessibility tools. The open-source code allows developers to customize and deploy the model easily.

Speech & Audioaudiotext-to-speech

Modelopen source

VITS

NVIDIA

VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an advanced speech synthesis model developed by NVIDIA. It combines variational autoencoders and GANs to generate high-quality, natural-sounding speech directly from text. Built on PyTorch and licensed under MIT, VITS supports fast, end-to-end training and inference, making it popular for voice assistants and media applications.

Speech & Audioaudiotext-to-speech

Modelopen source

MusicGen

Meta AI

MusicGen is a cutting-edge, single-stage autoregressive transformer AI from Meta AI via the AudioCraft library. Trained to generate high-quality music conditioned on text or melody (via EnCodec tokenizer), it supports multiple model sizes like small, medium (1.5B), and large (3.3B). Licensed under MIT for code and CC-BY-NC-4.0 for weights, it enables controllable, high-fidelity music synthesis across genres.

Speech & Audioaudiotext-to-music

Model Performance Statistics

Dataset Used

Related AI Models

FastSpeech 2

FastSpeech 2

VITS

VITS

MusicGen

MusicGen