open source

Stable Audio 2.0

Provided by: Framework: PyTorch

Stable Audio 2.0 is an advanced open-source AI model developed by Stability AI for generating music and audio from textual descriptions. Built with PyTorch and licensed under MIT, it offers creators and developers an accessible tool to produce diverse audio content, including music composition and sound design, with high fidelity and creativity.

Model Performance Statistics

14

Views

March 12, 2024

Released

Jul 20, 2025

Last Checked

2.0

Version

Capabilities
  • Text-to-Audio
  • Music Generation
Performance Benchmarks
FAD1.8
Length90 seconds
Technical Specifications
Parameter Count
N/A
Training & Dataset

Dataset Used

AudioSparx, FreeSound

Related AI Models

Discover similar AI models that might interest you

Modelopen source

FastSpeech 2

FastSpeech 2

FastSpeech 2

Microsoft Research Asia

FastSpeech 2 is an improved neural text-to-speech model from Microsoft that generates natural-sounding speech quickly and efficiently. Built with PyTorch and licensed under MIT, it enhances prosody modeling and robustness, making it suitable for real-time voice assistants, audiobooks, and accessibility tools. The open-source code allows developers to customize and deploy the model easily.

Speech & Audioaudiotext-to-speech
14
Modelopen source

VITS

VITS

VITS

NVIDIA

VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an advanced speech synthesis model developed by NVIDIA. It combines variational autoencoders and GANs to generate high-quality, natural-sounding speech directly from text. Built on PyTorch and licensed under MIT, VITS supports fast, end-to-end training and inference, making it popular for voice assistants and media applications.

Speech & Audioaudiotext-to-speech
14
Modelopen source

MusicGen

MusicGen

MusicGen

Meta AI

MusicGen is a cutting-edge, single-stage autoregressive transformer AI from Meta AI via the AudioCraft library. Trained to generate high-quality music conditioned on text or melody (via EnCodec tokenizer), it supports multiple model sizes like small, medium (1.5B), and large (3.3B). Licensed under MIT for code and CC-BY-NC-4.0 for weights, it enables controllable, high-fidelity music synthesis across genres.

Speech & Audioaudiotext-to-music
13