Category
🎨

Generative Models

Foundational generative AI architectures — LLMs for text and code, diffusion models for image and video, VAEs for latent manipulation, and multimodal generators producing any media from any prompt.

2APIs3AI Models
Most Popular In
Large Language ModelsDiffusion ModelsFlow Matching Models
Auth Breakdown
API Key100%
Notable Developers
OpenAIAnthropicBlack Forest Labs (FLUX)Google DeepMindMeta AI
Updated Jun 12, 2026
Curated by FreeAPIHub editors
Topics:Large Language ModelsDiffusion ModelsFlow Matching ModelsVAEsMultimodal GenerativeVideo Generation Models
5 of 5
Access:
Auth:
Format:
Tavus AI Video API logo

Tavus AI Video API

🔥 Hot
Generative Models

The Tavus AI Video API creates personalized and conversational video using digital replicas - AI versions of a real person's face and voice - that can say any script or talk live.

FreemiumAPI Key
View details
Runway AI API logo

Runway AI API

🔥 Hot
Generative Models

Runway's API brings its generative video models to your app. Submit a text prompt or an image and Runway returns AI-generated or extended video, handled as asynchronous jobs you poll, billed by credits.

FreemiumAPI Key
View details
AN

AnimateDiff

🔥 Hot
by Shanghai AI Lab & CUHK MMLab

AnimateDiff is an open technique that turns text-to-image diffusion models into animators. By plugging a trained motion module into Stable Diffusion, it generates short animated clips from prompts while keeping the model's style and custom fine-tunes.

Apache 2.0~1.7B (with SD 1.5 b
View model
SV

Stable Video Diffusion

🔥 Hot
by Stability AI

Stable Video Diffusion (SVD) is Stability AI's open image-to-video model. Built on Stable Diffusion, it animates a still image into a short, coherent video clip, bringing diffusion-based video generation to consumer hardware.

Stability AI Community License~1.5B (with image en
View model
VI

VideoGPT

🔥 Hot
by UC Berkeley (Wilson Yan et al.)

VideoGPT is an open research model for video generation that combines a VQ-VAE with a transformer. It learns to generate short video clips by modelling sequences of discrete spatiotemporal tokens, an influential early approach to AI video.

MIT~100M
View model
Showing 5 of 5 resources

At a glance

Compare the top Generative Models APIs

Browse all APIs
APIAccessAuthFormatsRating
Tavus AI Video API logo
Tavus AI Video API
FreemiumAPI KeyRESTJSONView
Runway AI API logo
Runway AI API
FreemiumAPI KeyRESTJSONView

About this category

Generative Models — developer guide

What Are Generative Models?

Generative models are the engine behind every AI-powered creative and productivity tool. Rather than classifying or analysing existing data, they synthesise entirely new content — text, code, images, audio, video, and 3D objects — that didn't exist before the prompt. This category covers the foundational model architectures that power the specific tool and API categories elsewhere on this site: LLMs, diffusion models, VAEs, and GANs. Understanding the architecture helps you choose the right base model for your use case, whether you're fine-tuning, building on top of a hosted API, or deploying open weights on your own infrastructure.

Key Generative Model Architectures

  • Large Language Models (LLMs) — transformer-based autoregressive models for text, code, reasoning, and chat
  • Diffusion Models — iterative denoising models for photorealistic image, video, and audio generation
  • Variational Autoencoders (VAEs) — latent space encoding for controllable, interpolatable generation
  • Masked Autoencoders (MAE) — self-supervised pre-training backbone for vision and multimodal models
  • Flow Matching Models — the architecture behind FLUX.1, offering faster inference than DDPM diffusion
  • Autoregressive Image Models — token-based image generation (LlamaGen, Chameleon) using LLM decoders

Leading Providers in 2025–2026

OpenAI's GPT-4o and o3 lead commercial LLMs; Meta's Llama 3.3 leads open-weight text. FLUX.1 (Black Forest Labs) and Stable Diffusion 3 Medium are the open image generation frontrunners. Sora (OpenAI) and Veo 3 (Google) represent the current state of the art in video generation. Gemini 2.5 Pro is the leading native multimodal model — processing text, image, audio, and video in one unified architecture.