FreeAPIHub
HomeAPIsAI ModelsAI ToolsBlog
Favorites
FreeAPIHub

The central hub for discovering, testing, and integrating the world's best AI models and APIs.

Platform

  • Categories
  • AI Models
  • APIs

Company

  • About Us
  • Contact
  • FAQ

Help

  • Terms of Service
  • Privacy Policy
  • Cookies

© 2026 FreeAPIHub. All rights reserved.

GitHubTwitterLinkedIn
  1. Home
  2. AI Models
  3. Generative Models
  4. VideoGPT
open sourcevideo

VideoGPT

Free foundational video generation AI — pioneer of autoregressive video models

Developed by UC Berkeley (Wilson Yan et al.)

Try Model
~100MParams
YesAPI
stableStability
VideoGPT (original)Version
MITLicense
PyTorchFramework
YesRuns Local

Playground

Implementation Example

Example Prompt

user input
Generate a 16-frame video sample from the UCF-101 trained VideoGPT model.

Model Output

model response
Returns a short video sample (16 frames at 64x64 resolution) showing one of UCF-101's action categories. Quality is research-grade — significantly lower than modern Stable Video Diffusion outputs but useful for understanding the architectural foundations.

Examples

Real-World Applications

  • Academic baselines
  • video generation tutorials
  • learning autoregressive video modeling
  • studying foundations of modern video AI.

Docs

Model Intelligence & Architecture

What is VideoGPT?

VideoGPT is a generative model for video synthesis released in April 2021 by researchers at UC Berkeley. It applies a VQ-VAE + Transformer architecture to video generation — first compressing video frames into discrete tokens, then using GPT-style transformer modeling to predict video token sequences.

Released under the MIT license, it's free for any commercial use, though it's primarily used as a research baseline.

Why VideoGPT Is Still Relevant in 2026

While modern video AI like Sora, Runway Gen-4, Stable Video Diffusion, and CogVideoX have far surpassed VideoGPT in quality, it remains historically significant as one of the first open transformer-based video generators. The architectural concepts it pioneered influenced today's autoregressive video models.

Key Features and Capabilities

VideoGPT supports unconditional video generation, action-conditioned generation, frame interpolation, and short clip synthesis (typically 16 frames).

Who Should Use VideoGPT?

VideoGPT is built for computer vision researchers, students learning video AI, and academics studying autoregressive video generation history.

Top Use Cases

Real-world applications are mostly research-focused: academic baselines, video generation tutorials, learning autoregressive video modeling, and studying the foundations of modern video AI.

Where Can You Run It?

VideoGPT runs on any system with PyTorch and CUDA. Pre-trained checkpoints are available for BAIR Robot Pushing and UCF-101 datasets.

How to Use VideoGPT (Quick Start)

Clone: git clone https://github.com/wilson1yan/VideoGPT. Train your own VQ-VAE + transformer or use the pre-trained BAIR/UCF-101 checkpoints. Generate samples with the included script.

When Should You Choose VideoGPT?

Choose VideoGPT only for research baselines or learning purposes. For any production video generation, use Stable Video Diffusion, AnimateDiff, CogVideoX, or hosted services like Runway and Sora.

Pricing

VideoGPT is completely free under MIT license.

Pros and Cons

Pros: ✔ MIT license ✔ Foundational architecture ✔ Pioneered VQ-VAE + transformer for video ✔ Research-grade flexibility ✔ Influenced modern video AI

Cons: ✘ Quality dramatically surpassed by modern models ✘ Limited use beyond research ✘ Short clips only ✘ Resource-intensive training

Final Verdict

VideoGPT is a foundational research model from the early days of video AI — interesting for students and researchers but not for production in 2026. Discover modern video AI at FreeAPIHub.com.

Evaluation

Advantages & Limitations

Advantages
  • ✓ MIT license
  • ✓ Foundational architecture
  • ✓ Pioneered VQ-VAE + transformer for video
  • ✓ Research-grade flexibility
  • ✓ Influenced modern video AI
Limitations
  • ✗ Quality far below modern models
  • ✗ Limited use beyond research
  • ✗ Short clips only
  • ✗ Resource-intensive training

Important Notice

Verify Before You Decide

Last verified · Apr 29, 2026

The details on this page — including pricing, features, and availability — are based on our last review and may not reflect the provider's current offering. Providers update their products frequently, sometimes without prior notice.

What may have changed

Pricing Plans
Features & Limits
Availability
Terms & Policies

Always visit the official provider website to confirm the latest pricing, terms, and feature availability before subscribing or integrating.

Check official site

External Resources

Try the Model Official Website Source Code

Technical Details

Architecture
VQ-VAE + GPT-style Transformer
Stability
stable
Framework
PyTorch
License
MIT
Release Date
2021-04-20
Signup Required
No
API Available
Yes
Runs Locally
Yes

Rate Limits

No limits self-hosted

Pricing

Completely free under MIT license

Best For

Researchers and students studying foundational video generation AI

Alternative To

Stable Video Diffusion, CogVideoX (modern alternatives)

Compare With

videogpt vs soravideogpt vs cogvideoxvideogpt vs stable videovideo generation research baselinevq-vae video

Tags

#Vq Vae#Videogpt#Uc Berkeley#Research AI#Open Source AI#video-generation

You Might Also Like

More AI Models Similar to VideoGPT

Stable Video Diffusion

Stable Video Diffusion (SVD) by Stability AI is a free open-source AI that turns any image into a 2-4 second video clip. Image-to-video, text-to-video, runs locally on a single GPU. Best free Runway Gen-2 alternative.

freemiumvideo

AnimateDiff

AnimateDiff is a free open-source AI that turns Stable Diffusion image models into video generators. Create animated GIFs and short clips from text prompts using your favorite SD checkpoints. MIT license, runs locally.

open sourcevideo

xLSTM 1.5B

xLSTM 1.5B by NXAI is a free open-source language model based on the modern xLSTM architecture — an evolution of LSTM that competes with transformers. Apache 2.0, efficient inference, breakthrough alternative architecture.

open sourcellm