Yes, it is Apache 2.0. The models you serve, however, carry their own licences.

A multi-turn benchmark FastChat uses to evaluate chatbot quality, alongside the human-preference Chatbot Arena.

FA

Open SourceServing Frameworkby LMSYS (UC Berkeley)

FastChat

FastChat is LMSYS's open platform for training, serving and evaluating large language model chatbots. It powers Vicuna and the Chatbot Arena, and exposes an OpenAI-compatible API server for local models.

chatbot-arenafastchatllmlmsysmodel-servingopen-source-ai

View on GitHub

Quick facts

LicenseApache 2.0

TypeServing Framework

APIOpenAI-Compatible

ByLMSYS

No ratings yet — be the first

Type

Serving + eval

framework

API

OpenAI-compatible

drop-in

License

Apache 2.0

open source

LMSYS

UC Berkeley

What is FastChat?

FastChat is an open platform from LMSYS (UC Berkeley and collaborators) for training, serving and evaluating large language model chatbots. Rather than a single model, it is the toolkit that made the open-chatbot wave practical: it trained and released Vicuna, runs the well-known Chatbot Arena for human preference evaluation, and provides a production-style serving stack — including an OpenAI-compatible API server — so you can run open models behind the same interface apps already use. It is released under Apache 2.0.

What it provides

FastChat bundles three things developers reach for. A serving system with a controller, model workers and a web/REST gateway lets you host one or many models and scale across GPUs. An OpenAI-compatible endpoint means existing OpenAI SDK code can point at your local model with almost no changes. And it includes training and evaluation utilities — the recipes used to fine-tune Vicuna, plus tools like MT-Bench and the Arena methodology for judging chatbot quality.

What it is good at

Its sweet spot is running open chat models yourself with a familiar API. Teams use it to self-host models like Vicuna, Llama-derived chatbots and others behind a drop-in OpenAI interface, to compare models with MT-Bench and Arena-style evaluation, and as a reference for chatbot fine-tuning. Because it is widely adopted, it is well documented and integrates with the broader open-LLM ecosystem.

Licensing & access

FastChat is open source under Apache 2.0, installed from PyPI or GitHub and run on your own hardware. Note that while FastChat itself is permissive, the models you serve carry their own licences (for example Llama-based weights) that you must respect. It supports a range of GPUs and can serve quantised models, so you can match it to anything from a single consumer card to a multi-GPU server.

Practical considerations

FastChat is a serving and research framework, not a turnkey product — you provide the models, the hardware and the operational care. For maximum raw inference throughput, dedicated engines like vLLM, TensorRT-LLM or MLC-LLM may be faster, and FastChat can integrate with some of them as workers. Mind GPU memory for larger models, and remember the quality of your deployment depends on the model you choose to serve.

How it compares

Compared with pure inference engines (TensorRT-LLM, MLC-LLM) that focus on squeezing maximum speed from a model, FastChat is broader: it covers serving, training and evaluation and made human-preference benchmarking mainstream through the Arena. If you want to self-host an open chatbot behind an OpenAI-style API and also measure how good it is, FastChat is the established, all-in-one choice.

Getting started

Install FastChat with pip, then launch the three serving components — a controller, a model worker that loads your chosen model, and the OpenAI-compatible API server. Point any OpenAI SDK at the local endpoint and start chatting. From there you can add more workers to scale, run the MT-Bench suite to score and rank models, or follow the published Vicuna recipes to fine-tune your own chatbot. Because the API mirrors OpenAI's, swapping a hosted model for a self-hosted one is often a one-line change to the base URL.

Capabilities

🚀

Distributed serving

A controller plus model workers host one or many models and scale across GPUs.

🔌

OpenAI-compatible API

Run open models behind the same interface existing OpenAI SDK code expects.

⚖️

Evaluation suite

Score chatbots with MT-Bench and the human-preference Chatbot Arena methodology.

🏋️

Training recipes

The fine-tuning recipes used to create Vicuna are included.

Pros & Cons

Pros6

Self-host open chatbots with an OpenAI-compatible API
Serving, training and evaluation in one toolkit
Powers Vicuna and the Chatbot Arena
Apache 2.0 and widely adopted
Scales across multiple GPUs
MT-Bench and Arena evaluation built in

Cons4

A framework, not a turnkey product
You supply models, hardware and ops
Pure inference engines can be faster
Served models have their own licences

Inspiration

FastChat use cases & project ideas

Self-host a chatbot

Serve an open model with an OpenAI API.

Drop-in API

Point OpenAI SDK code at a local model.

Model evaluation

Score chatbots with MT-Bench.

Fine-tune chat

Train a Vicuna-style assistant.

FAQ

Frequently asked questions

Is FastChat a model?+

No. It is an open platform for training, serving and evaluating LLM chatbots; it trained Vicuna and runs the Chatbot Arena.

Does it offer an OpenAI-compatible API?+

Is FastChat free?+

What is MT-Bench?+

Is it the fastest way to serve a model?+

More to explore

Learn more

From our blog

Tutorials