FA
Open SourceServing Frameworkby LMSYS (UC Berkeley)

FastChat

FastChat is LMSYS's open platform for training, serving and evaluating large language model chatbots. It powers Vicuna and the Chatbot Arena, and exposes an OpenAI-compatible API server for local models.

chatbot-arenafastchatllmlmsysmodel-servingopen-source-ai
Quick facts
LicenseApache 2.0
TypeServing Framework
APIOpenAI-Compatible
ByLMSYS
No ratings yet — be the first
Type
Serving + eval
framework
API
OpenAI-compatible
drop-in
License
Apache 2.0
open source
By
LMSYS
UC Berkeley

What is FastChat?

FastChat is an open platform from LMSYS (UC Berkeley and collaborators) for training, serving and evaluating large language model chatbots. Rather than a single model, it is the toolkit that made the open-chatbot wave practical: it trained and released Vicuna, runs the well-known Chatbot Arena for human preference evaluation, and provides a production-style serving stack — including an OpenAI-compatible API server — so you can run open models behind the same interface apps already use. It is released under Apache 2.0.

What it provides

FastChat bundles three things developers reach for. A serving system with a controller, model workers and a web/REST gateway lets you host one or many models and scale across GPUs. An OpenAI-compatible endpoint means existing OpenAI SDK code can point at your local model with almost no changes. And it includes training and evaluation utilities — the recipes used to fine-tune Vicuna, plus tools like MT-Bench and the Arena methodology for judging chatbot quality.

What it is good at

Its sweet spot is running open chat models yourself with a familiar API. Teams use it to self-host models like Vicuna, Llama-derived chatbots and others behind a drop-in OpenAI interface, to compare models with MT-Bench and Arena-style evaluation, and as a reference for chatbot fine-tuning. Because it is widely adopted, it is well documented and integrates with the broader open-LLM ecosystem.

Licensing & access

FastChat is open source under Apache 2.0, installed from PyPI or GitHub and run on your own hardware. Note that while FastChat itself is permissive, the models you serve carry their own licences (for example Llama-based weights) that you must respect. It supports a range of GPUs and can serve quantised models, so you can match it to anything from a single consumer card to a multi-GPU server.

Practical considerations

FastChat is a serving and research framework, not a turnkey product — you provide the models, the hardware and the operational care. For maximum raw inference throughput, dedicated engines like vLLM, TensorRT-LLM or MLC-LLM may be faster, and FastChat can integrate with some of them as workers. Mind GPU memory for larger models, and remember the quality of your deployment depends on the model you choose to serve.

How it compares

Compared with pure inference engines (TensorRT-LLM, MLC-LLM) that focus on squeezing maximum speed from a model, FastChat is broader: it covers serving, training and evaluation and made human-preference benchmarking mainstream through the Arena. If you want to self-host an open chatbot behind an OpenAI-style API and also measure how good it is, FastChat is the established, all-in-one choice.

Getting started

Install FastChat with pip, then launch the three serving components — a controller, a model worker that loads your chosen model, and the OpenAI-compatible API server. Point any OpenAI SDK at the local endpoint and start chatting. From there you can add more workers to scale, run the MT-Bench suite to score and rank models, or follow the published Vicuna recipes to fine-tune your own chatbot. Because the API mirrors OpenAI's, swapping a hosted model for a self-hosted one is often a one-line change to the base URL.

Capabilities

🚀
Distributed serving
A controller plus model workers host one or many models and scale across GPUs.
🔌
OpenAI-compatible API
Run open models behind the same interface existing OpenAI SDK code expects.
⚖️
Evaluation suite
Score chatbots with MT-Bench and the human-preference Chatbot Arena methodology.
🏋️
Training recipes
The fine-tuning recipes used to create Vicuna are included.

Pros & Cons

Pros6
  • Self-host open chatbots with an OpenAI-compatible API
  • Serving, training and evaluation in one toolkit
  • Powers Vicuna and the Chatbot Arena
  • Apache 2.0 and widely adopted
  • Scales across multiple GPUs
  • MT-Bench and Arena evaluation built in
Cons4
  • A framework, not a turnkey product
  • You supply models, hardware and ops
  • Pure inference engines can be faster
  • Served models have their own licences

Inspiration

FastChat use cases & project ideas

Self-host a chatbot

Serve an open model with an OpenAI API.

Drop-in API

Point OpenAI SDK code at a local model.

Model evaluation

Score chatbots with MT-Bench.

Fine-tune chat

Train a Vicuna-style assistant.

FAQ

Frequently asked questions

No. It is an open platform for training, serving and evaluating LLM chatbots; it trained Vicuna and runs the Chatbot Arena.

More to explore

You might also like

01
VV
Vicuna-13B v1.5
13B · Llama Community License