MP
Open SourceText Generationby MosaicML (Databricks)

MPT-7B

MPT-7B is an open, commercially usable large language model from MosaicML. Trained on 1 trillion tokens with the ALiBi method for long context, it shipped with finetuned Chat, Instruct and a 65K-token StoryWriter variant under Apache 2.0.

apache-2databricksllmlong-contextmosaicmlopen-source-ai
Quick facts
LicenseApache 2.0
Params7B
FeatureALiBi Long Context
ByMosaicML
No ratings yet — be the first
Params
7B
+ MPT-30B
Tokens
1T
trained on
License
Apache 2.0
commercial OK
By
MosaicML
now Databricks

What is MPT-7B?

MPT-7B is an open large language model from MosaicML (later acquired by Databricks), released in 2023 as a commercially usable alternative to other open models of the time. Trained from scratch on 1 trillion tokens of text and code, it matched the quality of comparable 7B models while being licensed under Apache 2.0 for unrestricted commercial use. It launched as a family — a base model plus finetuned Chat, Instruct, and a remarkable 65K-token StoryWriter variant — and demonstrated MosaicML's efficient training stack.

How it was built

MPT uses a modified decoder-only transformer with two notable engineering choices: ALiBi (Attention with Linear Biases) instead of standard positional embeddings, which lets the model handle much longer contexts than it was trained on, and optimisations like FlashAttention for fast training and inference. It was trained on MosaicML's platform on a carefully assembled text-and-code corpus. These choices made MPT-7B efficient to run and unusually flexible on context length — the StoryWriter variant was finetuned for 65K-token inputs and could extrapolate even further.

What it is good at

MPT-7B is a capable general-purpose model for text generation, chat (the Chat variant), instruction following (Instruct), and long-form tasks — the StoryWriter variant excels at reading and writing very long documents and stories thanks to ALiBi. Its commercial licence made it popular for products, fine-tuning and embedding into applications where licence restrictions on other models were a problem. The 7B size runs on a single consumer GPU.

Licensing & access

MPT-7B is open under Apache 2.0 — fully permissive for research and commercial use — with the base and finetuned variants on Hugging Face and Transformers support. There is also a larger MPT-30B. At 7B it runs on a single GPU (quantised on modest hardware), and MosaicML published its efficient training recipes. Its clean licence was a major draw before equally permissive models became common.

Practical considerations

MPT-7B is an earlier-generation model: excellent for its time and licence, but newer open models (Llama 3, Mistral, Qwen) generally surpass it on reasoning and quality. Use the Chat or Instruct variants for assistant behaviour and StoryWriter for long documents. As with any LLM it can hallucinate, so verify outputs. For new projects a more recent model is usually stronger, but MPT-7B remains a clean, permissive, well-documented option and a notable piece of open-LLM history.

How it compares

MPT-7B competed with Falcon, Llama and BLOOM. Against Falcon it shared the commercial-friendly Apache 2.0 appeal; versus the first Llama it offered a cleaner licence; and its ALiBi long-context and StoryWriter variant were distinctive. Today Llama 2 and newer models often exceed it on quality, but MPT's combination of permissive licensing, long context and efficient training made it influential. For a permissive, lightweight base with long-context heritage, it is still worth knowing.

Getting started

Load MPT-7B (use mpt-7b-chat or mpt-7b-instruct for assistants, or mpt-7b-storywriter for long documents) from Hugging Face with Transformers and prompt it; run a quantised build on a single GPU. Exploit ALiBi for longer-than-trained contexts where useful, use the base for fine-tuning, and benchmark against newer open models like Llama 3 or Mistral when you need the strongest quality for production.

Model variants

MPT-7B Base

7B
Base

For fine-tuning

MOST POPULAR

MPT-7B Chat

7B
Chat

Conversational

MOST POPULAR

MPT-7B Instruct

7B
Instruct

Instruction following

MOST POPULAR

MPT-7B StoryWriter

7B
65K context

Very long inputs

Capabilities

📏
Long context
ALiBi lets MPT handle inputs far longer than its training length.
✍️
StoryWriter 65K
A finetune for reading and writing very long documents.
💬
Chat & Instruct
Finetuned variants provide assistant and instruction-following behaviour.
🔓
Permissive licence
Apache 2.0 allows unrestricted commercial use.

Pros & Cons

Pros6
  • Open Apache 2.0 — commercial use
  • Trained on 1T tokens
  • ALiBi enables long context
  • 65K-token StoryWriter variant
  • Chat and Instruct variants
  • Runs on a single consumer GPU
Cons4
  • Earlier generation — newer models surpass it
  • Use Chat/Instruct for assistant behaviour
  • Can hallucinate — verify outputs
  • Newer models stronger for production

Inspiration

MPT-7B use cases & project ideas

Chat assistant

Use the Chat variant.

Long documents

StoryWriter 65K context.

Commercial base

Build under Apache 2.0.

Text generation

Write and summarise.

FAQ

Frequently asked questions

MosaicML's open, commercially usable LLM trained on 1T tokens, with Chat, Instruct and a 65K-token StoryWriter variant under Apache 2.0.

More to explore

You might also like

01
F4
Falcon 40B
40B · Apache 2.0