What is Mixtral 8x22B?
Mixtral 8x22B is an open sparse mixture-of-experts (SMoE) large language model from Mistral AI. It contains 141 billion total parameters spread across eight expert sub-networks, but for each token it routes to only two experts, activating just about 39 billion parameters. This gives it the knowledge capacity of a very large model with the inference cost of a far smaller one. Released under the permissive Apache 2.0 licence with a long 64K-token context, it was one of the strongest fully open models available at release.
The architecture
Mixtral uses a mixture-of-experts design: within each layer there are eight expert feed-forward networks, and a lightweight router selects the two best experts for each token. Because only a fraction of the network runs per token, Mixtral delivers high quality while keeping compute efficient. It builds on the smaller Mixtral 8x7B, scaling the experts up to 22B each. It is natively multilingual, strong at code and mathematics, and supports function calling, with the 64K context handling long documents.
What it is good at
Mixtral 8x22B is a strong general-purpose model: chat and reasoning, multilingual text (English, French, German, Spanish, Italian), code and maths, summarisation and function calling for tool use. Its efficiency-to-quality ratio makes it attractive for self-hosted assistants, RAG and agentic applications where you want high capability without the full cost of a dense model of equivalent quality. The Instruct variant is tuned for assistant and chat behaviour.
Licensing & access
Mixtral 8x22B is released under Apache 2.0 — fully permissive for research and commercial use — with weights on Hugging Face, easy local running via Ollama, and availability through Mistral's API and many inference providers. Despite activating only ~39B parameters per token, the full 141B must fit in memory, so self-hosting needs substantial multi-GPU hardware or quantisation; hosted endpoints are a simpler route for many.
Practical considerations
Use the Instruct variant for chat and the base for fine-tuning. The main practical hurdle is memory: all experts must be resident even though few run per token, so plan for multi-GPU or quantised deployment. MoE models are also a little more involved to serve efficiently than dense ones. Mistral has since released newer models, so for the latest quality compare options — but Mixtral 8x22B remains an excellent, permissively licensed open MoE.
How it compares
DBRX is another open MoE with finer-grained routing; Llama 2 and Falcon are dense open models. Mixtral's edges are its Apache 2.0 licence, 64K context, strong multilingual and code ability, and efficient MoE inference. Against dense models of similar quality it is cheaper to run per token; against DBRX it offers a different routing design and a longer context. For a permissive, efficient, capable open model, Mixtral 8x22B is a leading choice.
Getting started
The quickest path is a hosted endpoint (Mistral's API or a provider) with Mixtral 8x22B Instruct, prompting it like any chat model; or run locally via Ollama or Transformers with quantisation to fit your GPUs. Use the Instruct variant for assistants, the base for fine-tuning, exploit the 64K context for long inputs, and lean on its code/maths strength — validating quality on your own workloads before rollout.


