What is Mistral Small 3?
Mistral Small 3 is a 24-billion-parameter open-weights large language model released by Mistral AI in January 2025 under the permissive Apache 2.0 license. It is specifically designed for low-latency inference — using fewer transformer layers than typical 24B models to dramatically speed up forward passes.
The model achieves over 81% on MMLU at 150 tokens/second on consumer hardware, rivaling Llama 3.3-70B at 3× the speed.
Why Mistral Small 3 Is Trending in 2026
It hits a sweet spot that no other open model matches: frontier-level quality, single-GPU deployment, and Apache 2.0 freedom. With newer Mistral Small 3.1 (March 2025) adding multimodal support and 128K context, and Mistral Small 3.2 (June 2025) reaching 84.5% MMLU, this family has become the workhorse of efficient open AI.
It's also one of the few sub-30B models with native function calling and JSON mode for agentic workflows.
Key Features and Capabilities
Mistral Small 3 supports multilingual generation (10+ languages including Chinese, Japanese, Korean), function calling, JSON mode, and a 32K-token context window (128K in v3.1).
The 3.1 version adds vision input, making it competitive with GPT-4o-mini and Gemma 3 27B.
Who Should Use Mistral Small 3?
Mistral Small 3 is built for developers, AI startups, enterprise teams, and on-device app builders who need frontier quality with extremely low latency and full Apache 2.0 freedom.
Top Use Cases
Real-world applications include real-time chatbots, low-latency function-calling agents, document analysis, customer support, multilingual content generation, fine-tuning bases for vertical AI, and on-device assistants.
Where Can You Run It?
Mistral Small 3 runs on a single RTX 4090, A100 40GB, or 32 GB MacBook (quantized). It's available via Mistral's La Plateforme API, Hugging Face, AWS Bedrock, Azure AI, Together AI, Ollama, and Groq.
How to Use Mistral Small 3 (Quick Start)
Easiest: ollama pull mistral-small. Via API: sign up at console.mistral.ai for an OpenAI-compatible endpoint. Function calling and JSON mode work just like OpenAI's API.
When Should You Choose Mistral Small 3?
Choose it when you need maximum tokens-per-second and Apache 2.0 freedom. It's the best speed/quality ratio for any open-source LLM in its weight class in 2026.
Pricing
Free under Apache 2.0 for self-hosting. La Plateforme API: ~$0.10 per million input tokens, $0.30 per million output — among the cheapest frontier-class APIs.
Pros and Cons
Pros: ✔ True Apache 2.0 license ✔ 150 tokens/s latency ✔ Single-GPU friendly ✔ Function calling + JSON mode ✔ Multilingual ✔ Multimodal in v3.1+
Cons: ✘ Smaller world knowledge than 70B models ✘ 32K context (v3.0) ✘ Less RLHF refinement than DeepSeek R1
Final Verdict
Mistral Small 3 is one of the best-engineered open LLMs of 2026 — perfect for high-throughput production deployment. Discover more efficient AI at FreeAPIHub.com.