FreeAPIHub
HomeAPIsAI ModelsAI ToolsBlog
Favorites
FreeAPIHub

The central hub for discovering, testing, and integrating the world's best AI models and APIs.

Platform

  • Categories
  • AI Models
  • APIs

Company

  • About Us
  • Contact
  • FAQ

Help

  • Terms of Service
  • Privacy Policy
  • Cookies

© 2026 FreeAPIHub. All rights reserved.

GitHubTwitterLinkedIn
  1. Home
  2. AI Models
  3. Natural Language Processing
  4. Mistral Small 3
open sourcellm

Mistral Small 3

24B Apache 2.0 LLM at 150 tokens/s — beats Llama 3.3-70B in speed

Developed by Mistral AI

Try Model
24BParams
YesAPI
stableStability
Mistral Small 3.2Version
Apache 2.0License
PyTorchFramework
YesRuns Local

Playground

Implementation Example

Example Prompt

user input
[INST] Extract event details from: 'Hi team, the quarterly review is moved to Tuesday March 11 at 2pm in the Birch conference room.' Return JSON with title, date, time, location. [/INST]

Model Output

model response
{"title": "Quarterly Review", "date": "2025-03-11", "time": "14:00", "location": "Birch conference room"}

Examples

Real-World Applications

  • Real-time chatbots
  • low-latency function-calling agents
  • document analysis
  • multilingual content
  • fine-tuning base
  • on-device assistants.

Docs

Model Intelligence & Architecture

What is Mistral Small 3?

Mistral Small 3 is a 24-billion-parameter open-weights large language model released by Mistral AI in January 2025 under the permissive Apache 2.0 license. It is specifically designed for low-latency inference — using fewer transformer layers than typical 24B models to dramatically speed up forward passes.

The model achieves over 81% on MMLU at 150 tokens/second on consumer hardware, rivaling Llama 3.3-70B at 3× the speed.

Why Mistral Small 3 Is Trending in 2026

It hits a sweet spot that no other open model matches: frontier-level quality, single-GPU deployment, and Apache 2.0 freedom. With newer Mistral Small 3.1 (March 2025) adding multimodal support and 128K context, and Mistral Small 3.2 (June 2025) reaching 84.5% MMLU, this family has become the workhorse of efficient open AI.

It's also one of the few sub-30B models with native function calling and JSON mode for agentic workflows.

Key Features and Capabilities

Mistral Small 3 supports multilingual generation (10+ languages including Chinese, Japanese, Korean), function calling, JSON mode, and a 32K-token context window (128K in v3.1).

The 3.1 version adds vision input, making it competitive with GPT-4o-mini and Gemma 3 27B.

Who Should Use Mistral Small 3?

Mistral Small 3 is built for developers, AI startups, enterprise teams, and on-device app builders who need frontier quality with extremely low latency and full Apache 2.0 freedom.

Top Use Cases

Real-world applications include real-time chatbots, low-latency function-calling agents, document analysis, customer support, multilingual content generation, fine-tuning bases for vertical AI, and on-device assistants.

Where Can You Run It?

Mistral Small 3 runs on a single RTX 4090, A100 40GB, or 32 GB MacBook (quantized). It's available via Mistral's La Plateforme API, Hugging Face, AWS Bedrock, Azure AI, Together AI, Ollama, and Groq.

How to Use Mistral Small 3 (Quick Start)

Easiest: ollama pull mistral-small. Via API: sign up at console.mistral.ai for an OpenAI-compatible endpoint. Function calling and JSON mode work just like OpenAI's API.

When Should You Choose Mistral Small 3?

Choose it when you need maximum tokens-per-second and Apache 2.0 freedom. It's the best speed/quality ratio for any open-source LLM in its weight class in 2026.

Pricing

Free under Apache 2.0 for self-hosting. La Plateforme API: ~$0.10 per million input tokens, $0.30 per million output — among the cheapest frontier-class APIs.

Pros and Cons

Pros: ✔ True Apache 2.0 license ✔ 150 tokens/s latency ✔ Single-GPU friendly ✔ Function calling + JSON mode ✔ Multilingual ✔ Multimodal in v3.1+

Cons: ✘ Smaller world knowledge than 70B models ✘ 32K context (v3.0) ✘ Less RLHF refinement than DeepSeek R1

Final Verdict

Mistral Small 3 is one of the best-engineered open LLMs of 2026 — perfect for high-throughput production deployment. Discover more efficient AI at FreeAPIHub.com.

Evaluation

Advantages & Limitations

Advantages
  • ✓ True Apache 2.0 license
  • ✓ 150 tokens/s speed
  • ✓ Single-GPU friendly
  • ✓ Function calling + JSON mode
  • ✓ Multilingual
  • ✓ Vision input in v3.1+
Limitations
  • ✗ Smaller world knowledge than 70B
  • ✗ 32K context in v3.0
  • ✗ Less RLHF refinement

Important Notice

Verify Before You Decide

Last verified · Apr 29, 2026

The details on this page — including pricing, features, and availability — are based on our last review and may not reflect the provider's current offering. Providers update their products frequently, sometimes without prior notice.

What may have changed

Pricing Plans
Features & Limits
Availability
Terms & Policies

Always visit the official provider website to confirm the latest pricing, terms, and feature availability before subscribing or integrating.

Check official site

External Resources

Try the Model Official Website Source Code Pricing Details

Technical Details

Architecture
Decoder Transformer with reduced layer count for low latency
Stability
stable
Framework
PyTorch
License
Apache 2.0
Release Date
2025-01-30
Signup Required
No
API Available
Yes
Runs Locally
Yes

Rate Limits

No limits self-hosted; tiered free credits on La Plateforme

Pricing

Free Apache 2.0 self-hosting; API from $0.10/M input tokens

Best For

Developers needing low-latency, high-quality LLM with Apache 2.0 freedom

Alternative To

GPT-4o-mini, Llama 3.3-70B, Gemma 3 27B

Compare With

mistral small 3 vs llama 3.3mistral small 3 vs gpt-4o-minimistral small vs gemma 3best fast open llm

Tags

#Low Latency#Apache 2#Function Calling#Mistral AI#Open Source AI#llm

You Might Also Like

More AI Models Similar to Mistral Small 3

Mistral 8x22B

Mistral 8x22B is a sparse Mixture-of-Experts open-weights LLM by Mistral AI with 141B total / 39B active parameters and 64K context. Apache 2.0 license — fully free for commercial use, multilingual, top-tier reasoning.

open sourcellm

Granite 3.3

Granite 3.3 by IBM is a free open-source enterprise-grade LLM family with strong reasoning, code, and function calling. Apache 2.0, 128K context, sizes from 2B to 8B. Optimized for safe, governed enterprise AI.

open sourcellm

MPT-7B

MPT-7B by MosaicML is a free 7-billion-parameter Apache 2.0 LLM trained on 1 trillion tokens. Includes special variants like MPT-7B-StoryWriter with 65K context and MPT-7B-Chat. Production-ready, commercially-friendly base model.

open sourcellm