F4
Open SourceText Generationby Technology Innovation Institute (TII)

Falcon 40B

Falcon 40B is an open large language model from the Technology Innovation Institute (TII). Trained on the high-quality RefinedWeb dataset and released under Apache 2.0, it was a leading open LLM and remains a strong, permissive base.

apache-2falconllmmultilingualopen-source-aitii
Quick facts
LicenseApache 2.0
Params40B
DataRefinedWeb
ByTII
No ratings yet — be the first
Params
7B/40B/180B
open weights
Data
RefinedWeb
filtered web
License
Apache 2.0
permissive
By
TII
Abu Dhabi

What is Falcon 40B?

Falcon 40B is an open 40-billion-parameter large language model from the Technology Innovation Institute (TII) in Abu Dhabi. On release it was among the strongest openly available LLMs, topping open leaderboards, and it arrived with a genuinely permissive Apache 2.0 licence at a time when many strong open models had restrictive terms. Trained on TII's carefully filtered RefinedWeb dataset, Falcon demonstrated that high-quality web data plus efficient architecture could rival much larger or more restricted models.

How it was built

Falcon is a decoder-only transformer with architectural efficiencies (such as multi-query attention) that improve inference speed and scalability. Its standout ingredient is data: RefinedWeb, a large corpus built by aggressively filtering and deduplicating web text, which TII showed could produce excellent models with less reliance on curated sources. The family spans 7B, 40B and the larger 180B, each with base and Instruct (chat-tuned) variants, giving options across capability and hardware budgets.

What it is good at

Falcon 40B is a capable general-purpose model for text generation, summarisation, question answering, reasoning and some code, with the Instruct variant tuned for assistant-style chat. Its permissive licence made it especially attractive for commercial products and fine-tuning, and it has been used as a base for many downstream models. The smaller 7B is a popular lightweight option, while 40B targets stronger quality on a single multi-GPU setup.

Licensing & access

Falcon is released under Apache 2.0 — fully permissive for research and commercial use — with weights on Hugging Face and standard Transformers support. The 40B model needs substantial GPU memory (multi-GPU or quantisation), while the 7B runs on a single consumer GPU. Both base and Instruct variants are available, and the permissive licence means you can build and ship on top of it without restrictive conditions.

Practical considerations

For chat, use the Instruct variant; the base model is for completion and fine-tuning. At 40B you need real GPU memory, so budget multi-GPU hardware or use quantised builds. As an earlier-generation model, Falcon 40B has been surpassed on many benchmarks by newer open LLMs (Llama 3, Mistral, Qwen), and its context window is modest — weigh a newer model if you need top reasoning or long context. It can also hallucinate, so verify outputs.

How it compares

Falcon competed directly with Llama 2 and models like MPT and BLOOM. Its differentiators were a fully permissive Apache 2.0 licence and the RefinedWeb data approach, which made it a favourite for commercial builders. Against Llama 2 it traded blows on quality while offering a cleaner licence; against BLOOM it was stronger on English benchmarks. For a permissive, well-known open base — especially the efficient 7B — Falcon remains a solid choice.

Getting started

Load Falcon (start with Falcon-7B-Instruct to prototype, or 40B-Instruct for more quality) from Hugging Face with Transformers and prompt it; use the Instruct variant for chat and the base for fine-tuning. Run quantised builds to fit available GPUs, and given its permissive Apache 2.0 licence, build commercial applications freely — while benchmarking against newer open models such as Llama 3 or Mistral if you genuinely need the strongest possible quality for your use case.

Model variants

MOST POPULAR

Falcon 7B

7B
Smaller

Single-GPU

MOST POPULAR

Falcon 40B Instruct

40B
InstructChat

Chat-tuned

Falcon 180B

180B
Largest

Highest capacity

Capabilities

💬
General generation
Capable text generation, summarisation, QA and reasoning.
🔓
Permissive licence
Apache 2.0 allows free commercial use and redistribution.
Efficient architecture
Multi-query attention improves inference speed and scalability.
🧩
Base + Instruct
Chat-tuned and base variants across 7B, 40B and 180B.

Pros & Cons

Pros6
  • Fully permissive Apache 2.0 licence
  • Leading open LLM at release
  • Trained on high-quality RefinedWeb data
  • Base and Instruct variants (7B/40B/180B)
  • Efficient architecture (multi-query attention)
  • Popular, well-supported fine-tuning base
Cons4
  • 40B needs substantial GPU memory
  • Earlier generation — newer models surpass it
  • Modest context window
  • Can hallucinate — verify outputs

Inspiration

Falcon 40B use cases & project ideas

Chat assistant

Use the Instruct variant.

Text generation

Summarise and write.

Commercial base

Build under Apache 2.0.

Fine-tuning

Adapt to a domain.

FAQ

Frequently asked questions

An open 40B large language model from TII, trained on RefinedWeb and released under the permissive Apache 2.0 licence.

More to explore

You might also like

01
MP
MPT-7B
7B · Apache 2.0