F4

Open SourceText Generationby Technology Innovation Institute (TII)

Falcon 40B

Falcon 40B is an open large language model from the Technology Innovation Institute (TII). Trained on the high-quality RefinedWeb dataset and released under Apache 2.0, it was a leading open LLM and remains a strong, permissive base.

apache-2falconllmmultilingualopen-source-aitii

View on GitHub

Quick facts

LicenseApache 2.0

Params40B

DataRefinedWeb

ByTII

No ratings yet — be the first

Params

7B/40B/180B

open weights

Data

RefinedWeb

filtered web

License

Apache 2.0

permissive

TII

Abu Dhabi

What is Falcon 40B?

Falcon 40B is an open 40-billion-parameter large language model from the Technology Innovation Institute (TII) in Abu Dhabi. On release it was among the strongest openly available LLMs, topping open leaderboards, and it arrived with a genuinely permissive Apache 2.0 licence at a time when many strong open models had restrictive terms. Trained on TII's carefully filtered RefinedWeb dataset, Falcon demonstrated that high-quality web data plus efficient architecture could rival much larger or more restricted models.

How it was built

Falcon is a decoder-only transformer with architectural efficiencies (such as multi-query attention) that improve inference speed and scalability. Its standout ingredient is data: RefinedWeb, a large corpus built by aggressively filtering and deduplicating web text, which TII showed could produce excellent models with less reliance on curated sources. The family spans 7B, 40B and the larger 180B, each with base and Instruct (chat-tuned) variants, giving options across capability and hardware budgets.

What it is good at

Falcon 40B is a capable general-purpose model for text generation, summarisation, question answering, reasoning and some code, with the Instruct variant tuned for assistant-style chat. Its permissive licence made it especially attractive for commercial products and fine-tuning, and it has been used as a base for many downstream models. The smaller 7B is a popular lightweight option, while 40B targets stronger quality on a single multi-GPU setup.

Licensing & access

Falcon is released under Apache 2.0 — fully permissive for research and commercial use — with weights on Hugging Face and standard Transformers support. The 40B model needs substantial GPU memory (multi-GPU or quantisation), while the 7B runs on a single consumer GPU. Both base and Instruct variants are available, and the permissive licence means you can build and ship on top of it without restrictive conditions.

Practical considerations

For chat, use the Instruct variant; the base model is for completion and fine-tuning. At 40B you need real GPU memory, so budget multi-GPU hardware or use quantised builds. As an earlier-generation model, Falcon 40B has been surpassed on many benchmarks by newer open LLMs (Llama 3, Mistral, Qwen), and its context window is modest — weigh a newer model if you need top reasoning or long context. It can also hallucinate, so verify outputs.

How it compares

Falcon competed directly with Llama 2 and models like MPT and BLOOM. Its differentiators were a fully permissive Apache 2.0 licence and the RefinedWeb data approach, which made it a favourite for commercial builders. Against Llama 2 it traded blows on quality while offering a cleaner licence; against BLOOM it was stronger on English benchmarks. For a permissive, well-known open base — especially the efficient 7B — Falcon remains a solid choice.

Getting started

Load Falcon (start with Falcon-7B-Instruct to prototype, or 40B-Instruct for more quality) from Hugging Face with Transformers and prompt it; use the Instruct variant for chat and the base for fine-tuning. Run quantised builds to fit available GPUs, and given its permissive Apache 2.0 licence, build commercial applications freely — while benchmarking against newer open models such as Llama 3 or Mistral if you genuinely need the strongest possible quality for your use case.

Model variants

Falcon 7B

Smaller

Single-GPU

Falcon 40B Instruct

40B

InstructChat

Chat-tuned

Falcon 180B

180B

Largest

Highest capacity

Capabilities

💬

General generation

Capable text generation, summarisation, QA and reasoning.

🔓

Permissive licence

Apache 2.0 allows free commercial use and redistribution.

⚡

Efficient architecture

Multi-query attention improves inference speed and scalability.

🧩

Base + Instruct

Chat-tuned and base variants across 7B, 40B and 180B.

Pros & Cons

Pros6

Fully permissive Apache 2.0 licence
Leading open LLM at release
Trained on high-quality RefinedWeb data
Base and Instruct variants (7B/40B/180B)
Efficient architecture (multi-query attention)
Popular, well-supported fine-tuning base

Cons4

40B needs substantial GPU memory
Earlier generation — newer models surpass it
Modest context window
Can hallucinate — verify outputs

Inspiration

Falcon 40B use cases & project ideas

Chat assistant

Use the Instruct variant.

Text generation

Summarise and write.

Commercial base

Build under Apache 2.0.

Fine-tuning

Adapt to a domain.

FAQ

Frequently asked questions

What is Falcon 40B?+

An open 40B large language model from TII, trained on RefinedWeb and released under the permissive Apache 2.0 licence.

Can I use it commercially?+

What sizes are available?+

Which variant should I use for chat?+

What hardware does 40B need?+

More to explore

Learn more

From our blog

Tutorials