BL
Open SourceText Generationby BigScience Workshop

Bloom

BLOOM is a 176-billion-parameter multilingual large language model from the BigScience collaboration, trained openly on 46 natural and 13 programming languages and released under the Responsible AI License.

bigsciencebloomllmlow-resource-languagesmultilingual-aiopen-source-ai
Quick facts
LicenseRAIL
Params176B
Languages46 + code
ByBigScience
No ratings yet — be the first
Params
560M–176B
open weights
Languages
46 + 13
natural + code
License
RAIL
use-restricted
By
BigScience
open collab

What is BLOOM?

BLOOM is a 176-billion-parameter multilingual large language model produced by BigScience — a year-long open research collaboration of over a thousand researchers, coordinated by Hugging Face and trained on France's Jean Zay supercomputer. It was a landmark as the first openly developed LLM at that scale, designed for transparency and accessibility: trained in the open on a documented corpus spanning 46 natural languages and 13 programming languages, with weights released for anyone to study and use.

The architecture & training

BLOOM is a decoder-only transformer in the GPT lineage, trained on the multilingual ROOTS corpus assembled specifically for the project. Its deliberate emphasis on language diversity — including many languages underrepresented in earlier models, such as several African and South Asian languages — set it apart from English-centric LLMs. The family spans 560M, 1.1B, 1.7B, 3B, 7.1B and the flagship 176B, so you can pick a size to match your hardware.

What it is good at

BLOOM is strongest as a multilingual base model: text generation, completion and few-shot tasks across many languages, with notably better coverage of non-English languages than its contemporaries. It is widely used for research into large multilingual models, as a foundation to fine-tune (the instruction-tuned BLOOMZ variant followed), and for generation in languages other open models handled poorly. Smaller sizes make capable, runnable baselines. Its place in history is also part of the appeal: as the first openly built model at this scale, with its data and process documented, it remains a teaching reference for how a large language model is actually assembled.

Licensing & access

BLOOM uses the Responsible AI License (RAIL) — open and free to use, including commercially, but with use-based restrictions prohibiting harmful applications. The weights are on Hugging Face with full Transformers support. The 176B model is very large and needs serious multi-GPU hardware (or quantisation) to run, while the smaller sizes are practical on a single GPU.

Practical considerations

As an older, base model, BLOOM trails today's frontier LLMs on reasoning and instruction following — the 176B version is heavyweight to serve, and the pretrained checkpoints are not chat-aligned (use BLOOMZ or fine-tune for instructions). Read the RAIL licence terms, since they restrict certain uses, and prefer a smaller size unless you specifically need the full model's capacity.

How it compares

Versus GPT-Neo (EleutherAI's earlier open English-centric models) and T5 (Google's encoder-decoder), BLOOM's distinguishing strengths are scale and multilingual breadth. Newer open models like Falcon match or exceed it on English with more modern training, but BLOOM remains a reference point for open, multilingual LLM development and and a useful base model to build on whenever wide, genuine language coverage is what matters most.

Getting started

Load a BLOOM size that fits your GPU through Transformers and generate text in your target language; start with 560M–1.7B to prototype before scaling up. For instruction following, use the BLOOMZ variant or fine-tune a base checkpoint. To use the 176B model without local hardware, call it through a hosted inference provider instead, and always check the RAIL licence terms for your particular use case.

Model variants

MOST POPULAR

BLOOM 560M

560M
Base

Runs on modest hardware

MOST POPULAR

BLOOM 7.1B

7.1B
Base

Single-GPU capable

BLOOM 176B

176B
Flagship

Needs multi-GPU

MOST POPULAR

BLOOMZ

560M–176B
Instruct

Instruction-tuned

Capabilities

🌐
Broad multilingual
Generates and completes text across 46 languages, including underrepresented ones.
💻
Code-aware
Trained on 13 programming languages alongside natural text.
📝
Few-shot prompting
Handles classification and extraction tasks from in-context examples.
🔎
Fully transparent
Open weights and a documented training corpus for study and reproduction.

Pros & Cons

Pros6
  • Open weights at up to 176B parameters
  • Strong multilingual coverage (46 languages)
  • Transparently trained on a documented corpus
  • Multiple sizes from 560M to 176B
  • Includes code in 13 programming languages
  • Backed by a large open collaboration
Cons4
  • 176B needs heavy multi-GPU hardware
  • Base models aren't chat-aligned (use BLOOMZ)
  • Trails newer LLMs on reasoning
  • RAIL licence has use-based restrictions

Inspiration

Bloom use cases & project ideas

Multilingual generation

Generate text across many languages.

LLM research

Study a fully open large model.

Fine-tuning base

Adapt to a task or language.

Few-shot tasks

Prompt for classification or extraction.

FAQ

Frequently asked questions

The flagship is 176B parameters, with smaller sizes of 560M, 1.1B, 1.7B, 3B and 7.1B.

More to explore

You might also like

01
GP
GPT-Neo
125M / 1.3B / 2.7B · MIT