An instruction-tuned successor that follows instructions zero/few-shot, making T5 more versatile without task-specific training.

Yes, under Apache 2.0, with sizes from Small to 11B (plus FLAN-T5) on Hugging Face.

Sequence-to-sequence tasks like summarisation, translation and paraphrasing, plus QA and classification framed as text.

T5

Open SourceLanguage Modelby Google Research

T5

T5 (Text-to-Text Transfer Transformer) is Google's influential encoder-decoder model that frames every NLP task as text-to-text. From translation to summarization to classification, one unified format handles them all.

google-researchnlpopen-source-aiseq2seqtext-to-texttransformer

View on GitHub

Quick facts

LicenseApache 2.0

TypeEncoder-Decoder

ApproachText-to-Text

ByGoogle

No ratings yet — be the first

Type

Encoder-decoder

text-to-text

Sizes

Small-11B

+ FLAN-T5

License

Apache 2.0

open source

Google

Research

What is T5?

T5, the Text-to-Text Transfer Transformer, is a highly influential model from Google Research that introduced a strikingly simple idea: cast every NLP task as a text-to-text problem. Whether the task is translation, summarisation, question answering, or classification, T5 takes text in and produces text out — you just prefix the input with a task description (like 'summarize:' or 'translate English to German:'). This unified framing meant one architecture, one training objective, and one set of techniques could handle the entire breadth of language tasks, making T5 a landmark in NLP.

How it works

T5 is an encoder-decoder transformer: the encoder reads the input text and the decoder generates the output text, which suits both understanding and generation tasks. It was pretrained on the large C4 (Colossal Clean Crawled Corpus) with a span-corruption objective (masking and reconstructing spans of text), then fine-tuned on downstream tasks. By converting every task into the same text-to-text format, T5 unified pretraining and fine-tuning under one consistent recipe, which the original paper studied systematically across many design choices.

What it is good at

T5 excels at text-transformation tasks: summarisation, translation, paraphrasing, question answering, and text classification framed as generation. Its encoder-decoder design makes it especially strong for sequence-to-sequence work where input and output are both text. The instruction-tuned successor FLAN-T5 added strong zero/few-shot following of instructions, making the family versatile for many applications without task-specific training, and it spans sizes from small to 11B.

Licensing & access

T5 is open source under Apache 2.0, with all sizes — Small, Base, Large, 3B and 11B — plus FLAN-T5 variants on Hugging Face and full Transformers support. The smaller sizes run on modest hardware (Small/Base even on CPU), while 3B and 11B need GPUs. Its permissive licence and wide availability have made T5 and FLAN-T5 staples for fine-tuning and production NLP across countless projects.

Practical considerations

T5 typically benefits from fine-tuning for a specific task (the instruction-tuned FLAN-T5 is the better choice for zero/few-shot use). Remember the task-prefix format — providing the right prefix matters. The base T5 has a relatively modest input length, so very long documents need chunking, and as with any model, outputs should be verified. For pure free-form generation, a decoder-only model may be simpler, but for transform-style tasks T5 is excellent.

How it compares

BERT is an encoder-only model for understanding (no generation); GPT-Neo and BLOOM are decoder-only generators. T5's distinctive contribution is the unified text-to-text, encoder-decoder approach, which is particularly strong for summarisation, translation and other seq2seq tasks. Where BERT classifies and decoder models freely generate, T5 elegantly does both through one format — and FLAN-T5 keeps it competitive for instruction-following at efficient sizes.

Getting started

Load T5 (or, for instruction following, FLAN-T5) from Hugging Face with Transformers, format your input with the appropriate task prefix, and generate the output text. Start with Base or Large to prototype, fine-tune on your task for best results, and use FLAN-T5 for zero/few-shot use. Pick a size that matches your hardware, and and chunk any long inputs into smaller pieces to respect the model's context length.

Model variants

T5 Base

220M

Base

Balanced

T5 Large

770M

Larger

FLAN-T5 XL

Instruction

Instruction-tuned

T5 11B

11B

Largest

Highest capacity

Capabilities

🔄

Text-to-text

Frames every task as text in, text out, using a task prefix.

🔁

Seq2seq strength

Encoder-decoder design excels at summarisation, translation and paraphrasing.

🎯

FLAN instruction tuning

FLAN-T5 follows instructions zero/few-shot without task-specific training.

📐

Size range

Small to 11B fit hardware from CPU to multi-GPU.

Pros & Cons

Pros6

Unified text-to-text framing for all NLP tasks
Strong encoder-decoder for seq2seq work
Excellent at summarisation and translation
FLAN-T5 adds instruction following
Open source (Apache 2.0), many sizes
Runs from CPU (small) to GPU (11B)

Cons4

Often needs task-specific fine-tuning
Requires the right task prefix
Modest input length — chunk long docs
FLAN-T5 better for zero/few-shot

Inspiration

T5 use cases & project ideas

Summarization

Condense long text.

Translation

Translate between languages.

Question answering

Generate answers.

Classification

Label text as generation.

FAQ

Frequently asked questions

What is T5's core idea?+

It frames every NLP task as text-to-text — text in, text out — using a task prefix, so one model handles many tasks.

What architecture is it?+

What is FLAN-T5?+

Is it open source?+

What is it best at?+

More to explore

Learn more

From our blog

Tutorials

T5