T5
Open SourceLanguage Modelby Google Research

T5

T5 (Text-to-Text Transfer Transformer) is Google's influential encoder-decoder model that frames every NLP task as text-to-text. From translation to summarization to classification, one unified format handles them all.

google-researchnlpopen-source-aiseq2seqtext-to-texttransformer
Quick facts
LicenseApache 2.0
TypeEncoder-Decoder
ApproachText-to-Text
ByGoogle
No ratings yet — be the first
Type
Encoder-decoder
text-to-text
Sizes
Small-11B
+ FLAN-T5
License
Apache 2.0
open source
By
Google
Research

What is T5?

T5, the Text-to-Text Transfer Transformer, is a highly influential model from Google Research that introduced a strikingly simple idea: cast every NLP task as a text-to-text problem. Whether the task is translation, summarisation, question answering, or classification, T5 takes text in and produces text out — you just prefix the input with a task description (like 'summarize:' or 'translate English to German:'). This unified framing meant one architecture, one training objective, and one set of techniques could handle the entire breadth of language tasks, making T5 a landmark in NLP.

How it works

T5 is an encoder-decoder transformer: the encoder reads the input text and the decoder generates the output text, which suits both understanding and generation tasks. It was pretrained on the large C4 (Colossal Clean Crawled Corpus) with a span-corruption objective (masking and reconstructing spans of text), then fine-tuned on downstream tasks. By converting every task into the same text-to-text format, T5 unified pretraining and fine-tuning under one consistent recipe, which the original paper studied systematically across many design choices.

What it is good at

T5 excels at text-transformation tasks: summarisation, translation, paraphrasing, question answering, and text classification framed as generation. Its encoder-decoder design makes it especially strong for sequence-to-sequence work where input and output are both text. The instruction-tuned successor FLAN-T5 added strong zero/few-shot following of instructions, making the family versatile for many applications without task-specific training, and it spans sizes from small to 11B.

Licensing & access

T5 is open source under Apache 2.0, with all sizes — Small, Base, Large, 3B and 11B — plus FLAN-T5 variants on Hugging Face and full Transformers support. The smaller sizes run on modest hardware (Small/Base even on CPU), while 3B and 11B need GPUs. Its permissive licence and wide availability have made T5 and FLAN-T5 staples for fine-tuning and production NLP across countless projects.

Practical considerations

T5 typically benefits from fine-tuning for a specific task (the instruction-tuned FLAN-T5 is the better choice for zero/few-shot use). Remember the task-prefix format — providing the right prefix matters. The base T5 has a relatively modest input length, so very long documents need chunking, and as with any model, outputs should be verified. For pure free-form generation, a decoder-only model may be simpler, but for transform-style tasks T5 is excellent.

How it compares

BERT is an encoder-only model for understanding (no generation); GPT-Neo and BLOOM are decoder-only generators. T5's distinctive contribution is the unified text-to-text, encoder-decoder approach, which is particularly strong for summarisation, translation and other seq2seq tasks. Where BERT classifies and decoder models freely generate, T5 elegantly does both through one format — and FLAN-T5 keeps it competitive for instruction-following at efficient sizes.

Getting started

Load T5 (or, for instruction following, FLAN-T5) from Hugging Face with Transformers, format your input with the appropriate task prefix, and generate the output text. Start with Base or Large to prototype, fine-tune on your task for best results, and use FLAN-T5 for zero/few-shot use. Pick a size that matches your hardware, and and chunk any long inputs into smaller pieces to respect the model's context length.

Model variants

MOST POPULAR

T5 Base

220M
Base

Balanced

T5 Large

770M
Larger

MOST POPULAR

FLAN-T5 XL

3B
Instruction

Instruction-tuned

T5 11B

11B
Largest

Highest capacity

Capabilities

🔄
Text-to-text
Frames every task as text in, text out, using a task prefix.
🔁
Seq2seq strength
Encoder-decoder design excels at summarisation, translation and paraphrasing.
🎯
FLAN instruction tuning
FLAN-T5 follows instructions zero/few-shot without task-specific training.
📐
Size range
Small to 11B fit hardware from CPU to multi-GPU.

Pros & Cons

Pros6
  • Unified text-to-text framing for all NLP tasks
  • Strong encoder-decoder for seq2seq work
  • Excellent at summarisation and translation
  • FLAN-T5 adds instruction following
  • Open source (Apache 2.0), many sizes
  • Runs from CPU (small) to GPU (11B)
Cons4
  • Often needs task-specific fine-tuning
  • Requires the right task prefix
  • Modest input length — chunk long docs
  • FLAN-T5 better for zero/few-shot

Inspiration

T5 use cases & project ideas

Summarization

Condense long text.

Translation

Translate between languages.

Question answering

Generate answers.

Classification

Label text as generation.

FAQ

Frequently asked questions

It frames every NLP task as text-to-text — text in, text out — using a task prefix, so one model handles many tasks.

More to explore

You might also like

01
BE
BERT
BERT-Base 110M / BER · Apache 2.0