What is T5?
T5, the Text-to-Text Transfer Transformer, is a highly influential model from Google Research that introduced a strikingly simple idea: cast every NLP task as a text-to-text problem. Whether the task is translation, summarisation, question answering, or classification, T5 takes text in and produces text out — you just prefix the input with a task description (like 'summarize:' or 'translate English to German:'). This unified framing meant one architecture, one training objective, and one set of techniques could handle the entire breadth of language tasks, making T5 a landmark in NLP.
How it works
T5 is an encoder-decoder transformer: the encoder reads the input text and the decoder generates the output text, which suits both understanding and generation tasks. It was pretrained on the large C4 (Colossal Clean Crawled Corpus) with a span-corruption objective (masking and reconstructing spans of text), then fine-tuned on downstream tasks. By converting every task into the same text-to-text format, T5 unified pretraining and fine-tuning under one consistent recipe, which the original paper studied systematically across many design choices.
What it is good at
T5 excels at text-transformation tasks: summarisation, translation, paraphrasing, question answering, and text classification framed as generation. Its encoder-decoder design makes it especially strong for sequence-to-sequence work where input and output are both text. The instruction-tuned successor FLAN-T5 added strong zero/few-shot following of instructions, making the family versatile for many applications without task-specific training, and it spans sizes from small to 11B.
Licensing & access
T5 is open source under Apache 2.0, with all sizes — Small, Base, Large, 3B and 11B — plus FLAN-T5 variants on Hugging Face and full Transformers support. The smaller sizes run on modest hardware (Small/Base even on CPU), while 3B and 11B need GPUs. Its permissive licence and wide availability have made T5 and FLAN-T5 staples for fine-tuning and production NLP across countless projects.
Practical considerations
T5 typically benefits from fine-tuning for a specific task (the instruction-tuned FLAN-T5 is the better choice for zero/few-shot use). Remember the task-prefix format — providing the right prefix matters. The base T5 has a relatively modest input length, so very long documents need chunking, and as with any model, outputs should be verified. For pure free-form generation, a decoder-only model may be simpler, but for transform-style tasks T5 is excellent.
How it compares
BERT is an encoder-only model for understanding (no generation); GPT-Neo and BLOOM are decoder-only generators. T5's distinctive contribution is the unified text-to-text, encoder-decoder approach, which is particularly strong for summarisation, translation and other seq2seq tasks. Where BERT classifies and decoder models freely generate, T5 elegantly does both through one format — and FLAN-T5 keeps it competitive for instruction-following at efficient sizes.
Getting started
Load T5 (or, for instruction following, FLAN-T5) from Hugging Face with Transformers, format your input with the appropriate task prefix, and generate the output text. Start with Base or Large to prototype, fine-tune on your task for best results, and use FLAN-T5 for zero/few-shot use. Pick a size that matches your hardware, and and chunk any long inputs into smaller pieces to respect the model's context length.


