What is Llama 2?
Llama 2 is Meta's landmark open large language model, whose 2023 release reshaped the AI landscape. Available in 7B, 13B and 70B parameter sizes — each with a base model and a chat-tuned Llama-2-Chat variant — it combined strong, near-leading quality with a broadly permissive licence that allowed commercial use. That combination was transformative: Llama 2 became the foundation for thousands of fine-tuned models and applications, effectively launching the modern open-LLM ecosystem that everything since has built upon.
How it was built
Llama 2 is a decoder-only transformer trained on roughly 2 trillion tokens of public text, with the chat variants further refined through supervised fine-tuning and reinforcement learning from human feedback (RLHF) for helpfulness and safety. Meta invested heavily in alignment and red-teaming, publishing detailed work on safety. The result was a family that delivered competitive performance — the 70B in particular rivalled much larger or closed models of its day — across a 4K-token context.
What it is good at
Llama 2 is a strong general-purpose model: conversational chat, question answering, summarisation, reasoning, writing and some coding, with the Chat variants tuned for assistant behaviour. Its real superpower, though, is as a foundation to build on — its open weights and permissive licence made it the base for countless domain models, fine-tunes and products, and it runs across a wide range of hardware thanks to the 7B/13B/70B size ladder.
Licensing & access
Llama 2 is released under the Llama 2 Community License — permissive for the vast majority of users and commercial use, with a special-case clause for services exceeding 700M monthly active users (review the terms). Weights are on Hugging Face (with a quick access request), run locally via Ollama and Transformers, and are offered by many inference providers. The 7B runs on consumer GPUs; the 70B needs multi-GPU or quantisation.
Practical considerations
By today's standards Llama 2 is superseded by Llama 3 and other newer models on reasoning, knowledge and especially context length (its 4K window is short by current norms). Use the Chat variant for assistants and the base for fine-tuning, and verify outputs as it can hallucinate. For new projects, Llama 3.x or another current model is usually the better default — but Llama 2 remains historically pivotal and still widely deployed.
How it compares
Llama 2 competed with Falcon, MPT and BLOOM at release, generally leading on quality while offering a workable commercial licence; Vicuna and many others were fine-tuned from it. Mixtral and later models pushed efficiency further with mixture-of-experts. Llama 2's defining role is as the catalyst of the open-LLM movement — the base that proved open models could be both capable and commercially usable, paving the way for its successors.
Getting started
Pull Llama 2 from Hugging Face (after the access request) or via Ollama for instant local chat — start with Llama-2-7b-chat to prototype, or 70B-chat for more quality. Use the Chat variant for assistants and the base for fine-tuning, run quantised builds to fit your GPU, and for the strongest results today consider benchmarking against Llama 3.x or other newer open models.


