Claude Fable 5 is Anthropic's most capable widely released model, built for the hardest reasoning and long-horizon agentic work. It behaves a little differently from the Opus-tier models, so this 2026 guide walks through what is new, how to call it in Python, and when it is worth its higher price.
What Is Claude Fable 5?
Claude Fable 5 (model id claude-fable-5) sits at the top of Anthropic's lineup. It is aimed at demanding tasks: deep multi-step reasoning, complex coding, and agents that run for a long time. You reach it through the same Messages API as the rest of the Claude family, so if you have used Claude before, the call shape is familiar — with a few important differences below.
What Is New Compared to Opus-Tier Models
- Thinking is always on. You do not send a
thinkingparameter; the model reasons internally by default. You control how hard it thinks with an effort level instead of a token budget. - 1M token context. The context window is one million tokens, which is both the maximum and the default, with up to 128K tokens of output.
- A new tokenizer. The same text counts as roughly 30% more tokens than on Opus-tier models, so re-measure rather than reusing old token counts.
- A refusal stop reason. Safety classifiers can decline a request, returning
stop_reason: "refusal"instead of normal content, so check the stop reason before reading the reply. - No assistant prefill and no sampling knobs like temperature — those return an error on this model.
Prerequisites
- Python 3.10 or newer and an Anthropic API key.
- An organization with at least 30-day data retention — Fable 5 is not available under zero data retention.
Step 1: Install the SDK and Set Your Key
pip install anthropic
Store your key in an environment variable so it never lives in your code:
export ANTHROPIC_API_KEY="your-key-here"
Step 2: Make Your First Call
Create fable_demo.py. Notice there is no thinking parameter — it is always on, so you simply omit it:
import os
from anthropic import Anthropic
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
message = client.messages.create(
model="claude-fable-5",
max_tokens=4096,
messages=[
{"role": "user", "content": "Plan a safe migration from SQLite to Postgres."},
],
)
print(message.content[-1].text)
The final answer is the last text block, so read message.content[-1].text. Because thinking is on, earlier blocks may hold reasoning that is omitted by default.
Step 3: Control How Hard It Thinks
Instead of a thinking budget, you set an effort level — from low through high, xhigh, and max. Use a lower level for routine work and a higher one for genuinely hard problems:
message = client.messages.create(
model="claude-fable-5",
max_tokens=8192,
output_config={"effort": "high"},
messages=[
{"role": "user", "content": "Design a token-bucket rate limiter and explain the trade-offs."},
],
)
print(message.content[-1].text)
Higher effort means deeper reasoning and longer responses, so match it to the task rather than always reaching for the maximum.
Step 4: Always Check the Stop Reason
Fable 5 can decline a request with a refusal stop reason. Check it before you read the content so your app handles that case cleanly:
message = client.messages.create(
model="claude-fable-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Summarize this contract."}],
)
if message.stop_reason == "refusal":
print("The model declined this request.")
else:
print(message.content[-1].text)
Step 5: Stream Long Responses
Hard tasks can run for a while, so stream the output to show progress and avoid request timeouts:
with client.messages.stream(
model="claude-fable-5",
max_tokens=4096,
messages=[{"role": "user", "content": "Write a short essay on why clean code matters."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
A Note on Token Counts
Because the tokenizer changed, do not reuse token counts or max_tokens settings measured on other Claude models. Re-baseline with the token-counting endpoint before you size requests:
count = client.messages.count_tokens(
model="claude-fable-5",
messages=[{"role": "user", "content": "How many tokens is this?"}],
)
print(count.input_tokens)
Pricing and When to Use Fable 5
Claude Fable 5 costs about $10 per million input tokens and $50 per million output tokens, which is higher than the Opus tier. That makes it the right tool for the hardest reasoning, the most complex coding, and long-running agents — work where quality clearly justifies the cost. For everyday tasks, Opus 4.8 (around $5 input and $25 output) or the cheaper Sonnet and Haiku models are usually the better value. A common pattern is to draft on a smaller model and reserve Fable 5 for the parts that truly need it.
Common Mistakes to Avoid
- Sending a thinking parameter. It is always on; an explicit thinking config returns an error. Omit it.
- Passing temperature or other sampling knobs. They are not supported on this model and will error.
- Reusing old token math. The new tokenizer changes counts, so re-measure with count_tokens.
- Ignoring the stop reason. Check for a refusal before reading content, and discard any partial output on a mid-stream refusal.
Frequently Asked Questions
How is Fable 5 different from Opus 4.8?
Fable 5 is more capable on the hardest tasks but costs more, has always-on thinking controlled by effort levels, a 1M context window, and a new tokenizer. Opus 4.8 is cheaper and great for most work.
Do I need to turn thinking on?
No. Thinking is on by default; you only choose how much effort to spend. Sending a thinking parameter returns an error.
Why did my request return no content?
Check stop_reason. A value of "refusal" means a safety classifier declined the request, so there is no usable content to read.
Can I use the same code as my other Claude models?
Mostly. Use the model id claude-fable-5, drop the thinking and temperature parameters, set an effort level if needed, and re-measure your token counts.
Wrapping Up
Claude Fable 5 brings Anthropic's strongest reasoning to the same Messages API you already know, with a few key changes: always-on thinking with effort levels, a 1M context window, a new tokenizer, and a refusal stop reason to handle. Reach for it on the hard problems, lean on cheaper models for the rest, and you will get the most from it without overspending.
Want to compare Claude models and other assistants before you commit? Explore AI tools and assistants at Free API Hub and plan your build.



