Claude is one of the most capable AI models for writing, coding, and careful reasoning, and its API is straightforward to use from Python. This tutorial gets you from an empty file to streaming responses and step-by-step thinking, with current 2026 model names and the official Anthropic SDK.
You will learn to authenticate, send a message, steer behavior with a system prompt, hold a multi-turn conversation, stream output, and turn on extended thinking for harder tasks.
What Is the Claude API?
The Claude API, from Anthropic, gives you programmatic access to the Claude family of models through a single endpoint. You send a list of messages and get a reply back. The official anthropic Python library wraps all of this, so you write clean Python instead of raw HTTP requests.
Step 1: Get a Key and Install the SDK
Sign up at the Anthropic console, create an API key, and keep it private. Then install the SDK:
pip install anthropic
Step 2: Send Your First Message
Create claude_demo.py. Load the key from an environment variable and call the model:
import os
from anthropic import Anthropic
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
message = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain what an API is in one sentence."},
],
)
print(message.content[0].text)
Two things to note. The max_tokens value caps the length of the reply, and the response text lives at message.content[0].text. The model id claude-opus-4-8 selects the most capable model in 2026 — more on choosing models below.
Step 3: Steer Behavior with a System Prompt
The system parameter sets the assistant's role and tone for the whole conversation:
message = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
system="You are a concise, friendly coding tutor. Use simple examples.",
messages=[{"role": "user", "content": "What is a REST API?"}],
)
print(message.content[0].text)
Step 4: Hold a Multi-Turn Conversation
Claude is stateless between calls, so you pass the running history each time. Alternate user and assistant roles:
messages = [
{"role": "user", "content": "Suggest a name for a weather app."},
{"role": "assistant", "content": "How about SkyCast?"},
{"role": "user", "content": "Give me three more in that style."},
]
message = client.messages.create(
model="claude-opus-4-8",
max_tokens=512,
messages=messages,
)
print(message.content[0].text)
To keep a chat going, append the model's reply as an assistant message, then add the next user message, and call again.
Step 5: Stream the Response
For chat interfaces, stream the reply so it appears as it is written:
with client.messages.stream(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a haiku about clean code."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
Streaming also avoids request timeouts on long replies, so it is a good default for anything that may produce a lot of output.
Step 6: Turn On Thinking for Hard Problems
For planning, math, or tricky debugging, enable extended thinking so the model reasons before answering:
message = client.messages.create(
model="claude-opus-4-8",
max_tokens=2048,
thinking={"type": "adaptive"},
messages=[{"role": "user", "content": "Plan the steps to migrate a SQLite database to Postgres."}],
)
print(message.content[-1].text)
Adaptive thinking lets the model spend more effort on harder requests and less on easy ones, which keeps quality high without you tuning a budget by hand.
Add a Retry for Reliability
Network calls fail sometimes, and busy periods can return a rate-limit error. A small retry with a short pause keeps your script steady without extra libraries:
import time
from anthropic import Anthropic, APIStatusError
client = Anthropic()
def ask(prompt, retries=3):
for attempt in range(retries):
try:
msg = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}],
)
return msg.content[0].text
except APIStatusError:
if attempt == retries - 1:
raise
time.sleep(2 * (attempt + 1))
print(ask("Name three classic sorting algorithms."))
Real-World Use Cases
- Coding assistant: explain errors, draft tests, and refactor snippets straight from your editor or a script.
- Content drafting: outline articles, rewrite text for tone, or summarize long documents in one call.
- Support automation: answer common questions from your own docs, with a system prompt that sets the rules.
- Data extraction: turn messy text into clean structured fields for the rest of your pipeline.
Each of these is the same messages.create call you already wrote — only the prompt and model change, which makes it easy to grow from one script into a real feature.
Choosing a Model in 2026
Anthropic offers a few models so you can trade capability for speed and cost:
- claude-opus-4-8 — the most capable, for the hardest reasoning, coding, and long agent tasks.
- claude-sonnet-4-6 — a balanced everyday model that is faster and cheaper than Opus.
- claude-haiku-4-5 — the fastest and lowest cost, great for high-volume, simpler calls.
Start on Opus while you build, then drop to Sonnet or Haiku for parts of your app where speed and price matter more than peak quality. Check the official pricing page for current per-token rates before you scale.
Common Errors and Fixes
- 401 authentication error: the key is missing or wrong. Load it from
ANTHROPIC_API_KEYrather than pasting it into the file. - max_tokens too low: replies get cut off. Raise the limit for longer answers.
- Rate limits: add a short retry with backoff, and stream long responses to avoid timeouts.
- Wrong content access: remember the text is at
message.content[0].text, not on the message directly.
Frequently Asked Questions
Is the Claude API free?
It is a paid API, though new accounts usually include some starting credit. Costs scale with the model you pick and how many tokens you use, so Haiku is the budget choice for high volume.
Do I need to send the whole conversation every time?
Yes. The API is stateless, so you include the message history on each call. Keep only what the model needs to stay cheap and fast.
What does the thinking parameter do?
It lets Claude reason internally before replying, which improves accuracy on math, planning, and complex coding. Adaptive mode scales that effort to the task.
Which language SDKs exist?
Anthropic ships official SDKs for Python and TypeScript among others, plus a plain REST endpoint you can call from any language.
How do I keep my API key safe?
Store it in an environment variable or a secrets manager, load it at runtime, and never commit it to a repository. If a key ever leaks, revoke it in the console and generate a new one right away.
A Few Prompting Tips
The model is only as good as what you ask. A few habits make a big difference. Put clear instructions in the system prompt rather than repeating them every turn. Ask for a specific format — a list, JSON, or a table — when you need to parse the output. Give one short example of the result you want, since a single example often guides the model better than a long description. And when an answer misses, say what was wrong and ask for a revision instead of starting over; Claude handles follow-up corrections well, which keeps your conversation efficient and your token use low.
Wrapping Up
You can now use Claude from Python end to end: a basic message, system prompts, multi-turn chat, streaming, extended thinking, and a sensible way to choose between Opus, Sonnet, and Haiku. That covers the foundation for assistants, coding helpers, and agent tools.
Want to see how Claude stacks up against other assistants and where it fits your stack? Explore AI tools and assistants at Free API Hub and plan your build.



