Skip to main content

API Client

The network layer — how Claude Code talks to the model, handles errors, and supports multiple providers.

Providers

One interface, four backends. The provider is selected by environment variable:

Your Code
Query Engine
Unified Client
API Client Interface
Backends
Anthropic Direct
AWS Bedrock
Google Vertex
Foundry
Env VariableProviderAuth Method
ANTHROPIC_API_KEYAnthropic DirectAPI key or OAuth
CLAUDE_CODE_USE_BEDROCK=1AWS BedrockAWS credential chain (env, profile, IAM, SSO)
CLAUDE_CODE_USE_VERTEX=1Google Vertex AIGOOGLE_APPLICATION_CREDENTIALS or gcloud
Foundry configFoundryFoundry credentials

Request Shape

Every API call sends the system prompt, conversation history, and tool definitions:

TypeScript
{
model: "claude-sonnet-4-20250514",
system: assembledSystemPrompt, // ~10K+ tokens
messages: conversationHistory, // grows each turn
tools: toolDefinitions, // 44 tool schemas
stream: true,
max_tokens: responseLimit,
}
Python
response = client.messages.create(
model="claude-sonnet-4-20250514",
system=assembled_system_prompt,
messages=conversation_history,
tools=tool_definitions,
stream=True,
max_tokens=response_limit,
)

Streaming

Watch how SSE events arrive from the API and map to what you see in the terminal:

POST /v1/messages (stream: true)streaming
SSE Events
Terminal Output
Claude

Text renders progressively as content_block_delta events arrive. Tool calls appear inline as they're detected. Tool input JSON arrives in chunks and is accumulated via a partial JSON parser until complete.

Error Handling & Retries

ErrorWhat HappensStrategy
429 Rate LimitWait and retryExponential backoff with jitter (1s → 2s → 4s)
5xx Server ErrorRetry up to 3 timesSame backoff strategy
Network ErrorRetry with delayIncreasing intervals
401 UnauthorizedRe-authenticatePrompt for API key or re-launch OAuth
Context Too LargeTrigger compactionSummarize old messages, retry
Thundering herd prevention

The backoff includes random jitter — each retry waits a slightly different amount of time. This prevents many clients from retrying at the exact same moment after an outage.

Cost Tracking

Every request tracks token usage, feeding into:

MetricWhatWhere It Shows
Input tokensSystem prompt + messages + tool schemasToken budget UI
Output tokensModel response text + tool callsToken budget UI
Cache tokensPrompt caching reads/writesAnalytics

Key Source Files

FilePurpose
src/services/api/claude.tsMain API client wrapper
src/services/api/bootstrap.tsUser profile and session data
src/services/api/Provider-specific clients (20+ files)
src/utils/auth.tsAuthentication logic