How to use Claude Code with DeepSeek (v4-pro) via powapi
A 30-second setup that routes Claude Code through DeepSeek V4-pro for the bulk of your turns and escalates to Anthropic Sonnet when the task warrants it. No tool-side patches, no weekly cap, no custom client.
Why use DeepSeek under Claude Code?
Anthropic ships a fantastic CLI, but two things bite real users:
- Cost. Sonnet 4.6 is $3 / 1M input and $15 / 1M output. A busy day on Claude Code can easily burn through $5–$15 in tokens. The Pro plan's included quota disappears faster than most people expect.
- The weekly cap. Anthropic introduced weekly limits on top of the 5-hour rolling window. Hit them mid-week and you're locked out until the next reset.
DeepSeek V4-pro hits ~80% of Sonnet's SWE-bench score at roughly 1/6th the price. The catch: Claude Code talks Anthropic wire format, and DeepSeek (until recently) didn't. powapi bridges the gap — Claude Code keeps thinking it's hitting Anthropic, we route each turn to whichever model fits best.
Setup — 30 seconds, 2 environment variables
Sign up at powapi.io and mint a key from the Keys page. Then point Claude Code at our endpoint by setting two environment variables — either inline or in your shell rc file:
export ANTHROPIC_BASE_URL=https://api.powapi.io/claudecode
export ANTHROPIC_API_KEY=pk_live_yourkeyhere
# launch Claude Code as usual
claudeThat's it. Claude Code now routes every request through powapi. The dedicated /claudecode endpoint applies routing rules tuned for the Claude Code traffic profile (long system prompts, parallel sub-agents, tool calls, file edits).
Prefer Cursor, Cline, Roo, Aider, OpenCode or Continue? Each has its own endpoint — see the tool endpoints reference in the docs.
How the routing actually picks a model
powapi runs a small classifier on every incoming turn that scores the request along four axes:
- Difficulty — short edit vs multi-step reasoning vs architectural question.
- Context length — how much of the conversation actually matters for this turn.
- Modality — text only, or does it include images, PDFs, or large diff blocks?
- Tool intensity — single function call vs the model orchestrating 6 tools in sequence.
The classifier itself runs on a fast, cheap model (Llama-3.3 served by Cerebras) and takes <100ms. Based on its score, we pick from:
- DeepSeek V4-flash — trivial turns (renames, single-file edits, "what does this function do" lookups).
- DeepSeek V4-pro — the workhorse. Handles the vast majority of Claude Code traffic.
- Anthropic Sonnet 4.6 — escalation for long-context, vision, or turns where the verifier flagged a low-quality first attempt.
Every routing decision is recorded in your dashboard at /usage, so you can see exactly which models actually answered each request and what they cost.
Cost & latency numbers from production
Real numbers from powapi's production database after ~1500 Claude Code requests in May 2026 (one heavy daily user):
| Provider / model | Share of requests | Avg COGS / request | Avg latency (TTFT) |
|---|---|---|---|
| DeepSeek V4-pro | 83% | $0.0025 | ~1.1s |
| DeepSeek V4-flash | 15% | $0.0017 | ~0.6s |
| Anthropic Sonnet 4.6 | 2% | $0.226 | ~1.4s |
The Sonnet escalations are 2% of requests but 64% of total cost — that's the whole point of cascading. You pay Sonnet prices for the turns that need it, DeepSeek prices for everything else.
When does Sonnet take over?
Three triggers escalate a request to Sonnet, in priority order:
- Modality. Vision input or PDF parsing? Direct route to Sonnet — DeepSeek V4 is text-only.
- Context length. Over ~120K tokens in flight? Sonnet's 200K window handles it cleanly; DeepSeek >128K runs into degraded recall.
- Quality verifier. A small judge scores DeepSeek's response. If it's vague, contradicts the prompt, or hallucinates a function signature, we retry once with a critique prompt on the same adapter (cheap) and then escalate to Sonnet if still low.
The cascade is transparent — Claude Code sees a single successful response and never knows it came from a different provider on the inside.
Sub-agents, tool calls, and vision
Claude Code lean heavily on three Anthropic features. Here's how each one survives the DeepSeek bridge:
- Sub-agents. Each agent launched by Claude Code is a separate API request to powapi, so each one gets its own routing decision. A diff-summarizer agent might land on flash, a refactor agent on V4-pro, a "review this 50-file PR" agent on Sonnet — all in the same parent session.
- Tool calls. DeepSeek V4-pro implements the full Anthropic tool-use schema including
parallel_tool_use. Round-trips work the same as native Anthropic. - Vision & PDFs. Routed straight to Sonnet. Costs about 6× more than text turns but Claude Code uses vision sparingly (mostly for screenshots when debugging UIs), so it barely moves the monthly bill.
FAQ
Will Claude Code still feel like Claude with DeepSeek under the hood?▾
Does this work with tool calls and the new Claude Code agents?▾
How much does it cost compared to using Anthropic directly?▾
What happens when DeepSeek can't handle a request?▾
Is there a free tier or trial?▾
Sign up, mint a key, paste the two env vars above. The first request lands in under 30 seconds.