Skip to content
Guide · 8 min read

How to use Claude Code with DeepSeek (v4-pro) via powapi

A 30-second setup that routes Claude Code through DeepSeek V4-pro for the bulk of your turns and escalates to Anthropic Sonnet when the task warrants it. No tool-side patches, no weekly cap, no custom client.

Why use DeepSeek under Claude Code?

Anthropic ships a fantastic CLI, but two things bite real users:

  • Cost. Sonnet 4.6 is $3 / 1M input and $15 / 1M output. A busy day on Claude Code can easily burn through $5–$15 in tokens. The Pro plan's included quota disappears faster than most people expect.
  • The weekly cap. Anthropic introduced weekly limits on top of the 5-hour rolling window. Hit them mid-week and you're locked out until the next reset.

DeepSeek V4-pro hits ~80% of Sonnet's SWE-bench score at roughly 1/6th the price. The catch: Claude Code talks Anthropic wire format, and DeepSeek (until recently) didn't. powapi bridges the gap — Claude Code keeps thinking it's hitting Anthropic, we route each turn to whichever model fits best.

Setup — 30 seconds, 2 environment variables

Sign up at powapi.io and mint a key from the Keys page. Then point Claude Code at our endpoint by setting two environment variables — either inline or in your shell rc file:

text
export ANTHROPIC_BASE_URL=https://api.powapi.io/claudecode
export ANTHROPIC_API_KEY=pk_live_yourkeyhere

# launch Claude Code as usual
claude

That's it. Claude Code now routes every request through powapi. The dedicated /claudecode endpoint applies routing rules tuned for the Claude Code traffic profile (long system prompts, parallel sub-agents, tool calls, file edits).

Prefer Cursor, Cline, Roo, Aider, OpenCode or Continue? Each has its own endpoint — see the tool endpoints reference in the docs.

How the routing actually picks a model

powapi runs a small classifier on every incoming turn that scores the request along four axes:

  • Difficulty — short edit vs multi-step reasoning vs architectural question.
  • Context length — how much of the conversation actually matters for this turn.
  • Modality — text only, or does it include images, PDFs, or large diff blocks?
  • Tool intensity — single function call vs the model orchestrating 6 tools in sequence.

The classifier itself runs on a fast, cheap model (Llama-3.3 served by Cerebras) and takes <100ms. Based on its score, we pick from:

  • DeepSeek V4-flash — trivial turns (renames, single-file edits, "what does this function do" lookups).
  • DeepSeek V4-pro — the workhorse. Handles the vast majority of Claude Code traffic.
  • Anthropic Sonnet 4.6 — escalation for long-context, vision, or turns where the verifier flagged a low-quality first attempt.

Every routing decision is recorded in your dashboard at /usage, so you can see exactly which models actually answered each request and what they cost.

Cost & latency numbers from production

Real numbers from powapi's production database after ~1500 Claude Code requests in May 2026 (one heavy daily user):

Provider / modelShare of requestsAvg COGS / requestAvg latency (TTFT)
DeepSeek V4-pro83%$0.0025~1.1s
DeepSeek V4-flash15%$0.0017~0.6s
Anthropic Sonnet 4.62%$0.226~1.4s

The Sonnet escalations are 2% of requests but 64% of total cost — that's the whole point of cascading. You pay Sonnet prices for the turns that need it, DeepSeek prices for everything else.

When does Sonnet take over?

Three triggers escalate a request to Sonnet, in priority order:

  1. Modality. Vision input or PDF parsing? Direct route to Sonnet — DeepSeek V4 is text-only.
  2. Context length. Over ~120K tokens in flight? Sonnet's 200K window handles it cleanly; DeepSeek >128K runs into degraded recall.
  3. Quality verifier. A small judge scores DeepSeek's response. If it's vague, contradicts the prompt, or hallucinates a function signature, we retry once with a critique prompt on the same adapter (cheap) and then escalate to Sonnet if still low.

The cascade is transparent — Claude Code sees a single successful response and never knows it came from a different provider on the inside.

Sub-agents, tool calls, and vision

Claude Code lean heavily on three Anthropic features. Here's how each one survives the DeepSeek bridge:

  • Sub-agents. Each agent launched by Claude Code is a separate API request to powapi, so each one gets its own routing decision. A diff-summarizer agent might land on flash, a refactor agent on V4-pro, a "review this 50-file PR" agent on Sonnet — all in the same parent session.
  • Tool calls. DeepSeek V4-pro implements the full Anthropic tool-use schema including parallel_tool_use. Round-trips work the same as native Anthropic.
  • Vision & PDFs. Routed straight to Sonnet. Costs about 6× more than text turns but Claude Code uses vision sparingly (mostly for screenshots when debugging UIs), so it barely moves the monthly bill.

FAQ

Will Claude Code still feel like Claude with DeepSeek under the hood?
Yes. powapi exposes an Anthropic-compatible API surface — Claude Code thinks it's talking to Claude. We route most turns to DeepSeek V4-pro for cost and escalate to Anthropic Sonnet when the task warrants it. The cascade is transparent to the client.
Does this work with tool calls and the new Claude Code agents?
Yes. DeepSeek V4-pro implements the Anthropic tool-use schema, including parallel tool calls. Sub-agents launched by Claude Code work the same way — each agent gets its own routing decision. Vision inputs are automatically routed to Sonnet.
How much does it cost compared to using Anthropic directly?
DeepSeek V4-pro is roughly 6× cheaper than Claude Sonnet 4.6 per token. The powapi Starter plan ($9/mo) gives 750K tokens per 5h window with no weekly cap — comparable headroom to Anthropic Pro ($20/mo) at less than half the price.
What happens when DeepSeek can't handle a request?
powapi watches the response. If DeepSeek returns a vague or low-quality answer, a verifier scores it and we either retry with a critique prompt on the same provider or escalate to Sonnet automatically. The client sees one successful response.
Is there a free tier or trial?
powapi is in public beta. The $9 Starter plan is the entry point. Cancel anytime, billed monthly. Pay-per-token mode lets you load a prepaid balance and only pay for what you use.
Try it on your next coding session

Sign up, mint a key, paste the two env vars above. The first request lands in under 30 seconds.

Published 2026-05-20. Last updated 2026-05-20. Written by the powapi team. Feedback: hi@powapi.io.