How do I calculate the cost of an OpenAI or Claude API?

Monthly cost = (requests/day × input tokens ÷ 1M × input price) + (requests/day × output tokens ÷ 1M × output price), all × 30.4 days. At 500 requests/day with 2,000 input and 500 output tokens on Claude Sonnet ($3/M in, $15/M out) that's roughly $205 every month — and it grows linearly with traffic.

What does it cost to run a GPU for inference?

Only electricity, if you already own the card. A GPU draws full power only while actively generating. A 4090 pulling 450W that's busy ~70 minutes a day at $0.15/kWh costs about 8 cents a day. If you'd have to buy a card purely for inference, add its price divided by the months you'll keep it (a $1,600 card over 3 years is ~$44/month).

← all tools

free tool · no signup · runs in your browser

Cloud API vs Your GPU: AI Cost Calculator

Punch in your daily request volume and token sizes, pick a cloud model and a GPU you own, and see the real monthly delta. Cloud APIs bill every token forever — a GPU you already paid for only costs electricity. Find your break-even.

your usage

requests / day500avg input tokens / requestavg output tokens / request

what you're comparing

cloud api modelyour gpuelectricity price ($ / kWh)estimated output speed (tok/s)

GPU busy time is estimated at 30 tok/s (7B–14B class) ≈ 2.3 h/day · 1.04 kWh/day.

cloud api · per month

$121.60

GPT-4.1

your gpu · electricity / month

$4.75

RTX 4090 (450W)

you save by self-hosting

$116.85

/ month

$1,402

/ year

That is the recurring bill you avoid — you already own the GPU, so electricity is the only marginal cost of running it.

cost at scale · monthly

Cloud spend scales linearly with every token. Your electricity scales too — but from a far lower base.

usage	cloud / mo	your gpu / mo	you save / mo
1× (500 req/day)	$121.60	$4.75	$116.85
5× (2,500 req/day)	$608.00	$23.75	$584.25
10× (5,000 req/day)	$1,216	$47.50	$1,169

How the calculation works

The cloud side is pure arithmetic. Every provider prices in dollars per million tokens, split into a cheaper input (prompt) rate and a pricier output (generation) rate. Your monthly cost is just:

(requests/day × input tokens ÷ 1M × input price) + (requests/day × output tokens ÷ 1M × output price), all × 30.4 days

At 500 requests a day with 2,000 input and 500 output tokens, you push 30.4M input and 7.6M output tokens a month. On Claude Sonnet 4.5 ($3/M in, $15/M out) that is roughly $91 + $114 = $205 every month, forever, and it grows linearly the moment your traffic does.

The cost of running your own GPU

Here is the part the comparison sites leave out: if you already own the GPU, the marginal cost of an extra token is electricity, not hardware. A card only draws full power while it is actively generating. We estimate generation throughput at 30 tokens/second for a 7B–14B class model — a realistic figure for an RTX 4070-class card running a Q4 GGUF — and compute how many GPU-hours per day your output volume needs:

GPU hours/day = (requests/day × output tokens) ÷ 30 tok/s ÷ 3600

Multiply by the card's sustained wattage and your electricity rate and you get the daily energy cost. A 4090 pulling 450W that is busy 70 minutes a day at $0.15/kWh costs about 8 cents. That is the whole bill.

But the GPU wasn't free

True — and that is the honest counter-argument. This tool deliberately treats the GPU as a sunk costbecause for most people it is: it is the card already in your gaming rig, workstation, or Mac. You are not buying hardware to save money; you are putting hardware you own to work instead of renting someone else's. If you would have to buy a card purely for inference, add its price divided by the months you expect to keep it (a $1,600 card over 3 years is ~$44/month) to the electricity figure before comparing.

Hidden costs on both sides

	cloud api	your gpu
marginal cost / token	metered, forever	electricity only
upfront	$0	card (often already owned)
data leaves your network	yes	no
rate limits / throttling	yes	no
idle cost	$0	$0 (card sleeps)
scales with traffic	linearly, painfully	flatly, cheaply

Cloud wins on near-zero usage and on giant frontier models you cannot run locally. Local wins on steady, high-volume, or privacy-sensitive workloads — coding agents, batch jobs, RAG pipelines that re-read the same documents thousands of times.

When does self-hosting pay off?

The break-even point is wherever the amber savings number above crosses zero. For light, bursty chat use, the API is genuinely cheaper and simpler. But the 5× and 10× rows in the table show how fast the cloud bill compounds. A single coding agent running 32k-context requests all day can burn through tens of millions of tokens a week — exactly the workload where a GPU you already own quietly wins.

Where Wide Area Intelligence fits

Wide Area Intelligence is the bridge: it exposes the GPU in your own machine as an OpenAI-compatible endpoint at https://wideareaai.com/api/v1, with automatic cloud failover for the occasional request that needs a frontier model. You get the electricity-only economics for the bulk of your traffic and the cloud as a safety net — without choosing one forever. Add a node from the dashboard in one command, no port forwarding.

Frequently asked questions

Is it cheaper to run AI locally or use a cloud API?: It depends on volume. Cloud APIs bill every token forever, so light, bursty chat use is genuinely cheaper and simpler in the cloud. But steady, high-volume workloads — coding agents, batch jobs, RAG pipelines — flip the math fast: if you already own the GPU, the marginal cost of a token is electricity, often pennies a day. Enter your usage above to see your break-even.
How do I calculate the cost of an OpenAI or Claude API?: Monthly cost = (requests/day × input tokens ÷ 1M × input price) + (requests/day × output tokens ÷ 1M × output price), all × 30.4 days. At 500 requests/day with 2,000 input and 500 output tokens on Claude Sonnet ($3/M in, $15/M out) that's roughly $205 every month — and it grows linearly with traffic.
What does it cost to run a GPU for inference?: Only electricity, if you already own the card. A GPU draws full power only while actively generating. A 4090 pulling 450W that's busy ~70 minutes a day at $0.15/kWh costs about 8 cents a day. If you'd have to buy a card purely for inference, add its price divided by the months you'll keep it (a $1,600 card over 3 years is ~$44/month).
Do I have to choose cloud or local forever?: No. Wide Area Intelligence exposes your own GPU as an OpenAI-compatible endpoint and fails over to cloud models automatically for the occasional request that needs a frontier model. You get electricity-only economics on the bulk of your traffic and the cloud as a safety net, behind one endpoint — free for 2 nodes.

Related reading: OpenAI API vs a gaming PC: the real cost. Ready to use that hardware? Turn your GPU into an OpenAI-compatible endpoint — free for 2 nodes.

/// wide area ai

These numbers are theory. Your GPU is real — put it on the network.

Wide Area Intelligence turns any machine with a GPU into an OpenAI-compatible endpoint — routed, cached, and failed over automatically. Free for 2 nodes.

Start routing — free →