← all tools

free tool · no signup · runs in your browser

GPU Power Cost Calculator

Pick your graphics card, set how many hours a day it actually works, and your electricity rate — and see exactly what it costs to run, daily, monthly, and yearly. The same math that tells you whether a home AI node is cheaper than the cloud.

cost / day

$0.59

3.95 kWh

cost / month

$18

120 kWh

cost / year

$216

1,442 kWh

That monthly bill is the cloud-API equivalent of roughly 24M GPT-4o-mini tokens per month — at a blended $0.75 / million. Run more than that through your own card and the electricity is already cheaper than the API.

cost by daily duty cycle

Full-load hours only (card off the rest of the day), for NVIDIA RTX 4090 at $0.15/kWh.

duty cycleenergy / daycost / daycost / monthcost / year
4h / day1.8 kWh$0.27$8.21$99
8h / day3.6 kWh$0.54$16$197
12h / day5.4 kWh$0.81$25$296
24h / day10.8 kWh$1.62$49$591

How the calculation works

Electricity is billed in kilowatt-hours (kWh): one kWh is 1,000 watts drawn for one hour. A GPU pulling 450 watts for one hour uses 0.45 kWh. Multiply by your rate in $/kWh and you have the cost. This tool splits the day into three buckets — full load(you're running inference or training), idle (the machine is on but the GPU is sitting at desktop draw), and off— because a card that's "not doing anything" is rarely drawing zero.

The formula per day is: (load_hours × load_watts + idle_hours × idle_watts) ÷ 1000 × rate. We use 30.4 days per month and 365 per year so the monthly and yearly numbers line up.

TDP vs. real-world draw

The wattage we list per card is its TDP(thermal design power) — roughly the sustained board power under heavy load. Real draw is often a bit lower for LLM inference, which is usually memory-bandwidth bound rather than compute bound: a 4090 doing token generation may sit well under its 450W rating because the shader cores aren't pinned. Short bursts can also spike above TDP. TDP is the right number for a conservative estimate; if you have a wall meter, plug the measured figure into the custom watts option.

Undervolting cuts the bill

Most modern GPUs run far past their efficiency sweet spot from the factory. Undervolting — lowering the voltage at a given clock — commonly cuts 15-25% off power draw for a 2-5% performance loss, which on a card running many hours a day pays for itself in lower bills and lower heat. For an always-on node, capping the power limit (e.g. nvidia-smi -pl 300 on a 4090) is the simplest lever.

Why idle draw matters for 24/7 nodes

If you leave a machine on around the clock to serve requests, the idle hours dominate the bill. A card idling at 22W for 16 hours a day still burns ~0.35 kWh daily doing nothing — over a year that's real money before a single token is generated. This is why the idle figure is a first-class input here: for an always-available endpoint, shaving idle watts (and letting the box sleep when truly unused) often saves more than optimizing the busy hours.

The cheapest watt is the one you were already paying for. A GPU you keep powered for gaming or work is mostly sunk cost — running inference in the gaps is nearly free.

The night-shift argument

The economic trick for self-hosting is filling idle time with work that would otherwise cost cloud money. If your card is on anyway, batching bulk inference into the hours it'd be idle converts wasted standby power into useful output. Wide Area Intelligence's batch night-shiftdoes exactly this — it runs queued bulk jobs on idle nodes — so the kWh you're already spending produces results instead of heat. Compare the "equivalent tokens" line above against your actual usage: past that break-even, your own GPU is the cheaper machine.

FAQ

What does it cost to run an RTX 4090 24/7?At 450W full load around the clock and the US-average $0.15/kWh, that's about $1.62/day, ~$49/month, ~$591/year — though no node runs at 100% load all day, so set realistic load and idle hours above for your number.

Does this include the rest of the PC?No — it's GPU board power only. Add roughly 50-100W for the CPU, motherboard, and fans if you want a whole-system figure (use custom watts).

Are Apple Silicon numbers really that low?Yes. Unified- memory Macs draw a fraction of a discrete GPU's power for LLM work, which is part of why they make efficient always-on nodes.

/// wide area ai

These numbers are theory. Your GPU is real — put it on the network.

Wide Area Intelligence turns any machine with a GPU into an OpenAI-compatible endpoint — routed, cached, and failed over automatically. Free for 2 nodes.

Start routing — free →