kill the per-token bill
Stop paying per token — route to your own GPU.
Every token on a hosted API is metered. Wide Area Intelligence serves repeated requests from an edge cache, runs the rest on GPUs you already own with no per-token fees, and treats the cloud as burst-only failover. OpenAI-compatible, credit-billed cloud only on the overflow.
/// where the tokens stop costing money
Repeats cost $0
Identical requests — the same prompt, model, and params — are served from Cloudflare's edge cache. The prompt never touches a GPU and never bills a token. Set your own TTL per account.
Your GPU, no per-token fee
Everything that misses the cache runs on hardware you already own. No metered tokens — your only marginal cost is the power bill. The plan and your electricity are the real costs; we never put a per-token fee on your own hardware.
Cloud is burst-only
The cloud bills against prepaid credits and fires only when your node can't serve. You pay cloud rates on the overflow, never on the baseline — burst carries the markup, the baseline doesn't.
/// how the routing works
Cache the repeats, own the rest. Pay cloud rates only on the overflow.
EDGE CACHE
An identical request you've served before is returned straight from Cloudflare's edge — the prompt never reaches a GPU. Per-account toggle, your own TTL.
YOUR HARDWARE
A one-line install turns any machine with a GPU into a node. It opens a secure Cloudflare Tunnel — no port forwarding, no static IP — and serves llama.cpp behind your gateway with no per-token fees.
CLOUD FAILOVER
Node busy, offline, or timed out? The same request silently re-routes to the cloud through our managed gateway, billed only against prepaid credits — burst only, never the baseline.
/// drop-in integration
Bring your own key.
Change one URL.
Keep the OpenAI SDK and your gateway key — just point the base URL at us. The cache absorbs repeats, your hardware serves the rest with no per-token fees, and cloud burst (prepaid credits) catches only what's left over.
- Change one line: the base URL.
- Works with the OpenAI SDK, LangChain, agents, curl — anything OpenAI-compatible.
- Bring your own gateway key; routing, caching, and failover are automatic.
from openai import OpenAI client = OpenAI( base_url="https://wideareaai.com/api/v1", api_key="wai_sk_…", ) resp = client.chat.completions.create( model="llama-3.1-8b-instruct", messages=[…], )
/// initialize
Your hardware is already paid for. Stop renting tokens.
no credit card · 2 nodes free, forever · openai-compatible