kill the per-token bill

Stop paying per token — route to your own GPU.

Every token on a hosted API is metered. Wide Area Intelligence serves repeated requests from an edge cache, runs the rest on GPUs you already own with no per-token fees, and treats the cloud as burst-only failover. OpenAI-compatible, credit-billed cloud only on the overflow.

Cut my token bill — free

[ no per-token fees on local ][ edge-cached repeats ][ 2 nodes free ]

/// where the tokens stop costing money

Repeats cost $0

Identical requests — the same prompt, model, and params — are served from Cloudflare's edge cache. The prompt never touches a GPU and never bills a token. Set your own TTL per account.

Your GPU, no per-token fee

Everything that misses the cache runs on hardware you already own. No metered tokens — your only marginal cost is the power bill. The plan and your electricity are the real costs; we never put a per-token fee on your own hardware.

Cloud is burst-only

The cloud bills against prepaid credits and fires only when your node can't serve. You pay cloud rates on the overflow, never on the baseline — burst carries the markup, the baseline doesn't.

/// how the routing works

Cache the repeats, own the rest. Pay cloud rates only on the overflow.

01metric 0 · ~10ms

EDGE CACHE

An identical request you've served before is returned straight from Cloudflare's edge — the prompt never reaches a GPU. Per-account toggle, your own TTL.

02metric 1 · no token fees

YOUR HARDWARE

A one-line install turns any machine with a GPU into a node. It opens a secure Cloudflare Tunnel — no port forwarding, no static IP — and serves llama.cpp behind your gateway with no per-token fees.

03metric 2 · always up

CLOUD FAILOVER

Node busy, offline, or timed out? The same request silently re-routes to the cloud through our managed gateway, billed only against prepaid credits — burst only, never the baseline.

/// drop-in integration

Bring your own key.
Change one URL.

Keep the OpenAI SDK and your gateway key — just point the base URL at us. The cache absorbs repeats, your hardware serves the rest with no per-token fees, and cloud burst (prepaid credits) catches only what's left over.

Change one line: the base URL.
Works with the OpenAI SDK, LangChain, agents, curl — anything OpenAI-compatible.
Bring your own gateway key; routing, caching, and failover are automatic.

app.py

from openai import OpenAI

client = OpenAI(
    base_url="https://wideareaai.com/api/v1",
    api_key="wai_sk_…",
)

resp = client.chat.completions.create(
    model="llama-3.1-8b-instruct",
    messages=[…],
)

/// initialize

Your hardware is already paid for. Stop renting tokens.