wide area network → wide area intelligence

Your hardware is
the datacenter.
The cloud is the failover.

Wide Area Intelligence is an OpenAI-compatible gateway that routes every request to your own hardware first — office workstations, on-prem servers, lab machines — and silently fails over to the cloud when it has to. Your apps never know the difference — and requests served by your own hardware carry no per-token fees.

[ 2 nodes free ][ openai-compatible ][ no port forwarding ]
live route topology routing
YOURAPPWIDE AREAGATEWAYcache · route · failoverYOUR GPUno token feesCLOUDfailover onlyopenai sdkPRIORITY 1 · tunnelPRIORITY 2 · openrouter
dstviametricstate
inference/*node-01.tunnel0UP
inference/*openrouter.ai100STANDBY
cache/hitedge.kv38.2%
$0.00
per-token fees on your own hardware
<50ms
edge routing overhead
3 hops
cache → local → cloud
100%
openai sdk compatible

/// powering inference across the portfolio

/// how it works

Three-tier routing. Zero code changes.

01metric 0 · ~10ms

EDGE CACHE

Identical request seen before? Served instantly from Cloudflare's edge — the prompt never touches a GPU. Toggle it per-account, set your own TTL.

02metric 1 · no token fees

YOUR HARDWARE

A one-line Docker command turns any machine with a GPU into a node. It opens a secure Cloudflare Tunnel — no port forwarding, no static IP — and serves llama.cpp behind your gateway.

03metric 2 · always up

CLOUD FAILOVER

Node offline? Timed out? The same request silently re-routes to the cloud — GPT, Gemini, Claude — through our managed gateway, billed only against your prepaid credits. If one provider has an outage, the gateway fails over to the next. Your app sees one response, every time.

/// node setup

One command.
Any machine. Any OS.

The installer finds Docker, Podman, or Colima — or sets one up for you — then starts an agent bundling llama.cpp and a Cloudflare Tunnel. Run it on a workstation, a Mac Studio, or a rack server in the office: it registers itself and starts serving traffic in about a minute.

node-01 · office
$ curl -fsSL https://wideareaai.com/install.sh \
    | sh -s -- --key wai_node_…

[wai] Platform: linux
[wai] Using container runtime: docker
[wai] NVIDIA GPU passthrough enabled
✓ tunnel established
✓ node registered · node-01 ONLINE

/// app integration

Keep your SDK.
Change one URL.

Anything that speaks the OpenAI API speaks Wide Area Intelligence — Python, TypeScript, LangChain, agents, curl. Point it at the gateway and routing, caching, and failover happen automatically.

app.py
from openai import OpenAI

client = OpenAI(
    base_url="https://wideareaai.com/api/v1",
    api_key="wai_sk_…",
)

resp = client.chat.completions.create(
    model="llama-3.1-8b-instruct",
    messages=[…],
)

/// pricing

Own your baseline. Burst to the cloud.

Your GPUs serve the baseline with no per-token fees — your costs are the plan and the power bill. Cloud failover is prepaid credits, billed per request only on the overflow. The plans below cover nodes, request volume, and team features.

STARTER

$0forever
  • 2 compute nodes
  • 100k requests / mo
  • Exact-match edge cache
  • Cloud failover (credits)
  • Community support
Start free
most popular

PRO

$15/ month
  • 3 nodes included
  • +$5 / mo per extra node
  • 1M requests / mo
  • Request analytics
  • Budget alerts
  • Email support
Go Pro

TEAM

$49/ month
  • 10 nodes included
  • +$5 / mo per extra node
  • 5M requests / mo
  • Virtual keys w/ budgets
  • Priority support
Scale up

/// initialize

Your hardware is already paid for.
Put it on the network.

Create your gateway →