wide area network → wide area intelligence
Your hardware is
the datacenter.
The cloud is the failover.
Wide Area Intelligence is an OpenAI-compatible gateway that routes every request to your own hardware first — office workstations, on-prem servers, lab machines — and silently fails over to the cloud when it has to. Your apps never know the difference — and requests served by your own hardware carry no per-token fees.
| dst | via | metric | state |
|---|---|---|---|
| inference/* | node-01.tunnel | 0 | UP |
| inference/* | openrouter.ai | 100 | STANDBY |
| cache/hit | edge.kv | — | 38.2% |
/// powering inference across the portfolio
/// how it works
Three-tier routing. Zero code changes.
EDGE CACHE
Identical request seen before? Served instantly from Cloudflare's edge — the prompt never touches a GPU. Toggle it per-account, set your own TTL.
YOUR HARDWARE
A one-line Docker command turns any machine with a GPU into a node. It opens a secure Cloudflare Tunnel — no port forwarding, no static IP — and serves llama.cpp behind your gateway.
CLOUD FAILOVER
Node offline? Timed out? The same request silently re-routes to the cloud — GPT, Gemini, Claude — through our managed gateway, billed only against your prepaid credits. If one provider has an outage, the gateway fails over to the next. Your app sees one response, every time.
/// node setup
One command.
Any machine. Any OS.
The installer finds Docker, Podman, or Colima — or sets one up for you — then starts an agent bundling llama.cpp and a Cloudflare Tunnel. Run it on a workstation, a Mac Studio, or a rack server in the office: it registers itself and starts serving traffic in about a minute.
$ curl -fsSL https://wideareaai.com/install.sh \ | sh -s -- --key wai_node_… [wai] Platform: linux [wai] Using container runtime: docker [wai] NVIDIA GPU passthrough enabled ✓ tunnel established ✓ node registered · node-01 ONLINE
/// app integration
Keep your SDK.
Change one URL.
Anything that speaks the OpenAI API speaks Wide Area Intelligence — Python, TypeScript, LangChain, agents, curl. Point it at the gateway and routing, caching, and failover happen automatically.
from openai import OpenAI client = OpenAI( base_url="https://wideareaai.com/api/v1", api_key="wai_sk_…", ) resp = client.chat.completions.create( model="llama-3.1-8b-instruct", messages=[…], )
/// pricing
Own your baseline. Burst to the cloud.
Your GPUs serve the baseline with no per-token fees — your costs are the plan and the power bill. Cloud failover is prepaid credits, billed per request only on the overflow. The plans below cover nodes, request volume, and team features.
STARTER
- 2 compute nodes
- 100k requests / mo
- Exact-match edge cache
- Cloud failover (credits)
- Community support
PRO
- 3 nodes included
- +$5 / mo per extra node
- 1M requests / mo
- Request analytics
- Budget alerts
- Email support
TEAM
- 10 nodes included
- +$5 / mo per extra node
- 5M requests / mo
- Virtual keys w/ budgets
- Priority support