wide area network → wide area intelligence

Serve from
your hardware.
Fail over to the cloud.

Wide Area Intelligence is an OpenAI-compatible AI gateway that routes every request to your own hardware first — office workstations, on-prem servers, lab machines — and silently fails over to the cloud when it has to. It's an LLM gateway for the models you self-host: your apps never know the difference, and requests served by your own hardware carry no per-token fees.

Start routing — free See the route table

[ 2 nodes free ][ openai-compatible ][ no port forwarding ]

live route topology routing

dst	via	metric	state
inference/*	node-01.tunnel	0	UP
inference/*	openrouter.ai	100	STANDBY
cache/hit	edge.kv	—	38.2%

$0.00

per-token fees on your own hardware

<50ms

edge routing overhead

3 hops

cache → local → cloud

100%

openai sdk compatible

/// the explainer

The idea, in a couple of minutes.

What Wide Area Intelligence is, and why running inference on your own hardware first — cache, then your GPUs, then the cloud as plan B — costs less than sending every token to someone else's API.

explainer.mp4

A quick explainer — what it is and how hybrid routing works.

/// powering inference across the portfolio

Inventive HQcybersecurity Alert24incident mgmt GlitchReplayerror tracking PushMailtransactional email OmniCanvasspatial notes Planet Roadmapproduct mgmt Superpower Resumecareer AI CalBurndownfitness tracking

/// what brings you here

One gateway. Three reasons to run it.

cut the bill

Your AI spend is too high

Run the bulk of your traffic on hardware you own — no per-token fees — and burst to the cloud only for the hard part. OpEx becomes CapEx.

Explore →keep it private

Your data can't leave

Inference runs on machines you control and is never fed into Big AI training loops. Disable cloud failover and nothing leaves your network.

Explore →own the stack

You want to run your own AI

Any open-weight model, on the hardware you already have, pooled behind one endpoint — no vendor deciding what you can run.

Explore →

/// how it works

The AI gateway, in three tiers. Zero code changes.

01metric 0 · ~10ms

EDGE CACHE

Identical request seen before? Served instantly from Cloudflare's edge — the prompt never touches a GPU. Toggle it per-account, set your own TTL.

02metric 1 · no token fees

YOUR HARDWARE

A one-line Docker command turns any machine with a GPU into a node. It opens a secure Cloudflare Tunnel — no port forwarding, no static IP — and serves llama.cpp behind your gateway.

03metric 2 · always up

CLOUD FAILOVER

Node offline? Timed out? The same request silently re-routes to the cloud — GPT, Gemini, Claude — through our managed gateway, billed only against your prepaid credits. If one provider has an outage, the gateway fails over to the next. Your app sees one response, every time.

/// node setup

One command.
Any machine. Any OS.

The installer finds Docker, Podman, or Colima — or sets one up for you — then starts an agent bundling llama.cpp and a Cloudflare Tunnel. Run it on a workstation, a Mac Studio, or a rack server in the office: it registers itself and starts serving traffic in about a minute.

node-01 · office

$ curl -fsSL https://wideareaai.com/install.sh \
    | sh -s -- --key wai_node_…

[wai] Platform: linux
[wai] Using container runtime: docker
[wai] NVIDIA GPU passthrough enabled
✓ tunnel established
✓ node registered · node-01 ONLINE

/// app integration

Keep your SDK.
Change one URL.

Anything that speaks the OpenAI API speaks Wide Area Intelligence — Python, TypeScript, LangChain, agents, curl. Point it at the gateway and routing, caching, and failover happen automatically.

app.py

from openai import OpenAI

client = OpenAI(
    base_url="https://wideareaai.com/api/v1",
    api_key="wai_sk_…",
)

resp = client.chat.completions.create(
    model="llama-3.1-8b-instruct",
    messages=[…],
)

/// pricing

Own your baseline. Burst to the cloud.

Your GPUs serve the baseline with no per-token fees — your costs are the nodes and the power bill. The first two nodes are free; each node after that is $100/mo ($1000/yr). Team seats are unlimited. Cloud failover is prepaid credits, billed per request only on the overflow.

STARTER

$0forever

2 compute nodes — free
Unlimited team seats
2M requests / mo per node
Exact-match edge cache
Cloud failover (prepaid credits)

Start free

PER NODE

$100/ mo per node

Everything in Starter
Each node beyond the first 2
Or $1000 / yr per node (2 months free)
Auto-billed as you add & remove nodes
Cloud usage still metered on credits

Add a node

/// initialize

Your hardware is already paid for.
Put it on the network.

Create your gateway →

Serve fromyour hardware.Fail over to the cloud.