ollama · lm studio · llama.cpp

Turn your GPU into an OpenAI-compatible endpoint.

You already run models locally. Wide Area Intelligence puts that hardware behind one OpenAI-compatible endpoint — reachable from anywhere over a Cloudflare Tunnel, edge-cached for repeat requests, and backed by cloud burst only when your node can't serve.

Expose my GPU — free

[ no port forwarding ][ openai-compatible ][ 2 nodes free ]

/// from a local-only server to a real endpoint

Reachable anywhere

llama.cpp / Ollama bound to localhost only works on your LAN. The agent opens a secure Cloudflare Tunnel so your apps, CI, and teammates can reach it — no port forwarding, no static IP, no exposed ports.

One stable URL

Quick Tunnel URLs rotate on restart; your gateway URL never does. Apps point at wideareaai.com/api/v1 once and the gateway tracks which node is online behind it.

Never a hard down

When the box is asleep, mid-update, or maxed out, requests fail over to the cloud instead of erroring. Your endpoint stays up even when your GPU doesn't.

/// how the routing works

Edge cache → your hardware → cloud. One endpoint in front of all three.

01metric 0 · ~10ms

EDGE CACHE

An identical request you've served before is returned straight from Cloudflare's edge — the prompt never reaches a GPU. Per-account toggle, your own TTL.

02metric 1 · no token fees

YOUR HARDWARE

A one-line install turns any machine with a GPU into a node. It opens a secure Cloudflare Tunnel — no port forwarding, no static IP — and serves llama.cpp behind your gateway with no per-token fees.

03metric 2 · always up

CLOUD FAILOVER

Node busy, offline, or timed out? The same request silently re-routes to the cloud through our managed gateway, billed only against prepaid credits — burst only, never the baseline.

/// drop-in integration

Point your OpenAI SDK
at your own GPU.

Anything that already speaks the OpenAI API speaks Wide Area Intelligence. Swap the base URL for the gateway and your existing code now runs on the hardware under your desk — falling back to the cloud only when it has to.

Change one line: the base URL.
Works with the OpenAI SDK, LangChain, agents, curl — anything OpenAI-compatible.
Bring your own gateway key; routing, caching, and failover are automatic.

app.py

from openai import OpenAI

client = OpenAI(
    base_url="https://wideareaai.com/api/v1",
    api_key="wai_sk_…",
)

resp = client.chat.completions.create(
    model="llama-3.1-8b-instruct",
    messages=[…],
)

/// initialize

Your model is already loaded. Give it an endpoint.