ollama · lm studio · llama.cpp
Turn your GPU into an OpenAI-compatible endpoint.
You already run models locally. Wide Area Intelligence puts that hardware behind one OpenAI-compatible endpoint — reachable from anywhere over a Cloudflare Tunnel, edge-cached for repeat requests, and backed by cloud burst only when your node can't serve.
/// from a local-only server to a real endpoint
Reachable anywhere
llama.cpp / Ollama bound to localhost only works on your LAN. The agent opens a secure Cloudflare Tunnel so your apps, CI, and teammates can reach it — no port forwarding, no static IP, no exposed ports.
One stable URL
Quick Tunnel URLs rotate on restart; your gateway URL never does. Apps point at wideareaai.com/api/v1 once and the gateway tracks which node is online behind it.
Never a hard down
When the box is asleep, mid-update, or maxed out, requests fail over to the cloud instead of erroring. Your endpoint stays up even when your GPU doesn't.
/// how the routing works
Edge cache → your hardware → cloud. One endpoint in front of all three.
EDGE CACHE
An identical request you've served before is returned straight from Cloudflare's edge — the prompt never reaches a GPU. Per-account toggle, your own TTL.
YOUR HARDWARE
A one-line install turns any machine with a GPU into a node. It opens a secure Cloudflare Tunnel — no port forwarding, no static IP — and serves llama.cpp behind your gateway with no per-token fees.
CLOUD FAILOVER
Node busy, offline, or timed out? The same request silently re-routes to the cloud through our managed gateway, billed only against prepaid credits — burst only, never the baseline.
/// drop-in integration
Point your OpenAI SDK
at your own GPU.
Anything that already speaks the OpenAI API speaks Wide Area Intelligence. Swap the base URL for the gateway and your existing code now runs on the hardware under your desk — falling back to the cloud only when it has to.
- Change one line: the base URL.
- Works with the OpenAI SDK, LangChain, agents, curl — anything OpenAI-compatible.
- Bring your own gateway key; routing, caching, and failover are automatic.
from openai import OpenAI client = OpenAI( base_url="https://wideareaai.com/api/v1", api_key="wai_sk_…", ) resp = client.chat.completions.create( model="llama-3.1-8b-instruct", messages=[…], )
/// initialize
Your model is already loaded. Give it an endpoint.
no credit card · 2 nodes free, forever · openai-compatible