/// documentation

Docs

How Wide Area Intelligence works, and how to run it.

Plain-English definitions and pronunciation guides for local-AI terms like llama.cpp, Ollama, LM Studio, GGUF, LoRA, RAG, and KV cache.

[ coding agents ]Claude Code→

Run Claude Code through WAI's Anthropic Messages-compatible gateway: local GPU routing, Claude-shaped aliases, model discovery, token counting, and cloud failover.

[ agents ]Agent Mode API→

Run reusable agent workflows with external-user isolation, persistent local browser profiles, login handoff, schedules, and exports.

[ how it works ]Routing & failover→

How the gateway decides where a request runs — edge cache, your own GPUs, then capability-aware cloud failover. Substitution mode, vision models, cross-provider backups.

[ patterns ]Recipes: multi-model pipelines→

Mix a fast, cheap model and a strong one per step — classify-then-generate, extract-then-write — over one endpoint. Run the cheap step on your own GPUs for free.

[ media ]Video generation→

Generate text-to-video on your own GPU (Wan 2.2, LTX-2.3): deploy a video model, smoke-test it with wai test-video, and use it from Chat, the playground, and the node API.

[ node ops ]wai CLI reference→

Every command for the node CLI: status, logs, start/stop/restart, update, models-dir, uninstall, plus config and auto-update.