oh-my-pi (the binary is omp) is an open-source AI coding agent for the terminal — hash-anchored edits, LSP integration, in-process search and shell, browser automation, and subagents, with a Rust core for the hot paths. Like the other good agents, it isn't welded to one vendor: it speaks openai-completions, anthropic-messages, google-generative-ai and more, and lets you declare your own providers.
That's the hook. Declare a Wide Area Intelligence provider and omp drives a model running on hardware you own— zero per-token costs, your code never leaving your machines, and no rate limits beyond what your GPU can physically generate. This guide wires omp to a node serving Google's gemma-4-12b-qat (a QAT-quantized 12B that fits a single 12–16GB GPU) end to end.
What you need
A Wide Area Intelligence account
A node online and READY
qwen-code-with-wideareaai guide walks through bringing a node online and deploying a model.)A gateway API key
wai_sk_…value — it's shown once. One key per tool keeps your logs tidy and revocable.Step 1 — Install omp
# macOS / Linux — Homebrew (recommended on a Mac) brew install can1357/tap/omp # …or the universal install script curl -fsSL https://omp.sh/install | sh # …or with Bun (cross-platform) bun install -g @oh-my-pi/pi-coding-agent # Confirm it's on your PATH omp --version
On Windows, irm https://omp.sh/install.ps1 | iex does the same job. Either way you end up with an omp binary and a config directory at ~/.omp/agent.
Step 2 — Declare WideAreaAI as a provider
omp loads custom providers from ~/.omp/agent/models.yml. Create that file with a single wideareaai provider whose baseUrlis the gateway's OpenAI-compatible endpoint and whose api is openai-completions:
# ~/.omp/agent/models.yml
providers:
wideareaai:
baseUrl: https://wideareaai.com/api/v1
# apiKey resolution order: a "!command" is run for its stdout, else the
# value is looked up as an env-var name, else it's used as a literal key.
apiKey: WIDEAREAAI_API_KEY
api: openai-completions
models:
- id: google/gemma-4-12b-qat
name: WideAreaAI Gemma 12B QAT
reasoning: true
input: [text]
contextWindow: 32768
maxTokens: 8192
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }A few fields earn their place. id must match exactly what your node advertises (more on that in Step 3). reasoning: true tells omp this model thinks before it answers — the WideAreaAI Gemma QAT build streams a reasoning channel and accepts effort levels (minimal/low/medium/high/xhigh). And contextWindow is the ceiling omp uses for context management — set it to match your node, not higher (see the callout in Step 5).
How apiKey resolves, in order: a value starting with ! is run as a shell command and its stdout is used; otherwise the value is looked up as an environment-variable name; and only if no such variable exists is it treated as a literal key. So apiKey: wai_sk_… works inline, but the tidier move is the next step.
Rather than commit a secret to a YAML file, point apiKeyat an environment variable and drop the real key in omp's agent .env(it's read automatically):
# Keep the secret out of the YAML — drop it in the agent .env instead. # models.yml's "apiKey: WIDEAREAAI_API_KEY" then resolves to this value. echo 'WIDEAREAAI_API_KEY=wai_sk_...' >> ~/.omp/agent/.env
Step 3 — Match the model id your node serves
The id in models.ymlhas to be the model id the gateway actually exposes — that's how requests route to the node holding it. Ask the gateway directly:
# The exact ids your nodes serve right now, straight from the gateway curl https://wideareaai.com/api/v1/models \ -H "Authorization: Bearer wai_sk_..."
Each entry's id is your models.yml id, and owned_by tells you which node is serving it (e.g. wai-node:m3max). Once the file is saved, confirm omp picked the models up:
# omp merges your models.yml with its built-in catalog. # Filter to confirm the WideAreaAI models are registered: omp --list-models gemma
You'll see them listed under the wideareaai provider as wideareaai/google/gemma-4-12b-qat — that fully-qualified name is what you select next.
Step 4 — Make it the default model
omp resolves which model to use from the modelRoles record (the default role is your main model; smol and sloware the fast and reasoning roles). Because it's a record, set it as JSON rather than a dotted key:
# Make WideAreaAI's Gemma the default model for every session.
# modelRoles is a record, so pass it as JSON (dot-keys aren't accepted).
omp config set modelRoles '{"default":"wideareaai/google/gemma-4-12b-qat"}'
# Check it stuck:
omp config get modelRolesPrefer not to touch global config? Skip this step and pass --model gemma-4-12b-qat per run — omp fuzzy-matches, so you rarely type the full id.
Step 5 — Code
# Interactive TUI in any project — uses the default model you just set cd ~/code/my-project omp # One-shot, non-interactive (great for scripts and CI) omp -p "Summarize what this repository does" # Or choose a model ad hoc — fuzzy matching, no full id needed omp --model gemma-4-12b-qat "Add input validation to the signup handler"
Every request routes through Wide Area Intelligence to your node. Watch them land in real time on the dashboard's Overview page; Analytics breaks down tokens and generation speed per model and per node.
Set the context window on the node, not just in YAML.A node's llama-server defaults to a small 4,096-token window — fine for chat, fatal for an agent that stuffs system prompt, files, diffs, and history into context. Open Nodes → your node, set the context window to 32768, and save. Keep contextWindow in models.yml at or below that number — the server-side window is the hard ceiling.
How requests pick a node
If you have more than one ready node serving the model, the gateway load-balancesacross them: it prefers the least-loaded node and, among comparably-loaded ones, the faster node (it tracks each node's tokens/sec). If a node drops mid-session, requests can fail over to a cloud model on prepaid credits. Want everything pinned to one machine instead? omp lets you attach headers per provider — add X-WAI-Node: your-node-name to route to exactly one node with no fallback.
Two-model setup (optional)
Running a second node with a different model? Add it as another entry in the same models.yml models: list, then wire it into a role. A common split is a fast model for the smol role (titles, quick edits) and the bigger reasoning model as default:
| role | used for | good fit |
|---|---|---|
| default | main agent loop, edits, reasoning | gemma-4-12b-qat (reasoning) |
| smol | titles, classification, lightweight calls | a smaller/faster node model |
| slow | deep analysis when you invoke it | your largest available model |
Set them all at once: omp config set modelRoles '{"default":"wideareaai/google/gemma-4-12b-qat","smol":"wideareaai/amoral-gemma3-12b-v2-qat-q4_0"}' .
Why self-host the agent's model
Coding agents are token furnaces — one focused afternoon can burn through millions of tokens of context re-reads, diffs, and retries. On a metered API that's real money; on your own GPU it's electricity you were already paying for. Pair that with the privacy of code that never leaves your hardware and a QAT 12B that holds its own on everyday edits, and pointing omp at your own node stops being the compromise option.