Run the oh-my-pi coding agent on your own GPU with Wide Area Intelligence

Point can1357's oh-my-pi (omp) terminal agent at hardware you own: install it, declare a WideAreaAI provider in models.yml, set it as the default model, and code against your own Gemma node with zero token costs.

oh-my-pi (the binary is omp) is an open-source AI coding agent for the terminal — hash-anchored edits, LSP integration, in-process search and shell, browser automation, and subagents, with a Rust core for the hot paths. Like the other good agents, it isn't welded to one vendor: it speaks openai-completions, anthropic-messages, google-generative-ai and more, and lets you declare your own providers.

That's the hook. Declare a Wide Area Intelligence provider and omp drives a model running on hardware you own— zero per-token costs, your code never leaving your machines, and no rate limits beyond what your GPU can physically generate. This guide wires omp to a node serving Google's gemma-4-12b-qat (a QAT-quantized 12B that fits a single 12–16GB GPU) end to end.

What you need

A Wide Area Intelligence account

It's free for up to 2 nodes — sign in with Google.

A node online and READY

A GPU machine running the one-line installer, with a 12B-class model deployed. A 12GB+ GPU or a 16GB+ Apple Silicon Mac comfortably runs a QAT 12B. (New here? The qwen-code-with-wideareaai guide walks through bringing a node online and deploying a model.)

A gateway API key

API Keys → Create a key. Copy the wai_sk_…value — it's shown once. One key per tool keeps your logs tidy and revocable.

Step 1 — Install omp

install

# macOS / Linux — Homebrew (recommended on a Mac)
brew install can1357/tap/omp

# …or the universal install script
curl -fsSL https://omp.sh/install | sh

# …or with Bun (cross-platform)
bun install -g @oh-my-pi/pi-coding-agent

# Confirm it's on your PATH
omp --version

On Windows, irm https://omp.sh/install.ps1 | iex does the same job. Either way you end up with an omp binary and a config directory at ~/.omp/agent.

Step 2 — Declare WideAreaAI as a provider

omp loads custom providers from ~/.omp/agent/models.yml. Create that file with a single wideareaai provider whose baseUrlis the gateway's OpenAI-compatible endpoint and whose api is openai-completions:

~/.omp/agent/models.yml

# ~/.omp/agent/models.yml
providers:
  wideareaai:
    baseUrl: https://wideareaai.com/api/v1
    # apiKey resolution order: a "!command" is run for its stdout, else the
    # value is looked up as an env-var name, else it's used as a literal key.
    apiKey: WIDEAREAAI_API_KEY
    api: openai-completions
    models:
      - id: google/gemma-4-12b-qat
        name: WideAreaAI Gemma 12B QAT
        reasoning: true
        input: [text]
        contextWindow: 32768
        maxTokens: 8192
        cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }

A few fields earn their place. id must match exactly what your node advertises (more on that in Step 3). reasoning: true tells omp this model thinks before it answers — the WideAreaAI Gemma QAT build streams a reasoning channel and accepts effort levels (minimal/low/medium/high/xhigh). And contextWindow is the ceiling omp uses for context management — set it to match your node, not higher (see the callout in Step 5).

How apiKey resolves, in order: a value starting with ! is run as a shell command and its stdout is used; otherwise the value is looked up as an environment-variable name; and only if no such variable exists is it treated as a literal key. So apiKey: wai_sk_… works inline, but the tidier move is the next step.

Rather than commit a secret to a YAML file, point apiKeyat an environment variable and drop the real key in omp's agent .env(it's read automatically):

keep the key out of git

# Keep the secret out of the YAML — drop it in the agent .env instead.
# models.yml's "apiKey: WIDEAREAAI_API_KEY" then resolves to this value.
echo 'WIDEAREAAI_API_KEY=wai_sk_...' >> ~/.omp/agent/.env

Step 3 — Match the model id your node serves

The id in models.ymlhas to be the model id the gateway actually exposes — that's how requests route to the node holding it. Ask the gateway directly:

list your models

# The exact ids your nodes serve right now, straight from the gateway
curl https://wideareaai.com/api/v1/models \
  -H "Authorization: Bearer wai_sk_..."

Each entry's id is your models.yml id, and owned_by tells you which node is serving it (e.g. wai-node:m3max). Once the file is saved, confirm omp picked the models up:

confirm omp sees them

# omp merges your models.yml with its built-in catalog.
# Filter to confirm the WideAreaAI models are registered:
omp --list-models gemma

You'll see them listed under the wideareaai provider as wideareaai/google/gemma-4-12b-qat — that fully-qualified name is what you select next.

Step 4 — Make it the default model

omp resolves which model to use from the modelRoles record (the default role is your main model; smol and sloware the fast and reasoning roles). Because it's a record, set it as JSON rather than a dotted key:

set the default model

# Make WideAreaAI's Gemma the default model for every session.
# modelRoles is a record, so pass it as JSON (dot-keys aren't accepted).
omp config set modelRoles '{"default":"wideareaai/google/gemma-4-12b-qat"}'

# Check it stuck:
omp config get modelRoles

Prefer not to touch global config? Skip this step and pass --model gemma-4-12b-qat per run — omp fuzzy-matches, so you rarely type the full id.

Step 5 — Code

run it

# Interactive TUI in any project — uses the default model you just set
cd ~/code/my-project
omp

# One-shot, non-interactive (great for scripts and CI)
omp -p "Summarize what this repository does"

# Or choose a model ad hoc — fuzzy matching, no full id needed
omp --model gemma-4-12b-qat "Add input validation to the signup handler"

Every request routes through Wide Area Intelligence to your node. Watch them land in real time on the dashboard's Overview page; Analytics breaks down tokens and generation speed per model and per node.

Set the context window on the node, not just in YAML.A node's llama-server defaults to a small 4,096-token window — fine for chat, fatal for an agent that stuffs system prompt, files, diffs, and history into context. Open Nodes → your node, set the context window to 32768, and save. Keep contextWindow in models.yml at or below that number — the server-side window is the hard ceiling.

How requests pick a node

If you have more than one ready node serving the model, the gateway load-balancesacross them: it prefers the least-loaded node and, among comparably-loaded ones, the faster node (it tracks each node's tokens/sec). If a node drops mid-session, requests can fail over to a cloud model on prepaid credits. Want everything pinned to one machine instead? omp lets you attach headers per provider — add X-WAI-Node: your-node-name to route to exactly one node with no fallback.

Two-model setup (optional)

Running a second node with a different model? Add it as another entry in the same models.yml models: list, then wire it into a role. A common split is a fast model for the smol role (titles, quick edits) and the bigger reasoning model as default:

role	used for	good fit
default	main agent loop, edits, reasoning	gemma-4-12b-qat (reasoning)
smol	titles, classification, lightweight calls	a smaller/faster node model
slow	deep analysis when you invoke it	your largest available model

Set them all at once: omp config set modelRoles '{"default":"wideareaai/google/gemma-4-12b-qat","smol":"wideareaai/amoral-gemma3-12b-v2-qat-q4_0"}' .

Why self-host the agent's model

Coding agents are token furnaces — one focused afternoon can burn through millions of tokens of context re-reads, diffs, and retries. On a metered API that's real money; on your own GPU it's electricity you were already paying for. Pair that with the privacy of code that never leaves your hardware and a QAT 12B that holds its own on everyday edits, and pointing omp at your own node stops being the compromise option.

Create your gateway and put that GPU to work →