Aider is an AI pair programmer that lives in your terminal. You add files to the chat, describe what you want in plain English, and Aider edits the code in place — then commits each change to git with a sensible message. It builds a repo mapso it understands code it can't see, applies edits as real diffs (not copy-paste), and ties everything to your git history so every change is reviewable and reversible with /undo.
Aider speaks the OpenAI API, which means it doesn't care whether the model behind the endpoint lives at OpenAI, on OpenRouter, or on a GPU in your spare room. This guide does the last one: run a coder model on your own hardware through Wide Area Intelligence and pair-program with zero per-token costs, no rate limits, and your code never leaving machines you own.
Why run Aider on your own GPU
Aider is a token furnace, and that's by design. Every turn it re-sends the files in the chat, the repo map, the system prompt, and the running conversation so the model always has full context. A focused hour of pairing on a medium repo routinely moves through millions of input tokens — reads, diffs, retries when an edit doesn't apply cleanly. On a metered API that's a real line item; on hardware you own it's the electricity you were already paying for.
Two more reasons. Privacy: Aider sends your actual source — proprietary logic, secrets in config files, customer code — to whatever endpoint you point it at. Keeping that on your own node means it never touches a third party. No rate limits: no TPM ceilings, no 429s mid-refactor, no daily caps. The only limit is how fast your GPU generates tokens.
What you need
A Wide Area Intelligence account
A machine with a real GPU
Python 3.10+ on your dev machine
pipx is the cleanest way to install it.Step 1 — Bring a node online and deploy a coder model
In the dashboard, go to Nodes → Add a node, name it, and run the one-line installer on the GPU machine. It works on macOS (native Metal), Linux (Docker with NVIDIA passthrough), and Windows (native CUDA, no WSL). The node opens an outbound Cloudflare Tunnel — no port forwarding, no static IP, no firewall changes. Within a minute it shows CONNECTED.
Now open Models, search qwen coder, and deploy one to your node with one click — the catalog comes straight from Hugging Face's GGUF library and the dashboard only offers quantizations that actually fit your node's memory. For Aider, bigger is genuinely better: it relies on precise instruction-following so its diffs apply cleanly on the first try, and that's exactly where the larger models pull ahead. Treat the 14B as the floor for serious work and reach for the 32B if your hardware allows.
| your hardware | model for aider | download | notes |
|---|---|---|---|
| 12GB VRAM / 16GB Mac | Qwen2.5-Coder-7B-Instruct · Q4_K_M | ~4.7GB | usable, but diffs miss more |
| 16GB VRAM / 32GB Mac | Qwen2.5-Coder-14B-Instruct · Q4_K_M | ~9GB | the practical minimum |
| 24GB VRAM / 48GB Mac | Qwen2.5-Coder-32B-Instruct · Q4_K_M | ~20GB | the sweet spot for Aider |
| 2× 24GB / 96GB+ Mac | Qwen3-Coder-30B-A3B-Instruct · Q5_K_M | ~22GB | fast MoE, great quality |
Aider rates models on a public leaderboard by how often their edits apply without a retry. Smaller models score lower not because they write worse code but because they botch the diff format more often — which is why the 32B is worth the VRAM if you can spare it.
Step 2 — Raise the context window to 32k
This is the step that makes or breaks Aider on local hardware. A node's llama-server defaults to a 4,096-token context window— fine for chat, useless for an agent. Aider's repo map alone can run a few thousand tokens before you've added a single file, and once you /adda couple of source files plus the conversation, you'll blow past 4k immediately and start seeing truncated edits or "context length exceeded" errors.
The fix takes ten seconds. Open your node's detail page (Nodes → your node), find the context window setting, type 32768 (or 32k), and save. The node picks it up on its next heartbeat, restarts its inference server with the new window, and shows the applied value when it's done — under a minute, no SSH required.
The trade-off is memory: the KV cache grows linearly with context, on top of the model weights. Rough numbers for the 32B (Q4_K_M ≈ 20GB of weights):
| context window | extra (kv cache) | total ≈ | notes |
|---|---|---|---|
| 4k (default) | ~0.5GB | ~20.5GB | too small for Aider — skip |
| 32k | ~4GB | ~24GB | the target — fits 24GB with care / 48GB Mac |
| 64k | ~8GB | ~28GB | comfortable on a 48GB+ Mac |
| 128k (model max) | ~16GB | ~36GB | big repos — needs serious memory |
Rule of thumb: set 32768. It holds the repo map, several files, and a real conversation without truncation, and it lines up with the max_input_tokensyou'll declare to Aider in a moment. If the 32k KV cache crowds the 32B model off a 24GB card, either run the 14B at 32k or keep the 32B and drop to a smaller window — the node's detail page does the memory math for you.
Step 3 — Create a gateway key
Go to API Keys → Create a key, name it aider, and copy the wai_sk_…value — it's shown once. One key per tool keeps your request logs readable and lets you revoke access per-app later.
Step 4 — Install and configure Aider
# Recommended: pipx keeps Aider isolated from your project deps pipx install aider-chat # Or plain pip (use a venv) pip install aider-chat aider --version
Aider reads OPENAI_API_BASE and OPENAI_API_KEY for the endpoint, and takes the model on the command line. The openai/ prefix is the important bit: it tells Aider to talk to the endpoint with its generic OpenAI-compatible client rather than looking the name up against a known provider. And --no-show-model-warningssilences the "unknown model" metadata warning Aider prints for any model it doesn't recognize — which includes every local one.
# Point Aider at your Wide Area Intelligence gateway export OPENAI_API_BASE="https://wideareaai.com/api/v1" export OPENAI_API_KEY="wai_sk_..." # from API Keys # Run it inside a git repo. The openai/ prefix is what tells Aider # to use its OpenAI-compatible client instead of a named provider. cd ~/code/my-project aider --model openai/Qwen2.5-Coder-32B-Instruct-Q4_K_M --no-show-model-warnings
# Get the exact model string your node serves right now curl https://wideareaai.com/api/v1/models \ -H "Authorization: Bearer wai_sk_..."
The model string after openai/must match what your node serves — it's the .gguf filename without the extension, shown on your Nodes page and returned by the /models call above.
Typing those flags every time gets old. Aider reads a .aider.conf.ymlfrom your repo root (or home directory) automatically, so you can commit your team's setup and just run aider:
# .aider.conf.yml — drop this in your repo root (or ~/) so you can # just type 'aider' with no flags. Aider reads it automatically. openai-api-base: https://wideareaai.com/api/v1 model: openai/Qwen2.5-Coder-32B-Instruct-Q4_K_M show-model-warnings: false # Optional niceties for local models: edit-format: diff # ask for unified diffs, not whole-file rewrites auto-commits: true # Aider commits each change with a clear message map-tokens: 2048 # cap the repo map so it doesn't eat your context
One more file pays for itself. By default Aider has no cost or context data for your local model, so it warns on startup and — worse — guesses a conservative context size and trims your chat history too aggressively. Drop a .aider.model.metadata.json next to it to set the real window and tell Aider tokens are free:
// .aider.model.metadata.json — repo root. Tells Aider the real
// context window of your local model and that tokens are free, so it
// stops nagging and stops over-trimming the chat history.
{
"openai/Qwen2.5-Coder-32B-Instruct-Q4_K_M": {
"max_input_tokens": 32768,
"max_output_tokens": 8192,
"input_cost_per_token": 0,
"output_cost_per_token": 0
}
}Keep the max_input_tokensin the JSON in sync with the server-side context window you set in Step 2. The node's window is the hard ceiling; the JSON just tells Aider how much room it's allowed to use. Setting the JSON higher than the server only causes truncation errors.
Step 5 — A working session
Aider is driven by a handful of slash commands. The three you'll use constantly: /add puts files in the chat so Aider can edit them; /askasks a question without changing any code (great for "how does this work?" before you commit to a change); and /architect switches into two-step mode, where the model first plans the change in prose, then proposes the concrete edits for you to confirm — ideal for anything spanning multiple files.
# A real session, start to finish $ aider --no-show-model-warnings Aider v0.x — model: openai/Qwen2.5-Coder-32B-Instruct-Q4_K_M > /add src/auth/login.py src/auth/tokens.py Added src/auth/login.py and src/auth/tokens.py to the chat. > add a 15-minute expiry to access tokens and refresh them automatically Applied edit to src/auth/tokens.py Applied edit to src/auth/login.py Commit a1b2c3d feat: expire access tokens after 15m, auto-refresh > /ask why did you put the refresh check in the middleware? (answers without touching files) > /architect split tokens.py into issuance and validation modules (plans the refactor, then proposes concrete edits to confirm) > /undo Removed last commit.
Every one of those requests routes through Wide Area Intelligence to your node. You can watch them land in real time on the dashboard's Overview page, and the Analytics page breaks down tokens and generation speed per model and per node. Because Aider auto-commits, your git log doubles as an audit trail — and /undo rolls back the last change, commit and all, when the model gets it wrong.
Running more than one node? Add an X-WAI-Node: your-node-name header to pin Aider to one specific machine. In .aider.conf.yml that's extra-headers: { X-WAI-Node: your-node-name }. Without it, requests load-balance across all your ready nodes.
Where local models still fall short — and the answer
Honesty time: a 32B model on your GPU is not Claude or GPT-class for the hardest Aider tasks. The gap shows up in three places. Architect mode on sprawling changes — a refactor touching a dozen files with subtle cross-cutting dependencies — is where frontier models reason more reliably and local models lose the thread. Diff discipline under pressure: smaller models botch the edit format more often, so Aider retries, which costs you time even if it costs no money. And very large contexts: if a change genuinely needs 100k+ tokens of repo in view, a cloud model with a huge window will simply see more than your 32k node.
The pragmatic workflow is to use both, and Wide Area Intelligence is built for exactly that. Do the bulk of your pairing — the 90% that's feature work, tests, refactors within a file or two — on your own GPU for free. When you hit a genuinely hard architectural change, point Aider at a frontier model for that one session. You can configure cloud failover on prepaid credits so that if your node goes down mid-session, requests spill over to a cloud model instead of failing — no config change in Aider, the gateway handles it.
Add it up and the math is hard to argue with: open coder models now write genuinely good code, Aider is a great harness for them, and running it on your own GPU turns AI pair programming from a metered subscription into a fixed cost you already pay. Deploy a model, create a key, point Aider at the gateway — and keep the cloud in your back pocket for the days you actually need it.