Two features make Wide Area Intelligence cheaper and more resilient, and both work by changing the model that serves a request. Always use default models swaps a cloud-pinned model for your own hardware. Cross-provider failoverswaps a downed vendor for one that's up. Useful — until the swap lands on a model that can't do what the request needs.
The classic break is vision. An app uploads an image for captioning, sending it as an OpenAI image_url content part. If a model swap quietly routes that to a text-only model — a local model, or a cheaper text default — the image is dropped. The response comes back blank or with a hard error, and the app that worked yesterday is broken today. Nothing in the request was wrong; the routing was.
The fix: route by what the request needs
The gateway now reads each request before it picks a model. Does it carry an image? Does it define tools? Those are needs. When the gateway builds the cloud chain, the model you explicitly named always heads it untouched — your deliberate choice — but every fallback the gateway picks for youis filtered to models that can actually meet those needs. A text-only model simply isn't offered an image.
| # | role | model | capabilities | verdict |
|---|---|---|---|---|
| 01 | caller | openai/gpt-4o-mini | vision, tools | ✓ head — never filtered |
| 02 | vision default | google/gemini-2.5-flash | vision, tools | ✓ leads image requests |
| 03 | vision backup | anthropic/claude-sonnet | vision, tools | ✓ if Google is down |
| 04 | text default | google/gemma-text-only | tools | ✗ pruned — can't see |
| 05 | platform | openai/gpt-4o-mini | vision, tools | ✓ always backstops |
The text-only default is dropped for this request — sending the image there would return blanks. For a plain text request it stays in the chain. The filter is per-request, not per-account.
The filter is per-request, not per-account. The same text-only model that gets dropped for an image request stays in the chain for a plain text one. And because a node reports the model it has loaded but no capability signal, image requests skip local nodes entirely and go straight to a vision-capable cloud model — the gateway won't gamble an image on a node it can't vet.
Give vision its own model
If your cloud default can see — many do — you're already covered. But the common local-first setup is a small, fast, text-only model on your own GPUs, with cloud only as overflow. In that setup, image requests have nowhere good to land. So vision gets its own slot:
Set a default vision model
Settings → default models, pick a vision-capable model (the picker only lists models that can see). Image requests route here; everything else stays on your text default.Add a backup for each chain
Turn on cross-provider failover
What the chain looks like
For an image request, the resolved order is: your named model (if any) → vision default → vision backup → text default and its backup → the platform default — with every text-only entry filtered out along the way. For a plain text request it's simply your text default → backup → platform default. Either way, the platform default — which can see — always backstops the chain, so an image request can never run out of capable options.
Outages vs. real errors
Cross-provider failover only advances the chain on an availability failure — a 5xx, a rate-limit, a timeout. A deterministic client error, like a malformed request or a content-policy refusal, never switches models: it would fail identically on any vendor, so the gateway surfaces it instead of burning your credits trying the same thing three more times.
Set it once: a vision model for images, a backup per chain, cross-provider failover on. After that, cheaper routing and vendor outages are invisible to your app — the request always reaches a model that can serve it.
The full set of rules lives in the routing & failover reference. Or open your settings and give vision its own model.