Failover that doesn't break your app: capability-aware routing

Swapping models on the fly — for cost or for resilience — quietly breaks any request that needs vision or tools, if the substitute can't do them. Here's how the gateway routes by what a request actually needs, and how to set a vision model and backups.

Two features make Wide Area Intelligence cheaper and more resilient, and both work by changing the model that serves a request. Always use default models swaps a cloud-pinned model for your own hardware. Cross-provider failoverswaps a downed vendor for one that's up. Useful — until the swap lands on a model that can't do what the request needs.

The classic break is vision. An app uploads an image for captioning, sending it as an OpenAI image_url content part. If a model swap quietly routes that to a text-only model — a local model, or a cheaper text default — the image is dropped. The response comes back blank or with a hard error, and the app that worked yesterday is broken today. Nothing in the request was wrong; the routing was.

The fix: route by what the request needs

The gateway now reads each request before it picks a model. Does it carry an image? Does it define tools? Those are needs. When the gateway builds the cloud chain, the model you explicitly named always heads it untouched — your deliberate choice — but every fallback the gateway picks for youis filtered to models that can actually meet those needs. A text-only model simply isn't offered an image.

vision request · resolved chain capability-filtered

image_url ↦request carries an image → needs a model that can see

#	role	model	capabilities	verdict
01	caller	openai/gpt-4o-mini	vision, tools	✓ head — never filtered
02	vision default	google/gemini-2.5-flash	vision, tools	✓ leads image requests
03	vision backup	anthropic/claude-sonnet	vision, tools	✓ if Google is down
04	text default	google/gemma-text-only	tools	✗ pruned — can't see
05	platform	openai/gpt-4o-mini	vision, tools	✓ always backstops

The text-only default is dropped for this request — sending the image there would return blanks. For a plain text request it stays in the chain. The filter is per-request, not per-account.

The filter is per-request, not per-account. The same text-only model that gets dropped for an image request stays in the chain for a plain text one. And because a node reports the model it has loaded but no capability signal, image requests skip local nodes entirely and go straight to a vision-capable cloud model — the gateway won't gamble an image on a node it can't vet.

Give vision its own model

If your cloud default can see — many do — you're already covered. But the common local-first setup is a small, fast, text-only model on your own GPUs, with cloud only as overflow. In that setup, image requests have nowhere good to land. So vision gets its own slot:

Set a default vision model

In Settings → default models, pick a vision-capable model (the picker only lists models that can see). Image requests route here; everything else stays on your text default.

Add a backup for each chain

One backup model for the text chain, one for the vision chain. They're tried after the primary when its provider is down — so a Google outage falls over to OpenAI instead of erroring.

Turn on cross-provider failover

The backups only engage with this on. Off, each chain is just its primary model; on, the gateway walks the whole chain on a vendor outage. You're billed for whichever model actually served.

What the chain looks like

For an image request, the resolved order is: your named model (if any) → vision default → vision backup → text default and its backup → the platform default — with every text-only entry filtered out along the way. For a plain text request it's simply your text default → backup → platform default. Either way, the platform default — which can see — always backstops the chain, so an image request can never run out of capable options.

Outages vs. real errors

Cross-provider failover only advances the chain on an availability failure — a 5xx, a rate-limit, a timeout. A deterministic client error, like a malformed request or a content-policy refusal, never switches models: it would fail identically on any vendor, so the gateway surfaces it instead of burning your credits trying the same thing three more times.

Set it once: a vision model for images, a backup per chain, cross-provider failover on. After that, cheaper routing and vendor outages are invisible to your app — the request always reaches a model that can serve it.

The full set of rules lives in the routing & failover reference. Or open your settings and give vision its own model.

Failover that doesn't break your app: capability-aware routing

The fix: route by what the request needs

Give vision its own model

What the chain looks like

Outages vs. real errors

That GPU is already paid for.Put it on the network.

That GPU is already paid for.
Put it on the network.