← all posts
[ deep dive ]June 11, 20268 min read

Failover that doesn't break your app: capability-aware routing

Swapping models on the fly — for cost or for resilience — quietly breaks any request that needs vision or tools, if the substitute can't do them. Here's how the gateway routes by what a request actually needs, and how to set a vision model and backups.

Two features make Wide Area Intelligence cheaper and more resilient, and both work by changing the model that serves a request. Always use default models swaps a cloud-pinned model for your own hardware. Cross-provider failoverswaps a downed vendor for one that's up. Useful — until the swap lands on a model that can't do what the request needs.

The classic break is vision. An app uploads an image for captioning, sending it as an OpenAI image_url content part. If a model swap quietly routes that to a text-only model — a local model, or a cheaper text default — the image is dropped. The response comes back blank or with a hard error, and the app that worked yesterday is broken today. Nothing in the request was wrong; the routing was.

The fix: route by what the request needs

The gateway now reads each request before it picks a model. Does it carry an image? Does it define tools? Those are needs. When the gateway builds the cloud chain, the model you explicitly named always heads it untouched — your deliberate choice — but every fallback the gateway picks for youis filtered to models that can actually meet those needs. A text-only model simply isn't offered an image.

vision request · resolved chain capability-filtered
image_url ↦request carries an image → needs a model that can see
#rolemodelcapabilitiesverdict
01calleropenai/gpt-4o-minivision, toolshead — never filtered
02vision defaultgoogle/gemini-2.5-flashvision, toolsleads image requests
03vision backupanthropic/claude-sonnetvision, toolsif Google is down
04text defaultgoogle/gemma-text-onlytoolspruned — can't see
05platformopenai/gpt-4o-minivision, toolsalways backstops

The text-only default is dropped for this request — sending the image there would return blanks. For a plain text request it stays in the chain. The filter is per-request, not per-account.

The filter is per-request, not per-account. The same text-only model that gets dropped for an image request stays in the chain for a plain text one. And because a node reports the model it has loaded but no capability signal, image requests skip local nodes entirely and go straight to a vision-capable cloud model — the gateway won't gamble an image on a node it can't vet.

Give vision its own model

If your cloud default can see — many do — you're already covered. But the common local-first setup is a small, fast, text-only model on your own GPUs, with cloud only as overflow. In that setup, image requests have nowhere good to land. So vision gets its own slot:

01

Set a default vision model

In Settings → default models, pick a vision-capable model (the picker only lists models that can see). Image requests route here; everything else stays on your text default.
02

Add a backup for each chain

One backup model for the text chain, one for the vision chain. They're tried after the primary when its provider is down — so a Google outage falls over to OpenAI instead of erroring.
03

Turn on cross-provider failover

The backups only engage with this on. Off, each chain is just its primary model; on, the gateway walks the whole chain on a vendor outage. You're billed for whichever model actually served.

What the chain looks like

For an image request, the resolved order is: your named model (if any) → vision default → vision backup → text default and its backup → the platform default — with every text-only entry filtered out along the way. For a plain text request it's simply your text default → backup → platform default. Either way, the platform default — which can see — always backstops the chain, so an image request can never run out of capable options.

Outages vs. real errors

Cross-provider failover only advances the chain on an availability failure — a 5xx, a rate-limit, a timeout. A deterministic client error, like a malformed request or a content-policy refusal, never switches models: it would fail identically on any vendor, so the gateway surfaces it instead of burning your credits trying the same thing three more times.

Set it once: a vision model for images, a backup per chain, cross-provider failover on. After that, cheaper routing and vendor outages are invisible to your app — the request always reaches a model that can serve it.

The full set of rules lives in the routing & failover reference. Or open your settings and give vision its own model.

/// get started

That GPU is already paid for.
Put it on the network.

Create your gateway — free →