Anatomy of an inference request: how the gateway decides where it runs
Your code calls one OpenAI-compatible endpoint, but each request quietly walks three stages — edge cache, your own GPU, then capability-aware cloud failover. Here's the whole decision path, and the settings that shape it.
read the guide →