Virtual models & fallback
A virtual model is a named, client-callable model id that resolves to an ordered list of real targets — each a (provider/model, account, weight) — that merido tries in turn, with fallback. It's the LiteLLM "model group" concept: callers name one stable model, and merido decides which real target serves it.
Why use one
- A stable name across providers. Your tools call
smart-coder; you control which real models back it without touching the tools. - Fallback. If a target errors or is rate-limited, merido advances to the next one.
- Cost / latency / health awareness. Targets can be ordered by live cost or latency scores, and unhealthy or cooled-down targets are skipped.
Creating one
You can create virtual models three ways:
Dashboard
Use the Virtual Models page: name it, pick a strategy, and add targets (the target picker suggests real provider/model options per connected account, so you select rather than type).
CLI / env (MERIDO_ROUTES)
Define them declaratively as a JSON array in the MERIDO_ROUTES environment variable. Each entry is a virtual-model definition (name, strategy, targets). This is parsed specially at startup — see Configuration.
API
POST /api/virtual-models (and GET / PUT / DELETE per id; POST /api/virtual-models/{id}/toggle, POST /api/virtual-models/reorder). See API endpoints.
Strategies
A virtual model declares one strategy (validated up front):
| Strategy | Behavior |
|---|---|
failover | Try targets strictly in declared order, advancing only on failure. |
load_balance | Rotate the lead target across invocations (sticky window), then fall back through the rotated order. |
weighted | Load-balance variant that biases the lead target by each target's weight. |
cost_optimized | Order targets by a live cost score (lower is better), then fall back through that order. |
latency_based | Order targets by a live latency score (lower is better), then fall back through that order. |
For load_balance, a sticky limit pins consecutive requests to the chosen lead target before the rotation cursor advances. For the score-driven strategies, merido falls back to declared order when no score map is available yet.
How a request is resolved
When a caller names a virtual model:
- merido expands it to its ordered list of targets.
- Live feedback (health, quota, per-(account, model) locks, ranking) filters and orders candidates.
- The router picks a target and an account for it (cost-, latency-, health-, circuit-breaker-aware).
- On failure it walks the fallback chain: next account → next target → next tier.
Context-window fallback
When a target rejects a prompt as too long for its context window, merido does not just fail down the chain — it reorders the untried targets largest-context-window first, so the next attempt lands on a model that can actually hold the prompt. Context windows come from the model metadata catalog (/v1/model_group/info), synced from the upstream rate card with a built-in fallback table. This mirrors LiteLLM's context_window_fallbacks, but uses the virtual model's own targets — no separate fallback list to configure.
Circuit breaker & cooldowns
When a target/account repeatedly fails (or returns a 429), the circuit breaker records the failure and the account+model is put on a cooldown so the router stops hammering it; healthy alternatives are tried instead. The breaker recovers automatically once the cooldown expires. In a multi-instance deployment these cooldowns are shared across instances via Redis.
Related
- Add providers & keys — the accounts virtual models route across.
- Usage & the Advisor — the advisor can suggest cheaper targets.