Skip to content

Virtual models & fallback

A virtual model is a named, client-callable model id that resolves to an ordered list of real targets — each a (provider/model, account, weight) — that merido tries in turn, with fallback. It's the LiteLLM "model group" concept: callers name one stable model, and merido decides which real target serves it.

Why use one

  • A stable name across providers. Your tools call smart-coder; you control which real models back it without touching the tools.
  • Fallback. If a target errors or is rate-limited, merido advances to the next one.
  • Cost / latency / health awareness. Targets can be ordered by live cost or latency scores, and unhealthy or cooled-down targets are skipped.

Creating one

You can create virtual models three ways:

Dashboard

Use the Virtual Models page: name it, pick a strategy, and add targets (the target picker suggests real provider/model options per connected account, so you select rather than type).

CLI / env (MERIDO_ROUTES)

Define them declaratively as a JSON array in the MERIDO_ROUTES environment variable. Each entry is a virtual-model definition (name, strategy, targets). This is parsed specially at startup — see Configuration.

API

POST /api/virtual-models (and GET / PUT / DELETE per id; POST /api/virtual-models/{id}/toggle, POST /api/virtual-models/reorder). See API endpoints.

Strategies

A virtual model declares one strategy (validated up front):

StrategyBehavior
failoverTry targets strictly in declared order, advancing only on failure.
load_balanceRotate the lead target across invocations (sticky window), then fall back through the rotated order.
weightedLoad-balance variant that biases the lead target by each target's weight.
cost_optimizedOrder targets by a live cost score (lower is better), then fall back through that order.
latency_basedOrder targets by a live latency score (lower is better), then fall back through that order.

For load_balance, a sticky limit pins consecutive requests to the chosen lead target before the rotation cursor advances. For the score-driven strategies, merido falls back to declared order when no score map is available yet.

How a request is resolved

When a caller names a virtual model:

  1. merido expands it to its ordered list of targets.
  2. Live feedback (health, quota, per-(account, model) locks, ranking) filters and orders candidates.
  3. The router picks a target and an account for it (cost-, latency-, health-, circuit-breaker-aware).
  4. On failure it walks the fallback chain: next account → next target → next tier.

Context-window fallback

When a target rejects a prompt as too long for its context window, merido does not just fail down the chain — it reorders the untried targets largest-context-window first, so the next attempt lands on a model that can actually hold the prompt. Context windows come from the model metadata catalog (/v1/model_group/info), synced from the upstream rate card with a built-in fallback table. This mirrors LiteLLM's context_window_fallbacks, but uses the virtual model's own targets — no separate fallback list to configure.

Circuit breaker & cooldowns

When a target/account repeatedly fails (or returns a 429), the circuit breaker records the failure and the account+model is put on a cooldown so the router stops hammering it; healthy alternatives are tried instead. The breaker recovers automatically once the cooldown expires. In a multi-instance deployment these cooldowns are shared across instances via Redis.

MIT / Apache-2.0 licensed.