High availability
A single merido instance keeps several pieces of runtime state in memory — rate-limit buckets, the dedup cache, the live-event bus, circuit-breaker cooldowns. If you run more than one instance behind a load balancer, those need to be shared, or each instance would have its own view (a per-key rate limit would be N times too generous, a cooldown on one instance wouldn't be honored by another, and a dashboard on instance B wouldn't see traffic from instance A).
merido solves this with shared Redis cluster state.
Enabling it
Point every instance at the same Redis and the same store:
REDIS_URL=redis://your-redis:6379All instances must also share the same DATABASE_URL and MERIDO_MASTER_KEY, so they are genuinely one logical deployment.
/healthz reports Redis status: ok (configured and reachable), down (configured but unreachable — degraded to in-memory), or off (no Redis configured).
What becomes shared
When REDIS_URL is set, these per-instance components become cluster-correct:
- Rate limiting — the per-key token bucket lives in Redis, so spending it on one instance leaves less for the others (authoritative, shared budget).
- Login-failure throttle — brute-force protection counts across instances.
- Dedup cache — a completed deduped response on one instance can be served by another.
- Live events — completed-request events publish cluster-wide, so every dashboard sees every instance's traffic exactly once.
- Circuit-breaker / cooldown deltas — a cooldown tripped on one instance is applied on the others via pub/sub, so a 429'd account+model isn't re-hammered elsewhere.
- Per-(account, model) model locks — the same way, via pub/sub deltas.
It's a soft dependency: if Redis goes down mid-run, requests keep succeeding (rate limiting fails open, events fall back to the local bus), /healthz flips to "redis":"down", and nothing crashes. Subscribers reconnect on their own when Redis returns.
What stays per-instance
By design, the semantic cache is per-instance and the proxy-client pool is stateless. Only completed dedup responses are shared cross-instance — two identical in-flight requests landing on different instances may both reach upstream.
Run at least two instances
Redis only becomes load-bearing once you run two or more instances. On Fly.io, raise min_machines_running to ≥ 2 in fly.toml; a single machine runs fine without Redis.
For a hands-on, two-instance verification runbook (shared rate-limit budget, cross-instance cooldowns, cross-instance live events), see docs/REDIS-HA.md in the repository.
Related
- Deploy to production — the base cloud deployment this builds on.
- Configuration —
REDIS_URLand related knobs.