High availability

A single merido instance keeps several pieces of runtime state in memory — rate-limit buckets, the dedup cache, the live-event bus, circuit-breaker cooldowns. If you run more than one instance behind a load balancer, those need to be shared, or each instance would have its own view (a per-key rate limit would be N times too generous, a cooldown on one instance wouldn't be honored by another, and a dashboard on instance B wouldn't see traffic from instance A).

merido solves this with shared Redis cluster state.

Enabling it

Point every instance at the same Redis and the same store:

bash

REDIS_URL=redis://your-redis:6379

All instances must also share the same DATABASE_URL and MERIDO_MASTER_KEY, so they are genuinely one logical deployment.

/healthz reports Redis status: ok (configured and reachable), down (configured but unreachable — degraded to in-memory), or off (no Redis configured).

What becomes shared

When REDIS_URL is set, these per-instance components become cluster-correct:

Rate limiting — the per-key token bucket lives in Redis, so spending it on one instance leaves less for the others (authoritative, shared budget).
Login-failure throttle — brute-force protection counts across instances.
Dedup cache — a completed deduped response on one instance can be served by another.
Live events — completed-request events publish cluster-wide, so every dashboard sees every instance's traffic exactly once.
Circuit-breaker / cooldown deltas — a cooldown tripped on one instance is applied on the others via pub/sub, so a 429'd account+model isn't re-hammered elsewhere.
Per-(account, model) model locks — the same way, via pub/sub deltas.

It's a soft dependency: if Redis goes down mid-run, requests keep succeeding (rate limiting fails open, events fall back to the local bus), /healthz flips to "redis":"down", and nothing crashes. Subscribers reconnect on their own when Redis returns.

What stays per-instance

By design, the semantic cache is per-instance and the proxy-client pool is stateless. Only completed dedup responses are shared cross-instance — two identical in-flight requests landing on different instances may both reach upstream.

Run at least two instances

Redis only becomes load-bearing once you run two or more instances. On Fly.io, raise min_machines_running to ≥ 2 in fly.toml; a single machine runs fine without Redis.

For a hands-on, two-instance verification runbook (shared rate-limit budget, cross-instance cooldowns, cross-instance live events), see docs/REDIS-HA.md in the repository.

Deploy to production — the base cloud deployment this builds on.
Configuration — REDIS_URL and related knobs.

High availability ​

Enabling it ​

What becomes shared ​

What stays per-instance ​

Run at least two instances ​

Related ​

High availability

Enabling it

What becomes shared

What stays per-instance

Run at least two instances

Related