Smart routing
Spread requests across all your accounts and free tiers with cost- and latency-aware selection, load balancing, request dedup, a semantic cache, and a four-tier fallback chain that keeps you coding when a provider blinks.
merido is a fast, local-first AI gateway. One endpoint fans out across all your providers and accounts — paid and free-tier — routing on cost and latency with automatic failover, so no capacity goes to waste. Then it proves the tokens and dollars it saved you.
The Token-Optimization Advisor inspects real traffic per CLI and project, detects waste, and recommends concrete savings you apply in one click. Every optimization is booked to a savings ledger — so you see exactly how many tokens and dollars merido saved, and whether it paid for itself.
tool_result output, losslessly.tool_result · claude-codeSpread requests across all your accounts and free tiers with cost- and latency-aware selection, load balancing, request dedup, a semantic cache, and a four-tier fallback chain that keeps you coding when a provider blinks.
Circuit breakers, active & passive health checks, per-key rate limits, smart retry.
One model ID that fans out to an ordered list of real targets.
Images, embeddings, audio (TTS + STT), sandboxed MCP and WASM plugins.
OpenTelemetry, Prometheus, a queryable audit log, and a live request stream.
Encryption-at-rest for every credential, argon2 auth, hash-chained audit logging, and a BYOK secrets vault — your keys never leave your machine.
Validate the client key and per-key rate limit, then detect the source format — OpenAI, Anthropic, or Responses.
Resolve the model, alias, or Virtual Model, then pick the best provider + account — health-, breaker-, and cost-aware.
Compress tool results, then translate through one canonical IR into the target wire format.
Stream with retry, transform the response back, track tokens + cost — and fall back on any error.
Define a client-callable model that fans out to an ordered list of real
provider/model targets. merido rotates and fails over automatically —
your CLI keeps calling one name.
strategy: failover
targets:
- anthropic/claude-sonnet
- openrouter/claude-sonnet
- gemini/gemini-2.5-pro
# your CLI just calls:
model: "smart-sonnet"
The server crate embeds the dashboard at compile time — build the UI once, then run a single binary that serves the API, the dashboard, and this page.
# 1 · build the dashboard (needs bun)
cd dashboard && bun install && bun run build
# 2 · run the gateway (embedded SQLite)
cargo run -p merido -- start
→ http://127.0.0.1:8788
# multi-stage build, slim runtime image
docker build -t merido .
# persist the data dir on a volume
docker run --rm -p 8788:8788 \
-v merido-data:/data merido
Then merido keys create, merido providers add, and
merido gain to see what you've saved.
free forever · no open-core
Dual-licensed MIT / Apache-2.0. Every feature ships in the binary you self-host.
An open-source, local-first AI gateway written in Rust. It sits between your AI coding tools — Claude Code, Codex, Cursor, Cline, Continue — and 40+ upstream LLM providers, translating formats, compressing tool output to save tokens, and failing over so you never hit a wall.
merido speaks an OpenAI-compatible API and routes to 40+ providers including OpenAI, Anthropic, Google Gemini, OpenRouter, GLM and Kiro. Any CLI targeting an OpenAI or Anthropic endpoint works out of the box.
Yes — dual-licensed under MIT and Apache-2.0 with no billing and no open-core paywall. Every feature ships in a single static binary you self-host.
It analyzes your real usage, detects token waste per CLI and project, and recommends concrete changes — compressing tool results, switching to a cheaper model, or enabling prompt caching — then lets you apply them with a guarded probation window and auto-rollback.
Yes. The default local profile runs as a single binary with embedded SQLite, serving both API and dashboard from one port (8788). A cloud profile adds Postgres, Redis and multi-tenancy when you scale.
A clean-room reimplementation in Rust with a canonical-IR translation core, a built-in Token-Optimization Advisor, and a single self-hosted binary that serves both the API and a full operator dashboard. It's Helicone-header compatible, so existing clients point at it directly.