merido is an open-source, local-first AI gateway written in Rust. It sits between your AI coding tools (Claude Code, Codex, Cursor, Cline, Continue) and 40+ upstream LLM providers, translating request and response formats, compressing tool output to save tokens, and failing over across providers and accounts so you never hit a wall.

Which LLM providers and coding CLIs does merido support?

merido speaks an OpenAI-compatible API and routes to 40+ providers including OpenAI, Anthropic, Google Gemini, OpenRouter, GLM, Kiro and many more. Any coding CLI that targets an OpenAI or Anthropic endpoint works — Claude Code, Codex, Cursor, Cline and Continue are first-class.

Is merido free and open source?

Yes. merido is fully open source under the MIT and Apache-2.0 licenses, with no billing and no open-core paywall. Every feature ships in a single static binary you self-host.

Does merido run locally?

Yes. The default local profile runs as a single binary with an embedded SQLite database and serves both the API and the dashboard from one port (8788). A cloud profile adds Postgres, Redis and multi-tenancy when you need it.

v0 · open source · written in Rust →

Route every coding CLI across every key you own.

merido is a fast, local-first AI gateway. One endpoint fans out across all your providers and accounts — paid and free-tier — routing on cost and latency with automatic failover, so no capacity goes to waste. Then it proves the tokens and dollars it saved you.

Get started → Star on GitHub

40+: providers
1: static binary
MIT: / Apache-2.0

coding clis Claude Code Codex Cursor Cline Continue

merido canonical IR

40+ providers OpenAI Anthropic Gemini OpenRouter GLM · Kiro · …

01★ the differentiator

Every gateway shows what you spent.
merido shows what you're wasting.

The Token-Optimization Advisor inspects real traffic per CLI and project, detects waste, and recommends concrete savings you apply in one click. Every optimization is booked to a savings ledger — so you see exactly how many tokens and dollars merido saved, and whether it paid for itself.

Compresses bloated tool_result output, losslessly.
Flags requests that would be cheaper on another model.
Surfaces prompt-cache opportunities and spend anomalies.
Books every saving — real dollars, with an ROI you can share.

advisor · this week live

38.2% fewer tokens vs. baseline · ~$214 saved

−21% Compress tool_result · claude-code
−$0.42/run Route refactors to a cheaper tier
cache Enable prompt cache on system prefix

02what makes it stronger

One gateway.
Every capability you need.

Smart routing

Spread requests across all your accounts and free tiers with cost- and latency-aware selection, load balancing, request dedup, a semantic cache, and a four-tier fallback chain that keeps you coding when a provider blinks.

all your accountscost-awareload-balanced4-tier fallback

Reliability

Circuit breakers, active & passive health checks, per-key rate limits, smart retry.

Virtual Models

One model ID that fans out to an ordered list of real targets.

Multimodal

Images, embeddings, audio (TTS + STT), sandboxed MCP and WASM plugins.

Observability

OpenTelemetry, Prometheus, a queryable audit log, and a live request stream.

Security & secrets

Encryption-at-rest for every credential, argon2 auth, hash-chained audit logging, and a BYOK secrets vault — your keys never leave your machine.

encrypted at restargon2BYOK vaultaudit chain

03the request lifecycle

How a request flows through merido.

01
Authenticate & detect

Validate the client key and per-key rate limit, then detect the source format — OpenAI, Anthropic, or Responses.
02
Resolve & route

Resolve the model, alias, or Virtual Model, then pick the best provider + account — health-, breaker-, and cost-aware.
03
Save tokens & translate

Compress tool results, then translate through one canonical IR into the target wire format.
04
Execute & track

Stream with retry, transform the response back, track tokens + cost — and fall back on any error.

04virtual models

One model ID.
A whole fallback strategy.

Define a client-callable model that fans out to an ordered list of real provider/model targets. merido rotates and fails over automatically — your CLI keeps calling one name.

failover — drop to the next target when one is down.
load_balance — spread load across healthy targets.
cost_optimized — prefer the cheapest capable target.
latency_based — pick the fastest responder.

virtual-model · "smart-sonnet"

strategy: failover
targets:
  - anthropic/claude-sonnet
  - openrouter/claude-sonnet
  - gemini/gemini-2.5-pro

# your CLI just calls:
model: "smart-sonnet"

05quick start

Running in two commands.

The server crate embeds the dashboard at compile time — build the UI once, then run a single binary that serves the API, the dashboard, and this page.

build from source

# 1 · build the dashboard (needs bun)
cd dashboard && bun install && bun run build

# 2 · run the gateway (embedded SQLite)
cargo run -p merido -- start
  → http://127.0.0.1:8788

run with Docker

# multi-stage build, slim runtime image
docker build -t merido .

# persist the data dir on a volume
docker run --rm -p 8788:8788 \
  -v merido-data:/data merido

Then merido keys create, merido providers add, and merido gain to see what you've saved.

free forever · no open-core

Stop watching tokens drain.
Start routing smarter.

Dual-licensed MIT / Apache-2.0. Every feature ships in the binary you self-host.

Get started → View source

06faq

Questions, answered.

What is merido?

An open-source, local-first AI gateway written in Rust. It sits between your AI coding tools — Claude Code, Codex, Cursor, Cline, Continue — and 40+ upstream LLM providers, translating formats, compressing tool output to save tokens, and failing over so you never hit a wall.

Which providers and coding CLIs are supported?

merido speaks an OpenAI-compatible API and routes to 40+ providers including OpenAI, Anthropic, Google Gemini, OpenRouter, GLM and Kiro. Any CLI targeting an OpenAI or Anthropic endpoint works out of the box.

Is it really free and open source?

Yes — dual-licensed under MIT and Apache-2.0 with no billing and no open-core paywall. Every feature ships in a single static binary you self-host.

How does the Token-Optimization Advisor work?

It analyzes your real usage, detects token waste per CLI and project, and recommends concrete changes — compressing tool results, switching to a cheaper model, or enabling prompt caching — then lets you apply them with a guarded probation window and auto-rollback.

Can I run merido locally?

Yes. The default local profile runs as a single binary with embedded SQLite, serving both API and dashboard from one port (8788). A cloud profile adds Postgres, Redis and multi-tenancy when you scale.

How is it different from other AI gateways?

A clean-room reimplementation in Rust with a canonical-IR translation core, a built-in Token-Optimization Advisor, and a single self-hosted binary that serves both the API and a full operator dashboard. It's Helicone-header compatible, so existing clients point at it directly.

Route every coding CLI across every key you own.

Every gateway shows what you spent. merido shows what you're wasting.

One gateway.Every capability you need.

Smart routing

Reliability

Virtual Models

Multimodal

Observability

Security & secrets

How a request flows through merido.

Authenticate & detect

Resolve & route

Save tokens & translate

Execute & track

One model ID.A whole fallback strategy.