The AI gateway space has matured fast. In 2026 there are several credible options for putting a proxy in front of your LLM providers — and the differences between them have sharpened. If you are evaluating gateways for self-hosting, cost control, or both, here is a fair look at the landscape: what each tool does well, where each one makes tradeoffs, and how to match the right tool to what you actually need.
Why an AI gateway in the first place
An AI gateway is a proxy that sits between your applications — or coding CLIs like Claude Code, Codex and Cursor — and the LLM providers behind them. Instead of every client talking directly to a dozen different APIs, they all hit one stable, OpenAI-compatible endpoint. The gateway resolves models, picks providers, handles failover, and adds the cross-cutting concerns you would otherwise reimplement in every app: authentication, rate limiting, observability, cost tracking and, increasingly, token optimization.
The case for a gateway is strongest when you care about more than one provider, want to keep keys and data off third-party infrastructure, or need spend visibility and controls across multiple teams or tools. For a deeper look at the self-hosting angle, see the self-hosted LLM gateway guide.
The 2026 landscape
As of 2026, the AI gateway space has seen consolidation. Several tools that started as open-source projects have moved toward enterprise tiers, cloud-managed offerings, or have been acquired. That shift is not inherently bad — enterprise tooling serves real needs — but it has created a gap for teams that want a genuinely self-hostable, open-core, cost-control-focused gateway without a managed-cloud dependency or an enterprise sales motion. The options below cover the major tools in the space; each has real strengths, and this comparison tries to name them honestly.
The tools
LiteLLM
LiteLLM is the most widely deployed open-source LLM proxy. Written in Python, it offers the broadest provider coverage in the space — over 100 providers — and integrates naturally with the Python ML and AI ecosystem. Its strength is ecosystem breadth: if a provider exists, LiteLLM likely supports it. It includes a proxy server, an OpenAI-compatible API, load balancing, a dashboard, and spend tracking.
Where LiteLLM makes tradeoffs: the Python runtime adds latency overhead and memory footprint compared to compiled alternatives, which matters under high concurrency or on resource-constrained hosts. Cost optimization is primarily observational (spend tracking, budget limits) rather than structural (it does not compress tokens before they reach the model). The self-hosted proxy is open source; advanced features are gated behind a commercial tier. For teams already deep in Python infrastructure and needing maximum provider coverage, LiteLLM is a strong default.
Portkey
Portkey started as an AI gateway focused on governance, guardrails and prompt management. Its strength is operational depth for teams that need to control what models do, not just which ones they route to: prompt versioning, guardrails, virtual keys with per-key policies, and detailed audit trails. It has a polished dashboard and a thoughtful developer experience.
Portkey’s primary offering is a managed cloud gateway with a self-hosted option. As of 2026, the product has moved further toward enterprise; the open-source layer is maintained but the governance-depth features that distinguish Portkey are increasingly cloud-managed. If governance, guardrails and prompt management are your primary requirements, Portkey built that tooling first and built it well.
Helicone
Helicone is an observability-first LLM gateway. Its core value is logging, analytics and replay: every request and response is captured, searchable and visualizable. It integrates with a very wide range of providers and is exceptionally easy to get started with — in many cases a single header change routes traffic through it. The Helicone dashboard is purpose-built for debugging, cost analysis and prompt iteration.
Helicone’s architecture is primarily a cloud-hosted proxy with a self-hosted option. As of 2026, reported acquisition activity has moved the project further into enterprise territory. If deep observability — request logging, replay, analytics across your full prompt history — is the primary requirement, Helicone built that surface more carefully than most alternatives. Cost controls are lighter; the focus is on seeing spend, not structurally reducing it.
merido
merido is an open-source, local-first LLM gateway written in
Rust. It takes a different starting point: cost optimization is a first-class
feature, not an add-on. Where most gateways track spend after the fact, merido
acts before the request reaches the provider — compressing tool output, routing
across accounts by cost and latency, and recording only measured savings in a
ledger that shows $0 when it cannot prove a saving.
It ships as a single static binary that runs on embedded SQLite locally (no external dependencies to get started) and scales to Postgres + Redis for multi-tenant deployments. The same binary serves the API, a built-in dashboard, and documentation from one port. merido is the newest entrant in this comparison; it has a smaller provider ecosystem than LiteLLM and less observability depth than Helicone. Its edge is the specific combination: Rust performance, structural cost optimization, an honest savings ledger, genuine BYOK, and a genuinely self-hostable binary under an open-source license.
Side by side
| Dimension | LiteLLM | Portkey | Helicone | merido |
|---|---|---|---|---|
| Open source | Yes (MIT); enterprise tier is commercial. | Open-source layer exists; governance features are cloud-managed. | Open-source layer exists; primary product is cloud-hosted. | Yes (MIT / Apache-2.0); no closed enterprise fork. |
| Self-host | Yes — Python service + Redis/Postgres. | Yes — Docker-based; managed cloud is primary. | Yes — self-hosted option available; managed cloud is primary. | Yes — single static binary, SQLite default, zero external deps. |
| BYOK / key control | Yes — your keys, but proxied through the process you run. | Yes — virtual keys with per-key policies; cloud-managed by default. | Yes — keys forwarded through the proxy. | Yes — your keys only, encrypted at rest, never pooled or shared. |
| Provider coverage | 100+ providers — the broadest in the space. | Wide coverage; strong on enterprise providers. | Very wide coverage; one-header integration for many providers. | 40+ providers; growing. Translates OpenAI / Anthropic / Gemini formats. |
| Cost optimization | Spend tracking and budget limits; no token compression. | Budget limits, virtual keys with spend policies; no token compression. | Spend analytics and caching; no token compression. | Tool-output compression (lossless, by command type), routing by cost/latency, savings ledger, caching. |
| Observability | Dashboard, spend tracking, logging. | Audit trail, prompt versioning, governance reports. | Purpose-built: full request/response logging, replay, analytics. | Built-in dashboard, burn-rate meter, Token-Optimization Advisor, per-session savings ledger. |
| Governance / guardrails | Rate limits, budget caps, per-key policies. | Strongest in class: guardrails, prompt management, audit, RBAC. | Moderate: logging-based; limited active guardrails. | Per-key rate limits, budget caps, RBAC; guardrails are lighter. |
| Runtime / performance | Python — flexible, higher overhead at scale. | Managed cloud (runtime not disclosed); self-host is Node/Python. | Managed cloud (runtime not disclosed); self-host varies. | Rust — compiled, low overhead, stable under sustained streaming load. |
| 2026 trajectory | Active OSS; enterprise tier expanding. | Moving further into enterprise / managed cloud. | Reported acquisition; enterprise focus increasing. | Active indie OSS; cost-optimization-first, self-host focus. |
No single tool wins every row. The right choice depends on your actual requirements: if you need 100+ providers or deep Python integration, LiteLLM is hard to beat. If guardrails and prompt governance matter most, Portkey built that tooling first. If your primary need is request-level observability and replay, Helicone is purpose-built for it. merido’s edge is the combination of structural cost optimization, genuine self-hosting, and Rust performance — not breadth.
How to pick
01Start with the self-hosting question
If running a managed-cloud gateway is acceptable — you do not have hard data-residency or key-control requirements — Portkey and Helicone offer polished managed experiences with lower operational overhead. If self-hosting is a requirement (compliance, cost, control, or ToS posture), narrow to LiteLLM and merido as the strongest self-hosted options; Portkey and Helicone have self-hosted paths but their primary investment is in the managed product.
02Match the primary use case to the tool’s core
Each tool built something first. LiteLLM built provider coverage. Portkey built governance. Helicone built observability. merido built cost optimization. Pick the tool whose primary investment maps to your primary requirement — you will get the depth you need without working against the grain of the tool’s design.
03Think about cost optimization: tracking vs. structural
Most gateways offer cost tracking: they record spend after the fact.
That is useful but limited — it tells you what you spent, not what you could
have avoided. Structural cost optimization acts before the
tokens reach the provider: compressing tool output, routing to the cheapest
capable account, caching responses. If AI coding costs are the primary
motivation (and if you have ever seen a session bill spike from bulky
git or test output, you know the scope of the problem — see the
AI coding cost calculator),
a gateway that acts structurally will outperform one that only tracks.
04Evaluate the savings ledger honestly
Any gateway can show you a “savings” number. The question is how it is
calculated. A ledger that attributes every cache hit or compression event as
a saving — regardless of whether the request would have been made otherwise —
overstates what the tool actually did. merido’s savings ledger reports only
measured numbers, against a recorded baseline, and shows
$0 when it cannot prove a saving. That is a useful signal about
the tool’s posture; look for the same rigor in any gateway you evaluate.
05Consider the operational footprint
Python services with external dependencies (Redis, Postgres) are flexible but carry operational overhead. A compiled binary with embedded storage as a default is easier to run, upgrade and secure — especially on a developer workstation or a single-instance self-hosted setup. If you are operating a team deployment, the Postgres + Redis path is available in merido and LiteLLM; weigh what you are willing to operate vs. what you want managed for you.
Where merido fits — honestly
merido is not the right tool for every situation. It has a smaller provider ecosystem than LiteLLM (40+ vs. 100+), lighter governance depth than Portkey, and less observability surface than Helicone. It is the newest project in this comparison, and provider support grows over time.
Where merido earns its place is the specific combination no other tool offers together: structural cost optimization built into the core (lossless tool-output compression, cost-and-latency-aware routing, prompt-cache management), a savings ledger that only reports what it can prove, a genuinely self-hostable single binary with no mandatory external dependencies, encryption-at-rest for credentials, and a Rust runtime that stays out of the way under sustained streaming load. It is built for teams and individual developers whose primary concern is controlling AI spend — not just watching it — while keeping keys and data on their own infrastructure.
merido uses your own API keys, runs on your own infrastructure, and never pools, shares or resells credentials. It is MIT / Apache-2.0 licensed — no proprietary control plane, no managed-cloud dependency, no enterprise tier gating the cost-optimization features. Your keys, your billing, your machine.
If you are evaluating for self-hosting depth, read the self-hosted LLM gateway guide for a fuller look at what to require from a gateway you run yourself. And if the underlying question is “how much does agentic AI coding actually cost and where does it go?”, the AI coding cost calculator gives you a concrete starting point before you route a single request.
Self-host a cost-optimization-first AI gateway
Open source, single static binary, on your own keys. Route across 40+ providers, compress tool output before it reaches the model, and see only what you actually saved.
Related guides
- Self-hosted LLM gateway — own your keys, data, and routing.
- AI coding cost calculator — estimate your bill before you route.
- Why AI coding bills explode — the root cause, explained.
- How to reduce Claude Code costs — 8 concrete tactics.
Frequently asked questions
What is the best open-source LiteLLM alternative?
It depends on what you need LiteLLM to do differently. LiteLLM's strengths are its enormous provider ecosystem and Python-native integration. If you want a compiled, low-overhead single binary with cost optimization as a core feature — tool-output compression, routing across your own accounts, a savings ledger — merido is a focused alternative worth evaluating. If you need enterprise governance depth, Portkey is worth a look; if observability is the primary concern, Helicone was purpose-built for that. The right answer is the narrowest tool that covers your actual requirements.
Should I self-host an AI gateway or use a SaaS?
Self-hosting keeps your API keys and request data on infrastructure you control, eliminates per-request markup from a hosted router, and is the ToS-clean way to do BYOK — you call providers with your own credentials, not pooled through someone else's account. The tradeoff is operational responsibility: you run and maintain the process. If operational overhead is a deal-breaker and data residency is not a constraint, a SaaS gateway can be the right call. For teams where keys, data, or compliance posture matter, self-hosting is usually the right architecture.
Is merido production-ready?
merido is open source and actively developed. It runs on embedded SQLite for local single-user use and on Postgres + Redis for multi-tenant deployments. It has a circuit-breaker-backed routing engine, encryption-at-rest for credentials, and a savings ledger that only reports measured numbers. Like any self-hosted tool, production readiness depends on your operating environment and risk tolerance — evaluate it against your requirements, read the docs, and run your own load tests. The binary, source, and documentation are all publicly available.
Does merido pool or resell API keys?
No. merido uses your own API keys exclusively, runs on your own infrastructure, and never pools, shares or resells credentials. That is the core BYOK design: your keys, your billing, your machine. It is the same arrangement you have when calling providers directly — the gateway just adds routing, compression, and observability on top.