The AI gateway space has matured fast. In 2026 there are several credible options for putting a proxy in front of your LLM providers — and the differences between them have sharpened. If you are evaluating gateways for self-hosting, cost control, or both, here is a fair look at the landscape: what each tool does well, where each one makes tradeoffs, and how to match the right tool to what you actually need.

Why an AI gateway in the first place

An AI gateway is a proxy that sits between your applications — or coding CLIs like Claude Code, Codex and Cursor — and the LLM providers behind them. Instead of every client talking directly to a dozen different APIs, they all hit one stable, OpenAI-compatible endpoint. The gateway resolves models, picks providers, handles failover, and adds the cross-cutting concerns you would otherwise reimplement in every app: authentication, rate limiting, observability, cost tracking and, increasingly, token optimization.

The case for a gateway is strongest when you care about more than one provider, want to keep keys and data off third-party infrastructure, or need spend visibility and controls across multiple teams or tools. For a deeper look at the self-hosting angle, see the self-hosted LLM gateway guide.

The 2026 landscape

As of 2026, the AI gateway space has seen consolidation. Several tools that started as open-source projects have moved toward enterprise tiers, cloud-managed offerings, or have been acquired. That shift is not inherently bad — enterprise tooling serves real needs — but it has created a gap for teams that want a genuinely self-hostable, open-core, cost-control-focused gateway without a managed-cloud dependency or an enterprise sales motion. The options below cover the major tools in the space; each has real strengths, and this comparison tries to name them honestly.

The tools

LiteLLM

LiteLLM is the most widely deployed open-source LLM proxy. Written in Python, it offers the broadest provider coverage in the space — over 100 providers — and integrates naturally with the Python ML and AI ecosystem. Its strength is ecosystem breadth: if a provider exists, LiteLLM likely supports it. It includes a proxy server, an OpenAI-compatible API, load balancing, a dashboard, and spend tracking.

Where LiteLLM makes tradeoffs: the Python runtime adds latency overhead and memory footprint compared to compiled alternatives, which matters under high concurrency or on resource-constrained hosts. Cost optimization is primarily observational (spend tracking, budget limits) rather than structural (it does not compress tokens before they reach the model). The self-hosted proxy is open source; advanced features are gated behind a commercial tier. For teams already deep in Python infrastructure and needing maximum provider coverage, LiteLLM is a strong default.

Portkey

Portkey started as an AI gateway focused on governance, guardrails and prompt management. Its strength is operational depth for teams that need to control what models do, not just which ones they route to: prompt versioning, guardrails, virtual keys with per-key policies, and detailed audit trails. It has a polished dashboard and a thoughtful developer experience.

Portkey’s primary offering is a managed cloud gateway with a self-hosted option. As of 2026, the product has moved further toward enterprise; the open-source layer is maintained but the governance-depth features that distinguish Portkey are increasingly cloud-managed. If governance, guardrails and prompt management are your primary requirements, Portkey built that tooling first and built it well.

Helicone

Helicone is an observability-first LLM gateway. Its core value is logging, analytics and replay: every request and response is captured, searchable and visualizable. It integrates with a very wide range of providers and is exceptionally easy to get started with — in many cases a single header change routes traffic through it. The Helicone dashboard is purpose-built for debugging, cost analysis and prompt iteration.

Helicone’s architecture is primarily a cloud-hosted proxy with a self-hosted option. As of 2026, reported acquisition activity has moved the project further into enterprise territory. If deep observability — request logging, replay, analytics across your full prompt history — is the primary requirement, Helicone built that surface more carefully than most alternatives. Cost controls are lighter; the focus is on seeing spend, not structurally reducing it.

merido

merido is an open-source, local-first LLM gateway written in Rust. It takes a different starting point: cost optimization is a first-class feature, not an add-on. Where most gateways track spend after the fact, merido acts before the request reaches the provider — compressing tool output, routing across accounts by cost and latency, and recording only measured savings in a ledger that shows $0 when it cannot prove a saving.

It ships as a single static binary that runs on embedded SQLite locally (no external dependencies to get started) and scales to Postgres + Redis for multi-tenant deployments. The same binary serves the API, a built-in dashboard, and documentation from one port. merido is the newest entrant in this comparison; it has a smaller provider ecosystem than LiteLLM and less observability depth than Helicone. Its edge is the specific combination: Rust performance, structural cost optimization, an honest savings ledger, genuine BYOK, and a genuinely self-hostable binary under an open-source license.

Side by side

DimensionLiteLLMPortkeyHeliconemerido
Open sourceYes (MIT); enterprise tier is commercial.Open-source layer exists; governance features are cloud-managed.Open-source layer exists; primary product is cloud-hosted.Yes (MIT / Apache-2.0); no closed enterprise fork.
Self-hostYes — Python service + Redis/Postgres.Yes — Docker-based; managed cloud is primary.Yes — self-hosted option available; managed cloud is primary.Yes — single static binary, SQLite default, zero external deps.
BYOK / key controlYes — your keys, but proxied through the process you run.Yes — virtual keys with per-key policies; cloud-managed by default.Yes — keys forwarded through the proxy.Yes — your keys only, encrypted at rest, never pooled or shared.
Provider coverage100+ providers — the broadest in the space.Wide coverage; strong on enterprise providers.Very wide coverage; one-header integration for many providers.40+ providers; growing. Translates OpenAI / Anthropic / Gemini formats.
Cost optimizationSpend tracking and budget limits; no token compression.Budget limits, virtual keys with spend policies; no token compression.Spend analytics and caching; no token compression.Tool-output compression (lossless, by command type), routing by cost/latency, savings ledger, caching.
ObservabilityDashboard, spend tracking, logging.Audit trail, prompt versioning, governance reports.Purpose-built: full request/response logging, replay, analytics.Built-in dashboard, burn-rate meter, Token-Optimization Advisor, per-session savings ledger.
Governance / guardrailsRate limits, budget caps, per-key policies.Strongest in class: guardrails, prompt management, audit, RBAC.Moderate: logging-based; limited active guardrails.Per-key rate limits, budget caps, RBAC; guardrails are lighter.
Runtime / performancePython — flexible, higher overhead at scale.Managed cloud (runtime not disclosed); self-host is Node/Python.Managed cloud (runtime not disclosed); self-host varies.Rust — compiled, low overhead, stable under sustained streaming load.
2026 trajectoryActive OSS; enterprise tier expanding.Moving further into enterprise / managed cloud.Reported acquisition; enterprise focus increasing.Active indie OSS; cost-optimization-first, self-host focus.
Reading this table

No single tool wins every row. The right choice depends on your actual requirements: if you need 100+ providers or deep Python integration, LiteLLM is hard to beat. If guardrails and prompt governance matter most, Portkey built that tooling first. If your primary need is request-level observability and replay, Helicone is purpose-built for it. merido’s edge is the combination of structural cost optimization, genuine self-hosting, and Rust performance — not breadth.

How to pick

01Start with the self-hosting question

If running a managed-cloud gateway is acceptable — you do not have hard data-residency or key-control requirements — Portkey and Helicone offer polished managed experiences with lower operational overhead. If self-hosting is a requirement (compliance, cost, control, or ToS posture), narrow to LiteLLM and merido as the strongest self-hosted options; Portkey and Helicone have self-hosted paths but their primary investment is in the managed product.

02Match the primary use case to the tool’s core

Each tool built something first. LiteLLM built provider coverage. Portkey built governance. Helicone built observability. merido built cost optimization. Pick the tool whose primary investment maps to your primary requirement — you will get the depth you need without working against the grain of the tool’s design.

03Think about cost optimization: tracking vs. structural

Most gateways offer cost tracking: they record spend after the fact. That is useful but limited — it tells you what you spent, not what you could have avoided. Structural cost optimization acts before the tokens reach the provider: compressing tool output, routing to the cheapest capable account, caching responses. If AI coding costs are the primary motivation (and if you have ever seen a session bill spike from bulky git or test output, you know the scope of the problem — see the AI coding cost calculator), a gateway that acts structurally will outperform one that only tracks.

04Evaluate the savings ledger honestly

Any gateway can show you a “savings” number. The question is how it is calculated. A ledger that attributes every cache hit or compression event as a saving — regardless of whether the request would have been made otherwise — overstates what the tool actually did. merido’s savings ledger reports only measured numbers, against a recorded baseline, and shows $0 when it cannot prove a saving. That is a useful signal about the tool’s posture; look for the same rigor in any gateway you evaluate.

05Consider the operational footprint

Python services with external dependencies (Redis, Postgres) are flexible but carry operational overhead. A compiled binary with embedded storage as a default is easier to run, upgrade and secure — especially on a developer workstation or a single-instance self-hosted setup. If you are operating a team deployment, the Postgres + Redis path is available in merido and LiteLLM; weigh what you are willing to operate vs. what you want managed for you.

Where merido fits — honestly

merido is not the right tool for every situation. It has a smaller provider ecosystem than LiteLLM (40+ vs. 100+), lighter governance depth than Portkey, and less observability surface than Helicone. It is the newest project in this comparison, and provider support grows over time.

Where merido earns its place is the specific combination no other tool offers together: structural cost optimization built into the core (lossless tool-output compression, cost-and-latency-aware routing, prompt-cache management), a savings ledger that only reports what it can prove, a genuinely self-hostable single binary with no mandatory external dependencies, encryption-at-rest for credentials, and a Rust runtime that stays out of the way under sustained streaming load. It is built for teams and individual developers whose primary concern is controlling AI spend — not just watching it — while keeping keys and data on their own infrastructure.

BYOK · self-hosted · open source

merido uses your own API keys, runs on your own infrastructure, and never pools, shares or resells credentials. It is MIT / Apache-2.0 licensed — no proprietary control plane, no managed-cloud dependency, no enterprise tier gating the cost-optimization features. Your keys, your billing, your machine.

If you are evaluating for self-hosting depth, read the self-hosted LLM gateway guide for a fuller look at what to require from a gateway you run yourself. And if the underlying question is “how much does agentic AI coding actually cost and where does it go?”, the AI coding cost calculator gives you a concrete starting point before you route a single request.

Self-host a cost-optimization-first AI gateway

Open source, single static binary, on your own keys. Route across 40+ providers, compress tool output before it reaches the model, and see only what you actually saved.

Related guides

Frequently asked questions

What is the best open-source LiteLLM alternative?

It depends on what you need LiteLLM to do differently. LiteLLM's strengths are its enormous provider ecosystem and Python-native integration. If you want a compiled, low-overhead single binary with cost optimization as a core feature — tool-output compression, routing across your own accounts, a savings ledger — merido is a focused alternative worth evaluating. If you need enterprise governance depth, Portkey is worth a look; if observability is the primary concern, Helicone was purpose-built for that. The right answer is the narrowest tool that covers your actual requirements.

Should I self-host an AI gateway or use a SaaS?

Self-hosting keeps your API keys and request data on infrastructure you control, eliminates per-request markup from a hosted router, and is the ToS-clean way to do BYOK — you call providers with your own credentials, not pooled through someone else's account. The tradeoff is operational responsibility: you run and maintain the process. If operational overhead is a deal-breaker and data residency is not a constraint, a SaaS gateway can be the right call. For teams where keys, data, or compliance posture matter, self-hosting is usually the right architecture.

Is merido production-ready?

merido is open source and actively developed. It runs on embedded SQLite for local single-user use and on Postgres + Redis for multi-tenant deployments. It has a circuit-breaker-backed routing engine, encryption-at-rest for credentials, and a savings ledger that only reports measured numbers. Like any self-hosted tool, production readiness depends on your operating environment and risk tolerance — evaluate it against your requirements, read the docs, and run your own load tests. The binary, source, and documentation are all publicly available.

Does merido pool or resell API keys?

No. merido uses your own API keys exclusively, runs on your own infrastructure, and never pools, shares or resells credentials. That is the core BYOK design: your keys, your billing, your machine. It is the same arrangement you have when calling providers directly — the gateway just adds routing, compression, and observability on top.