Your coding workload

800k tok
40k tok
60

Model tiers — prices editable, per 1M tokens

$
$
$
$
$
$

Generic tiers — model prices change often; enter your model's current price per million tokens from the provider's official pricing page.

Monthly cost comparison

Frontier
$—
Balanced
$—
Economy
$—
$—
monthly difference between cheapest and most expensive

Formula: monthly = tasks × (input ÷ 1M × input price + output ÷ 1M × output price). All arithmetic is done in your browser on the numbers you enter. The cheapest tier is highlighted in teal.

How model price interacts with the cumulative-input tax

Switching from a frontier model to a cheaper one is an obvious lever — but it only captures part of the story. Agentic coding tools like Claude Code, Cursor, and Cline are stateless: every turn re-sends the entire conversation. That means a 60-turn session doesn't cost 60× a single turn; it costs roughly proportional to the sum of a growing series — much more than you'd expect. This is what we call the cumulative-input tax.

A cheaper model lowers the multiplier, but a longer session raises the base aggressively. Read more about this mechanism in why AI coding bills explode. The comparator above works with per-task token counts, so if you enter counts that already reflect a long session, the price difference between tiers will look large — because it is.

The right model depends on the task

A cheaper, smaller model can handle high-volume, repetitive coding steps (linting, simple refactors, test generation from templates) at dramatically lower cost. But the same model may require several extra turns to reason through a complex architectural decision — turns that each re-send the full context and erode the saving.

Practical rule of thumb

Use cheaper tiers for well-scoped, self-contained tasks where the output is easy to validate. Reserve frontier models for tasks that need strong reasoning in one pass, where extra turns are more expensive than the price difference between tiers.

See also the AI coding cost calculator for a session-level breakdown that shows how the cumulative-input tax compounds across turns at any price point.

How merido helps without hand-picking per request

Manually choosing a model for each task doesn't scale — and the comparator above shows why the tradeoff isn't always obvious. merido is an open-source, self-hosted AI gateway that sits between your coding CLI and the providers you already have accounts with:

  • Routes by cost and latency across the API keys and providers you own, automatically.
  • Fails over gracefully — if one provider is slow or rate-limited, the next account or provider picks up the request.
  • Compresses bulky tool output losslessly, reducing the re-send tax before it hits the wire.
  • BYOK, self-hosted — merido uses your own keys and never pools or resells them.
  • No promised savings percentage — it routes on real signals and reports what it actually observed.
Honest framing

merido does not tell you it will cut your bill by a headline percentage. The comparator above shows the arithmetic; whether the saving is real for you depends on which providers you have, your task mix, and whether quality is acceptable at the lower price point. merido measures what actually happened on your traffic and shows it — no fabricated baselines.

Route your real traffic, measure what moves

Open source, one self-hosted binary, on your own keys. Point your CLI at merido and let it route and compress automatically.

Related guides

Frequently asked questions

How is the monthly cost calculated?

monthly = tasks × (input ÷ 1,000,000 × input price + output ÷ 1,000,000 × output price). You supply tasks per month, input and output tokens per task, and the price per million tokens for each tier. All computation happens in your browser.

Are the default prices real?

No — the defaults are generic illustrative tiers (Frontier, Balanced, Economy) with no specific vendor attached. Model prices change frequently. Enter the current per-million price from your provider's official pricing page for an accurate comparison.

Which model is cheapest for coding?

It depends on your workload and current prices. Cheaper models can save a lot on repetitive, well-scoped tasks, but may need extra turns on hard reasoning — which adds tokens and can erase the saving. Enter your real numbers to find out.

Does merido guarantee savings?

No. merido routes on real signals and reports what it actually observed. The comparator shows the arithmetic of publicly available pricing tiers; the actual saving depends on your task mix, provider availability, and acceptable quality at each tier.