How you work
Model price — per 1M tokens, editable
Estimated spend
Where the input goes
At turn 1 you send 5,000 input tokens; by turn 30 each request carries 34,000. You pay for that growing context on every turn — that's the tax.
Method: input at turn t = starting context
+ growth × (t − 1). Session input sums that across all turns
(so the re-send term grows with the square of session length); session output
= turns × output. Cost = (input ÷ 1M × input price) + (output ÷ 1M × output price),
then scaled by sessions and days. Defaults are calibrated to public figures
for agentic coding (first turn ≈ 5K input, ~25–35K by turn 30, on the order of
~$10–13 per active developer-day). Your real numbers will differ — that's why
everything here is editable.
Why the bill grows faster than your usage
The surprising part of agentic coding cost isn't the price per token — it's that every turn resends the entire conversation. The model is stateless, so to continue it re-reads everything that came before. Double the length of a session and you don't double the cost; you pay the cumulative-input tax, which grows roughly with the square of session length. That's why a 60-turn refactor can cost far more than two 30-turn ones, even for the same work.
The biggest lever is not a cheaper model — it's keeping the resent context small: clear or compact between tasks, scope what the agent reads, compress bulky tool output, and cache stable prefixes. The calculator's “re-send tax” bar is exactly the part those tactics attack.
Turning the estimate into real savings
A calculator can show you where the money could be going. To know what you're actually spending and saving, you need to measure your real traffic. That's what merido does: it sits between your coding CLI and your LLM providers and
- shows a live burn-rate meter per session — the real version of the number above;
- compresses bulky tool output losslessly, shrinking the re-send tax at its source;
- routes across the providers and accounts you own, by cost and latency, with failover;
- caps a per-session budget and can auto-downgrade the model as you approach it;
- records measured savings against a baseline — and shows
$0when it can't prove one.
merido never promises a headline savings number. It measures against your real spend, shows the conditions, and uses your own API keys, self-hosted, never pooling or reselling them.
See your real number, not an estimate
Open source, single self-hosted binary, on your own keys. Point your CLI at merido and watch the burn-rate meter live.
Related guides
- How to reduce Claude Code costs — 8 tactics that attack the re-send tax.
- How to reduce Cursor AI costs — the same mechanism for Cursor.
- Claude Code vs Cursor pricing — a price-free comparison.
- Self-hosted LLM gateway — own your keys, data, and routing.
Frequently asked questions
How is the cost calculated?
Input at turn t is the starting context plus per-turn growth
times the turns so far; the session sums that across every turn, so the
re-send term grows with the square of session length. Output is turns ×
output per turn. Multiply tokens by your model's price per million and scale
by sessions per day and working days. Everything is editable.
What is the cumulative-input tax?
The share of your bill from re-sending earlier context on every turn rather than from the first turn. Because each turn carries everything before it, it usually dominates a long session and is the part compression, context hygiene and caching reduce.
Are the prices real?
The price fields default to generic tiers and are fully editable. Model prices change frequently — enter the current per-million price from your provider's official pricing page for an accurate estimate.
Does merido guarantee these savings?
No. This tool estimates where your money goes; it doesn't promise a savings percentage. merido measures actual savings on your real traffic against a baseline and reports nothing when it can't prove a saving. Get started here.