If your Aider API bill is higher than you expected, the cause is almost certainly not the model’s per-token price — those keep falling. The cause is how much Aider sends on every turn. It resends the entire conversation each request, its repo map puts real repository structure into every prompt, and whole-file edits mean large blocks of code travel back and forth. The good news: most of that cost is reducible with the right habits and, for the mechanical parts, a gateway. Here is where the money goes and what you can do about it.
Why Aider gets expensive
Aider shares one cost mechanism with every other agentic coding tool — Claude Code, Cline, Cursor, Codex — but it adds its own amplifiers on top. The shared mechanism: the model is stateless, so every turn resends the entire conversation. Turn 1 might send a few thousand input tokens. By turn 20, each request can carry tens of thousands of tokens and you pay for that cumulative input on every turn, not just the first. See why AI coding bills explode for a detailed breakdown of the cumulative-input problem.
Aider’s amplifiers are specific to how it works:
The repo map. Aider builds a map of your repository — file names, symbols, call signatures — and includes it in the prompt so the model understands the shape of your codebase without reading every file in full. On a small project this is compact; on a larger one it can add thousands of tokens to every single request, and those tokens are resent on every following turn.
Whole-file edits. Aider’s default edit format sends the full content of changed files, not just diffs. When the model is editing a 200-line file, those 200 lines go into context and stay there for the rest of the session, resent on every turn.
Shell command output. Aider can run tests, linters, and build commands and append the results to context. A test suite that prints a hundred lines of output becomes a permanent passenger — paid for on every subsequent turn, just like Cline’s terminal output.
Because the bill is driven by re-sent input, the durable way to cut it is to shrink and cache the input that gets resent — not just to switch to a cheaper model. A smaller context on the right model beats a cheaper model on a bloated one.
8 ways to reduce your Aider bill
01Keep your /add file set tight — only add what you are editing
Every file you /add goes into context and stays there until
you /drop it. Aider does not silently drop files as you move
through a task. Add only the files you are actively changing right now;
drop them before pivoting to a different part of the codebase. The repo map
gives the model enough structural awareness of files you have not added;
full file content is for files you are actually editing.
02Start a fresh session per task — don’t drag sessions
The cheapest token is the one you never resend. When you finish one sub-task, exit and start a new Aider session instead of continuing the same thread. A fresh session means turn 1 again pays for a small input; dragging a completed task’s conversation into the next one means every new turn re-reads all that stale context. This is free and has an immediate effect on cumulative cost.
03Match the model to the task difficulty
Aider is BYOK by nature — you supply your own provider key — and it supports switching models. Not every edit needs the flagship model. Renaming, boilerplate, simple refactors, and routine Q&A can run on a smaller, less expensive model. Reserve the expensive reasoning model for the steps that genuinely need it. The easiest way to get task-level routing without changing your workflow is a gateway that selects the right model per request.
04Use prompt caching — on the stable prefix
If a stable block of context — system prompt, project conventions, a small set of key files that do not change often — is reused across many turns, caching it lets you pay the full token price once and a fraction thereafter. The catch: if the cached prefix keeps changing (because files you are editing are part of it), the cache is invalidated and you pay full price anyway. The win is in caching the genuinely stable part. This depends on provider support and requires getting the cache boundary right.
05Compress shell and command output
When Aider runs a test suite or build command and captures the output, that output gets appended to context. Verbose build logs, test runner output, and linter results are often far larger than the useful signal they contain. Compressing that output before it enters context removes a significant source of cumulative input. Depending on the command, these specific payloads can shrink 60–90%, losslessly — without losing the information the model needs to keep going.
06Cap a budget before you start
Knowing the spend after the fact does not prevent bill shock. A hard per-session or per-task budget cap — optionally downgrading the model as you approach the limit — turns “how much did that cost?” into a number you chose, not a surprise. Set it before the session starts.
07Route across the provider accounts you already own
Aider is BYOK by design; you are already supplying your own key. Most developers have more than one provider account, or unused quota on a free tier. A gateway can route across all of them — by cost, latency, and availability — with automatic failover when one provider is rate-limited or down. That uses capacity you are already paying for instead of funneling everything through a single key on a single provider.
08Measure before you trust a number
Any tool that claims big savings percentages without showing you the
baseline and the conditions is not giving you a real number. That
applies to your own intuitions too. The honest way to know you are
saving is a ledger that compares against measured spend and reports
$0 when it cannot prove a saving. Use the
AI coding cost calculator
to estimate your baseline before you start optimizing.
Let a gateway handle the mechanical parts
Tactics 1–3 are habits you change once. Tactics 4–8 are work you do not
want to do by hand on every session — and that is what
merido is for. merido is an open-source, local-first AI
gateway written in Rust. Because Aider supports any OpenAI-compatible
endpoint via the —openai-api-base flag (or the
openai-api-base config entry), pointing it at merido takes
a single change:
aider --openai-api-base http://127.0.0.1:8788/v1 --model openai/<your-model>
Once it is pointed at merido, you get:
- Shell and command output compression shrinks verbose build logs, test output, and linter results before they enter context (tactic 5) — 60–90% on specific command payloads, losslessly, depending on the command.
- Cost-, quota- and latency-aware routing spreads requests across every provider and account you own, with automatic failover when one is rate-limited or unavailable (tactics 3 & 7).
- Strategic prompt-cache control manages the cache boundary so caching helps instead of backfiring (tactic 4).
- A live burn-rate meter and per-session budget caps show what you are spending and stop it where you set the line (tactics 6 & 8).
- A savings ledger records measured savings against a baseline — and shows
$0when it cannot prove one (tactic 8).
merido uses your own API keys, runs self-hosted on your own machine, and never pools, shares, or resells credentials. Aider is already BYOK — merido just adds routing, compression, and observability on top of the keys you already own.
merido speaks an OpenAI-compatible API and supports Aider, Claude Code, Cline, Cursor, Codex, and Continue as first-class clients. Point Aider at merido’s local endpoint and keep working exactly as before — just with the bill in plain sight and the mechanical savings handled automatically. The same core cumulative-input mechanism affects every agentic tool: see reducing Cline costs and reducing Claude Code costs for the same analysis applied to those tools.
See, cap, and prove your Aider spend
Open source, single self-hosted binary, on your own keys. Get started in a couple of minutes.
Related guides
- Why AI coding bills explode — a deeper look at the cumulative-input problem that drives Aider's cost.
- AI coding cost calculator — estimate your baseline before optimizing.
- How to reduce Cline costs — the same mechanism for Cline.
- How to reduce Claude Code costs — the same mechanism for Claude Code.
Frequently asked questions
Why is Aider running up my API bill?
Aider sends the entire conversation — including every file edit and shell command result — to the model on every turn. The model is stateless, so it re-reads all of that on each request. Aider's repo map and whole-file editing style also put substantial repository context into each prompt. By the time you are a dozen turns into a task, each request can carry tens of thousands of input tokens, and you pay for that on every turn.
Does the repo map cost a lot?
It depends on your repository size and how many files Aider includes. The repo map is designed to be a compact summary, but on larger codebases it can still add thousands of tokens to every prompt — and because it is resent on every turn, those tokens compound. Limiting the files you actively add with /add keeps the per-turn input much smaller than letting Aider scan broadly.
Can I point Aider at a gateway instead of a provider directly?
Yes. Aider is BYOK by nature and supports any OpenAI-compatible endpoint via --openai-api-base (or the equivalent config). Pointing it at a gateway like merido takes a single flag or config entry. The gateway can then compress command output, route across the provider accounts you own, manage prompt-cache boundaries, enforce spend caps, and record measured savings — all transparently, without changing how Aider behaves.
Can merido lower my Aider bill automatically?
merido handles the mechanical tactics: compressing shell and command output before it enters context, routing across every provider key you own, managing the prompt-cache boundary, enforcing spend caps, and recording measured savings against a baseline — showing $0 when it cannot prove one. It runs self-hosted on your own machine, using your own keys, and never pools or resells them. Get started here.