Skip to content

Token saving

Every request through merido can be made cheaper without changing your tools. Two mechanisms do the work, and a ledger proves the result. A hard rule underpins both: the token-saver must never grow or empty content — it fails safe.

tool_result compression (automatic)

Coding agents send large tool_result blocks back to the model — file dumps, command output, search results. These are mostly low-signal tokens. merido runs RTK-style compression filters over tool output, auto-detecting the kind of content and shrinking it while preserving meaning.

This is on by the gateway's defaults and requires no per-request configuration. You can inspect which filters are active via GET /api/token-saver/filters (or the dashboard).

Caveman mode (opt-in output compression)

Where tool-result compression shrinks input, Caveman mode cuts output verbosity by injecting an intensity-tuned, format-aware system prompt that nudges the model toward terser responses. Set MERIDO_CAVEMAN_LEVEL to one of:

LevelEffect
liteLight trimming of filler; safest.
fullNoticeably terser output.
ultraAggressive compression.
wenyan-lite / wenyan / wenyan-ultraClassical-Chinese-style ultra-dense variants.

Leave it unset (or blank) to disable; an unrecognised value is ignored and logged at startup.

The tradeoff: more aggressive levels save more output tokens but can strip nuance, formatting, or explanation. Start at lite, measure, and increase only if the terser output is acceptable for your use case. Caveman affects style, not correctness-critical content.

The savings ledger

merido records what each optimization saved into a savings ledger — the "the gateway paid for itself" receipts. View it:

  • CLI: merido gain — usage totals and estimated cost.
  • API: GET /api/savings (raw receipts), /api/savings/totals, /api/savings/rollup, /api/savings/export.
  • Dashboard: the savings view renders these totals and rollups.

MIT / Apache-2.0 licensed.