Skip to content

fix(analytics/gain): cap per-call saved tokens at Claude tool-result ceiling#1978

Open
YOMXXX wants to merge 1 commit into
rtk-ai:developfrom
YOMXXX:fix/gain-per-call-cap
Open

fix(analytics/gain): cap per-call saved tokens at Claude tool-result ceiling#1978
YOMXXX wants to merge 1 commit into
rtk-ai:developfrom
YOMXXX:fix/gain-per-call-cap

Conversation

@YOMXXX
Copy link
Copy Markdown
Contributor

@YOMXXX YOMXXX commented May 20, 2026

Summary

`rtk gain` has been reporting impossible figures — issue #1973 shows ~1.6M saved tokens per command on average; issue #1935 shows one day with 1.4B tokens used and saved. Both are the same root cause: when a command like `rtk read 50MB.log` filters a multi-megabyte file, the naive `saved = input_tokens - output_tokens` attribution recorded millions of "saved" tokens, but Claude's tool-result surface is capped around 25K tokens — anything beyond that never reaches Claude under any scheme.

This PR caps the per-call saved attribution at the Claude tool-result ceiling so the dashboard reflects realistic LLM-side savings.

Reproduction

```bash

Before this PR

rtk read /tmp/huge.log # any file > ~100KB
rtk gain --history | head -3

Shows e.g. 'Saved: 12M tokens (100%)' for one call

After this PR (or after first rtk gain re-open of an existing DB)

Per-call saved_tokens ≤ 25_000.

```

Root cause

`src/core/tracking.rs::Tracker::record` computed `saved = input_tokens.saturating_sub(output_tokens)` and stored it verbatim. There was no realistic upper bound, so a 12M-token `input` (raw `stdout`) yielded a 12M-token "saved" figure even though Claude's tool-result cap would have truncated that input to ~25K before it reached the model.

Fix approach

  • New constant `CLAUDE_TOOL_RESULT_CAP = 25_000` (mirrors Claude Code's default `MAX_OUTPUT_TOKENS` for tool results).
  • Write side (`Tracker::record`): `saved` is capped to `CLAUDE_TOOL_RESULT_CAP`; the `savings_pct` denominator is also capped so a 25K / 12M ratio doesn't crush the percentage to ~0% on legitimately-filtered large logs.
  • Read side migration (`Tracker::new`): on first open, rows with `saved_tokens > 25_000` get clamped, and their `savings_pct` is recomputed using the capped denominator. Idempotent — re-running has no effect once every row is within the cap.
  • `input_tokens` and `output_tokens` are intentionally left raw (so the schema can still surface "raw local-processing volume" if a future UI wants it). Only the attribution metric is normalized.

Behavior changes

  • The dashboard's headline numbers will drop dramatically for users who heavily use `rtk read` on large logs. This is intentional and correct: the previous numbers measured local processing volume, not Claude-quota impact. The realistic-cap version is what users were asking for in Misleading analytics: gain over-counts "tokens saved"; discover under-counts adoption when hook rewrites are in play #1973.
  • Per-call `savings_pct` for legitimately-filtered large inputs is now meaningful (60-100%) instead of being diluted to ~0%.
  • Small-savings commands (well under 25K) are unaffected.
  • `rtk proxy` passthrough rows (input == output) still record 0 saved / 0% — no change.

Test plan

  • `test_record_caps_saved_tokens_at_claude_tool_result_cap` — 12M input, 5K output → `saved == 25_000`.
  • `test_record_pct_uses_capped_denominator` — `pct ≥ 75%` for the same scenario (not diluted to ~0%).
  • `test_record_small_savings_unchanged` — 1000 → 200 still yields `saved == 800` and `pct == 80%`.
  • `test_record_passthrough_unaffected_by_cap` — input == output → 0 saved, 0%.
  • `cargo fmt --all` clean.
  • `cargo clippy --all-targets` zero warnings.
  • `cargo test --bin rtk -- --test-threads=8` 1905 passed, 0 failed.

Fixes #1973, #1935

…ceiling

Issue rtk-ai#1973 / rtk-ai#1935: 'rtk gain' was reporting impossible figures — e.g.
~1.6M saved tokens per command on average, with one day showing 1.4B
total tokens used and saved. The root cause: when a command like
'rtk read 50MB.log' filters a multi-megabyte file, the naive
'saved = input_tokens - output_tokens' attribution recorded millions
of 'saved' tokens. But Claude's tool-result surface is capped around
25K tokens; anything beyond that wouldn't have reached Claude under
any scheme, so the attribution overstated RTK's contribution by orders
of magnitude.

This change caps the per-call saved_tokens attribution at
CLAUDE_TOOL_RESULT_CAP (25_000, matching Claude Code's default
MAX_OUTPUT_TOKENS for tool results), in two places:

- write side (Tracker::record): new rows are capped on insert; the
  savings_pct denominator is also capped so a 25K / 12M ratio doesn't
  flatten the percentage to ~0% for legitimately-filtered large logs.
- read side (Tracker::new migration): on first open, historical rows
  with saved_tokens > 25_000 are clamped to 25_000, and their
  savings_pct is recomputed using the capped denominator. Idempotent.

Four new tests cover: huge-input cap, small savings unchanged, pct
uses capped denominator, passthrough (input == output) yields 0 saved.

Fixes rtk-ai#1973, rtk-ai#1935
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 20, 2026

CLA assistant check
All committers have signed the CLA.

@YOMXXX
Copy link
Copy Markdown
Contributor Author

YOMXXX commented May 20, 2026

recheck

1 similar comment
@YOMXXX
Copy link
Copy Markdown
Contributor Author

YOMXXX commented May 20, 2026

recheck

@YOMXXX YOMXXX closed this May 20, 2026
@YOMXXX YOMXXX reopened this May 20, 2026
@YOMXXX
Copy link
Copy Markdown
Contributor Author

YOMXXX commented May 20, 2026

@CLAassistant recheck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misleading analytics: gain over-counts "tokens saved"; discover under-counts adoption when hook rewrites are in play

2 participants