fix(analytics/gain): cap per-call saved tokens at Claude tool-result ceiling#1978
Open
YOMXXX wants to merge 1 commit into
Open
fix(analytics/gain): cap per-call saved tokens at Claude tool-result ceiling#1978YOMXXX wants to merge 1 commit into
YOMXXX wants to merge 1 commit into
Conversation
…ceiling Issue rtk-ai#1973 / rtk-ai#1935: 'rtk gain' was reporting impossible figures — e.g. ~1.6M saved tokens per command on average, with one day showing 1.4B total tokens used and saved. The root cause: when a command like 'rtk read 50MB.log' filters a multi-megabyte file, the naive 'saved = input_tokens - output_tokens' attribution recorded millions of 'saved' tokens. But Claude's tool-result surface is capped around 25K tokens; anything beyond that wouldn't have reached Claude under any scheme, so the attribution overstated RTK's contribution by orders of magnitude. This change caps the per-call saved_tokens attribution at CLAUDE_TOOL_RESULT_CAP (25_000, matching Claude Code's default MAX_OUTPUT_TOKENS for tool results), in two places: - write side (Tracker::record): new rows are capped on insert; the savings_pct denominator is also capped so a 25K / 12M ratio doesn't flatten the percentage to ~0% for legitimately-filtered large logs. - read side (Tracker::new migration): on first open, historical rows with saved_tokens > 25_000 are clamped to 25_000, and their savings_pct is recomputed using the capped denominator. Idempotent. Four new tests cover: huge-input cap, small savings unchanged, pct uses capped denominator, passthrough (input == output) yields 0 saved. Fixes rtk-ai#1973, rtk-ai#1935
Contributor
Author
|
recheck |
1 similar comment
Contributor
Author
|
recheck |
Contributor
Author
|
@CLAassistant recheck |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
`rtk gain` has been reporting impossible figures — issue #1973 shows ~1.6M saved tokens per command on average; issue #1935 shows one day with 1.4B tokens used and saved. Both are the same root cause: when a command like `rtk read 50MB.log` filters a multi-megabyte file, the naive `saved = input_tokens - output_tokens` attribution recorded millions of "saved" tokens, but Claude's tool-result surface is capped around 25K tokens — anything beyond that never reaches Claude under any scheme.
This PR caps the per-call saved attribution at the Claude tool-result ceiling so the dashboard reflects realistic LLM-side savings.
Reproduction
```bash
Before this PR
rtk read /tmp/huge.log # any file > ~100KB
rtk gain --history | head -3
Shows e.g. 'Saved: 12M tokens (100%)' for one call
After this PR (or after first
rtk gainre-open of an existing DB)Per-call saved_tokens ≤ 25_000.
```
Root cause
`src/core/tracking.rs::Tracker::record` computed `saved = input_tokens.saturating_sub(output_tokens)` and stored it verbatim. There was no realistic upper bound, so a 12M-token `input` (raw `stdout`) yielded a 12M-token "saved" figure even though Claude's tool-result cap would have truncated that input to ~25K before it reached the model.
Fix approach
Behavior changes
Test plan
Fixes #1973, #1935