Here are our proven tips for limiting the consumption of Claude Code tokens, from the simplest to the most advanced.
This is the problem of the moment for millions of developers: using Claude Code for a few hours without the AI exceeding its hourly or daily quotas. Running out of tokens in the middle of a project or having to upgrade is quite annoying. Fortunately, to limit your token consumption and partially compensate for Claude Code’s haphazard management of consumption, there are several proven techniques. From the simplest to the most complex, here are the 6 best according to our various tests.
1. Use the “opusplan” planning mode
This is one of the most direct ways to reduce the risk of a temporary blocking of the use of Claude Code: adapt your use case to the right model. Claude Sonnet 4.6 is on paper less efficient than Claude Opus 4.6, but it nonetheless remains one of the best models on the market for coding. To only use Opus 4.6 when it counts, Anthropic developed a “hidden” mode in Claude Code allowing Opus 4.6 to be used only for planning and Sonnet 4.6 for the rest of the tasks. To activate it, simply type the command: /model opusplan.
This is, in our opinion, the best compromise for 90% of use cases. When Claude gets stuck on a bug or a modification requires the most advanced expertise, switch back to Opus 4.6 using maximum reasoning (Max effort).
2. Guide Claude in his management of the context
This is the main lever to reduce the consumption of Claude Code tokens. The /compact command, which compacts the agent context using summaries, can be triggered manually. We recommend that you systematically trigger it before a major modification or the addition of a new functionality to your project. Even more intelligent, it is possible to guide Claude in the use of the command directly in the file CLAUDE.mdwhich is used to give instructions to the AI.
Example of an instruction to add: “Before each modification that you consider important, compact the context of our conversation with /compact.”
Another discovery taken directly from leak from Claude Code : the use of instructions in natural language inviting the AI to be brief, particularly between tool calls, would significantly reduce the consumption of tokens over time. Example: “Use 25 words maximum between two tool calls, 100 words for final responses.
3. Manually clean your context
The context of Claude Code consists precisely of the system prompt hardcoded by Anthropic coupled with the CLAUDE.mdconversation history, but also the tools activated by default in your conversation. Depending on their number, these tools can represent between 5 and 15% of the total size of the context. Repeated, these bricks end up gradually saturating your context window, especially since with the recent addition of skills of all kinds, many users use and abuse them (sometimes rightly).
Our advice here is very clear: systematically check the need for MCP servers, skills and plugins activated by default in your conversation. Temporarily deactivating them often saves valuable tokens. For example, do you need the front-end design skill to work on the back-end part of your project? Probably not.
4. Use a Compression Proxy
Here we enter advanced solutions, but even more effective to drastically reduce the average context of Claude Code. Each command that Claude Code executes in the terminal returns text in the conversation, which is added to the agent’s context. Multiplied by dozens of calls in a session, these outputs quickly saturate the context window, while 80% of this text is unnecessary noise for the AI.
The idea is therefore to insert a compression proxy, an intermediary between the model and the shell, which filters and compresses the output before it reaches the context. The tool RTKopen source, has established itself as the reference in this niche. Concretely, it completely rewrites the bash output: RTK removes noise (comments, spaces, boilerplate), groups similar elements and deduplicates repeated lines. On its repository, RTK announces gains of 60 to 90% on around a hundred common commands (git, cargo, pytest, npm, docker, etc.).
5. Use a knowledge graph
This is a solution that has been gaining momentum in recent weeks: the use of an additional knowledge graph on your code base. When Claude Code reads a large codebase, he uses subagents to understand the entire structure and dependencies. A step that consumes a large part of your context window. The solution? Provide him with this map in advance, in an already digested form, rather than letting him explore it each session. Several open source projects are positioned in this niche, including code-review-graphwhich already has more than 10,000 stars on GitHub.
Concretely, the tool builds a map of your code base (functions, classes, dependencies) and exposes it to Claude via an MCP server. When you modify a file, the graph identifies the files actually impacted by the change, and Claude only reads those instead of scanning the entire project. Code-review-graph announces average gains of 8.2x on the number of tokens used. Very effective, therefore.
6. Forcing Claude to speak like… a caveman
This is the most original solution on this list. It starts from an observation: 30 to 40% of the tokens in a natural language text are used solely for grammar (articles, connectors, passive voice, filler words). Elements that LLMs know how to reconstruct perfectly themselves. Hence the idea of the skill Claude Code caveman-compression : make the model reason in a deliberately telegraphic style, stripped of any embellishment. “In order to optimize the database query performance, we should consider implementing an index “becomes” Need fast queries. Add index to frequently used columns”, i.e. 29% savings without loss of information. On verbose system prompts, the announced gains go up to 58%.




