I Worked All Day With Opus 4.7. Never Hit a Limit.

· claude-code, tokens, workflow

Everyone on my feed is complaining about token limits since Opus 4.7 dropped. Burning through context in an hour. Paying more for less. Wondering if it is worth it.

I worked a full day. Eight hours. Multiple projects. Never hit a wall.

This is not luck. It is a system.

The problem is not the model

Opus 4.7 is more capable. More capability means more output. More output means more tokens. People are blaming Anthropic when the real issue is their workflow has no discipline.

Claude will fill every token you give it permission to fill. “Let me think through this carefully…” — that sentence alone costs you nothing except the fifty sentences like it that follow. Claude is not being helpful. It is being verbose. There is a difference.

What I run

RTK (Rust Token Killer). A CLI proxy that hooks into every bash command before Claude sees the output. Filters noise, compresses results, strips what does not matter. Transparent. I never think about it. It just runs.

My current lifetime numbers: 27.9 million tokens saved across 4,849 commands. 93 percent efficiency. That is not a projection. That is what rtk gain prints when I run it right now. The biggest single contributor is rtk read, which alone has saved 22.5M tokens by stripping boilerplate and binary noise from file reads.

A CLAUDE.md that means business. Every session loads a file with hard rules: no preamble, no postamble, no meta-commentary (“Let me now…”, “I’ll start by…”), maximum three prose lines between tool calls, one line per file changed after edits. The model reads it. The model follows it. Verbosity is a default, not a law.

A memory layer that actually remembers. Qdrant holds the long-term knowledge. A small index file sits at the root of my workspace and loads at the start of every session. It points to project state, open decisions, what failed last week. I do not re-explain the codebase every time. I do not paste the same context block into every prompt. When a session ends, it writes back what changed. The next one picks up where this one left off. This is the single biggest reason I do not hit limits: I am not paying tokens to rebuild context I already paid for yesterday.

Domain separation, enforced at the config layer. Trading tools do not load in the writing project. The novel tools do not load in the trading lab. Every loaded MCP server costs tokens on every turn, because its tool definitions get injected into the system prompt.

How it works in practice: each project directory has its own .mcp.json that declares only the servers that project needs. When I open Claude Code from one project folder, only that project’s tools load. Open it from a different folder, a different toolset. Claude Code respects the working-directory scope by default. Most people put everything in a global config and pay for it on every turn.

Route work to cheaper models. Claude is the orchestrator, not the workhorse. Synthesis across large document sets runs on Groq’s Llama 3.3 70B. Bulk work like classification and summarization runs on Groq’s 8B Instant. Anything sensitive or offline runs on Ollama locally. Claude only sees the distilled result, not the raw input. A 50-page PDF summarized by Groq costs me a few hundred Claude tokens for the summary, not the 80,000 it would take to put the whole thing in context.

Short sessions, measured in prompts, not hours. This is the one most people get wrong. A session is not a time window. A session is a prompt budget.

I typically end a session after fewer than ten prompts, at the first natural savepoint. I save the work, write a handoff note, and start a fresh session for the next chunk. A task can take hours and span four or five sessions. Each session starts lean, does one thing, writes down what happened, and closes.

The reason this works: the context window grows linearly with every turn, but the cost of each new turn grows with the total context already loaded. Turn 40 is not twice as expensive as turn 20. It is much more than that. Ending early and resuming with a handoff note resets the clock. Most people think they are saving time by not stopping. They are paying a compounding tax they cannot see.

This is tighter than most people can stomach. That is fine. The point is the direction, not my specific number.

Three rules you can apply today

One: Write a CLAUDE.md. Put it in your project root. Here is a block you can paste in right now:

## Output Token Conservation
- NO preamble/postamble/pleasantries.
- NO meta-commentary ("Let me read...", "I'll now..."). Just execute.
- After edits: one line per file changed. No code recap.
- After commands: report only unexpected results or errors.
- Errors: fix silently. Report only persistent failures.
- Code speaks for itself. Explain only when non-obvious or asked.
- Maximum 3 prose lines between consecutive tool calls.
- ALWAYS Edit for existing files. Write ONLY for new files.

That is what runs at the top of every one of my sessions. Watch your token count drop the first time you ship it.

Two: End sessions early. If the immediate task is done, stop. Do not turn one session into a sprawling conversation. Context bloat is cumulative and it accelerates.

Three: Filter your bash output. Whether you use RTK or write your own hook, stop piping raw command output into the context window. A 500-line git diff does not need to be there in full. Summarize it.

The bigger point

If you are running Claude Code on factory settings, you are leaving tokens on the table. Not a little. A lot. 93 percent, in my case, just on the bash-output layer alone.

The pattern that works: configure the environment so the model operates within constraints by default. Tight output rules. Structured memory. Filtered inputs. Per-project tool scopes. Work routed to the cheapest model that can do it. Sessions measured in prompts, not hours.

This is not clever. It is plumbing. The model’s defaults are tuned for a general audience, not for a professional workflow. If you do not override them, you pay for that choice every day.

None of this is hidden knowledge. Every piece of it is in Anthropic’s docs or one config file away. The only reason most people have not done it is that nobody told them it was the actual job.

Stop blaming the model. Configure the environment.


If your Claude Code workflow is burning through tokens and slowing you down, I can fix that. One session, your specific stack, a setup that works.

Book a session