ai can read 18,000 tokens of config on every message — wasting tokens before your prompt even starts
memorytune compresses your Claude Code config — CLAUDE.md, memory files, skill descriptions — so you spend tokens on work, not overhead.
type full workday
duration 4+ hours
model opus 4.6
plan max 5
effort max
tokens consumed
0K
tokens saved
0K
usage limits hit
0
config reduction
0%
workday — max 5 plan, opus 4.6, max effort, 8am to 6pm
tuned session
standard config
cumulative impact — tokens remaining after each technique
saved
remaining
compression fidelity — 112 question a/b test
pass
degraded
compressed notation
what it actually looks like
ai reads tokens, not grammar. every heading, bullet point, and adjective is overhead — charged on every message. these are real config blocks, compressed with zero accuracy loss.
naked code — the only reader is the machine.
before — 440 tokens
## Code Documentation
- Every function needs a docstring with
description, args, returns, and examples
- Add inline comments above complex blocks
explaining the reasoning, not the what
- README sections for each module with
architecture overview and data flow
- Type annotations on all function signatures
and class attributes
- Changelog entries for every modification
440 tok
after — 42 tokens
docs:none—ai reads source directly
types:yes,skip obvious
no readme,no changelog,no docstrings
code IS the context
42 tok
mem dedup — save what matters. derive the rest.
before — 1,203 tokens
## Memory System
- Save important user preferences to memory
- Memory files go in the .claude/memory/ dir
- Include frontmatter with name, description,
and type fields
- Update MEMORY.md index when saving memories
- Types: user, feedback, project, reference
- Don't save things derivable from code
- Don't save git history or debugging solutions
- Check for existing memory before creating new
1,203 tok
after — 156 tokens
mem:save→.claude/memory/ w/ frontmatter(name,desc,type)
types:user|feedback|project|reference
update MEMORY.md index on save
skip:code-derivable,git-history,debug-fixes
dedup:check existing before new
156 tok
code switch — write for the reader. the reader is a tokenizer.
before — 380 tokens
## Response Behavior
Please keep your responses concise and
focused on the task at hand. Do not
include unnecessary preamble, summaries,
or pleasantries. When you reference code,
always include the file path and line
number so the user can navigate directly
to the relevant section. If you encounter
an error, explain what went wrong and
suggest a fix rather than just showing
the error message.
380 tok
after — 38 tokens
resp:concise,task-focused,no filler
code ref→filepath:line always
error→explain+fix,not just dump
38 tok
fork loop — when stuck, fork. don't loop.
without fork loop
ssh connection refused
→ retry with -v flag
→ connection refused
→ try port 22 explicitly
→ connection refused
→ try username@ip instead
→ connection refused
→ check firewall... same error
→ retry original command
→ connection refused
→ ask user for help
6 attempts, same wall
fork loop
ssh connection refused ×2
→ fork: agent A keeps ssh debug
agent B checks routing + firewall
→ B finds: no internet forwarding to host
→ B fixes route, ssh connects
solved in 2 steps, not 6
2 attempts trigger fork
hidden state — transformer runs. SSM thinks.
before — 410 tokens
## Working Memory
- Before each response, mentally review all
prior context to maintain continuity
- Keep a running summary of decisions made,
files changed, and approaches tried
- When starting a new task, check if similar
work was done earlier in the session
- Compress old context when approaching
limits — preserve decisions, drop details
- Carry architectural understanding forward
between messages, never start cold
410 tok
after — 39 tokens
mem:compressed state,not full replay
decisions+changes→persist,details→drop
similar prior work→reuse,don't redo
architecture→carry forward always
39 tok
url inject — skip the form. drop the value.
before — 520 tokens
> "go to the project settings"
1. navigate to dashboard.example.com
2. click "Projects" in the sidebar
3. find the project named "api-v2"
4. click the gear icon
5. dialog: "Save changes?" → click OK
6. scroll to "Webhooks" section
7. type the new URL into the field
8. click "Save"
what happened in 4 hours without hitting the usage limit
0:00
CODEBASE AUDIT Full repository scan across 3 services. 140 files analyzed, dependency tree mapped, 51 issues flagged.
0:45
API INTEGRATION Payment provider connected from scratch. Endpoint mapping, webhook validation, error handling, retry logic. Tested end-to-end with live sandbox.
1:30
FRONTEND REBUILD Dashboard rewritten from wireframes. 12 components, responsive grid, real-time data binding.
2:30
DATA MIGRATION Schema redesigned for scale. Migration script with rollback safety. 50K records moved, integrity verified, zero downtime.
3:30
DEPLOY + MONITOR CI/CD pipeline assembled. Automated test suite, staging deploy, production cutover.
5 major tasks. max effort the entire session. 1 context compaction. 0 usage limits hit.
questions
does compressed notation actually work?
112 questions tested across instruction following, code generation, architectural reasoning, debugging, and context retention. identical agents, one verbose, one compressed. 110 passed, 2 edge cases: one complex regex generation lost a capture group, one long-session context recall drifted on variable names. both fixed by backing off compression on those blocks. ai tokenizes into subwords, not grammar — the grammar is overhead paid on every message.
will this break my setup?
no. ai reads the same instructions — just fewer tokens to express them. test against the original after compressing. if comprehension drops on any task, you went too far on that section — back it off.
what tools does this apply to?
anything ai reads before responding. Claude Code, Cursor, Windsurf, Aider, custom agents. if ai spends tokens on config before doing your work, compression helps.
how long does compression take?
typical CLAUDE.md: 30-60 minutes following the patterns above. memory files and skill descriptions add time. the notation examples on this page cover the core patterns.
what about effort levels?
claude code has four effort levels: low, medium, high, and max. higher effort means deeper reasoning but burns tokens faster. this session ran entirely on max. dropping to high or medium for straightforward tasks can stretch the same token budget further — another lever alongside compression.
who built this?
justin at marow.ai. solo engineer, former platform team lead. built this after watching my own token budget burn on config overhead every day. the patterns came from 6 months of daily Claude Code sessions on max effort. [email protected] — happy to talk methodology.