Context Management
You're using AI to refactor a large module. The Agent has read a dozen files, executed a bunch of shell commands, and searched tens of symbol references. Half an hour in, it starts getting "sluggish" — answers go off-track, it repeats earlier mistakes, and forgets key information you already told it.
The model didn't get dumber. The context window got overwhelmed.
Helix solves this with two complementary strategies: Cache (store and recall on demand) and Compact (compress into structured summaries).
Why Context Management Is Critical
Token consumption in tool-intensive AI engineering tasks grows far faster than in normal conversation:
- Reading a 500-line file → ~2,000 tokens
- Executing a
findcommand → potentially thousands of lines of output - Searching code references → match results across dozens of files
- Every tool call's parameters and results count toward context
After 10 tool calls, you may have consumed 50K+ tokens. And most models concentrate effective attention on recent content — important early decisions and context are being forgotten.
Without context management, you have two choices:
- Start a new chat — lose all progress
- Push through — output quality keeps declining, costs keep rising
Helix offers a third way.
Cache: Store Large Outputs, Recall on Demand
Cache solves this problem: tool outputs are too large, but you don't need them every time.
How It Works
- After conversation turns exceed a threshold, Helix automatically replaces large tool outputs in earlier messages with lightweight cache markers
- Original content is stored in a local KV cache (Pebble storage engine with SHA256 content-addressed deduplication)
- Cache markers remain in conversation history, in the form
[CACHED] recall_cached_content("sha256hash") - When the Agent needs to revisit a historical tool output, it recalls the original content on demand via the
recall_cached_contenttool
Key Design Choices
- Content-addressed deduplication — identical content stored only once, SHA256 hash ensures uniqueness
- High reversibility — any cached content can be recalled in full, no information loss
- Automatic triggering — no user intervention needed, the system handles it based on conversation turn count
- Workspace-level isolation — each Workspace has its own cache instance, managed through the
builtin_context_cacheMCP server
Practical Effect
Imagine a session with 80 tool calls where the first 60 tool outputs are no longer in the "active working area." Cache replaces them with markers of a few dozen bytes, freeing massive context space, while the Agent can still "look back" at any historical output whenever needed.
Compact: Compress Old Conversations into Structured Summaries
Compact solves this problem: conversation history is too long, but you can't just throw it all away.
Trigger Conditions
Compact triggers automatically when token usage reaches 85% of the context window (default window size: 128K tokens). Additional conditions must also be met:
- No compression operation is currently in progress
- At least 60 seconds have passed since the last compression (cooldown period)
- There are enough historical messages to compress
Compression Method: "Handover Document" Format
Compact doesn't simply truncate or delete old messages. It calls AI to generate a structured handover document summary containing six sections:
| Summary Section | Contents |
|---|---|
| Project Background | Project type, tech stack, key architecture information |
| Technical Details | Important technical facts and constraints discovered |
| Completed Work | What's been done so far, key decisions and rationale |
| Current Issues | Blockers and unresolved problems |
| TODO Items | What to do next, prioritized |
| User Preferences | Style, constraints, and special requirements expressed by the user |
This summary is injected into conversation history as a [Conversation History Summary] message, replacing the compressed old messages.
Three-Phase Lock Strategy
To ensure the compression process doesn't conflict with ongoing conversations, Compact uses a three-phase lock strategy:
- Hold lock, read data — acquire lock, read messages to compress
- Release lock, call AI — release lock, call AI to generate summary (this step takes a while, doesn't block conversation)
- Re-acquire lock, write results — lock again, write summary into conversation history
This means users can continue chatting normally while Compact generates its summary — no blocking.
Cache vs Compact: When to Use Which
| Dimension | Cache | Compact |
|---|---|---|
| Optimizes | Repeated transmission of large tool outputs | Overly long conversation history |
| Reversibility | ✅ High — full original content recoverable anytime | ⚠️ Low — compressed to summary, details lost |
| Trigger | Based on conversation turn threshold | Token usage hits 85% of window |
| Storage | Local KV cache (Pebble + SHA256) | Replaced with summary message |
| User Perception | Nearly invisible — Agent auto-decides when to recall | May be noticeable — early details get summarized |
| Applies To | File contents, command output, search results | Completed discussion segments, earlier reasoning |
Complete Lifecycle of Dual-Strategy Collaboration
Session starts
│
▼ Normal conversation (all content in context)
│
├─ Tool outputs accumulate...
│
▼ Turn count exceeds threshold → Cache activates
│ Early tool outputs → cache markers
│ When needed → recall_cached_content retrieves them
│
├─ Conversation keeps growing...
│
▼ Tokens hit 85% of window → Compact triggers
│ Old conversations → "handover document" summary
│ Recent conversations remain intact
│
├─ Work continues...
│
▼ Approaching limit again → Compact fires again (after 60s cooldown)
│
└─ Session can run for hours / across days
Real-World Scenario
Scenario: A multi-day large-scale refactoring task
- Hour 1: Understanding code structure, reading a dozen files → Cache starts storing early tool outputs
- Hour 2: Forming a plan, beginning code changes → context approaches 85% → Compact generates the first handover summary
- Hour 3: Continuing changes, encountering an issue that requires reviewing an earlier file → Agent recalls it via Cache
- Next day: Another Compact → summary updates → work continues without losing key context
Throughout this entire process, users don't need to manage anything manually. The system automatically balances context capacity, information retention, and cost.
Synergies with Other Capabilities
With SubAgents
SubAgents run in isolated contexts with their own independent Cache/Compact strategies. A subtask's intermediate tool outputs never enter the main session's context — the main session only receives distilled summary results. This is another form of "context protection."
With Workspaces
Both Cache and Compact operate at the Workspace level — each Workspace has independent cache storage and compression policies. Context management across different projects never interferes.
Related Documentation
- Multi-Agent Architecture — how SubAgents coordinate with context management
- Workspace Architecture — Workspace-level context isolation
- Feature Overview — back to the core capabilities overview