Context Management

You're using AI to refactor a large module. The Agent has read a dozen files, executed a bunch of shell commands, and searched tens of symbol references. Half an hour in, it starts getting "sluggish" — answers go off-track, it repeats earlier mistakes, and forgets key information you already told it.

The model didn't get dumber. The context window got overwhelmed.

Helix solves this with two complementary strategies: Cache (store and recall on demand) and Compact (compress into structured summaries).

Why Context Management Is Critical

Token consumption in tool-intensive AI engineering tasks grows far faster than in normal conversation:

Reading a 500-line file → ~2,000 tokens
Executing a find command → potentially thousands of lines of output
Searching code references → match results across dozens of files
Every tool call's parameters and results count toward context

After 10 tool calls, you may have consumed 50K+ tokens. And most models concentrate effective attention on recent content — important early decisions and context are being forgotten.

Without context management, you have two choices:

Start a new chat — lose all progress
Push through — output quality keeps declining, costs keep rising

Helix offers a third way.

Cache: Store Large Outputs, Recall on Demand

Cache solves this problem: tool outputs are too large, but you don't need them every time.

How It Works

After conversation turns exceed a threshold, Helix automatically replaces large tool outputs in earlier messages with lightweight cache markers
Original content is stored in a local KV cache (Pebble storage engine with SHA256 content-addressed deduplication)
Cache markers remain in conversation history, in the form [CACHED] recall_cached_content("sha256hash")
When the Agent needs to revisit a historical tool output, it recalls the original content on demand via the recall_cached_content tool

Key Design Choices

Content-addressed deduplication — identical content stored only once, SHA256 hash ensures uniqueness
High reversibility — any cached content can be recalled in full, no information loss
Automatic triggering — no user intervention needed, the system handles it based on conversation turn count
Workspace-level isolation — each Workspace has its own cache instance, managed through the builtin_context_cache MCP server

Practical Effect

Imagine a session with 80 tool calls where the first 60 tool outputs are no longer in the "active working area." Cache replaces them with markers of a few dozen bytes, freeing massive context space, while the Agent can still "look back" at any historical output whenever needed.

Compact: Compress Old Conversations into Structured Summaries

Compact solves this problem: conversation history is too long, but you can't just throw it all away.

Trigger Conditions

Compact triggers automatically when token usage reaches 85% of the context window (default window size: 128K tokens). Additional conditions must also be met:

No compression operation is currently in progress
At least 60 seconds have passed since the last compression (cooldown period)
There are enough historical messages to compress

Compression Method: "Handover Document" Format

Compact doesn't simply truncate or delete old messages. It calls AI to generate a structured handover document summary containing six sections:

Summary Section	Contents
Project Background	Project type, tech stack, key architecture information
Technical Details	Important technical facts and constraints discovered
Completed Work	What's been done so far, key decisions and rationale
Current Issues	Blockers and unresolved problems
TODO Items	What to do next, prioritized
User Preferences	Style, constraints, and special requirements expressed by the user

This summary is injected into conversation history as a [Conversation History Summary] message, replacing the compressed old messages.

Three-Phase Lock Strategy

To ensure the compression process doesn't conflict with ongoing conversations, Compact uses a three-phase lock strategy:

Hold lock, read data — acquire lock, read messages to compress
Release lock, call AI — release lock, call AI to generate summary (this step takes a while, doesn't block conversation)
Re-acquire lock, write results — lock again, write summary into conversation history

This means users can continue chatting normally while Compact generates its summary — no blocking.

Cache vs Compact: When to Use Which

Dimension	Cache	Compact
Optimizes	Repeated transmission of large tool outputs	Overly long conversation history
Reversibility	✅ High — full original content recoverable anytime	⚠️ Low — compressed to summary, details lost
Trigger	Based on conversation turn threshold	Token usage hits 85% of window
Storage	Local KV cache (Pebble + SHA256)	Replaced with summary message
User Perception	Nearly invisible — Agent auto-decides when to recall	May be noticeable — early details get summarized
Applies To	File contents, command output, search results	Completed discussion segments, earlier reasoning

Complete Lifecycle of Dual-Strategy Collaboration

Session starts
    │
    ▼ Normal conversation (all content in context)
    │
    ├─ Tool outputs accumulate...
    │
    ▼ Turn count exceeds threshold → Cache activates
    │  Early tool outputs → cache markers
    │  When needed → recall_cached_content retrieves them
    │
    ├─ Conversation keeps growing...
    │
    ▼ Tokens hit 85% of window → Compact triggers
    │  Old conversations → "handover document" summary
    │  Recent conversations remain intact
    │
    ├─ Work continues...
    │
    ▼ Approaching limit again → Compact fires again (after 60s cooldown)
    │
    └─ Session can run for hours / across days

Real-World Scenario

Scenario: A multi-day large-scale refactoring task

Hour 1: Understanding code structure, reading a dozen files → Cache starts storing early tool outputs
Hour 2: Forming a plan, beginning code changes → context approaches 85% → Compact generates the first handover summary
Hour 3: Continuing changes, encountering an issue that requires reviewing an earlier file → Agent recalls it via Cache
Next day: Another Compact → summary updates → work continues without losing key context

Throughout this entire process, users don't need to manage anything manually. The system automatically balances context capacity, information retention, and cost.

Synergies with Other Capabilities

With SubAgents

SubAgents run in isolated contexts with their own independent Cache/Compact strategies. A subtask's intermediate tool outputs never enter the main session's context — the main session only receives distilled summary results. This is another form of "context protection."

With Workspaces

Both Cache and Compact operate at the Workspace level — each Workspace has independent cache storage and compression policies. Context management across different projects never interferes.

Multi-Agent Architecture — how SubAgents coordinate with context management
Workspace Architecture — Workspace-level context isolation
Feature Overview — back to the core capabilities overview

Why Context Management Is Critical​

Cache: Store Large Outputs, Recall on Demand​

How It Works​

Key Design Choices​

Practical Effect​

Compact: Compress Old Conversations into Structured Summaries​

Trigger Conditions​

Compression Method: "Handover Document" Format​

Three-Phase Lock Strategy​

Cache vs Compact: When to Use Which​

Complete Lifecycle of Dual-Strategy Collaboration​

Real-World Scenario​

Synergies with Other Capabilities​

With SubAgents​

With Workspaces​

Related Documentation​