Architecture Overview
Most AI coding tools are a chat box plus a model. Helix is not.
Helix is a workspace-centered multi-agent engineering system — it unifies your codebase, terminal, models, toolchain, and multi-agent orchestration into a single workflow, enabling AI to complete real software engineering tasks end-to-end, not just answer questions.
System Overview
┌─────────────────────────────────────────────────────┐
│ Helix UI (helix · Flutter Desktop / Web) │
│ ├─ Workspace & session management │
│ ├─ SubAgent parallel status panel │
│ ├─ MCP tool execution live view │
│ └─ Model selector & Dual Agent mode │
│ │ REST + WebSocket │
└────────────────────┼────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────┐
│ Helix Backend (helix-agent · Go) │
│ ├─ Workspace Manager — isolation + lifecycle │
│ ├─ Session Engine — streaming + tool calls + retry │
│ ├─ Multi-Agent — Manager / Execution / SubAgent │
│ ├─ Context Engine — KV Cache + Compact compression │
│ ├─ 12 built-in MCP servers — Shell / FS / LSP / … │
│ └─ Provider adapters — DeepSeek / Claude / OpenAI │
└─────────────────────────────────────────────────────┘
Four Design Principles
1. Keep user intent stable over long tasks
A single Agent often forgets the original goal after 30+ consecutive tool calls. Helix solves this with layered orchestration: a Manager Agent locks the objective, an Execution Agent drives implementation, and SubAgents handle deep subtasks. Each layer has a single responsibility, and the goal is protected at every node in the execution chain.
2. Make parallel execution observable
Many products let AI "think silently" in the background while users stare at a loading spinner. Helix exposes subtask creation, execution status, and intermediate results in the UI — you can see in real time which Agent is working on what, how it's progressing, and what it found.
3. Keep long sessions healthy
Tool-intensive conversations consume context at an alarming rate. Helix uses a Cache + Compact dual strategy: large tool outputs are stored in a KV cache and recalled on demand; old conversations are compressed into structured summaries. This lets a single session run for hours or even across days without needing to "start a new chat."
4. Strictly isolate projects
Every Workspace has its own session history, tool configuration, model selection, and MCP server instances. You can work on a frontend repo, a backend service, and an infrastructure project simultaneously — their contexts never cross-contaminate.
Multi-Agent Execution Model
Helix uses three execution roles to collaboratively complete tasks:
| Role | Responsibility | Execution Mode |
|---|---|---|
| Manager Agent | Guards user objective, validates completion quality | Throughout the task lifecycle |
| Execution Agent | Drives primary coding and tool workflow | Sequential in the main session |
| SubAgent | Runs focused subtasks in isolated context | Parallel execution, returns distilled summaries |
Key SubAgent design details (from the backend implementation):
- Each SubAgent creates an independent sub-session (ID format:
sub_{parentSessionID}_{timestamp}) with fully isolated context - To prevent recursive creation, SubAgents automatically filter out three tools:
run_subagent,create_worktree_binding, andstart_code_review, and exclude the entirebuiltin_subagentMCP server - If a SubAgent's result exceeds 2,000 characters, AI automatically generates a distilled summary of ≤ 500 characters; if summarization fails, the result is truncated
- Multiple SubAgents run truly in parallel without interference — total wall-clock time equals the slowest one
Data and Interaction Flow
User sends a task
│
▼
Backend streams model output over WebSocket
│
├─ Tool calls execute (filesystem / LSP / Git / terminal / MCP)
│
├─ SubAgents launch in parallel when needed
│ ├─ SubAgent A: search multiple modules ──→ return summary
│ └─ SubAgent B: analyze security patterns ──→ return summary
│
├─ Results merge back into the main session
│
└─ Cache / Compact policies keep context healthy over time
Workspace: The Fundamental Execution Unit
A Workspace isn't just "opening a directory." In Helix, it's the fundamental boundary for state isolation:
- Independent session history — each Workspace maintains its own conversations and context
- Independent tool configuration — Shell, filesystem, LSP, and other MCP servers are instantiated per Workspace
- Independent model & profile — different projects can use different models and system prompts
- Local and remote targets — both local directories and remote servers can serve as workspaces
- Lifecycle management — create → connect → lazy-load → disconnect → idle timeout (default 30 minutes) → auto-close
Typical usage:
- Workspace A: Frontend iteration (Claude + Serena LSP)
- Workspace B: Backend migration (DeepSeek + Remote SSH)
- Workspace C: Infrastructure checks (GPT-4o + Tmux terminal)
Runtime Components
Frontend: helix
A cross-platform client built with Flutter, supporting macOS desktop app (recommended) and web deployment.
- Three-panel layout: session list + chat area + tool panel
- Native multi-window support on desktop (chat, SubAgent, Workspace selector windows)
- WebSocket streaming with SSE fallback
- Live tool call display (MCP tool name, parameters, results)
- Phased display for Dual Agent mode (Think → Discuss → Synthesize → Execute)
Backend: helix-agent
A Go-based server built around the Workspace → Session → Agent → MCP Tools four-layer architecture.
- Each Workspace has an independent Session Manager and built-in MCP manager
- Session Engine supports streaming, tool call loops (max 100 iterations), and automatic retries
- Built-in KV cache (Pebble storage engine, SHA256 content-addressed deduplication)
- Worktree binding: write operations execute in isolated git worktree branches, protecting the main branch
Model Adapter Layer
A unified Provider abstraction adapting four types of model providers:
- Anthropic Claude — supports extended thinking (default 32K token thinking budget)
- DeepSeek — passes through
reasoning_contentfor cost-effective coding - OpenAI — standard Chat Completions API
- OpenAI Responses API — next-generation interface for GPT-5.x series
Model routing supports both providerId:modelId exact specification and prefix-based auto-inference.
Deployment Options
| Option | Best For | Highlights |
|---|---|---|
| Desktop app | Daily development (recommended) | Fastest onboarding, full local file/terminal/Workspace integration |
| Web + backend | Remote environments, team sharing, quick evaluation | Backend can run locally or remotely, Web supports PWA |
| Self-hosted backend | Enterprise intranets, compliance requirements | Full control over data and network |
Security and Privacy
- Provider API keys are user-controlled — Helix never holds them
- Workspace data is stored in local or self-hosted backend data directories
- No mandatory third-party data storage
- Self-hosting path available for strict compliance environments
Continue Reading
- Feature Overview — complete overview of core capabilities
- Multi-Agent Architecture — deep dive into three-layer collaboration
- Context Management — Cache + Compact dual strategy explained
- Workspace Architecture — Workspace isolation and lifecycle
- Multi-Model Support — model selection and switching strategies