Architecture Overview

Most AI coding tools are a chat box plus a model. Helix is not.

Helix is a workspace-centered multi-agent engineering system — it unifies your codebase, terminal, models, toolchain, and multi-agent orchestration into a single workflow, enabling AI to complete real software engineering tasks end-to-end, not just answer questions.

System Overview

┌─────────────────────────────────────────────────────┐
│  Helix UI (helix · Flutter Desktop / Web)         │
│  ├─ Workspace & session management                  │
│  ├─ SubAgent parallel status panel                  │
│  ├─ MCP tool execution live view                    │
│  └─ Model selector & Dual Agent mode                │
│                    │ REST + WebSocket                │
└────────────────────┼────────────────────────────────┘
                     ▼
┌─────────────────────────────────────────────────────┐
│  Helix Backend (helix-agent · Go)                       │
│  ├─ Workspace Manager — isolation + lifecycle       │
│  ├─ Session Engine — streaming + tool calls + retry │
│  ├─ Multi-Agent — Manager / Execution / SubAgent    │
│  ├─ Context Engine — KV Cache + Compact compression │
│  ├─ 12 built-in MCP servers — Shell / FS / LSP / …  │
│  └─ Provider adapters — DeepSeek / Claude / OpenAI  │
└─────────────────────────────────────────────────────┘

Four Design Principles

1. Keep user intent stable over long tasks

A single Agent often forgets the original goal after 30+ consecutive tool calls. Helix solves this with layered orchestration: a Manager Agent locks the objective, an Execution Agent drives implementation, and SubAgents handle deep subtasks. Each layer has a single responsibility, and the goal is protected at every node in the execution chain.

2. Make parallel execution observable

Many products let AI "think silently" in the background while users stare at a loading spinner. Helix exposes subtask creation, execution status, and intermediate results in the UI — you can see in real time which Agent is working on what, how it's progressing, and what it found.

3. Keep long sessions healthy

Tool-intensive conversations consume context at an alarming rate. Helix uses a Cache + Compact dual strategy: large tool outputs are stored in a KV cache and recalled on demand; old conversations are compressed into structured summaries. This lets a single session run for hours or even across days without needing to "start a new chat."

4. Strictly isolate projects

Every Workspace has its own session history, tool configuration, model selection, and MCP server instances. You can work on a frontend repo, a backend service, and an infrastructure project simultaneously — their contexts never cross-contaminate.

Multi-Agent Execution Model

Helix uses three execution roles to collaboratively complete tasks:

Role	Responsibility	Execution Mode
Manager Agent	Guards user objective, validates completion quality	Throughout the task lifecycle
Execution Agent	Drives primary coding and tool workflow	Sequential in the main session
SubAgent	Runs focused subtasks in isolated context	Parallel execution, returns distilled summaries

Key SubAgent design details (from the backend implementation):

Each SubAgent creates an independent sub-session (ID format: sub_{parentSessionID}_{timestamp}) with fully isolated context
To prevent recursive creation, SubAgents automatically filter out three tools: run_subagent, create_worktree_binding, and start_code_review, and exclude the entire builtin_subagent MCP server
If a SubAgent's result exceeds 2,000 characters, AI automatically generates a distilled summary of ≤ 500 characters; if summarization fails, the result is truncated
Multiple SubAgents run truly in parallel without interference — total wall-clock time equals the slowest one

Data and Interaction Flow

User sends a task
    │
    ▼
Backend streams model output over WebSocket
    │
    ├─ Tool calls execute (filesystem / LSP / Git / terminal / MCP)
    │
    ├─ SubAgents launch in parallel when needed
    │   ├─ SubAgent A: search multiple modules ──→ return summary
    │   └─ SubAgent B: analyze security patterns ──→ return summary
    │
    ├─ Results merge back into the main session
    │
    └─ Cache / Compact policies keep context healthy over time

Workspace: The Fundamental Execution Unit

A Workspace isn't just "opening a directory." In Helix, it's the fundamental boundary for state isolation:

Independent session history — each Workspace maintains its own conversations and context
Independent tool configuration — Shell, filesystem, LSP, and other MCP servers are instantiated per Workspace
Independent model & profile — different projects can use different models and system prompts
Local and remote targets — both local directories and remote servers can serve as workspaces
Lifecycle management — create → connect → lazy-load → disconnect → idle timeout (default 30 minutes) → auto-close

Typical usage:

Workspace A: Frontend iteration (Claude + Serena LSP)
Workspace B: Backend migration (DeepSeek + Remote SSH)
Workspace C: Infrastructure checks (GPT-4o + Tmux terminal)

Runtime Components

Frontend: helix

A cross-platform client built with Flutter, supporting macOS desktop app (recommended) and web deployment.

Three-panel layout: session list + chat area + tool panel
Native multi-window support on desktop (chat, SubAgent, Workspace selector windows)
WebSocket streaming with SSE fallback
Live tool call display (MCP tool name, parameters, results)
Phased display for Dual Agent mode (Think → Discuss → Synthesize → Execute)

Backend: helix-agent

A Go-based server built around the Workspace → Session → Agent → MCP Tools four-layer architecture.

Each Workspace has an independent Session Manager and built-in MCP manager
Session Engine supports streaming, tool call loops (max 100 iterations), and automatic retries
Built-in KV cache (Pebble storage engine, SHA256 content-addressed deduplication)
Worktree binding: write operations execute in isolated git worktree branches, protecting the main branch

Model Adapter Layer

A unified Provider abstraction adapting four types of model providers:

Anthropic Claude — supports extended thinking (default 32K token thinking budget)
DeepSeek — passes through reasoning_content for cost-effective coding
OpenAI — standard Chat Completions API
OpenAI Responses API — next-generation interface for GPT-5.x series

Model routing supports both providerId:modelId exact specification and prefix-based auto-inference.

Deployment Options

Option	Best For	Highlights
Desktop app	Daily development (recommended)	Fastest onboarding, full local file/terminal/Workspace integration
Web + backend	Remote environments, team sharing, quick evaluation	Backend can run locally or remotely, Web supports PWA
Self-hosted backend	Enterprise intranets, compliance requirements	Full control over data and network

Security and Privacy

Provider API keys are user-controlled — Helix never holds them
Workspace data is stored in local or self-hosted backend data directories
No mandatory third-party data storage
Self-hosting path available for strict compliance environments

Continue Reading

Feature Overview — complete overview of core capabilities
Multi-Agent Architecture — deep dive into three-layer collaboration
Context Management — Cache + Compact dual strategy explained
Workspace Architecture — Workspace isolation and lifecycle
Multi-Model Support — model selection and switching strategies

System Overview​

Four Design Principles​

1. Keep user intent stable over long tasks​

2. Make parallel execution observable​

3. Keep long sessions healthy​

4. Strictly isolate projects​

Multi-Agent Execution Model​

Data and Interaction Flow​

Workspace: The Fundamental Execution Unit​

Runtime Components​

Frontend: helix​

Backend: helix-agent​

Model Adapter Layer​

Deployment Options​

Security and Privacy​

Continue Reading​