Skip to main content

Architecture Overview

Most AI coding tools are a chat box plus a model. Helix is not.

Helix is a workspace-centered multi-agent engineering system — it unifies your codebase, terminal, models, toolchain, and multi-agent orchestration into a single workflow, enabling AI to complete real software engineering tasks end-to-end, not just answer questions.

System Overview

┌─────────────────────────────────────────────────────┐
│ Helix UI (helix · Flutter Desktop / Web) │
│ ├─ Workspace & session management │
│ ├─ SubAgent parallel status panel │
│ ├─ MCP tool execution live view │
│ └─ Model selector & Dual Agent mode │
│ │ REST + WebSocket │
└────────────────────┼────────────────────────────────┘

┌─────────────────────────────────────────────────────┐
│ Helix Backend (helix-agent · Go) │
│ ├─ Workspace Manager — isolation + lifecycle │
│ ├─ Session Engine — streaming + tool calls + retry │
│ ├─ Multi-Agent — Manager / Execution / SubAgent │
│ ├─ Context Engine — KV Cache + Compact compression │
│ ├─ 12 built-in MCP servers — Shell / FS / LSP / … │
│ └─ Provider adapters — DeepSeek / Claude / OpenAI │
└─────────────────────────────────────────────────────┘

Four Design Principles

1. Keep user intent stable over long tasks

A single Agent often forgets the original goal after 30+ consecutive tool calls. Helix solves this with layered orchestration: a Manager Agent locks the objective, an Execution Agent drives implementation, and SubAgents handle deep subtasks. Each layer has a single responsibility, and the goal is protected at every node in the execution chain.

2. Make parallel execution observable

Many products let AI "think silently" in the background while users stare at a loading spinner. Helix exposes subtask creation, execution status, and intermediate results in the UI — you can see in real time which Agent is working on what, how it's progressing, and what it found.

3. Keep long sessions healthy

Tool-intensive conversations consume context at an alarming rate. Helix uses a Cache + Compact dual strategy: large tool outputs are stored in a KV cache and recalled on demand; old conversations are compressed into structured summaries. This lets a single session run for hours or even across days without needing to "start a new chat."

4. Strictly isolate projects

Every Workspace has its own session history, tool configuration, model selection, and MCP server instances. You can work on a frontend repo, a backend service, and an infrastructure project simultaneously — their contexts never cross-contaminate.


Multi-Agent Execution Model

Helix uses three execution roles to collaboratively complete tasks:

RoleResponsibilityExecution Mode
Manager AgentGuards user objective, validates completion qualityThroughout the task lifecycle
Execution AgentDrives primary coding and tool workflowSequential in the main session
SubAgentRuns focused subtasks in isolated contextParallel execution, returns distilled summaries

Key SubAgent design details (from the backend implementation):

  • Each SubAgent creates an independent sub-session (ID format: sub_{parentSessionID}_{timestamp}) with fully isolated context
  • To prevent recursive creation, SubAgents automatically filter out three tools: run_subagent, create_worktree_binding, and start_code_review, and exclude the entire builtin_subagent MCP server
  • If a SubAgent's result exceeds 2,000 characters, AI automatically generates a distilled summary of ≤ 500 characters; if summarization fails, the result is truncated
  • Multiple SubAgents run truly in parallel without interference — total wall-clock time equals the slowest one

Data and Interaction Flow

User sends a task


Backend streams model output over WebSocket

├─ Tool calls execute (filesystem / LSP / Git / terminal / MCP)

├─ SubAgents launch in parallel when needed
│ ├─ SubAgent A: search multiple modules ──→ return summary
│ └─ SubAgent B: analyze security patterns ──→ return summary

├─ Results merge back into the main session

└─ Cache / Compact policies keep context healthy over time

Workspace: The Fundamental Execution Unit

A Workspace isn't just "opening a directory." In Helix, it's the fundamental boundary for state isolation:

  • Independent session history — each Workspace maintains its own conversations and context
  • Independent tool configuration — Shell, filesystem, LSP, and other MCP servers are instantiated per Workspace
  • Independent model & profile — different projects can use different models and system prompts
  • Local and remote targets — both local directories and remote servers can serve as workspaces
  • Lifecycle management — create → connect → lazy-load → disconnect → idle timeout (default 30 minutes) → auto-close

Typical usage:

  • Workspace A: Frontend iteration (Claude + Serena LSP)
  • Workspace B: Backend migration (DeepSeek + Remote SSH)
  • Workspace C: Infrastructure checks (GPT-4o + Tmux terminal)

Runtime Components

Frontend: helix

A cross-platform client built with Flutter, supporting macOS desktop app (recommended) and web deployment.

  • Three-panel layout: session list + chat area + tool panel
  • Native multi-window support on desktop (chat, SubAgent, Workspace selector windows)
  • WebSocket streaming with SSE fallback
  • Live tool call display (MCP tool name, parameters, results)
  • Phased display for Dual Agent mode (Think → Discuss → Synthesize → Execute)

Backend: helix-agent

A Go-based server built around the Workspace → Session → Agent → MCP Tools four-layer architecture.

  • Each Workspace has an independent Session Manager and built-in MCP manager
  • Session Engine supports streaming, tool call loops (max 100 iterations), and automatic retries
  • Built-in KV cache (Pebble storage engine, SHA256 content-addressed deduplication)
  • Worktree binding: write operations execute in isolated git worktree branches, protecting the main branch

Model Adapter Layer

A unified Provider abstraction adapting four types of model providers:

  • Anthropic Claude — supports extended thinking (default 32K token thinking budget)
  • DeepSeek — passes through reasoning_content for cost-effective coding
  • OpenAI — standard Chat Completions API
  • OpenAI Responses API — next-generation interface for GPT-5.x series

Model routing supports both providerId:modelId exact specification and prefix-based auto-inference.


Deployment Options

OptionBest ForHighlights
Desktop appDaily development (recommended)Fastest onboarding, full local file/terminal/Workspace integration
Web + backendRemote environments, team sharing, quick evaluationBackend can run locally or remotely, Web supports PWA
Self-hosted backendEnterprise intranets, compliance requirementsFull control over data and network

Security and Privacy

  • Provider API keys are user-controlled — Helix never holds them
  • Workspace data is stored in local or self-hosted backend data directories
  • No mandatory third-party data storage
  • Self-hosting path available for strict compliance environments

Continue Reading