<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="rss.xsl"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Helix Blog</title>
        <link>https://helix.iqe.me/en/blog/</link>
        <description>Helix Blog</description>
        <lastBuildDate>Mon, 11 May 2026 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <item>
            <title><![CDATA[HelixVM: Stop Letting AI Agents Run Naked on Your Real Machine]]></title>
            <link>https://helix.iqe.me/en/blog/helixvm-intro/</link>
            <guid>https://helix.iqe.me/en/blog/helixvm-intro/</guid>
            <pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[HelixVM puts AI agents inside a lightweight local VM sandbox — secure isolation without buying cloud resources, installing VMware, or learning virtualization.]]></description>
            <content:encoded><![CDATA[<p>What's really blocking AI agent adoption isn't model capability — it's the <strong>conflict between permission and safety</strong>. HelixVM exists to let agents work efficiently without putting your real machine at risk.</p>
<p><img decoding="async" loading="lazy" alt="HelixVM: an AI agent running inside a lightweight local VM sandbox" src="https://helix.iqe.me/en/assets/images/helixvm-agent-sandbox-c116977dfa39438bab09da66cd02fc24.png" width="1792" height="1024" class="img_ev3q"></p>
<p>The harder I think about AI agent products, the more I believe the central problem isn't "can the model write code?" — it's:</p>
<p><strong>Do you actually trust it enough to hand over the keys?</strong></p>
<p>Give it too little permission, and it has to stop and ask for approval on every step. Give it too much, and it can quietly delete files, corrupt your environment, or even break your system.</p>
<p>This isn't an abstract design question. To actually finish a task, an agent has to touch real things: read files, change code, install dependencies, run commands, start services, move directories, access local ports.</p>
<p>So we built <strong>HelixVM</strong> — not because the world needs another VM tool, but to answer this:</p>
<blockquote>
<p><strong>How do we let everyday users run a high-efficiency AI agent safely, without forcing them to learn virtualization, buy cloud resources, or tolerate constant approval prompts?</strong></p>
</blockquote>
<p>A quick naming note: internally you'll see project names like aiagent, agentui, and helix-vm. For end users we want the brand to be cleaner: <strong>Helix Agent, Helix, and HelixVM</strong>.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-aggressive-prompting-looks-safe--but-isnt-really">1. Aggressive prompting looks safe — but isn't really<a href="https://helix.iqe.me/en/blog/helixvm-intro/#1-aggressive-prompting-looks-safe--but-isnt-really" class="hash-link" aria-label="Direct link to 1. Aggressive prompting looks safe — but isn't really" title="Direct link to 1. Aggressive prompting looks safe — but isn't really" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Three permission models for AI agents" src="https://helix.iqe.me/en/assets/images/helixvm-permission-models-199c3c482550f21d324c66395d53fb04.png" width="1792" height="1024" class="img_ev3q"></p>
<p>The permission model in many agent products is essentially: every potentially risky action triggers a confirmation dialog and waits for the user to approve.</p>
<p>On paper, this is reasonable. Should you confirm a file read? A code change? Running a shell command? Installing a dependency? Deleting a file? The user should know — sure.</p>
<p>The problem is what happens once the agent actually starts working continuously. The prompts pile up fast, and the user quickly falls into a pattern:</p>
<ul>
<li class="">Prompt, click approve</li>
<li class="">Prompt again, click approve</li>
<li class="">Prompt again, keep approving</li>
<li class="">Eventually stop reading at all and just rubber-stamp everything</li>
</ul>
<p>At that point "user confirmation" stops meaning anything. It doesn't reduce risk, but it absolutely slows everything down.</p>
<p>Worse, this design quietly shifts the safety burden onto the user: <em>"I asked you. You clicked approve."</em> But can a normal user really judge whether each shell command, each file write, each dependency install is safe? Usually not.</p>
<p><strong>Constant approval is not safety. It's mostly the UI of safety.</strong></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-full-permission-is-genuinely-dangerous">2. Full permission is genuinely dangerous<a href="https://helix.iqe.me/en/blog/helixvm-intro/#2-full-permission-is-genuinely-dangerous" class="hash-link" aria-label="Direct link to 2. Full permission is genuinely dangerous" title="Direct link to 2. Full permission is genuinely dangerous" translate="no">​</a></h2>
<p>The other extreme isn't an answer either.</p>
<p>If you grant the agent everything in the name of efficiency and let it execute freely on the host, the experience is glorious: no interruptions, no approvals, automatic code changes, automatic commands, automatic dependency installs, automatic cleanup.</p>
<p>This is exactly the kind of high-efficiency experience an agent should deliver — except it's all happening on your real machine.</p>
<p>Which means it can: delete your actual project files, corrupt your system environment, pollute your global dependencies, break a previously working dev setup, or run a destructive command in the wrong directory.</p>
<p>These aren't theoretical. Helix's early users hit them. Users of other agents have hit them too.</p>
<p>So we're stuck in a very concrete tension:</p>
<blockquote>
<p><strong>Efficiency means not asking every time. Safety means not letting the agent run naked on the host.</strong></p>
</blockquote>
<p>Most existing solutions swing between the two extremes. HelixVM tries a third path.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-cloud-containers-and-sandboxes--right-direction-wrong-fit-for-many-users">3. Cloud containers and sandboxes — right direction, wrong fit for many users<a href="https://helix.iqe.me/en/blog/helixvm-intro/#3-cloud-containers-and-sandboxes--right-direction-wrong-fit-for-many-users" class="hash-link" aria-label="Direct link to 3. Cloud containers and sandboxes — right direction, wrong fit for many users" title="Direct link to 3. Cloud containers and sandboxes — right direction, wrong fit for many users" translate="no">​</a></h2>
<p>For the last few years, every cloud vendor has been talking about containers, sandboxes, remote dev environments, isolated cloud execution. The direction is right. An agent running inside an isolated environment is safer than one running directly on the user's machine.</p>
<p>But cloud-based solutions usually have a hidden assumption:</p>
<blockquote>
<p><strong>Safety = move to the cloud = consume cloud resources = enter the vendor's infrastructure stack.</strong></p>
</blockquote>
<p>That's great business for cloud vendors. But for many individual users, indie devs, and small teams, it's not the most comfortable answer. It typically means extra budget, remote environments, network latency, cloud resource management, and migrating your workflow onto someone else's infrastructure.</p>
<p>A lot of users do care about safety — they just don't want to buy a cloud subscription, learn another console, and maintain a remote environment as a prerequisite. If safety requires "go cloud first," you've already excluded all the local-first users.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-traditional-local-vms-arent-the-answer-either">4. Traditional local VMs aren't the answer either<a href="https://helix.iqe.me/en/blog/helixvm-intro/#4-traditional-local-vms-arent-the-answer-either" class="hash-link" aria-label="Direct link to 4. Traditional local VMs aren't the answer either" title="Direct link to 4. Traditional local VMs aren't the answer either" translate="no">​</a></h2>
<p>So why not just spin up a local VM? VMware, VirtualBox, Parallels, or roll your own Linux with QEMU.</p>
<p>From a pure isolation standpoint, sure. But from a product experience standpoint, traditional VMs are too heavy.</p>
<p><strong>First, they demand a lot of resources.</strong> Traditional VMs eat memory, CPU, and disk. Most users don't want a permanent desktop VM running just so an agent can edit some code.</p>
<p><strong>Second, they demand a lot of knowledge.</strong> You need to understand images, disk layouts, networking, port forwarding, shared folders, system bootstrap, dependency installation. That's a lot for a normal user — and even people with a CS background often won't bother learning virtualization just to use an agent.</p>
<p><strong>Third, even if you know how, it's still annoying.</strong> Today everyone is spoiled by one-click products. You can't really ask users to install VM software, download images, configure networking, tune ports, and manually launch the agent — all in the name of safety.</p>
<p>If the cost of safety is significantly more complexity, most users will just abandon safety.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-so-helixvms-goal-make-local-vm-isolation-feel-invisible">5. So HelixVM's goal: make local VM isolation feel invisible<a href="https://helix.iqe.me/en/blog/helixvm-intro/#5-so-helixvms-goal-make-local-vm-isolation-feel-invisible" class="hash-link" aria-label="Direct link to 5. So HelixVM's goal: make local VM isolation feel invisible" title="Direct link to 5. So HelixVM's goal: make local VM isolation feel invisible" translate="no">​</a></h2>
<p>HelixVM's idea is simple:</p>
<blockquote>
<p><strong>Run the agent inside a lightweight local VM, but don't make the user feel the weight of a traditional VM.</strong></p>
</blockquote>
<p>Users don't need to know what QEMU is. They don't prepare images by hand. They don't configure port forwarding. They don't SSH into the VM to set up services. They don't buy a cloud server.</p>
<p>The experience should feel more like:</p>
<blockquote>
<p>Pick an environment image → click Create → wait for boot → land in a ready agent workspace.</p>
</blockquote>
<p>The isolation happens underneath. The complexity stays out of the user's face.</p>
<p>A traditional VM is a tool <em>for the user</em> to operate. HelixVM is an isolated runtime <em>for the agent</em> to live in. What the user actually cares about isn't "how do I configure the VM?" — it's "can my agent safely and efficiently get this done?"</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="6-how-it-works-in-helix-helix--helixvm--helix-agent">6. How it works in Helix: Helix + HelixVM + Helix Agent<a href="https://helix.iqe.me/en/blog/helixvm-intro/#6-how-it-works-in-helix-helix--helixvm--helix-agent" class="hash-link" aria-label="Direct link to 6. How it works in Helix: Helix + HelixVM + Helix Agent" title="Direct link to 6. How it works in Helix: Helix + HelixVM + Helix Agent" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="How Helix, HelixVM, and Helix Agent work together" src="https://helix.iqe.me/en/assets/images/helixvm-architecture-flow-7ebf4873beebb869955d022d68510131.png" width="1792" height="1024" class="img_ev3q"></p>
<p>HelixVM isn't a single feature — it's an experience produced by several layers of Helix working together. From the user's perspective: pick a VM workspace, pick an image, create, then enter a ready agent workspace. Underneath, three layers cooperate.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="layer-1-helix--the-user-facing-experience">Layer 1: Helix — the user-facing experience<a href="https://helix.iqe.me/en/blog/helixvm-intro/#layer-1-helix--the-user-facing-experience" class="hash-link" aria-label="Direct link to Layer 1: Helix — the user-facing experience" title="Direct link to Layer 1: Helix — the user-facing experience" translate="no">​</a></h3>
<p>Helix is the product surface users actually see. It exposes the VM workspace entry, starts the local HelixVM control plane, lets the user pick an image, creates the VM, manages port mappings, waits for both VM and in-guest Helix Agent readiness, and finally drops the user into a usable workspace.</p>
<p>One thing worth emphasizing: HelixVM's control plane is local by default — you don't need to register a remote cloud console first.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="layer-2-helixvm--the-local-vm-control-plane">Layer 2: HelixVM — the local VM control plane<a href="https://helix.iqe.me/en/blog/helixvm-intro/#layer-2-helixvm--the-local-vm-control-plane" class="hash-link" aria-label="Direct link to Layer 2: HelixVM — the local VM control plane" title="Direct link to Layer 2: HelixVM — the local VM control plane" translate="no">​</a></h3>
<p>HelixVM is where all the messy VM details get wrapped up. It handles things the user shouldn't have to: the VM registry, template and downloaded image management, parsing bundles / disk images, allocating control / SSH / business ports, generating low-level VM launch plans, starting and stopping VMs, checking guest readiness, and cleaning up residual processes.</p>
<p>In other words, the user doesn't deal with QEMU directly. QEMU is still doing the heavy lifting underneath, but it's hidden behind HelixVM.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="layer-3-helix-agent--inside-the-guest">Layer 3: Helix Agent — inside the guest<a href="https://helix.iqe.me/en/blog/helixvm-intro/#layer-3-helix-agent--inside-the-guest" class="hash-link" aria-label="Direct link to Layer 3: Helix Agent — inside the guest" title="Direct link to Layer 3: Helix Agent — inside the guest" translate="no">​</a></h3>
<p>Once the VM is up, Helix's execution environment runs inside the guest, with Helix Agent at its core. It does the actual work: reading and writing the workspace, executing shell, running builds and tests, managing sessions, exposing the agent API, establishing trusted connections back to Helix.</p>
<p>What the user opens at the end isn't an abstract VM — it's a ready agent workspace.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="7-invisible-pairing-no-manual-pairing-codes">7. Invisible pairing: no manual pairing codes<a href="https://helix.iqe.me/en/blog/helixvm-intro/#7-invisible-pairing-no-manual-pairing-codes" class="hash-link" aria-label="Direct link to 7. Invisible pairing: no manual pairing codes" title="Direct link to 7. Invisible pairing: no manual pairing codes" translate="no">​</a></h2>
<p>A lot of local / self-hosted agent systems have one painful step: pairing. To make the UI trust the local agent, you usually have to start a service, find a pairing code, paste it into the client, wait for the binding, and save credentials.</p>
<p>HelixVM does it more smoothly. When the VM is created, Helix generates a one-time bootstrap secret and hands it to HelixVM as part of the launch. HelixVM injects this bootstrap data into the guest startup parameters. Once Helix Agent starts inside the guest, it reads the secret and treats it as the first-time binding credential.</p>
<p>Helix then calls Helix Agent's pairing endpoint with that secret. Helix Agent verifies it: is the secret correct, expired, or already consumed? On success, Helix Agent enters a bound state and issues a long-lived credential.</p>
<p>From the user's perspective, all they see is: <strong>the VM is ready, and the agent workspace is already connected.</strong></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="8-port-forwarding--ready-checks-making-the-in-vm-agent-feel-local">8. Port forwarding + ready checks: making the in-VM agent feel local<a href="https://helix.iqe.me/en/blog/helixvm-intro/#8-port-forwarding--ready-checks-making-the-in-vm-agent-feel-local" class="hash-link" aria-label="Direct link to 8. Port forwarding + ready checks: making the in-VM agent feel local" title="Direct link to 8. Port forwarding + ready checks: making the in-VM agent feel local" translate="no">​</a></h2>
<p>The agent runs inside the VM. Helix runs on the host. They need to talk. HelixVM's default network mode is port forwarding, allocating control, SSH, and business service ports.</p>
<p>Helix then runs health checks to confirm the in-guest Helix Agent is actually responding — not just that a VM process exists.</p>
<p>This matters. "VM running" only means the VM process started; it does not mean the agent inside is ready. So Helix waits for two layers of readiness:</p>
<ol>
<li class="">HelixVM reports guest control plane ready.</li>
<li class="">Helix Agent's health check inside the guest passes.</li>
</ol>
<p>That's why a HelixVM workspace looks simple to create, but the underlying flow is more than "spawn a VM process."</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="9-image-marketplace-a-safe-environment-shouldnt-start-from-scratch">9. Image marketplace: a safe environment shouldn't start from scratch<a href="https://helix.iqe.me/en/blog/helixvm-intro/#9-image-marketplace-a-safe-environment-shouldnt-start-from-scratch" class="hash-link" aria-label="Direct link to 9. Image marketplace: a safe environment shouldn't start from scratch" title="Direct link to 9. Image marketplace: a safe environment shouldn't start from scratch" translate="no">​</a></h2>
<p>Handing the user a blank VM isn't enough. What they actually need is a usable dev environment, not a Linux system waiting to be configured.</p>
<p>So HelixVM ships with a template / image marketplace. Users can pick an image that fits what they're doing, for example:</p>
<ul>
<li class="">Lightweight Linux + Helix Agent environment</li>
<li class="">Common dev toolchain environments</li>
<li class="">Browser-enabled automation environments</li>
<li class="">Environments tailored to specific languages or project types</li>
</ul>
<p>This is where the experience really clicks. "Safe isolated environment" stops being an ops task and becomes a product choice: <strong>What am I working on today? Pick the matching image.</strong></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="10-for-the-first-time-safety-and-efficiency-arent-opposites">10. For the first time, safety and efficiency aren't opposites<a href="https://helix.iqe.me/en/blog/helixvm-intro/#10-for-the-first-time-safety-and-efficiency-arent-opposites" class="hash-link" aria-label="Direct link to 10. For the first time, safety and efficiency aren't opposites" title="Direct link to 10. For the first time, safety and efficiency aren't opposites" translate="no">​</a></h2>
<p>Helix has always been focused on high-efficiency agents. We don't want users to be interrupted constantly while the agent is working — the whole point of an agent is to chain a complex task end to end: search code, edit files, run tests, analyze errors, fix, verify, summarize.</p>
<p>If every step needs user approval, you lose most of that automation value. But we also don't want the agent running unrestricted on the host.</p>
<p>So HelixVM's role is very clear:</p>
<blockquote>
<p><strong>Get safety from an isolated runtime, not from constant approval prompts.</strong></p>
</blockquote>
<p>With HelixVM, the agent can move much more freely inside the VM. Even if it makes a mistake, the blast radius stays inside the virtual machine — not on your real system.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="11-helixvm-isnt-a-vm-feature--its-infrastructure-for-agent-products">11. HelixVM isn't a "VM feature" — it's infrastructure for agent products<a href="https://helix.iqe.me/en/blog/helixvm-intro/#11-helixvm-isnt-a-vm-feature--its-infrastructure-for-agent-products" class="hash-link" aria-label="Direct link to 11. HelixVM isn't a &quot;VM feature&quot; — it's infrastructure for agent products" title="Direct link to 11. HelixVM isn't a &quot;VM feature&quot; — it's infrastructure for agent products" translate="no">​</a></h2>
<p>The VM is just the mechanism. The real change is what it does to the agent's permission model.</p>
<p>Traditional agent products keep swinging between two bad options:</p>
<ul>
<li class=""><strong>Option A: constant approvals.</strong> Looks safe, but fatigues the user, kills efficiency, and approvals end up perfunctory.</li>
<li class=""><strong>Option B: full permission.</strong> Efficient, but the agent is touching your host directly.</li>
</ul>
<p>HelixVM offers a third one:</p>
<ul>
<li class=""><strong>Option C: a highly permissive agent, inside an isolated environment.</strong> Inside the VM, the agent works fast. Outside the VM, the host still has a clear safety boundary.</li>
</ul>
<p>This is the shape of AI agents I think actually fits everyday users: <strong>safe but not annoying; efficient but not naked on your machine.</strong></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="closing">Closing<a href="https://helix.iqe.me/en/blog/helixvm-intro/#closing" class="hash-link" aria-label="Direct link to Closing" title="Direct link to Closing" translate="no">​</a></h2>
<p>HelixVM isn't trying to teach users how to run a VM. The opposite — we want users to get VM-level isolation without ever needing to understand a VM.</p>
<p>What we want to leave them with is a simple loop:</p>
<blockquote>
<p>Pick an image. Click create. Let the agent work.</p>
</blockquote>
<p>No cloud server. No VMware install. No virtualization crash course. No frantic approval clicking. And no agent running naked on your real machine.</p>
<p><strong>We want AI agents to be more automated — but automation shouldn't cost you your real system.</strong> That's what HelixVM is for.</p>
<p>HelixVM is currently in private beta. If this resonates, come join the Helix beta and try it out.</p>]]></content:encoded>
            <category>helixvm</category>
            <category>sandbox</category>
            <category>vm</category>
            <category>security</category>
            <category>architecture</category>
            <category>deep-dive</category>
        </item>
        <item>
            <title><![CDATA[Automatic Worktree: Stop Letting Agents Run Around on Your Main Branch]]></title>
            <link>https://helix.iqe.me/en/blog/automatic-worktree/</link>
            <guid>https://helix.iqe.me/en/blog/automatic-worktree/</guid>
            <pubDate>Thu, 02 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Helix treats worktree isolation as a system-level constraint. Agents never touch main directly — every change lands on an isolated branch, passes a code review gate, and only then merges back as a single clean commit.]]></description>
            <content:encoded><![CDATA[<p><img decoding="async" loading="lazy" alt="Automatic Worktree — agents work on isolated branches while main stays clean" src="https://helix.iqe.me/en/assets/images/automatic-worktree-cover-55bdd0d5847b8a30339e312ffbedfbe6.png" width="1792" height="1024" class="img_ev3q"></p>
<blockquote>
<p>Every coding agent eventually writes to your repository.<br>
<!-- -->The question is: what branch does it write to?</p>
</blockquote>
<p>Most AI coding tools answer that question by making it your problem. <strong>Helix answers it at the system level, before the agent touches a single file.</strong></p>
<p>This is the third boundary in <a class="" href="https://helix.iqe.me/en/blog/introducing-helix/">Helix's multi-agent architecture</a>. Manager Mode guards the boundary of <em>intent</em>. <a class="" href="https://helix.iqe.me/en/blog/helixvm-intro/">HelixVM</a> guards the boundary of the <em>host machine</em>. Automatic Worktree guards the boundary that matters most to your code: <strong>the repository branch</strong>.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-writing-directly-to-main-the-part-agent-demos-quietly-skip">1. Writing directly to main: the part agent demos quietly skip<a href="https://helix.iqe.me/en/blog/automatic-worktree/#1-writing-directly-to-main-the-part-agent-demos-quietly-skip" class="hash-link" aria-label="Direct link to 1. Writing directly to main: the part agent demos quietly skip" title="Direct link to 1. Writing directly to main: the part agent demos quietly skip" translate="no">​</a></h2>
<p>When an AI agent edits files, it needs somewhere to put them. The path of least resistance is the working directory the user opened — which is usually main itself.</p>
<p>This creates a class of problems nobody talks about in agent demos:</p>
<ul>
<li class=""><strong>A task that dies halfway leaves main dirty.</strong> The agent started a refactor, got three files in, hit an error, and stopped. <code>git status</code> is a mess. The user has no clean way to tell what is safe to commit and what should be rolled back.</li>
<li class=""><strong>Concurrent tasks collide.</strong> Run multiple agents or sessions simultaneously, and they all write to the same checkout. File conflicts are unpredictable, hard to debug, and impossible to attribute — you cannot always tell which session dirtied which file.</li>
<li class=""><strong>Rollback is painful.</strong> The agent made changes the user did not want, but also made changes the user did want. They are tangled together in the same working copy, with no clean boundary to revert.</li>
<li class=""><strong>There is no "review before merge".</strong> The code is already on main. Review becomes retroactive acknowledgement instead of a preventive gate.</li>
</ul>
<p>The worktree problem is not unique to AI agents. It is a well-understood challenge in any parallel development workflow. Git's own answer is the worktree: a separate checkout of the repository on a separate branch in a separate directory, with changes kept isolated until they are deliberately merged.</p>
<p>The real question is: <strong>who creates and manages that worktree?</strong></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-the-industrys-answer-opt-in-worktree--manual-merge">2. The industry's answer: opt-in worktree + manual merge<a href="https://helix.iqe.me/en/blog/automatic-worktree/#2-the-industrys-answer-opt-in-worktree--manual-merge" class="hash-link" aria-label="Direct link to 2. The industry's answer: opt-in worktree + manual merge" title="Direct link to 2. The industry's answer: opt-in worktree + manual merge" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Opt-in worktree versus system-level isolation" src="https://helix.iqe.me/en/assets/images/automatic-worktree-vs-others-684299cd4c0efff3eafaa636815e1d8a.png" width="1792" height="1024" class="img_ev3q"></p>
<p>Several coding agent tools have added worktree support. The pattern is consistent:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">User: enable worktree mode</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Tool: ok, worktrees are now enabled</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">User: run task</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Tool: creating worktree... done</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">User: review output</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">User: merge branch          ← manual step</span><br></span></code></pre></div></div>
<p>There are two structural problems with this shape:</p>
<p><strong>Worktrees are opt-in.</strong> Users have to know the feature exists and remember to turn it on. Forget — or decide a small task is not worth the ceremony — and the agent writes directly to the working copy again.</p>
<p><strong>Merge is always manual.</strong> The tool creates the worktree and supervises the agent, but bringing changes back to main is the user's job. That is fine for a single task. Across five concurrent tasks, or a workflow with dozens of daily agent runs, the manual merge cost compounds into real friction.</p>
<p>The direction is right. But "opt-in worktree with manual merge" still leaves the default path — the unconfigured, don't-think-about-it path — pointing straight at main.</p>
<p>And most accidental main-branch pollution starts exactly there: <em>"It's just a small task, I won't bother with a worktree."</em></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-helixs-approach-worktree-as-a-system-constraint">3. Helix's approach: worktree as a system constraint<a href="https://helix.iqe.me/en/blog/automatic-worktree/#3-helixs-approach-worktree-as-a-system-constraint" class="hash-link" aria-label="Direct link to 3. Helix's approach: worktree as a system constraint" title="Direct link to 3. Helix's approach: worktree as a system constraint" translate="no">​</a></h2>
<p>In Helix, worktree isolation is not a feature users enable. It is an <strong>architectural constraint</strong> built into how agents interact with repositories. The rules are enforced at the system level, not asked for in a prompt.</p>
<p>Helix puts three hard rules around worktrees:</p>
<ol>
<li class="">
<p><strong>No binding, no write.</strong> Neither the Execution Agent nor any SubAgent can write to a git-tracked file on the current branch until a worktree binding exists for that repository. This is not a warning — the system rejects the write outright.</p>
</li>
<li class="">
<p><strong>Bindings are declared by the agent, not configured by the user.</strong> When the agent decides a task requires changes to a repository, it calls <code>create_worktree_binding</code> first. The system creates the worktree, generates an isolated branch, and returns the path the agent should work in — <strong>all of this happens before the agent touches a single file.</strong></p>
</li>
<li class="">
<p><strong>Merge is automatic.</strong> When the session completes, code review passes, and the changes are committed, the system automatically merges the worktree branch back to base, removes the worktree, and deletes the temporary branch. Users do not have to manage any of it.</p>
</li>
</ol>
<p>The end-to-end flow, as seen by a user, looks like this:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Agent: I need to write to /projects/my-repo</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Agent: create_worktree_binding(project="/projects/my-repo", task="add auth middleware")</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">System: worktree created</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        branch: aiagent/{session}/add-auth-middleware-a3f7c91d</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        base:   main</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Agent: [works entirely inside the worktree path]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Agent: [task complete, code review passed]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">System: stage → commit → switch to main → pull → no-ff merge → remove worktree → delete branch</span><br></span></code></pre></div></div>
<p>The agent never had access to main. Main was not dirty for a single moment during the task. The merge landed as a clean, traceable commit.</p>
<blockquote>
<p><strong>Isolation is not a toggle. It is the system's default shape.</strong></p>
</blockquote>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-the-session-lifecycle-from-binding-to-cleanup">4. The session lifecycle: from binding to cleanup<a href="https://helix.iqe.me/en/blog/automatic-worktree/#4-the-session-lifecycle-from-binding-to-cleanup" class="hash-link" aria-label="Direct link to 4. The session lifecycle: from binding to cleanup" title="Direct link to 4. The session lifecycle: from binding to cleanup" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Session lifecycle of an Automatic Worktree" src="https://helix.iqe.me/en/assets/images/automatic-worktree-lifecycle-37e42d758eb9369065f9c8e4e96fc0d1.png" width="1792" height="1024" class="img_ev3q"></p>
<p>Once a binding is in place, every write the agent performs is routed to the worktree path. The original checkout stays untouched.</p>
<p>In practical terms, <code>create_worktree_binding</code> does a few things:</p>
<ul>
<li class="">Walks up from the given path to find the Git repository root (the user's opened repo)</li>
<li class="">Reads the current branch — that becomes the merge target</li>
<li class="">Generates a branch name from the session ID and a sanitized task description, shaped like <code>aiagent/{session}/{task}-{hash}</code>, so any future change can be traced back to the session that produced it</li>
<li class="">Creates the worktree in a dedicated directory outside the repository</li>
<li class="">Records the binding — <code>{project_root} → {worktree_path, branch, base_branch}</code> — in session state</li>
</ul>
<p>When the session enters finalization, the system runs merge and cleanup in a fixed sequence:</p>
<ol>
<li class="">Stage and commit any uncommitted changes left in the worktree</li>
<li class="">Switch back to the base branch and <code>pull --ff-only</code> to pick up remote updates first</li>
<li class="">Perform a <strong>non-fast-forward merge</strong> of the worktree branch into base — preserving an explicit merge commit</li>
<li class="">Remove the worktree directory</li>
<li class="">Delete the temporary branch</li>
</ol>
<p>That last non-fast-forward merge is deliberate. Branch history stays intact in the git log, and every stretch of agent work shows up as a distinct merge node. Anyone reviewing, auditing, or trying to revert later has a clean boundary to operate on.</p>
<p>After all of that, main has one new merge commit. The session's intermediate state is gone. <strong>No half-staged files, no orphan branches, no leftover worktree directories.</strong></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-cross-repo-sessions-one-binding-per-repository">5. Cross-repo sessions: one binding per repository<a href="https://helix.iqe.me/en/blog/automatic-worktree/#5-cross-repo-sessions-one-binding-per-repository" class="hash-link" aria-label="Direct link to 5. Cross-repo sessions: one binding per repository" title="Direct link to 5. Cross-repo sessions: one binding per repository" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="One session, isolated work across multiple repositories" src="https://helix.iqe.me/en/assets/images/automatic-worktree-multi-repo-6110c9eb9a3cc74caea54491536ed3c7.png" width="1792" height="1024" class="img_ev3q"></p>
<p>Real engineering tasks rarely fit inside a single repository. A gRPC migration may touch backend, frontend, and a shared library. An analytics event addition may need both an app and a tracking SDK update.</p>
<p>Helix accounts for this at the worktree layer itself: <strong>one session can hold multiple worktree bindings — one per repository.</strong></p>
<p>The session state holds a map of <code>project_root → {repo_path, worktree_path, branch, base_branch}</code>. Every repository gets:</p>
<ul>
<li class="">its own isolated branch</li>
<li class="">its own worktree directory</li>
<li class="">its own base branch (each repo's main might be named differently)</li>
</ul>
<p>At finalization, the system processes each binding in turn: merge, then clean up. If a particular repository's merge fails, the error is surfaced explicitly and that repository's worktree is preserved for human inspection — but <strong>repositories that have already merged successfully are not dragged into the failure.</strong></p>
<p>This is what makes cross-repo tasks actually tractable. A single session can span five repositories without dirtying any of them; at the end, each repository receives its own clean merge commit.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="6-working-with-manager-mode-parallel-subagents-inside-a-shared-boundary">6. Working with Manager Mode: parallel SubAgents inside a shared boundary<a href="https://helix.iqe.me/en/blog/automatic-worktree/#6-working-with-manager-mode-parallel-subagents-inside-a-shared-boundary" class="hash-link" aria-label="Direct link to 6. Working with Manager Mode: parallel SubAgents inside a shared boundary" title="Direct link to 6. Working with Manager Mode: parallel SubAgents inside a shared boundary" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Execution Agent and SubAgents share a single worktree boundary" src="https://helix.iqe.me/en/assets/images/automatic-worktree-manager-mode-cadcb5c00ffac4d2fa149995807f0f91.png" width="1792" height="1024" class="img_ev3q"></p>
<p>The worktree system becomes significantly more powerful when combined with <a class="" href="https://helix.iqe.me/en/blog/manager-mode/">Manager Mode</a>'s parallel SubAgent execution.</p>
<p>In Manager Mode the Execution Agent can dispatch multiple SubAgents to run concurrently. Each SubAgent has its own context, its own tool calls, its own LLM interaction. <strong>Without worktree isolation, parallel SubAgents writing to the same repository would collide instantly.</strong></p>
<p>With worktree isolation, the picture is different:</p>
<ul>
<li class="">The worktree is created by the <strong>top-level Execution Agent</strong> before any SubAgent is dispatched.</li>
<li class="">All SubAgents work <strong>in the same worktree path</strong> — that path is the shared isolation boundary.</li>
<li class="">SubAgents <strong>cannot create their own worktree bindings</strong>. <code>create_worktree_binding</code> is filtered out of the SubAgent tool list at the system level.</li>
<li class="">They inherit the worktree context that was set up before they were dispatched, but they cannot change it.</li>
</ul>
<p>In other words, no matter how many SubAgents run in parallel — ten, twenty — <strong>the Execution Agent remains the single point of coordination for repository state.</strong> Manager Mode guards the boundary of intent; Automatic Worktree guards the boundary of physical writes. Together they make "a fleet of agents working in one repository without hurting each other" something the user no longer has to think about.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="7-the-code-review-gate-skip-review-skip-merge">7. The code review gate: skip review, skip merge<a href="https://helix.iqe.me/en/blog/automatic-worktree/#7-the-code-review-gate-skip-review-skip-merge" class="hash-link" aria-label="Direct link to 7. The code review gate: skip review, skip merge" title="Direct link to 7. The code review gate: skip review, skip merge" translate="no">​</a></h2>
<p>Worktree finalization — merge plus cleanup — is gated on a passing code review. The session cannot complete and merge until that gate flips green.</p>
<p>This is not a prompt instruction asking the agent to review its work. It is a state machine check inside the session: a "review passed" flag must be set, or the finalize call refuses to proceed. The merge does not happen.</p>
<p>In practice, the workflow is forced into this exact order:</p>
<ol>
<li class="">The agent decides it is done with the task</li>
<li class="">A code review is triggered</li>
<li class="">Review passes → the flag is set; review fails → the session continues working</li>
<li class="">Changes are summarized</li>
<li class="">The worktree branch is merged to main</li>
</ol>
<p>Skipping the review is not a way to merge faster. <strong>Skipping review means skipping merge, period.</strong> The two are tied together at the system level. If you want to bypass review, you have to give up the merge — and the worktree stays quietly in its isolated directory waiting for human attention.</p>
<p>This turns "review before merge" from a good practice into a path the agent cannot work around.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="8-failure-and-cleanup-an-explicit-destructive-boundary">8. Failure and cleanup: an explicit destructive boundary<a href="https://helix.iqe.me/en/blog/automatic-worktree/#8-failure-and-cleanup-an-explicit-destructive-boundary" class="hash-link" aria-label="Direct link to 8. Failure and cleanup: an explicit destructive boundary" title="Direct link to 8. Failure and cleanup: an explicit destructive boundary" translate="no">​</a></h2>
<p>Worktree creation and merge can both partially fail — directory exists but branch creation didn't, or the session is interrupted before finalize runs. Helix splits these into two clearly separated paths:</p>
<p><strong>Explicit failure during finalize.</strong> The error is surfaced as-is, the session is not marked complete, and the worktree and branch both stay intact. The user can inspect the state, fix things manually, and retry. This is the "nothing is broken, it just hasn't merged yet" path.</p>
<p><strong>Abandoning without merge.</strong> When a session is being given up — task cancelled, error makes the changes unwanted — the system invokes an explicit "best-effort cleanup": remove the worktree directory, and <strong>force-delete</strong> the unmerged branch.</p>
<p>The force-delete is intentional. The normal <code>git branch -d</code> refuses to delete an unmerged branch, which is a protection in regular development. On the "discard agent work-in-progress" path, that protection becomes an obstacle. So Helix opts in to destructive deletion only on this specific path.</p>
<blockquote>
<p>This path is explicit about what it is: <strong>it represents discarded work.</strong></p>
</blockquote>
<p>With both paths clearly separated, user expectations become predictable. Merge failure leaves things intact and recoverable. Abandonment leaves the repository clean and free of residue. The two never cross-contaminate.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="9-why-automatic-beats-opt-in">9. Why automatic beats opt-in<a href="https://helix.iqe.me/en/blog/automatic-worktree/#9-why-automatic-beats-opt-in" class="hash-link" aria-label="Direct link to 9. Why automatic beats opt-in" title="Direct link to 9. Why automatic beats opt-in" translate="no">​</a></h2>
<p>The case for opt-in worktree is "give users control" — skip the worktree overhead on small tasks.</p>
<p>Turn that around: <strong>"small tasks" are exactly where most accidental main-branch pollution starts.</strong></p>
<p>The task looked small. The user did not bother to set up worktree. The agent did nine things the user expected and one thing the user did not. Now the user is untangling a mixed-up working copy with no clean boundary to revert.</p>
<p>Helix's position is that the overhead of worktree isolation is now low enough — creating a worktree is a fast git operation, cleanup is automatic, branch naming is automatic — that the tradeoff is worth making unconditionally. The constraint removes an entire category of repository-state problems from the user's mental load.</p>
<p>You don't enable worktree isolation. You don't remember which tasks to enable it for. <strong>It's simply how Helix works.</strong></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="10-what-users-actually-get">10. What users actually get<a href="https://helix.iqe.me/en/blog/automatic-worktree/#10-what-users-actually-get" class="hash-link" aria-label="Direct link to 10. What users actually get" title="Direct link to 10. What users actually get" translate="no">​</a></h2>
<p>Translated into day-to-day experience, Helix's Automatic Worktree system means:</p>
<ul>
<li class=""><strong>Main is never dirty from agent work.</strong> In-progress tasks always live in isolated branches and separate directories.</li>
<li class=""><strong>Parallel sessions don't interfere.</strong> Each session has its own worktree and its own branch. Five concurrent sessions, five completely independent working copies.</li>
<li class=""><strong>Merge stops being a manual step.</strong> When a task completes and clears review, the change lands on main as a proper merge commit.</li>
<li class=""><strong>History is clean and traceable.</strong> Every agent-driven change appears as a distinct merge commit. Branch names encode session ID and task description — any change traces back to the session that produced it.</li>
<li class=""><strong>Rollback is unambiguous.</strong> If a session produced changes you don't want, the merge commit itself is a clean revert target. No need to untangle half-finished file edits.</li>
<li class=""><strong>Cross-repo tasks are a first-class citizen.</strong> One session gracefully handles changes across multiple repositories, each receiving its own clean merge commit.</li>
</ul>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="11-how-to-use-it">11. How to use it<a href="https://helix.iqe.me/en/blog/automatic-worktree/#11-how-to-use-it" class="hash-link" aria-label="Direct link to 11. How to use it" title="Direct link to 11. How to use it" translate="no">​</a></h2>
<p>Worktree isolation is <strong>enabled by default in every Helix session</strong>. There is nothing to configure.</p>
<p>Open a session, run a task that writes to a repository, and the agent will set up the worktree before writing; when the task completes, merge and cleanup happen automatically. From the user's perspective, the experience is just "tell an AI coworker what to do, watch the result land on main" — every isolation step in between is invisible.</p>
<p>If you want to observe the behavior directly:</p>
<ol>
<li class="">Open a Helix session in a workspace with a git repository</li>
<li class="">Run any task that writes to repository files</li>
<li class="">While it runs, peek at <code>~/.aiagent/worktree/</code> — you'll see the isolated working copy</li>
<li class="">When the task completes, the worktree is gone and the changes are on main as a merge commit</li>
</ol>
<p>For complex tasks that span multiple repositories or need parallel execution, combine Automatic Worktree with <a class="" href="https://helix.iqe.me/en/blog/manager-mode/">Manager Mode</a> to get the full benefit of multiple SubAgents working safely inside one isolation boundary.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="12-whats-coming">12. What's coming<a href="https://helix.iqe.me/en/blog/automatic-worktree/#12-whats-coming" class="hash-link" aria-label="Direct link to 12. What's coming" title="Direct link to 12. What's coming" translate="no">​</a></h2>
<p>A few worktree workflow improvements are in flight:</p>
<ul>
<li class=""><strong>Worktree inspection UI</strong> — view active worktrees, their branches, and pending changes directly from the session panel, without dropping to a terminal.</li>
<li class=""><strong>Selective merge</strong> — approve or reject individual commits from a session before they land on main.</li>
<li class=""><strong>Cross-session worktree sharing</strong> — let related sessions share a worktree boundary for coordinated multi-session work.</li>
<li class=""><strong>Conflict resolution tooling</strong> — a better UI for cases where automatic merge fails and human intervention is required.</li>
</ul>
<p>One core principle is not going to change:</p>
<blockquote>
<p><strong>Agents work on isolated branches. Main only receives deliberate, reviewed merges.</strong></p>
</blockquote>
<p>This is one of the inevitable consequences of designing agents as an engineering system rather than as a chat box. Together with <a class="" href="https://helix.iqe.me/en/blog/manager-mode/">Manager Mode's goal-keeping</a> and <a class="" href="https://helix.iqe.me/en/blog/helixvm-intro/">HelixVM's execution boundary</a>, Automatic Worktree forms Helix's answer to a deceptively simple question: <strong>when an agent is genuinely doing the work for you, who is keeping its boundaries?</strong></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="try-it-out">Try it out<a href="https://helix.iqe.me/en/blog/automatic-worktree/#try-it-out" class="hash-link" aria-label="Direct link to Try it out" title="Direct link to Try it out" translate="no">​</a></h2>
<ul>
<li class="">🚀 <a class="" href="https://helix.iqe.me/en/app/">Launch the web app</a></li>
<li class="">💾 <a class="" href="https://helix.iqe.me/en/download/">Download the desktop client</a></li>
<li class="">📖 <a class="" href="https://helix.iqe.me/en/docs/">Quickstart docs</a></li>
<li class="">🧩 <a class="" href="https://helix.iqe.me/en/blog/manager-mode/">Deep dive into Manager Mode</a></li>
<li class="">🔒 <a class="" href="https://helix.iqe.me/en/blog/helixvm-intro/">Deep dive into HelixVM</a></li>
<li class="">🌿 <a class="" href="https://helix.iqe.me/en/blog/introducing-helix/">Back to the Helix overview</a></li>
</ul>]]></content:encoded>
            <category>deep-dive</category>
            <category>git</category>
            <category>worktree</category>
            <category>architecture</category>
            <category>parallel</category>
        </item>
        <item>
            <title><![CDATA[Manager Mode: How Helix Keeps AI Agents on Track for Real Engineering Tasks]]></title>
            <link>https://helix.iqe.me/en/blog/manager-mode/</link>
            <guid>https://helix.iqe.me/en/blog/manager-mode/</guid>
            <pubDate>Wed, 01 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Manager Mode is Helix's multi-agent orchestration architecture for complex engineering work — a Manager, an Execution Agent, and parallel SubAgents with strict separation of duties, scope locking, and evidence-based completion.]]></description>
            <content:encoded><![CDATA[<p><img decoding="async" loading="lazy" alt="Manager Mode — keeping AI agents on track in long-running engineering tasks" src="https://helix.iqe.me/en/assets/images/manager-mode-cover-0430f9693a65ab5de1f1fd037ceb072e.png" width="1792" height="1024" class="img_ev3q"></p>
<blockquote>
<p>Most AI agents fail not because they lack capability — they fail because they <strong>drift</strong>.</p>
</blockquote>
<p>They start with your intent, pick up momentum, and end up doing twelve things you never asked for. Or they declare success before anything actually ships.</p>
<p><strong>Manager Mode</strong> in Helix is the architectural answer to that problem. It is not a longer prompt or a cleverer system message. It is a real multi-agent orchestration layer implemented at the system level — one agent guards intent, one agent runs the work, and several SubAgents go deep in parallel, with strict separation of duties and mutual constraint.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-the-drift-problem-nobody-talks-about">1. The drift problem nobody talks about<a href="https://helix.iqe.me/en/blog/manager-mode/#1-the-drift-problem-nobody-talks-about" class="hash-link" aria-label="Direct link to 1. The drift problem nobody talks about" title="Direct link to 1. The drift problem nobody talks about" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Scope drift — single-agent chaos vs Manager Mode scope lock" src="https://helix.iqe.me/en/assets/images/manager-mode-scope-drift-d66c23c09a8b4a76887c1ff586429d0c.png" width="1792" height="1024" class="img_ev3q"></p>
<p>You ask an AI to "refactor the authentication module." Three minutes later it has:</p>
<ul>
<li class="">refactored authentication ✓</li>
<li class="">"improved" a bunch of unrelated utilities</li>
<li class="">changed the error handling convention across five files</li>
<li class="">added a new dependency it thought was "clearly better"</li>
<li class="">written a summary explaining why all of this was necessary</li>
</ul>
<p>The core task might be done. But now you have a diff that touches forty files, your code review is a nightmare, and you have no idea what actually changed versus what the agent decided to change on its own.</p>
<p>This is <strong>scope drift</strong>. It happens because a single-agent system has no separation between <em>understanding what was asked</em> and <em>executing what it thinks is needed</em>. Those two things collapse into one thread with no guardrails.</p>
<p>There is a dual problem that is just as common — <strong>premature completion</strong>. The agent writes a tidy summary: "I've finished refactoring the authentication module with changes X, Y, Z." You go check the repository and find no commit in <code>git log</code>, no merge to main, no test run at all.</p>
<p>Drift and premature completion look like two different bugs. The root cause is the same: <strong>no independent role is responsible for "what did the user originally ask?" and "is this task actually done?".</strong></p>
<p>Manager Mode solves this with a three-layer architecture where intent preservation, task execution, and parallel subtask handling are handled by separate, specialized agents.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-what-is-manager-mode">2. What is Manager Mode<a href="https://helix.iqe.me/en/blog/manager-mode/#2-what-is-manager-mode" class="hash-link" aria-label="Direct link to 2. What is Manager Mode" title="Direct link to 2. What is Manager Mode" translate="no">​</a></h2>
<p>Manager Mode is Helix's orchestration layer for complex, multi-step engineering work.</p>
<p>When you enable it, your session gains a <strong>Manager Agent</strong> that sits between you and execution. The Manager does not write code. It does not run tools. Its job is to:</p>
<ol>
<li class="">Receive your request and forward it to the Execution Agent — faithfully, without modification</li>
<li class="">Verify that what actually got done matches what you actually asked for</li>
<li class="">Enforce a strict definition of "done" that includes commit, merge, verification, and clean workspace</li>
<li class=""><strong>Refuse</strong> to call anything complete until all five criteria are met — with evidence</li>
</ol>
<p>The Execution Agent handles the real work, breaking tasks into subtasks and running them in parallel using SubAgents. But it always operates under the Manager's scope constraints.</p>
<p>Think of it as having a <strong>technical project manager</strong> and a <strong>senior engineer</strong> on every task, where the project manager's only job is to make sure the engineer doesn't go off-script. That role split has worked in real engineering teams for decades; Helix transplants it verbatim into the AI agent system.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-three-layer-architecture">3. Three-layer architecture<a href="https://helix.iqe.me/en/blog/manager-mode/#3-three-layer-architecture" class="hash-link" aria-label="Direct link to 3. Three-layer architecture" title="Direct link to 3. Three-layer architecture" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Helix Manager Mode three-layer architecture — Manager / Execution / SubAgent" src="https://helix.iqe.me/en/assets/images/manager-mode-three-layer-53d56f2a7dac78f0db986bfca6f691b6.png" width="1792" height="1024" class="img_ev3q"></p>
<p>Here is how the three layers interact:</p>
<ul>
<li class=""><strong>You</strong> submit a task via chat or UI.</li>
<li class=""><strong>Manager Agent</strong> locks the original intent as a baseline, forwards the task verbatim to the Execution Agent, and checks completion across six dimensions: implementation + commit + merge + verify + clean workspace + no scope creep. It demands evidence, not summaries.</li>
<li class=""><strong>Execution Agent</strong> is the layer that actually does the work: plans and implements, decomposes work, manages all tool calls (file edits, shell, LSP, MCP), and reports back with verifiable evidence.</li>
<li class=""><strong>SubAgent A / B / C</strong> are parallel execution units dispatched by the Execution Agent via <code>run_subagent</code>. Each one focuses on a single independent task and all of them share the Execution Agent's git worktree.</li>
</ul>
<p>SubAgents share the Execution Agent's git worktree — they operate within the same working copy of the repository. The Execution Agent coordinates all file changes and, when the task is complete, commits and merges the result back to the main branch.</p>
<p><strong>SubAgents cannot spawn their own SubAgents.</strong> This is intentional. Unbounded recursion in agent systems leads to unpredictable resource usage and hard-to-trace execution paths. The three-layer limit is enforced <strong>at the system level</strong>, not as a polite reminder in a prompt.</p>
<blockquote>
<p>The key design stance: <strong>no single agent simultaneously owns "defining the goal" and "executing the goal."</strong> The PM does not write code, and the engineer does not change the requirements. Real engineering teams call this "separation of duties." In AI agents, we call it Manager Mode.</p>
</blockquote>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-core-mechanisms">4. Core mechanisms<a href="https://helix.iqe.me/en/blog/manager-mode/#4-core-mechanisms" class="hash-link" aria-label="Direct link to 4. Core mechanisms" title="Direct link to 4. Core mechanisms" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="41-scope-locking">4.1 Scope locking<a href="https://helix.iqe.me/en/blog/manager-mode/#41-scope-locking" class="hash-link" aria-label="Direct link to 4.1 Scope locking" title="Direct link to 4.1 Scope locking" translate="no">​</a></h3>
<p>When the Manager Agent receives your request, it records your original intent as a <strong>baseline</strong>. Every subsequent action by the Execution Agent is evaluated against this baseline.</p>
<p>The rules are strict:</p>
<ul>
<li class="">"Improvements" that weren't requested → out of scope</li>
<li class="">Refactors that touch files not related to the task → out of scope</li>
<li class="">New dependencies added because the agent thought they were better → out of scope</li>
</ul>
<p>The Manager maintains a mental model of what belongs to this task and what does not. If the Execution Agent tries to expand the scope — even with good justification — the Manager flags it and either rejects it or surfaces it to you as an <strong>explicit proposal</strong>.</p>
<p>An often-overlooked detail: <strong>the Manager has no file tools and no shell access.</strong> It can only observe and direct. This "invisible hand" design is precisely what makes scope enforcement credible — the enforcer cannot be tempted to "just fix one more thing."</p>
<p><strong>Why can't a prompt solve this?</strong> This is the first question many people ask about Manager Mode — if the Manager only "guards intent," why not just write "don't go out of scope" in the system prompt?</p>
<p>The answer: prompts have no enforcement. If the same agent both interprets what you want and decides what to do, it will eventually override "the user didn't ask for this" with "I think this change is better." <strong>Only when the enforcer and the executor are two separate instances, and the enforcer literally has no ability to act</strong>, does the constraint actually hold. That is an architectural decision, not something prompt engineering can patch over.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="42-the-five-completion-criteria">4.2 The five completion criteria<a href="https://helix.iqe.me/en/blog/manager-mode/#42-the-five-completion-criteria" class="hash-link" aria-label="Direct link to 4.2 The five completion criteria" title="Direct link to 4.2 The five completion criteria" translate="no">​</a></h3>
<p><img decoding="async" loading="lazy" alt="The five completion criteria — evidence-based &amp;quot;done&amp;quot;" src="https://helix.iqe.me/en/assets/images/manager-mode-five-criteria-6dc723cfd095ef3fed6fabd3f133ff54.png" width="1792" height="1024" class="img_ev3q"></p>
<p>The Manager will not report a task as complete until all five of these are true — with <strong>verifiable evidence</strong>:</p>
<table><thead><tr><th>#</th><th>Criterion</th><th>What counts as evidence</th></tr></thead><tbody><tr><td>1</td><td>Original requirement implemented correctly</td><td>Test output, output of relevant commands</td></tr><tr><td>2</td><td>Changes committed</td><td><code>git log</code> showing the commit</td></tr><tr><td>3</td><td>Merged to main branch</td><td><code>git log main</code> showing the merge</td></tr><tr><td>4</td><td>Main branch verified post-merge</td><td>Build/test run on main after merge</td></tr><tr><td>5</td><td>Workspace clean, no scope violations</td><td><code>git status</code> clean, diff shows only expected files</td></tr></tbody></table>
<p><strong>"The agent said it's done" does not count.</strong> The Manager requires actual command output, tool results, or test runs. This eliminates the most common — and most insidious — failure mode: an agent that summarizes success without actually delivering it.</p>
<p>This principle will feel "overly strict" at times, until you first run into a situation where the Execution Agent reports "done" and the Manager pulls up <code>git status</code> to find uncommitted changes still sitting in the worktree, and sends the task back. That is the moment you realize what the word "done" really means.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="43-parallel-subagent-execution">4.3 Parallel SubAgent execution<a href="https://helix.iqe.me/en/blog/manager-mode/#43-parallel-subagent-execution" class="hash-link" aria-label="Direct link to 4.3 Parallel SubAgent execution" title="Direct link to 4.3 Parallel SubAgent execution" translate="no">​</a></h3>
<p><img decoding="async" loading="lazy" alt="Parallel SubAgent execution — three tasks in the time of one" src="https://helix.iqe.me/en/assets/images/manager-mode-parallel-subagents-f2787da87d834fdfd7cfdac1a4f24a04.png" width="1792" height="1024" class="img_ev3q"></p>
<p>When the Execution Agent identifies independent subtasks, it dispatches them as SubAgents that run <strong>concurrently</strong>:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Execution Agent calls run_subagent("Fix auth module",  model="claude-sonnet-4-5")</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Execution Agent calls run_subagent("Write tests",      model="claude-sonnet-4-5")</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Execution Agent calls run_subagent("Update docs",      model="claude-haiku-4-5")</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          ↓                              ↓                      ↓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    [runs in parallel]            [runs in parallel]     [runs in parallel]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          ↓                              ↓                      ↓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    SubAgent returns result       SubAgent returns result  SubAgent returns result</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          ↓                              ↓                      ↓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    Execution Agent collects all results</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                              ↓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                    Runs verification, hands evidence to Manager</span><br></span></code></pre></div></div>
<p>SubAgents share the Execution Agent's worktree and coordinate their file changes through the Execution Agent, which sequences writes and manages the final merge. This avoids the race condition of concurrent SubAgents stepping on each other's writes.</p>
<p>Parallelism is not only a performance win. <strong>It fundamentally changes the waiting experience</strong> — the original "ask, wait, ask again, wait again" serial rhythm is replaced by "set the full goal once, watch multiple workstreams converge."</p>
<p>And because the Manager Agent is guarding scope upstream, the "loss of control" risk that usually comes with parallelism is held in check. No matter how fast three SubAgents run, the Execution Agent still merges in order, and the Manager still verifies the whole delivery against the same standard. <strong>Faster, without losing edges.</strong></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="44-context-management-under-long-tasks">4.4 Context management under long tasks<a href="https://helix.iqe.me/en/blog/manager-mode/#44-context-management-under-long-tasks" class="hash-link" aria-label="Direct link to 4.4 Context management under long tasks" title="Direct link to 4.4 Context management under long tasks" translate="no">​</a></h3>
<p>Long-running tasks accumulate a lot of history. Helix uses two mechanisms to keep sessions healthy:</p>
<p><strong>KV Caching</strong>: Large tool outputs (file reads, command results, search results) are cached so they don't need to be re-sent with every LLM request. The cache is transparent — you don't configure it, it just works.</p>
<p><strong>Auto-compression</strong>: When conversation history grows beyond a threshold, Helix compresses older messages into a concise summary and moves the "active window" forward. The agent retains full context of what happened without paying the token cost of the full history.</p>
<p>Both mechanisms are invisible during normal use. They're what makes a <strong>50-turn task feel as responsive as a 5-turn one</strong>.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="45-session-level-identity-isolation">4.5 Session-level identity isolation<a href="https://helix.iqe.me/en/blog/manager-mode/#45-session-level-identity-isolation" class="hash-link" aria-label="Direct link to 4.5 Session-level identity isolation" title="Direct link to 4.5 Session-level identity isolation" translate="no">​</a></h3>
<p>When you run multiple Manager sessions in parallel — one refactoring auth, one running a data migration, one building a new feature — Helix guarantees:</p>
<ul>
<li class="">Each session has an <strong>independent</strong> Manager / Execution / SubAgent stack</li>
<li class="">Sessions do not bleed into each other — task A's scope baseline does not pollute task B</li>
<li class="">Switching workspaces also switches session state, model selection, and connection configs — you don't re-explain context</li>
</ul>
<p>This isolation is a background mechanism. Users rarely notice it. But it is exactly what makes "leave Manager Mode running on multiple tasks and walk away" a safe thing to do.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-how-to-use-manager-mode">5. How to use Manager Mode<a href="https://helix.iqe.me/en/blog/manager-mode/#5-how-to-use-manager-mode" class="hash-link" aria-label="Direct link to 5. How to use Manager Mode" title="Direct link to 5. How to use Manager Mode" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="51-enabling-it">5.1 Enabling it<a href="https://helix.iqe.me/en/blog/manager-mode/#51-enabling-it" class="hash-link" aria-label="Direct link to 5.1 Enabling it" title="Direct link to 5.1 Enabling it" translate="no">​</a></h3>
<p>On the workspace selector page, you will find a <strong>Manager</strong> entry alongside the Chat option. Click it to open a Manager session. The three-layer architecture is automatic — no extra configuration required.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="52-writing-good-task-requests-for-manager-mode">5.2 Writing good task requests for Manager Mode<a href="https://helix.iqe.me/en/blog/manager-mode/#52-writing-good-task-requests-for-manager-mode" class="hash-link" aria-label="Direct link to 5.2 Writing good task requests for Manager Mode" title="Direct link to 5.2 Writing good task requests for Manager Mode" translate="no">​</a></h3>
<p>Manager Mode is most effective when your request is <strong>specific about scope boundaries</strong>. Compare:</p>
<p><strong>Less effective:</strong></p>
<blockquote>
<p>Improve the login flow</p>
</blockquote>
<p><strong>More effective:</strong></p>
<blockquote>
<p>Refactor the login flow to use the new AuthService interface. Only touch files in <code>src/auth/</code> and <code>src/components/Login/</code>. Don't change the API contracts.</p>
</blockquote>
<p>The Manager uses your request as its scope baseline. <strong>The more precisely you describe what's in scope, the more precisely it can guard against drift.</strong></p>
<p>You don't need to be exhaustive — the Manager can handle ambiguity. But explicit scope boundaries give it harder constraints to enforce.</p>
<p>A simple heuristic: <strong>if you would normally write a task spec or ticket before handing this work to another person, it's a Manager Mode task.</strong></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="53-handling-scope-expansion-proposals">5.3 Handling scope expansion proposals<a href="https://helix.iqe.me/en/blog/manager-mode/#53-handling-scope-expansion-proposals" class="hash-link" aria-label="Direct link to 5.3 Handling scope expansion proposals" title="Direct link to 5.3 Handling scope expansion proposals" translate="no">​</a></h3>
<p>Sometimes the Execution Agent will identify something it thinks should be part of the task. The Manager will surface this to you as an explicit question rather than silently including it:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Execution Agent found an issue in the session middleware that may affect</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">the auth refactor. This was not in the original scope.</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Expand scope to include middleware fix? [Yes / No / Defer]</span><br></span></code></pre></div></div>
<p>Saying "No" or "Defer" keeps the current task clean. You can always start a new session for the follow-up work. This is Manager Mode's explicit boundary between <strong>focus</strong> and <strong>flexibility</strong>.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="54-monitoring-progress">5.4 Monitoring progress<a href="https://helix.iqe.me/en/blog/manager-mode/#54-monitoring-progress" class="hash-link" aria-label="Direct link to 5.4 Monitoring progress" title="Direct link to 5.4 Monitoring progress" translate="no">​</a></h3>
<p>While a Manager session is running, you can see in real time:</p>
<ul>
<li class="">Which SubAgents are active and what they're working on</li>
<li class="">Token usage per agent</li>
<li class="">Tool calls in flight</li>
<li class="">What changes are currently pending in the worktree</li>
</ul>
<p>All of this is visible in the session's live event stream. If you step away and come back, you can scroll the stream to see which SubAgents the Execution Agent dispatched, their individual results, and the Manager's per-criterion completion checks.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="6-real-world-scenarios">6. Real-world scenarios<a href="https://helix.iqe.me/en/blog/manager-mode/#6-real-world-scenarios" class="hash-link" aria-label="Direct link to 6. Real-world scenarios" title="Direct link to 6. Real-world scenarios" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="scenario-1-multi-module-api-migration">Scenario 1: Multi-module API migration<a href="https://helix.iqe.me/en/blog/manager-mode/#scenario-1-multi-module-api-migration" class="hash-link" aria-label="Direct link to Scenario 1: Multi-module API migration" title="Direct link to Scenario 1: Multi-module API migration" translate="no">​</a></h3>
<p><strong>The task:</strong> Migrate three service modules from REST to gRPC.</p>
<p><strong>Without Manager Mode:</strong> You start a session, the agent begins migrating auth service, notices the user service uses a similar pattern, starts touching that too, then realizes the test fixtures need updates, then decides to refactor the error types "since we're here anyway." Two hours later you have a diff across eleven modules and a broken build.</p>
<p><strong>With Manager Mode:</strong></p>
<p>You submit:</p>
<blockquote>
<p>Migrate auth-service, payment-service, and notification-service from REST to gRPC. Use the existing proto definitions in <code>/proto/</code>. Don't touch other services or shared utilities.</p>
</blockquote>
<p>The Manager locks this scope. The Execution Agent dispatches three SubAgents — one per service — running in parallel:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">SubAgent A: auth-service migration      [parallel]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">SubAgent B: payment-service migration   [parallel]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">SubAgent C: notification-service        [parallel]</span><br></span></code></pre></div></div>
<p>Each SubAgent works independently. When all three complete, the Execution Agent runs the merge sequence and verifies the build. The Manager reviews the final diff, confirms it <strong>only touches the three specified services</strong>, then presents you with a commit hash, test results, and a clean <code>git status</code>.</p>
<p>Total scope: exactly what you asked for.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="scenario-2-large-codebase-refactor-with-test-coverage">Scenario 2: Large codebase refactor with test coverage<a href="https://helix.iqe.me/en/blog/manager-mode/#scenario-2-large-codebase-refactor-with-test-coverage" class="hash-link" aria-label="Direct link to Scenario 2: Large codebase refactor with test coverage" title="Direct link to Scenario 2: Large codebase refactor with test coverage" translate="no">​</a></h3>
<p><strong>The task:</strong> A legacy data model class (<code>LegacyUserRecord</code>) needs to be replaced with the new <code>UserProfile</code> type across a large codebase — 60+ files.</p>
<p><strong>Without Manager Mode:</strong> A single-agent session will lose track of its own progress in long tasks. It might fix 40 files, think it's done, write a summary, and stop. Or it might fix 60 files but introduce subtle differences in how it handled edge cases across different parts of the codebase.</p>
<p><strong>With Manager Mode:</strong></p>
<p>The Execution Agent uses LSP tools to find all 63 references to <code>LegacyUserRecord</code>, groups them into logical clusters by module, and dispatches SubAgents for each cluster:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">SubAgent A: core domain models (12 files)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">SubAgent B: API layer (8 files)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">SubAgent C: service layer (18 files)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">SubAgent D: repository layer (14 files)</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">SubAgent E: test files (11 files)</span><br></span></code></pre></div></div>
<p>Each cluster is internally consistent. When all SubAgents complete, the Execution Agent runs the full test suite. The Manager verifies:</p>
<ul>
<li class="">All 63 references migrated (via <code>grep -r LegacyUserRecord</code> returning empty)</li>
<li class="">Tests pass</li>
<li class="">No unrelated files changed</li>
</ul>
<p>If any SubAgent missed a reference or introduced a regression, the Manager identifies the gap and sends the Execution Agent back to fix specifically that issue — <strong>not restart everything</strong>. In long tasks this matters: one missing reference should never force a full redo.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="scenario-3-parallel-feature-development-with-merge-coordination">Scenario 3: Parallel feature development with merge coordination<a href="https://helix.iqe.me/en/blog/manager-mode/#scenario-3-parallel-feature-development-with-merge-coordination" class="hash-link" aria-label="Direct link to Scenario 3: Parallel feature development with merge coordination" title="Direct link to Scenario 3: Parallel feature development with merge coordination" translate="no">​</a></h3>
<p><strong>The task:</strong> Implement a new analytics dashboard that requires backend API endpoints, frontend components, and database migrations — all independent work streams.</p>
<p><strong>The challenge:</strong> Three engineers would normally do this in parallel. With a single AI agent, it becomes a serial slog.</p>
<p><strong>With Manager Mode:</strong></p>
<p>You send:</p>
<blockquote>
<p>Build the analytics dashboard feature. Backend: add <code>/api/analytics/summary</code> and <code>/api/analytics/events</code> endpoints in <code>src/api/</code>. Frontend: create <code>AnalyticsDashboard</code> component in <code>src/components/</code>. Database: add migration for <code>analytics_events</code> table. These are independent — parallelize them.</p>
</blockquote>
<p>The Execution Agent dispatches three SubAgents simultaneously:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">SubAgent A [backend]   → writes API endpoints, runs unit tests</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">SubAgent B [frontend]  → builds React component with mock data</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">SubAgent C [database]  → writes migration, tests locally</span><br></span></code></pre></div></div>
<p>SubAgent A and SubAgent C finish first. SubAgent B finishes 40 seconds later. The Execution Agent then:</p>
<ol>
<li class="">Collects and applies all three SubAgents' results in sequence</li>
<li class="">Runs integration tests that connect all three layers</li>
<li class="">Fixes one minor import path conflict from the merge</li>
<li class="">Verifies the full test suite passes</li>
</ol>
<p>The Manager confirms: <strong>three independent workstreams, completed in roughly the time it would have taken to do one serially</strong>, with verified integration.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="7-when-to-use-manager-mode">7. When to use Manager Mode<a href="https://helix.iqe.me/en/blog/manager-mode/#7-when-to-use-manager-mode" class="hash-link" aria-label="Direct link to 7. When to use Manager Mode" title="Direct link to 7. When to use Manager Mode" translate="no">​</a></h2>
<p>Manager Mode adds orchestration overhead. For quick, scoped tasks it's often more than you need. Here's a rough guide:</p>
<table><thead><tr><th>Task type</th><th>Recommended mode</th></tr></thead><tbody><tr><td>Quick question, explanation, code snippet</td><td>Standard chat</td></tr><tr><td>Single-file edit or small bug fix</td><td>Standard or Coder mode</td></tr><tr><td>Multi-file refactor within one module</td><td>Coder mode</td></tr><tr><td>Cross-module refactor, feature spanning multiple layers</td><td><strong>Manager Mode</strong></td></tr><tr><td>Large migration (many files, parallel workstreams)</td><td><strong>Manager Mode</strong></td></tr><tr><td>Long-running task where you need to walk away</td><td><strong>Manager Mode</strong></td></tr><tr><td>Task where scope drift has burned you before</td><td><strong>Manager Mode</strong></td></tr></tbody></table>
<p>The signal: <strong>if you'd normally write a task spec or ticket before handing it to another person, you probably want Manager Mode.</strong></p>
<p>The reverse is also true: if you just want to ask "why is this code erroring?", the three-layer architecture is overkill — standard chat is enough. <strong>One mark of a good tool is knowing when not to use it.</strong></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="8-what-makes-this-work-in-practice">8. What makes this work in practice<a href="https://helix.iqe.me/en/blog/manager-mode/#8-what-makes-this-work-in-practice" class="hash-link" aria-label="Direct link to 8. What makes this work in practice" title="Direct link to 8. What makes this work in practice" translate="no">​</a></h2>
<p>A few design decisions that make the system reliable rather than just theoretically sound:</p>
<p><strong>The Manager never executes.</strong> It has no file tools, no shell access. It can only observe and direct. This separation is what makes scope enforcement credible — the enforcer can't be tempted to "just fix one more thing."</p>
<p><strong>SubAgents are recursion-limited.</strong> SubAgents cannot spawn their own SubAgents. This is a hard <strong>system-level</strong> constraint, not a prompt instruction. It keeps execution depth predictable and prevents runaway branching.</p>
<p><strong>Evidence is required, not requested.</strong> The Manager's completion check is not "did the agent say it's done?" It's "can I see the command output that proves it?" The prompting enforces that distinction explicitly, and the system makes it part of the completion judgment.</p>
<p><strong>Worktree is managed by the Execution Agent.</strong> SubAgents share the Execution Agent's git worktree. The Execution Agent coordinates write sequencing across parallel subtasks, so changes from concurrent SubAgents are applied in a controlled order rather than colliding.</p>
<p><strong>Retries are built in.</strong> Every LLM call uses exponential backoff retry (up to 3 attempts, 2s initial delay). Transient API failures don't break long tasks.</p>
<p><strong>Sessions are isolated.</strong> Multiple Manager sessions don't bleed into each other — role baselines, scope memory, SubAgent state are all kept separate. That's what makes "run several tasks in parallel" a safe thing to do.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="9-get-started">9. Get started<a href="https://helix.iqe.me/en/blog/manager-mode/#9-get-started" class="hash-link" aria-label="Direct link to 9. Get started" title="Direct link to 9. Get started" translate="no">​</a></h2>
<p>If you haven't enabled Manager Mode yet:</p>
<ol>
<li class="">Open Helix and create a new session</li>
<li class="">Click the <strong>Manager</strong> entry on the workspace selector page</li>
<li class="">Write your task with <strong>explicit scope boundaries</strong></li>
<li class="">Watch the live event stream as subtasks execute in parallel</li>
</ol>
<p>The first time a task that would have drifted <strong>stays clean</strong> — or the first time you see three SubAgents completing a week's worth of parallel work in minutes — is when the model clicks.</p>
<blockquote>
<p>To understand the bigger picture of how Helix treats agents as an engineering system, read <a class="" href="https://helix.iqe.me/en/blog/introducing-helix/">Introducing Helix</a>.<br>
<!-- -->Manager Mode runs on top of <a class="" href="https://helix.iqe.me/en/blog/automatic-worktree/">Automatic Worktree</a> — the Execution Agent never modifies your main branch directly; all changes happen in an isolated branch first.<br>
<!-- -->If your Manager Mode task is doing environment-heavy work, <a class="" href="https://helix.iqe.me/en/blog/helixvm-intro/">HelixVM</a> turns the execution boundary into a security boundary as well.</p>
</blockquote>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="10-whats-coming">10. What's coming<a href="https://helix.iqe.me/en/blog/manager-mode/#10-whats-coming" class="hash-link" aria-label="Direct link to 10. What's coming" title="Direct link to 10. What's coming" translate="no">​</a></h2>
<p>We're continuing to improve Manager Mode:</p>
<ul>
<li class=""><strong>Richer evidence pages</strong> — visual breakdowns of what each SubAgent did, with diff summaries and test results inline</li>
<li class=""><strong>Scope proposals UI</strong> — cleaner interface for reviewing and approving scope expansion requests</li>
<li class=""><strong>Workflow templates</strong> — pre-built task templates for common patterns (migration, feature build, test coverage)</li>
<li class=""><strong>Team visibility</strong> — let collaborators see live task execution status too, so "what is the agent doing right now?" becomes a team-level signal rather than a single user's view</li>
</ul>
<p>Questions, edge cases where it broke, tasks where it surprised you in a good way — send them our way. <strong>Manager Mode gets better from real workloads.</strong></p>]]></content:encoded>
            <category>deep-dive</category>
            <category>multi-agent</category>
            <category>manager</category>
            <category>architecture</category>
        </item>
        <item>
            <title><![CDATA[Switch Models Mid-Conversation: No Restarts, No Lost Context]]></title>
            <link>https://helix.iqe.me/en/blog/model-switching/</link>
            <guid>https://helix.iqe.me/en/blog/model-switching/</guid>
            <pubDate>Wed, 01 Apr 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Helix treats the model as the current engine of a session, not the identity of it. Users can switch models anywhere in a conversation — default model, mid-session, or even per tool call — with the full history preserved, no restart required.]]></description>
            <content:encoded><![CDATA[<p><img decoding="async" loading="lazy" alt="Helix — the session continues, the engine is swappable" src="https://helix.iqe.me/en/assets/images/model-switching-cover-d57923d96ec5c46f44a7ca1d4b781918.png" width="1792" height="1024" class="img_ev3q"></p>
<blockquote>
<p>Want a cheaper model? Paste your requirements again.<br>
<!-- -->Want a stronger model? Re-explain the whole context.<br>
<!-- -->That is what switching models looks like in most AI tools today.</p>
</blockquote>
<p>Switching models mid-session sounds simple. In most systems, it actually means: <strong>start over</strong>.</p>
<p>You pick a model at the beginning of a conversation. You build context — twenty messages deep, a dozen tool calls, a pile of file reads. Then you realize the model is too slow, too expensive, or missing a capability you need. Your options: abandon the session and start fresh, or keep going with the wrong tool for the job.</p>
<p>Helix is not designed that way.</p>
<p>In Helix, the model is the session's <strong>current engine</strong>, not its <strong>identity</strong>. Users can change models at any point in a conversation — the same history, the same tool config, the same thinking context, just a different engine driving the next message.</p>
<blockquote>
<p>This is the concrete product expression of the position stated in <a class="" href="https://helix.iqe.me/en/blog/introducing-helix/">the Helix introduction</a>: <strong>"the session is the unit of continuity; the model is just the current engine."</strong></p>
</blockquote>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-committing-to-a-model-upfront-is-a-structural-waste">1. Committing to a model upfront is a structural waste<a href="https://helix.iqe.me/en/blog/model-switching/#1-committing-to-a-model-upfront-is-a-structural-waste" class="hash-link" aria-label="Direct link to 1. Committing to a model upfront is a structural waste" title="Direct link to 1. Committing to a model upfront is a structural waste" translate="no">​</a></h2>
<p>Every model has a different cost-capability tradeoff.</p>
<p>A model that's ideal for deep reasoning on a complex architecture problem is expensive for quick drafting work. A fast, cheap model that handles routine edits well falls short when you need multi-step reasoning across a large codebase.</p>
<p>Real work doesn't fit neatly into one category.</p>
<p>A coding session often starts with exploration — reading files, understanding structure, asking clarifying questions — and ends with implementation work that demands more capability. Or the reverse: start with a powerful model on the hard part, then switch to something faster for the follow-through.</p>
<p>The conventional approach forces users to make this choice once, at session start, with the least information they will ever have about what the task actually requires.</p>
<p>This isn't a limitation of the models themselves. It's the product form putting "pick the model" at the wrong moment.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-helix-model-switching-traditional-way-vs-helix-way">2. Helix model switching: traditional way vs Helix way<a href="https://helix.iqe.me/en/blog/model-switching/#2-helix-model-switching-traditional-way-vs-helix-way" class="hash-link" aria-label="Direct link to 2. Helix model switching: traditional way vs Helix way" title="Direct link to 2. Helix model switching: traditional way vs Helix way" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Traditional way (copy/paste, restart) vs Helix (seamless switch)" src="https://helix.iqe.me/en/assets/images/model-switching-vs-traditional-55a8f59ea6ef005372408b5a7f20921b.png" width="1792" height="1024" class="img_ev3q"></p>
<p>In Helix, the model selector is always available in the chat toolbar. Users can change it at any point during a conversation.</p>
<p>The next message they send uses the new model — with full access to everything that happened before.</p>
<p><strong>No reset. No re-explaining. No "let me catch you up."</strong></p>
<p>The conversation history travels with the user across model changes. This is not a prompt injection trick where the previous messages are summarized and handed off — <strong>the actual message history is transferred to the new model directly</strong>, so it has the same context depth as if it had been in the conversation from the start.</p>
<p>Helix exposes model switching at three different levels of granularity:</p>
<ul>
<li class=""><strong>Default model</strong> — the account- or workspace-level default engine; new sessions inherit from here</li>
<li class=""><strong>In-session switch</strong> — replace the engine at any point mid-conversation; takes effect on the next message</li>
<li class=""><strong>Per-tool-call routing</strong> — certain specialized tools (lightweight prompts, code completion) can use a cheaper model than the main conversation, routed automatically by the Agent system</li>
</ul>
<p>All three levels share the same underlying switching mechanism; they differ only in where the switch is triggered.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-three-switching-paths">3. Three switching paths<a href="https://helix.iqe.me/en/blog/model-switching/#3-three-switching-paths" class="hash-link" aria-label="Direct link to 3. Three switching paths" title="Direct link to 3. Three switching paths" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Three switch paths: same provider / same family / cross provider" src="https://helix.iqe.me/en/assets/images/model-switching-three-paths-e512251c8a7c429c6e5595883c3f9ec2.png" width="1792" height="1024" class="img_ev3q"></p>
<p>Not all model switches are the same. Helix handles three distinct scenarios differently, based on what needs to change under the hood.</p>
<p><strong>Same provider, same base URL</strong> (e.g., <code>gpt-4o</code> → <code>gpt-4.1</code>): the LLM session updates its model ID in place. The existing connection, tool configuration, and message history are untouched. This is nearly instantaneous.</p>
<p><strong>Same provider family, different base URL</strong> (e.g., switching between two custom OpenAI-compatible endpoints): the session updates its model ID, base URL, and API key. No Runner rebuild required.</p>
<p><strong>Cross-provider switch</strong> (e.g., GPT-4o → Claude Sonnet, or any model → CLI mode): a full Runner rebuild happens. The message history is extracted from the old Runner, sanitized, and loaded into the new one. This is the most interesting case — and the one worth understanding in detail.</p>
<p>At the engineering level, "light switches" and "heavy switches" travel different code paths. From the user's point of view, they are all the same single click in the toolbar. Hiding that complexity is part of what the product is for.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-what-happens-to-your-conversation-history-on-a-cross-provider-switch">4. What happens to your conversation history on a cross-provider switch<a href="https://helix.iqe.me/en/blog/model-switching/#4-what-happens-to-your-conversation-history-on-a-cross-provider-switch" class="hash-link" aria-label="Direct link to 4. What happens to your conversation history on a cross-provider switch" title="Direct link to 4. What happens to your conversation history on a cross-provider switch" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Three-pass canonicalization pipeline" src="https://helix.iqe.me/en/assets/images/model-switching-three-pass-1366042e06336391ea4691a278a5f886.png" width="1792" height="1024" class="img_ev3q"></p>
<p>When you switch across providers, Helix performs a canonicalization pass on your message history before handing it to the new model.</p>
<p>Here's why that's necessary.</p>
<p>Different providers have subtly different requirements for what constitutes a valid message sequence. After a long session, your history may contain:</p>
<ul>
<li class=""><strong>Empty assistant messages</strong> — left behind when a response was interrupted before content arrived</li>
<li class=""><strong>Orphaned tool calls</strong> — the assistant requested a tool but the result was never received (cancellation, network interruption, etc.)</li>
<li class=""><strong>Consecutive messages from the same role</strong> — an artifact of certain error recovery paths</li>
</ul>
<p>Any one of these can cause the new provider's API to return a 400. The session would appear broken even though the underlying content is intact.</p>
<p>Canonicalization fixes this before the new model ever sees the history, in three passes:</p>
<ol>
<li class=""><strong>Pass 1 — Remove empty assistant messages.</strong> <code>content=""</code> with no tool_calls triggers "non-empty content" errors. Strip them first.</li>
<li class=""><strong>Pass 2 — Merge consecutive user messages.</strong> After Pass 1 you may end up with two adjacent user messages. Identical ones are deduplicated; different ones are joined with <code>\n</code>.</li>
<li class=""><strong>Pass 3 — Trim unpaired tool calls.</strong> Scan the last 5 assistant tool-call messages, find any <code>tool_call_id</code> with no matching tool response, and truncate from that point.</li>
</ol>
<p>The cleaned history is loaded into the new Runner. Tool config is restored (<code>toolChoice: auto</code>). If the new model supports extended thinking and it was enabled before, it is re-enabled — otherwise it is automatically disabled for that model.</p>
<p><strong>Result: the new model sees a complete, clean conversation. Content is fully preserved. The provider-specific quirks from the old session are gone.</strong></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-three-real-world-scenarios">5. Three real-world scenarios<a href="https://helix.iqe.me/en/blog/model-switching/#5-three-real-world-scenarios" class="hash-link" aria-label="Direct link to 5. Three real-world scenarios" title="Direct link to 5. Three real-world scenarios" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="scenario-1-draft-fast-refine-with-depth">Scenario 1: Draft fast, refine with depth<a href="https://helix.iqe.me/en/blog/model-switching/#scenario-1-draft-fast-refine-with-depth" class="hash-link" aria-label="Direct link to Scenario 1: Draft fast, refine with depth" title="Direct link to Scenario 1: Draft fast, refine with depth" translate="no">​</a></h3>
<p>You're writing a technical spec for a new API. The structure is straightforward — resource definitions, endpoint signatures, error codes. You want to get the draft out quickly without burning expensive reasoning capacity on scaffolding work.</p>
<p>You start with a fast, cost-efficient model. It handles the scaffolding well: proposes the initial endpoint list, drafts the request/response schema, sketches the error taxonomy. Thirty messages in, you have a solid skeleton.</p>
<p>Now the hard part: inconsistencies in the auth model, edge cases in the pagination design, questions about backward compatibility. This is where you want the strongest reasoning you can get.</p>
<p>You switch to your most capable model — right there, same session. It picks up the draft exactly where it is. You ask it to audit the auth design. It reads the full thirty-message history of decisions already made, flags two contradictions you hadn't noticed, and proposes a cleaner approach that's consistent with the patterns already established.</p>
<p><strong>You didn't restart anything. You didn't paste the spec into a new window.</strong> The fast model did the work it was good at; the powerful model did the work it was good at. Total cost: a fraction of what it would have cost to run everything on the capable model from the start.</p>
<hr>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="scenario-2-hit-a-capability-wall-mid-session-keep-going">Scenario 2: Hit a capability wall mid-session, keep going<a href="https://helix.iqe.me/en/blog/model-switching/#scenario-2-hit-a-capability-wall-mid-session-keep-going" class="hash-link" aria-label="Direct link to Scenario 2: Hit a capability wall mid-session, keep going" title="Direct link to Scenario 2: Hit a capability wall mid-session, keep going" translate="no">​</a></h3>
<p>You're in a debugging session. A Go service is misbehaving under load — requests are stalling and you suspect a goroutine leak. You've been using a model with strong reasoning capability. Over the past fifteen messages it has traced the issue to a goroutine that's consuming from a message queue without a timeout.</p>
<p>Now you need to fix it: edit three files, run the test suite, check that the queue consumer behavior changes as expected. Your current model doesn't support tool calls.</p>
<p>You switch to a model with tool access. Same session, same history.</p>
<p>The new model can see the full diagnostic trail — the stack traces you explored, the hypothesis you validated, the exact files you identified. It doesn't need any re-explanation. It goes straight to the implementation, runs the tests, confirms the fix holds.</p>
<p>No re-diagnosis. No "can you summarize what we found?" <strong>The context is already there because the history is already there.</strong></p>
<hr>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="scenario-3-cost-aware-multi-phase-review">Scenario 3: Cost-aware multi-phase review<a href="https://helix.iqe.me/en/blog/model-switching/#scenario-3-cost-aware-multi-phase-review" class="hash-link" aria-label="Direct link to Scenario 3: Cost-aware multi-phase review" title="Direct link to Scenario 3: Cost-aware multi-phase review" translate="no">​</a></h3>
<p>Your team has a batch of pull requests queued for AI-assisted review. Most are mechanical — check for common patterns, flag style violations, confirm test coverage. A few are genuinely complex — architecture decisions, security-sensitive changes, subtle logic in concurrent code.</p>
<p>You work through the batch in a single session. For the routine reviews, you stay on a fast, cost-effective model. It handles the pattern-matching well. When you hit a PR that touches the auth layer and the billing service simultaneously, you switch to your highest-capability model for that one.</p>
<p>Then switch back.</p>
<p>The session thread keeps the full record of every review, every flag, every comment. <strong>The model switches are invisible to anyone reading the session history</strong> — they just see a coherent thread of review work. The cost profile matches the actual complexity of each piece of work, not the worst-case complexity of any single piece.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="6-why-the-model-is-a-parameter-not-an-identity">6. Why the model is a parameter, not an identity<a href="https://helix.iqe.me/en/blog/model-switching/#6-why-the-model-is-a-parameter-not-an-identity" class="hash-link" aria-label="Direct link to 6. Why the model is a parameter, not an identity" title="Direct link to 6. Why the model is a parameter, not an identity" translate="no">​</a></h2>
<p>The design decision here is that <strong>the session, not the model, is the persistent entity</strong>.</p>
<p>Your conversation state lives in the session. The model is a parameter of how the next message gets processed. Switching the model doesn't change whose conversation this is — it's still the same conversation; only the next message gets handled by a different engine.</p>
<p>This means the model selector in Helix works differently from a provider switcher in other tools. You're not starting a "new conversation with Claude" — you're continuing the same conversation, but with a different engine processing the next message.</p>
<p>The WebSocket protocol reflects this. Every outbound message carries the current model ID. The backend checks it against the session's current model on each message and runs the appropriate switch path before sending to the LLM. <strong>There is no separate "switch model" API call. The switch and the message are one atomic operation.</strong></p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Every message over WebSocket:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">{</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "type": "message",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "content": "...",</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "model": "builtin-anthropic:claude-sonnet-4-5",   ← current selection</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  "req_id": "req_xxx"</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">}</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">Backend on receipt:</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  if msg.Model != session.CurrentModel {</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      runSwitchPath(session, msg.Model)   // one of the three paths above</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  }</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  // then process message with (possibly new) model</span><br></span></code></pre></div></div>
<p>A side effect of this design: you can change models as often as you want. There is no accumulated penalty for switching back and forth. Every switch is evaluated fresh against the current state. There is no "the session gets weird after three switches" trap — because the switch itself carries no accumulated state.</p>
<p>It also means the session's continuity does not depend on "not changing models." Continuity comes from the message history stored on the session; the model is just the component consuming that history and producing the next response.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="7-get-started">7. Get started<a href="https://helix.iqe.me/en/blog/model-switching/#7-get-started" class="hash-link" aria-label="Direct link to 7. Get started" title="Direct link to 7. Get started" translate="no">​</a></h2>
<p>Model switching requires no configuration. The model selector is in the chat toolbar of every Helix session.</p>
<p>A few things worth knowing before you use it:</p>
<ul>
<li class=""><strong>Switch any time.</strong> There is no right or wrong moment. The switch takes effect on the next message you send.</li>
<li class=""><strong>History is fully preserved.</strong> The new model sees everything that happened before it — not a summary, the actual history.</li>
<li class=""><strong>Tool configuration carries over.</strong> The new model gets the same tool access, provided it supports tool calls.</li>
<li class=""><strong>Thinking mode follows capability.</strong> If the new model supports extended thinking and you have it enabled, it continues. If it doesn't, it's disabled automatically for that model.</li>
<li class=""><strong>Switching is free.</strong> There's no cost to the switch itself — only to the messages you send after it.</li>
</ul>
<blockquote>
<p><strong>The session is the conversation. The model is just whichever engine is running it right now.</strong></p>
</blockquote>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="go-deeper-into-helix">Go deeper into Helix<a href="https://helix.iqe.me/en/blog/model-switching/#go-deeper-into-helix" class="hash-link" aria-label="Direct link to Go deeper into Helix" title="Direct link to Go deeper into Helix" translate="no">​</a></h2>
<p>Model switching is one concrete expression of a broader stance in Helix: the session is the persistent entity. The same thinking shows up in several larger places in the product:</p>
<ul>
<li class="">🧩 <a class="" href="https://helix.iqe.me/en/blog/manager-mode/">Manager Mode — treating an Agent as a system, not a conversation</a></li>
<li class="">🌿 <a class="" href="https://helix.iqe.me/en/blog/automatic-worktree/">Automatic Worktree — Agents work on isolated branches and never touch your main</a></li>
<li class="">🔒 <a class="" href="https://helix.iqe.me/en/blog/helixvm-intro/">HelixVM — putting Agents inside a local VM sandbox</a></li>
<li class="">🚀 <a class="" href="https://helix.iqe.me/en/blog/introducing-helix/">Introducing Helix — not a smarter AI assistant, an AI coworker that ships</a></li>
</ul>
<p>Or just open Helix, grab a session that's currently stuck, and switch to a different model — see whether it picks up where the last one left off.</p>]]></content:encoded>
            <category>features</category>
            <category>models</category>
            <category>workflow</category>
            <category>deep-dive</category>
        </item>
        <item>
            <title><![CDATA[Introducing Helix — Not a Smarter AI Assistant, but an AI Teammate That Actually Delivers]]></title>
            <link>https://helix.iqe.me/en/blog/introducing-helix/</link>
            <guid>https://helix.iqe.me/en/blog/introducing-helix/</guid>
            <pubDate>Tue, 20 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Helix isn't yet another AI that can write code. It's a product that treats AI agents as a real engineering system — multi-agent orchestration, parallel execution, durable context, and cross-workspace delivery.]]></description>
            <content:encoded><![CDATA[<p><img decoding="async" loading="lazy" alt="Helix — not another AI chat box, but an engineering system built to deliver in parallel" src="https://helix.iqe.me/en/assets/images/introducing-helix-cover-d4884035871a5e36103e363ae69deb12.png" width="1792" height="1024" class="img_ev3q"></p>
<blockquote>
<p>Most AI coding assistants keep getting smarter at <em>answering</em>.<br>
<!-- -->They've barely moved at <em>delivering</em>.<br>
<!-- -->That's what Helix is built to change.</p>
</blockquote>
<p>Over the past year, AI coding assistants have multiplied. Models got smarter, context windows grew longer, reasoning chains went deeper — and yet, for people actually running real engineering tasks through them, the lived experience didn't improve nearly as much.</p>
<p>They all follow a similar arc. The first session feels magical: it writes a function that almost looks production-ready. After a few weeks of real use, it starts to feel off. After a few more weeks, the pattern becomes clear: what these tools are really good at is <strong>answering beautifully</strong>, not <strong>getting things done</strong>.</p>
<p>Ask one to "refactor the auth module." It returns a polished, well-structured explanation: "Here's the new AuthService design…" — and then what? Editing the code, running the tests, fixing the build, committing, merging — that's still on the user.</p>
<p><strong>It hands back an answer, not an outcome.</strong></p>
<p>Real work needs outcomes. A cross-module refactor that actually runs to completion. A test suite that turns green. A commit that lands on <code>main</code>. A board where progress is visible to the team. None of that involves "saying it well."</p>
<p>That's the problem Helix was built to solve.</p>
<blockquote>
<p>Helix isn't another AI assistant that answers questions — it's designed to be <strong>an AI teammate that actually delivers</strong>.</p>
</blockquote>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-the-ceiling-of-single-thread-chat">1. The ceiling of single-thread chat<a href="https://helix.iqe.me/en/blog/introducing-helix/#1-the-ceiling-of-single-thread-chat" class="hash-link" aria-label="Direct link to 1. The ceiling of single-thread chat" title="Direct link to 1. The ceiling of single-thread chat" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Single-thread chat vs Helix multi-agent parallel" src="https://helix.iqe.me/en/assets/images/introducing-helix-single-vs-multi-d677a407e6453c962d7a01570138b712.png" width="1792" height="1024" class="img_ev3q"></p>
<p>Most AI coding tools are, at heart, just a smarter chat box.</p>
<p>The user talks, the model replies. Every piece of work gets compressed into the same conversational thread: understanding intent, exploring code, writing code, running tests, explaining errors, summarizing progress — all in sequence, all in one timeline.</p>
<p>The real problem isn't model capability. It's that the chat-box shape <strong>structurally can't carry complex work</strong>:</p>
<ul>
<li class="">A task spans three modules; thirty turns in, the user has lost track of what the agent promised and what it skipped.</li>
<li class="">Long sessions get <strong>expensive and fragile</strong>: every new message replays the entire history through the model, costs scale linearly, but quality often doesn't.</li>
<li class="">The user can't see what the agent is actually doing. It says "done" — open the IDE, and the work is incomplete, or wrong.</li>
<li class="">Tasks cross local and remote environments; every switch means re-explaining context.</li>
</ul>
<p>None of this is the model's fault. It's the ceiling of "conversation" as a shape.</p>
<p>How do human engineering teams handle problems like this? They <strong>split the work</strong>. A PM keeps the goal. Engineers implement. Multiple people work on sub-tasks in parallel. A board makes progress visible.</p>
<p>AI agents should do the same.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-helix-treats-agents-as-a-system-not-a-conversation">2. Helix treats agents as a <strong>system</strong>, not a <strong>conversation</strong><a href="https://helix.iqe.me/en/blog/introducing-helix/#2-helix-treats-agents-as-a-system-not-a-conversation" class="hash-link" aria-label="Direct link to 2-helix-treats-agents-as-a-system-not-a-conversation" title="Direct link to 2-helix-treats-agents-as-a-system-not-a-conversation" translate="no">​</a></h2>
<p>The core stance of Helix can be put in one sentence:</p>
<blockquote>
<p><strong>The session is the unit of continuity; the model is just the current engine. The task is the unit of delivery; the conversation is just the record of how it happened.</strong></p>
</blockquote>
<p>Once the frame shifts from "who am I talking to" to "how does this task ship," the rest follows naturally:</p>
<ul>
<li class="">Tasks can be decomposed, so there should be <strong>Manager + Execution + SubAgent</strong> role separation.</li>
<li class="">Sub-tasks are independent, so they should run <strong>in parallel</strong> instead of serially queuing.</li>
<li class="">Long tasks inevitably accumulate context, so <strong>Cache + Compact</strong> should protect quality and contain cost.</li>
<li class="">Real work moves between local and remote, so a <strong>workspace should be an independent execution boundary</strong>, not tied to a specific model or chat.</li>
</ul>
<p>The rest of this post unpacks each of those.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-three-layer-multi-agent-architecture-agents-that-constrain-each-other">3. Three-layer multi-agent architecture: agents that constrain each other<a href="https://helix.iqe.me/en/blog/introducing-helix/#3-three-layer-multi-agent-architecture-agents-that-constrain-each-other" class="hash-link" aria-label="Direct link to 3. Three-layer multi-agent architecture: agents that constrain each other" title="Direct link to 3. Three-layer multi-agent architecture: agents that constrain each other" translate="no">​</a></h2>
<p><img decoding="async" loading="lazy" alt="Helix three-layer architecture — Manager / Execution / SubAgent" src="https://helix.iqe.me/en/assets/images/introducing-helix-three-layer-976ebecd02f79ad32c761d13e1dca182.png" width="1792" height="1024" class="img_ev3q"></p>
<p>The most damaging failure mode of single-agent systems is <strong>scope drift</strong>. The user says "refactor the auth module," and three minutes later the agent has also "optimized" five unrelated utilities, swapped in a new error-handling library, and pulled in a new dependency — then written a thorough summary explaining "why all of this was necessary."</p>
<p>The root cause is direct: a single-agent system <strong>never separates "understand what the user asked for" from "decide what to do."</strong> Both responsibilities get fused into the same loop, with no guardrail between them.</p>
<p>Helix's answer is to split those responsibilities <strong>across different agents</strong>, so they can constrain each other.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="manager-agent--guards-the-goal-never-touches-code">Manager Agent — guards the goal, never touches code<a href="https://helix.iqe.me/en/blog/introducing-helix/#manager-agent--guards-the-goal-never-touches-code" class="hash-link" aria-label="Direct link to Manager Agent — guards the goal, never touches code" title="Direct link to Manager Agent — guards the goal, never touches code" translate="no">​</a></h3>
<p>The Manager doesn't write code, doesn't run shell commands, doesn't call tools. Its single responsibility is this: <strong>make sure what gets delivered is what was originally asked for</strong>.</p>
<p>It locks the user's original request as a baseline. Every action by the execution layer is evaluated against that baseline for scope creep. And it requires <strong>evidence</strong> of completion — not "the agent said it's done," but the commit actually in <code>git log</code>, the working tree actually clean, the test output actually green.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="execution-agent--drives-execution-decomposes-work">Execution Agent — drives execution, decomposes work<a href="https://helix.iqe.me/en/blog/introducing-helix/#execution-agent--drives-execution-decomposes-work" class="hash-link" aria-label="Direct link to Execution Agent — drives execution, decomposes work" title="Direct link to Execution Agent — drives execution, decomposes work" translate="no">​</a></h3>
<p>This is the layer that actually works. It reads code, writes code, calls tools, runs tests. But it doesn't try to be a hero — it actively breaks work into parallelizable pieces and dispatches them to SubAgents below.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="subagent--goes-deep-runs-in-parallel">SubAgent — goes deep, runs in parallel<a href="https://helix.iqe.me/en/blog/introducing-helix/#subagent--goes-deep-runs-in-parallel" class="hash-link" aria-label="Direct link to SubAgent — goes deep, runs in parallel" title="Direct link to SubAgent — goes deep, runs in parallel" translate="no">​</a></h3>
<p>Each SubAgent is an isolated execution context. It focuses on one thing inside its own view of the world, then reports results back to the Execution Agent.</p>
<p>The elegance of this architecture isn't "we have N agents." It's this:</p>
<blockquote>
<p><strong>No single agent is responsible for both defining the goal and executing it.</strong></p>
</blockquote>
<p>PMs don't write code. Engineers don't redefine requirements. For the first time, an AI system has role boundaries that resemble a real team.</p>
<blockquote>
<p>For a deep dive into how the three-layer architecture defends scope in complex tasks, see <a class="" href="https://helix.iqe.me/en/blog/manager-mode/">Manager Mode</a>.</p>
</blockquote>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-parallel-scheduling-three-tasks-in-the-time-of-one">4. Parallel scheduling: three tasks in the time of one<a href="https://helix.iqe.me/en/blog/introducing-helix/#4-parallel-scheduling-three-tasks-in-the-time-of-one" class="hash-link" aria-label="Direct link to 4. Parallel scheduling: three tasks in the time of one" title="Direct link to 4. Parallel scheduling: three tasks in the time of one" translate="no">​</a></h2>
<p>A chat box is a serial interface. Real work is often parallel.</p>
<p>Once the Execution Agent decomposes a task, Helix <strong>actually runs the sub-tasks concurrently</strong> — not just lists them, but dispatches and executes them at the same time.</p>
<p>Concrete example. The user says: "Migrate auth, payment, and notification services from REST to gRPC."</p>
<p>In traditional chat-based AI, this becomes:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">user → model → auth migrated → user reviews → user says continue →</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">model → payment migrated → user reviews → user says continue → ...</span><br></span></code></pre></div></div>
<p>In Helix it becomes:</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">Execution Agent</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   ├─→ SubAgent A: auth migration         [parallel]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   ├─→ SubAgent B: payment migration      [parallel]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   └─→ SubAgent C: notification migration [parallel]</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                ↓</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">   Execution Agent: collect → verify → deliver</span><br></span></code></pre></div></div>
<p>Three independent pieces of work, <strong>finished in roughly the time of one</strong>.</p>
<p>Parallelism isn't just a speedup. It changes the <em>waiting</em> experience: the rhythm of "say something, wait, say something else, wait again" gets replaced by "state the full goal once, watch multiple tracks advance simultaneously."</p>
<p>More importantly — because the Manager Agent guards the upstream boundary, the "loss of control" usually associated with parallel agents is contained. SubAgents can run as fast as they want; the Execution Agent still merges their outputs in sequence, and the Manager still validates the integrated delivery against the original scope.</p>
<p><strong>Faster, but the boundary holds.</strong></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-context-management-a-50-turn-task-that-feels-like-a-5-turn-one">5. Context management: a 50-turn task that feels like a 5-turn one<a href="https://helix.iqe.me/en/blog/introducing-helix/#5-context-management-a-50-turn-task-that-feels-like-a-5-turn-one" class="hash-link" aria-label="Direct link to 5. Context management: a 50-turn task that feels like a 5-turn one" title="Direct link to 5. Context management: a 50-turn task that feels like a 5-turn one" translate="no">​</a></h2>
<p>The most demoralizing thing about long sessions is that <strong>the deeper progress gets, the worse the experience becomes</strong>.</p>
<p>In the first few turns the AI is fast and accurate. After dozens of turns, the whole session feels heavy — every message reloads the entire history through the model, latency creeps up, quality drifts down, and the token bill starts to hurt.</p>
<p>Helix counters that curve with two mechanisms:</p>
<p><strong>KV Caching.</strong> Large tool outputs — file reads, shell command results, search results — are cached. The agent knows "that history is sitting there, retrieve it when needed" instead of re-sending it verbatim with every model request.</p>
<p><strong>Auto-compression.</strong> Once history exceeds a threshold, Helix compresses older sections into concise summaries and slides the active window forward. The agent still understands what happened; the user doesn't pay token cost for the entire transcript.</p>
<p>Both are <strong>enabled by default and invisible to the user</strong>. The goal isn't to look technically sophisticated — it's to make a 50-turn complex task <strong>feel as responsive as a 5-turn quick one</strong>.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="6-multi-workspace-local-and-remote-stop-being-two-separate-products">6. Multi-workspace: local and remote stop being two separate products<a href="https://helix.iqe.me/en/blog/introducing-helix/#6-multi-workspace-local-and-remote-stop-being-two-separate-products" class="hash-link" aria-label="Direct link to 6. Multi-workspace: local and remote stop being two separate products" title="Direct link to 6. Multi-workspace: local and remote stop being two separate products" translate="no">​</a></h2>
<p>Many AI tools assume the user works in one place — a local repo, a cloud IDE, or a remote dev container. The moment work crosses that boundary, the experience breaks.</p>
<p>Helix treats a <strong>workspace</strong> as an independent execution boundary.</p>
<p>Within a workspace, the sessions running there, the code those sessions write, the tools they call, the ports they reach — all stay inside that workspace's boundary. The user can have a local repo, a remote dev environment, an ephemeral VM, and a Docker container open at the same time. They're independent, but all driven by the same agent system.</p>
<p>That means:</p>
<ul>
<li class="">"An experimental task on the laptop" and "a real deployment task on the remote machine" can <strong>run at the same time</strong>.</li>
<li class="">Switching workspaces carries session state, model choice, and connection config along — no re-explaining.</li>
<li class="">When something goes wrong, the workspace is a natural isolation unit — one workspace blowing up doesn't affect the others.</li>
</ul>
<blockquote>
<p>This idea extends further in <a class="" href="https://helix.iqe.me/en/blog/helixvm-intro/">HelixVM</a>: when the workspace itself is a virtual machine, the agent's execution boundary becomes a <strong>safety boundary</strong> as well.</p>
</blockquote>
<p>And at the source-control layer, Helix provides <a class="" href="https://helix.iqe.me/en/blog/automatic-worktree/">Automatic Worktree</a>: the agent never touches the user's main branch directly. All changes happen on isolated branches, with code review gating the merge back to main.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="7-the-commitments-helix-makes-to-itself">7. The commitments Helix makes to itself<a href="https://helix.iqe.me/en/blog/introducing-helix/#7-the-commitments-helix-makes-to-itself" class="hash-link" aria-label="Direct link to 7. The commitments Helix makes to itself" title="Direct link to 7. The commitments Helix makes to itself" translate="no">​</a></h2>
<p>By now, the way Helix talks about agents probably feels different from most AI products.</p>
<p>Helix doesn't lean on phrases like "smarter model," "longer context," or "deeper reasoning." The team has seen too many products where the model keeps getting stronger but the user experience barely moves — great demo videos, same old problems in real use.</p>
<p>Helix holds itself to a short, plain list:</p>
<ol>
<li class=""><strong>Not a pretty demo — a real task that actually runs to completion.</strong> A task that "looks done" doesn't count. A commit on main with a green test run does.</li>
<li class=""><strong>Not a smooth conversation — a change that actually lands.</strong> What matters in the end is whether the commit exists in <code>git log</code>, not how nicely the chat read.</li>
<li class=""><strong>Not the quality of a single answer — the reliability of the whole delivery.</strong> Turn 47 should still behave like turn 1.</li>
<li class=""><strong>Not the model doing the work for the user — the agent system doing it.</strong> The model is the engine; the architecture surrounding it is what shapes the experience.</li>
</ol>
<p>If someone just wants an AI that answers questions, there are plenty of options out there.</p>
<p>But for people who <strong>actually run engineering tasks through AI</strong> — the kind of 30-turn task that spans five files, needs parallelism, needs visibility, needs to keep going when something breaks — Helix is built for that.</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="8-whats-coming-next">8. What's coming next<a href="https://helix.iqe.me/en/blog/introducing-helix/#8-whats-coming-next" class="hash-link" aria-label="Direct link to 8. What's coming next" title="Direct link to 8. What's coming next" translate="no">​</a></h2>
<p>Helix is iterating quickly. On the roadmap:</p>
<ul>
<li class=""><strong>Richer evidence views</strong>: visualizing what each SubAgent did, which files it touched, which commands it ran.</li>
<li class=""><strong>Stronger workflow templates and a skill system</strong>: turning repeatable engineering patterns — migrations, test coverage, logging instrumentation — into reusable "playbooks."</li>
<li class=""><strong>Cross-platform desktop parity</strong>: bringing macOS, Windows, and Linux experiences to the same level.</li>
<li class=""><strong>Team collaboration paths</strong>: letting humans and agents co-operate on the same task flow.</li>
</ul>
<p>One thing won't change:</p>
<blockquote>
<p><strong>An agent shouldn't be designed as a toy that talks. It should be designed as a teammate that delivers.</strong></p>
</blockquote>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="want-to-try-it">Want to try it?<a href="https://helix.iqe.me/en/blog/introducing-helix/#want-to-try-it" class="hash-link" aria-label="Direct link to Want to try it?" title="Direct link to Want to try it?" translate="no">​</a></h2>
<p>Helix is currently open to beta users.</p>
<p>If the idea that "agents should be designed as an engineering system" resonates, take Helix for a spin on the project the user knows best — and don't pick an easy task. Pick the one that makes them hesitate to hand it to AI again. That's the case Helix really wants to be tested against.</p>
<ul>
<li class="">🚀 <a class="" href="https://helix.iqe.me/en/app/">Launch the web app</a></li>
<li class="">💾 <a class="" href="https://helix.iqe.me/en/download/">Download the desktop client</a></li>
<li class="">📖 <a class="" href="https://helix.iqe.me/en/docs/">Quick start docs</a></li>
<li class="">🧩 <a class="" href="https://helix.iqe.me/en/blog/manager-mode/">Deep dive: Manager Mode</a></li>
<li class="">🌿 <a class="" href="https://helix.iqe.me/en/blog/automatic-worktree/">Deep dive: Automatic Worktree</a></li>
<li class="">🔒 <a class="" href="https://helix.iqe.me/en/blog/helixvm-intro/">Deep dive: HelixVM safety sandbox</a></li>
</ul>]]></content:encoded>
            <category>announcement</category>
            <category>release</category>
            <category>launch</category>
            <category>multi-agent</category>
            <category>architecture</category>
        </item>
    </channel>
</rss>