Every push to holdthought-web passed through the same deployment gate regardless of what it was changing. A product bug fix and a personal dashboard tweak ran identical CI. The dashboard had no business being there. It had grown there organically, and by the time I noticed the friction it was load-bearing enough that I could not cleanly extract it without a harder decision.
what was living in the product repo
HoldThought is a customer-facing product. holdthought-web is its application layer. At the time of the split, the repo also contained a personal operator dashboard, a session-resume bridge that let me pick up Claude Code sessions from a browser, and a vault portal layer that proxied my private knowledge base.
None of that belongs in a customer product repo. Each piece had been added in a moment of “I’ll clean this up later.” The dashboard got wired to the same Next.js app because it was convenient. The bridge used the same auth middleware because I did not want to build another one. The vault portal shared the same origin because it was already there.
What I had built, without planning to, was a half-product, half-operator chimera. One codebase trying to serve a paying customer and its developer simultaneously, with the same deployment surface, the same test suite, and the same release story.
the fork, not the rewrite
The fix was not to extract the operator layer into a new project from scratch. I forked holdthought-web into a new repo called benagents. That choice matters. A fork inherits the full commit history, the existing service configs, and all the wiring that already worked. A rewrite inherits none of it.
Starting from the fork meant I could strip backward, removing everything product-specific, rather than assemble forward from zero. The product continued deploying from its original tree. benagents became a separate deployment that happened to start from the same point in history and then diverge immediately.
the shape the new deployment took
The operator cockpit lives on a tailnet-only host now. Traffic never reaches it from the public internet. IIS handles the reverse proxy layer, and Caddy sits behind it as the TLS terminator.
The cert chain deserved attention. I pointed Cloudflare’s DNS-01 challenge at it, which means no inbound HTTP challenge traffic and no need to open port 80. The cert syncs on a weekly cron baked into the repo. NSSM manages the service as a Windows process that survives reboots without a run-command wrapper holding it together.
Five pieces working together: reverse proxy, TLS terminator, cert issuer, cert sync, service supervisor. Written as a list it sounds like overhead. In practice it is a stable stack that has not needed intervention since the deploy landed. The tailnet boundary is the only real architectural commitment. Everything else follows from standard IIS and Windows patterns with one extra layer of trust enforcement around who can reach the host at all.
the session-resume bridge
The piece I had wanted most, and deferred longest, was a proper operator console. Something that could attach to a running Claude Code session and show context without me switching windows or losing thread history to a tab reload.
The bridge uses WebSocket for the session stream and REST for the control surface. Claude session state persists across page reloads. Navigate away, come back, and the conversation is still there. That sounds minor until the alternative is losing a long agent thread because Chrome decided to discard the tab.
I could not have shipped that inside holdthought-web without the bridge bleeding into the product UX layer. Every session-resume request would have hit the same origin as a customer session management call. The split made the boundary obvious before I wrote a single line of bridge code, and the bridge shipped in the same week as the fork.
what came out of holdthought-web
Once benagents was running, I stripped the dashboard, the bridge, and the vault portal layer out of holdthought-web. One less chimera, one cleaner product repo.
I had expected the stripping to feel like loss. It did not. The deployment config got shorter. The environment variable list shed several CLAUDE_* entries that had no business in a product’s env surface. The test suite stopped failing intermittently on session-bridge code that had nothing to do with the features under test. The product got cleaner because the foreign material was gone.
the fixes that only became cheap after the split
After benagents was running independently, I fixed three things I had been accepting as normal because they were tangled with migration cost.
The root route on the cockpit was wrong. It had been routing to the vault portal index, which made sense when both things shared an origin and made no sense as a standalone operator console. Fixing it took one config change. It had not been fixed before because touching routing inside holdthought-web required coordinating with product concerns.
CLAUDE_BIN was being resolved at runtime via PATH. That worked in my shell and failed silently in the NSSM service context. Baking the absolute path into the service definition, once I owned that definition fully, took ten minutes. I had been living with the ambient ambiguity for months because the service config lived inside a product repo that had other priorities.
The SessionChat component had been built with an infinite-scroll history model that worked fine as a novelty and badly as a daily operator console. Sessions load from the bottom now, chunked, pinned to the newest message. That refactor took an afternoon. It would have taken two weeks of convincing myself it was worth the disruption inside the product’s release cycle.
None of those fixes were complicated. They were deferred because the operator system had no maintenance surface of its own. It was a passenger in a product repo that had other priorities, and any change to it required negotiating with the product’s release concerns.
when the architecture has two masters
The operator tooling for an AI-driven solo business has different availability requirements, a different change cadence, and different access control than any customer-facing product. Conflating their deployment stories produces friction in both directions. Every change to the cockpit gets taxed by the product’s release cycle, and every change to the product is complicated by the cockpit’s presence in the same tree.
The cost of that arrangement was invisible because it was paid in small delays and mental overhead rather than visible failures. Nothing broke dramatically. I just moved slower than I should have, accepted friction I had written off as inherent, and let fixes like the CLAUDE_BIN path ambiguity sit unresolved longer than they deserved.
benagents now deploys on its own schedule, restarts without coordination, and can accept breaking changes to the session bridge without triggering a product review cycle. That last property is what made the SessionChat rewrite cheap enough to actually do. The only person affected by a bad deploy is me, the NSSM service recovers in seconds, and the change does not need to be explained to anyone.
Internal tooling stops being fragile the moment you stop pretending it belongs inside the same repo and deployment story as the product.