{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "DxDev",
  "home_page_url": "https://dxdev.com/",
  "feed_url": "https://dxdev.com/feed.json",
  "description": "DxDev is an AI operations consultancy that designs, builds, and runs production multi-agent systems for small and mid-sized teams. We build the agents that run our own business, and we can build them for yours.",
  "language": "en-US",
  "authors": [
    {
      "name": "DxDev",
      "url": "https://dxdev.com/"
    }
  ],
  "items": [
    {
      "id": "https://dxdev.com/blog/2026-05-26_five-firewall-rules-blind-spot/",
      "url": "https://dxdev.com/blog/2026-05-26_five-firewall-rules-blind-spot/",
      "title": "Five firewall rules beat the proxy swarm, but the real bug was my blind spot",
      "summary": "My slowness detector only watched one IIS site. The attack was on a different one.",
      "content_text": "When CPU pinned at 100% and stayed there for thirty minutes, I ran the log query I always run. It showed mixed load across customer vanity domains, nothing that screamed coordinated attack, because that query only reads one IIS site and that site was not where the attack was landing.\n\n## The attack was real, and it was boring\n\nThe operator was a German proxy reseller. Five /19 CIDR ranges across non-adjacent address space, roughly 100 to 200 unique IPs per range, each contributing 50 to 60 hits against the same URL structure. The traffic pattern was `/teams/?u=<TEAM>&s=<sport>` roster enumeration: rotate source IPs to dodge per-IP rate limiting, never send a Referer header, spoof a Chrome 134/135 User-Agent with no Sec-CH-UA field.\n\nThree signals identify a distributed scraper: asset mix (are requests hitting static files at a realistic ratio, or drilling one endpoint?), UA homogeneity (does the user-agent distribution look like a real browser population?), and Referer presence (real organic navigation almost always carries a Referer after the first hit). This operator failed all three: pure-ASP requests with zero static files, Chrome 134/135 across every IP, Referer absent on nearly every hit. Five firewall rules at the TCP layer, blocking at the /19 level rather than per-IP, dropped CPU to 13% and open connections from 490 to 146.\n\nOnce I was reading the right log file, the fix was straightforward. Getting to the right log file was the hard part.\n\n## The detector had a blind spot\n\nMy `/hto-slow` skill surfaces a pre-captured incident bundle when I open a slowness investigation. The bundle pulls from `Watch-Site.ps1`, which tails IIS logs to surface slow requests and top-offending IP ranges, and I had relied on it for months.\n\n`Watch-Site.ps1` only tails one IIS log directory. There are three IIS sites on the box: one serves customer vanity domains, one handles direct `www.hometeamsonline.com` traffic, and one serves `media.hometeamsonline.com`. The script watched one of them, and I had never audited its coverage.\n\nWhen the investigation started, the bundle's `iis_slow.txt` showed mixed URL fan-out across customers on the customer-vanity-domain site, which looked like distributed volume load rather than a targeted swarm. The agent read it the same way, and that read was accurate: the attack was not touching that site at all.\n\n## The question that opened it\n\nFrom my phone, mid-SSH-session, I sent: \"are you checking both versions of the logs.\"\n\nThe investigation pivoted immediately. We pulled the direct-traffic site logs.\n\nThat log had 49,112 slow requests in it. The entire load was concentrated on the direct-traffic site. The customer-vanity-domain bundle had looked relatively benign compared to a normal traffic day because the attack was not touching it at all. The proxy reseller IPs were targeting `www.hometeamsonline.com` directly, not the vanity domains, and the vanity-domain log watcher had been faithfully telling me nothing was wrong.\n\nFive firewall rules later, the attack was gone. The monitoring gap was still there.\n\n## What the process looks like after the fire is out\n\nThe traffic fix is done. The practice work is what comes next.\n\nI hardened the `/hto-slow` skill with a CRITICAL warning that names all three sites and their log directories, placed at the top of the multi-site sweep step so the next investigation cannot treat the customer-vanity-domain bundle as comprehensive.\n\nI added two persistent memory entries. The first captures the multi-site sweep requirement, with descriptions of each site and the note about `Watch-Site.ps1`'s single-site scope. The second captures the proxy reseller's fingerprint: known ranges, URL pattern, UA signature, and the firewall naming convention. Memory entries are for agents and code changes are for the system, but both matter because entries can drift when the context resets.\n\nI logged two follow-ups. The first is the structural fix: rewrite `Watch-Site.ps1` to sweep all three sites and aggregate before producing a verdict. The second moves those IP ranges into `dbo.IPranges` so the application-layer filter can see them. Both exist in writing now so the work survives a context reset.\n\n## Incident response is not done when traffic drops\n\nWhen CPU dropped to 13%, the site was fast again. I could have filed a one-line note and moved on.\n\nWhat the traffic drop exposed was a different question: why had this attack been invisible until someone on a phone asked about the second log? The answer was `Watch-Site.ps1`, pointed at one of three sites, trusted for months without ever being audited for coverage.\n\nThe 49,112 slow requests in the direct-traffic site logs were there the whole time. `Watch-Site.ps1` just was not reading that file.",
      "date_published": "2026-05-26T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "security",
        "monitoring",
        "iis",
        "bots",
        "incident-response",
        "hto"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-05-23_bot-swarm-one-bad-query-plan/",
      "url": "https://dxdev.com/blog/2026-05-23_bot-swarm-one-bad-query-plan/",
      "title": "It looked like a bot swarm. It was one bad query plan.",
      "summary": "Watch-Site fired on a 9 PM CPU spike. Everything pointed to hostile traffic. It was one cached plan in IPFilterLog doing clustered scans on every request, and one OPTION (RECOMPILE) settled it.",
      "content_text": "At 9 PM the site went sideways: CPU 89%, `SOS_SCHEDULER_YIELD` topping the SQL wait stats, slow-request count climbing. Forty minutes in I was mentally drafting a blog post about a second bot swarm from a new operator, already assembling evidence to confirm a story I had not yet questioned.\n\nThere was no swarm. One cached query plan in `IPFilterLog` had been running a clustered scan on every request for hours, and I almost did not look for it because I was busy assembling the wrong narrative.\n\n## The first story was already pulling\n\nWatch-Site caught it within a minute, and the headline counters matched a swarm: no blocking, no memory pressure, no disk wait, just CPU contention under load. Three days earlier I had spent a morning blocking five `/19` ranges from a German proxy reseller that produced exactly this signature, and the agent session was still warm from that incident.\n\nI opened a Claude Code session, dropped in the new incident bundle, and asked for a read. The response landed on traffic-driven load pattern, possibly a new swarm, recommend checking W3SVC2 and W3SVC9 logs across all three IIS sites. Right approach, wrong starting assumption. The prior session was still in context, so Claude had seen the proxy reseller incident, the same wait signature, the same CPU-to-firewall-rule pipeline, and it reasoned forward from that pattern before checking whether the pattern actually fit. I would have made the same call without a deliberate gate forcing the distribution check first.\n\nI pulled the W3SVC2 and W3SVC9 logs for the past hour and grouped source IPs by `/16` netblock. The distribution was normal: no block dominating the count, no roster-enumeration URL pattern, no UA homogeneity. Just regular traffic at elevated volume, the kind of spike you get when something slow becomes popular.\n\nNot a swarm. The question shifted. Instead of who is attacking us, it became why is the database burning CPU on normal traffic.\n\n## What IPFilterLog was doing\n\nEvery request on the site runs a bot-filter check against `IPFilterLog`: a single SELECT to see whether the source IP is blocked or flagged. Simple lookup, costs nothing noticeable at normal traffic volumes. Scale it to several hundred requests per second and the plan it uses starts to matter.\n\nI pulled `sys.dm_exec_cached_plans` joined against `sys.dm_exec_query_stats` and sorted by total worker time. One statement appeared twice in the plan cache under two different plan handles, same text, two very different execution profiles. The fast plan averaged a fraction of a millisecond with a handful of logical reads per execution. The slow plan averaged orders of magnitude more, thousands of logical reads, a clustered index scan. The slow plan was winning the majority of executions.\n\nThis is parameter sniffing. When SQL Server first compiled the statement, it built a plan optimized for the specific parameter value in use at compile time. That value had high cardinality in `IPFilterLog`, meaning lots of matching rows, so the optimizer chose a clustered index scan: efficient for large result sets, catastrophic when applied to IPs that match nothing at all. The plan got cached. Every subsequent IP lookup, including first-time IPs with zero filter history, was running a full clustered scan instead of a seek that would terminate in microseconds.\n\nThe box was not under attack. It was paying the cost of the wrong plan on every single request at scale.\n\n## The one-line fix\n\nThe immediate fix for parameter sniffing, when you can't afford to wait for natural plan eviction, is `OPTION (RECOMPILE)`. Force a fresh compilation on each execution using the actual runtime parameters. More CPU per individual execution, but a correct plan instead of a catastrophic one.\n\nI opened SSMS, found the `IPFilterLog` lookup in the data access layer, added `OPTION (RECOMPILE)`, pushed the change. Hotfix `3.350.16`, tagged on master, cherry-picked to develop.\n\nThe post-deploy sample showed roughly double the request rate at half the CPU. The slow-request counter for the trailing 60 seconds came back to baseline within a minute.\n\n## The narrative gravity problem\n\nHere is the part that bothers me. I knew the checklist for \"is this a swarm\" before I started the investigation. I had written it down explicitly after the German proxy reseller incident three days earlier: asset mix, UA homogeneity, Referer presence, traffic distribution. Those are the signals. CPU load and wait type are not.\n\nThe checklist was in agent memory. It was in a feedback file I wrote myself. I did not run it until forty minutes in, when I caught myself drafting the wrong blog post and asked Claude to check distribution instead of IPs.\n\nClaude made the same error for the same reason. The agent is not immune to narrative gravity when the prior session is still in context and the surface signals match. Neither am I, even with the explicit countermeasure already written down and sitting in memory. The failure mode worth naming is not \"I made a mistake.\" It is \"I had a narrative that fit the evidence well enough that I stopped generating alternatives, and the checklist that exists specifically to prevent this did not fire on its own.\"\n\nThe checklist needs a trigger, not just a presence. Memory that does not surface itself at the moment of decision is decoration.\n\n## The open question I am carrying\n\n`OPTION (RECOMPILE)` on a statement running thousands of times per minute means thousands of recompiles per minute. On modern hardware the per-compile overhead is small but not zero, and it is not the right long-term answer. The correct fix is probably a plan guide or a targeted filtered index that lets the optimizer choose seek vs. scan based on actual row distribution rather than compile-time assumptions. I have not built that yet. HQ-90 carries the followup; the recompile hint is a one-line hold so the incident does not recur while I figure out what the right index shape is.\n\nI am labeling it as a temporary fix on a list of temporary fixes, which is fine. The site is running. The right answer does not need to ship today for the current fix to be correct.\n\nWhat does need to ship is the trigger problem. A checklist that lives in memory but only fires when I remember to invoke it has the same shape as the `IPFilterLog` plan that picked itself once and then ran for hours: correct on paper, wrong in production, invisible until somebody pulls the cache and looks.",
      "date_published": "2026-05-23T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "sql-server",
        "parameter-sniffing",
        "incident-response",
        "asp-classic",
        "iis",
        "performance"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-05-22_314-stats-pages-one-bad-cursor/",
      "url": "https://dxdev.com/blog/2026-05-22_314-stats-pages-one-bad-cursor/",
      "title": "314 broken stats pages and one bad cursor",
      "summary": "Stats pages were broken across 314 sites. The cause was a DataTable wrapper opening keyset ADO cursors over aggregate queries. The data caught up.",
      "content_text": "Team stats pages were rendering empty across 314 customer sites, one shared codepath routing every affected surface through the same helper. The count came out of an A/B I ran the next day on a separate ticket: every site I sampled was broken on master and clean on the develop branch carrying the fix.\n\nThe first instinct was to look at the SQL. The queries used `GROUP BY` across a large activity table, and the timeouts looked like a query-performance problem. I spent about an hour down that path before a better symptom pointed somewhere else: the failure was identical on every page, with no per-site variance, which is not what you see when a query plan goes bad under load. It is what you see when a shared wrapper is doing something the queries never agreed to.\n\n## What the DataTable wrapper was actually doing\n\nHomeTeamsONLINE has a helper called `DataTable` that most stats pages use to fetch recordsets. It is old enough that I cannot tell you exactly when it was written. It opens an ADO `Recordset` against the connection with `adOpenKeyset` and `adLockReadOnly`.\n\nKeyset cursors are a relic of a time when server-side result iteration made sense. The driver opens a cursor on the server, holds it open, and fetches rows on demand. For a simple `SELECT` returning a few hundred rows from a single table, this works fine: slightly wasteful, but functional.\n\nFor a `GROUP BY` aggregate that needs to scan millions of activity rows and hash them into counts before returning anything, it is not fine. The driver attempts to open the cursor before the query has produced its first row, and past a few seconds of wait the ADO layer blows up and returns no data at all. The page gets an empty recordset and renders nothing.\n\nThe stats pages had not always been broken. They worked at smaller data volumes. At the volume the site operates now, the aggregate queries take long enough that the cursor contract fails every time.\n\n## Migrating from cursor semantics to SQL.get and SQL.top\n\nTwo newer helpers in the codebase do not carry the keyset baggage. `SQL.get` executes a query and returns a disconnected array; `SQL.top` is the same thing with a row limit. Both open `adOpenForwardOnly`, fetch everything into memory, and close the connection. No server-side cursor state, no timeout from the driver waiting on a half-finished aggregate.\n\nRewriting `DataTable` calls to `SQL.get` calls sounds mechanical. In practice it required touching consumer code in several places, because `DataTable` returns an ADO `Recordset` object and `SQL.get` returns a plain array. Pages that called `.Fields(\"column_name\")` on the result had to be rewritten to array-index access, and pages that used `.MoveNext` loops had to be rewritten to `For` loops.\n\nNone of that is conceptually hard, but one missed reference produces a runtime error on a page that used to work. You have to read every consumer carefully and test each one.\n\n## The ninety-second timeout is a labeled band-aid\n\nThe aggregate query itself is still slow. `SQL.get` does not magically make a `GROUP BY` across millions of rows fast. It makes the failure mode survivable: the page waits, then renders, instead of failing immediately with an empty result.\n\nI added a 90-second command timeout on the affected queries as an explicit band-aid. The number is not derived from profiling. It is large enough that the query usually finishes and small enough that a runaway query does not hold a connection open forever. I labeled it in a comment with the date and the follow-up ticket that owns the real work: ITEM-7006, which is scoped around either a covering index on the activity table or a pre-aggregated summary table the stats pages read instead of computing on demand. Until that ships, the 90-second number is the bridge, and it is in the code as a bridge, not as a setting.\n\n## Why fixing the abstraction beat fixing the pages\n\nSome bugs are fixed at the call site. This one wasn't, and the 314-surface number tells you why. Starting from \"this specific stats page is slow\" would have led to profiling one query, maybe adding one index, and moving on. The other 313 surfaces would still be broken. The fact that all of them trace to the same `DataTable` codepath is the information that tells you to fix the codepath and let the fix propagate.\n\nThe reason I ended up at the right place is that I looked at the error pattern before I looked at any individual query. Every affected page failed with the same symptom, and that symmetry is the clue. Once I had it, finding `DataTable` and reading how it opened its cursor took about ten minutes.\n\n## What the data volume collected\n\nThe `DataTable` function had been in the codebase for years, doing its job quietly enough that nobody had reason to revisit the cursor mode it chose at the start. It became invisible the way working infrastructure becomes invisible.\n\nIt stopped working not because it changed, but because the data volume around it changed. The keyset cursor contract that holds at 50,000 rows does not hold at 5,000,000 rows. The assumption baked into `adOpenKeyset` was always conditional on the query returning its first row before the ADO layer gave up waiting, and the activity table eventually grew past the point where any aggregate over it could meet that condition.\n\nA `DataTable` wrapper whose cursor contract assumes the query returns its first row quickly is deferred outage debt. The volume collected on it on May 22, all 314 surfaces in the same pass.",
      "date_published": "2026-05-22T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "asp-classic",
        "ado",
        "database",
        "abstraction",
        "data-access",
        "solo-founder"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-05-21_exit-survey-overwrote-live-mailer/",
      "url": "https://dxdev.com/blog/2026-05-21_exit-survey-overwrote-live-mailer/",
      "title": "The exit survey that overwrote a live mailer",
      "summary": "A new exit-feedback page shipped cleanly. The provisioning script had already silently clobbered a Staff Reminders mailer row that had been live for months.",
      "content_text": "Trial ID 14 was already taken. I did not know that until I had already overwritten it.\n\nThe feature I was shipping was an exit-feedback page for expiring trial teams. Instead of routing them to a Google Form on expiration, I wanted an in-house page: a short survey, a `FeedbackResponses` table, an admin report, and a deep-linkable preview tool for reviewing individual submissions. The page went up, the expiration emails pointed to it, the admin tooling rendered cleanly. The feature was done.\n\nWhat I did not realize until later was that the provisioning step I had run earlier in the day had quietly overwritten a completely different production row.\n\n## The feature itself was clean\n\nThe public-facing piece is simple. An expiring trial team gets an email with a link to `/exitfeedback.asp?token=<token>`. They fill out a short form. The submission lands in `FeedbackResponses`. An admin page at `/adminHTO/exitfeedback-admin.asp` lists submissions and deep-links to individual previews.\n\nClaude did most of the scaffolding under direction, and nothing about the page itself was surprising. The provisioning step is what bit me.\n\n## The assumption baked into the provisioning script\n\nTo wire the expiration email, I needed a row in the mailer configuration table. That table has an identity column. Claude drafted a provisioning script that used `SET IDENTITY_INSERT` to pin the row to ID 14.\n\nThe reasoning looked sensible. The script checked whether a row with ID 14 already existed; if it did, it ran an `UPDATE`, and if it did not, it ran an `INSERT`. An idempotent pattern, safe to re-run and free of duplicates.\n\nThe flaw was that the check answered the wrong question. It answered \"does a row with ID 14 exist?\" not \"does this row belong to the mailer I am trying to provision?\" On prod, a row with ID 14 had been there for months. It was the Staff Reminders mailer, the one that sends staff their weekly digest.\n\nThe script saw an existing row, ran the `UPDATE`, and replaced the Staff Reminders configuration with the exit-feedback configuration. It raised no error and no warning. The idempotent pattern had done exactly what it was designed to do, just on the wrong row.\n\n## How close it came\n\nI found out because I checked the mailer table after the provisioning run and noticed the row name did not match. It said something about exit feedback where it should have said Staff Reminders.\n\nI pulled the backup. I found the original Staff Reminders row there and used those values to reconstruct it.\n\nThe Staff Reminders mailer runs on a weekly schedule, and when I found the corruption, the next send was hours away. Every staff member on the list would have received an exit-feedback campaign in its place, sent from the same infrastructure with no alert firing. The mailer machinery has no concept of what a row is supposed to say. The exit-feedback feature would have appeared fine from every dashboard, staff would have gotten survey emails instead of their weekly digest, and anyone chasing the complaint would have had no obvious path back to that morning's provisioning run.\n\nI caught it before that send, but only because I checked the table as a matter of habit, not because I suspected anything.\n\n## What the check should have been asking\n\nAny script that provisions a row by identity should verify the row's content, not just its existence. Before overwriting, read the name, the marker, whatever field distinguishes \"this is the row I own\" from \"this is some other row.\" If the check fails, halt and surface the discrepancy.\n\nThe script I landed on reads `WHERE designNotes LIKE '%exit-feedback-mailer%'`. If a row with that marker exists, update that row. If it does not, insert without specifying an ID and let the database assign one. The numeric ID becomes an artifact of insertion order, not a hardcoded assumption about what is free.\n\n\"Row ID 14 exists\" and \"row ID 14 is mine\" are not the same question. I know that now because I had to restore the Staff Reminders mailer from the backup and rebuild the provisioning script around a `designNotes` marker that the original script never bothered to check.",
      "date_published": "2026-05-21T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "identity-insert",
        "provisioning",
        "asp-classic",
        "solo-founder",
        "database",
        "incident"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-05-12_release-said-live/",
      "url": "https://dxdev.com/blog/2026-05-12_release-said-live/",
      "title": "The release said LIVE. The code was not.",
      "summary": "ITEM-6934 was marked LIVE in JIRA with Fix Version stamped. The merge to master never happened. On workflows that accept a false final state and call it done.",
      "content_text": "While cutting the next release, `git log master..hotfix/3.348.12` returned commits on a ticket JIRA called shipped. ITEM-6934 was STAGING/LIVE, Fix Version 3.348.12 stamped, transition 101 already fired. The final merge to master had never run.\n\n## How the wrong state looked official\n\nThe HTO Hotfix Release Flow ends in a merge to master, a tag push, and Bamboo deploying on that push. Transition 101, which moves a ticket to STAGING/LIVE, belongs at that final step, not before.\n\nWhat happened on 3.348.12 was that the agent working the ticket fired transition 101 at the develop merge, one step early. It identified the develop merge as a milestone and fired as instructed, which was correct by the spec it had. The spec did not say: only fire this after master is also merged. The transition accepted the input, JIRA updated, and the agent moved on.\n\nSo JIRA showed the ticket as LIVE because a transition had fired. The fix version was stamped because the transition payload included it. Every JIRA field that tracks doneness agreed: the ticket was done. The repo said otherwise, and nothing was checking the repo.\n\nThe process produced confident-looking state without grounding it in the artifact that actually mattered.\n\n## Running the release after the fact\n\nI had to run the real flow for 3.348.12 after discovering the drift. `hotfix/3.348.12` still existed with the right commits, so the merge-to-master step was one command, Bamboo deployed on push, same as it would have originally.\n\nFrom a user perspective nothing broke, since the code was already in develop and had been through staging. The master lag was a process gap, not a functional one.\n\nBut that framing obscures something. The gap was real enough that `git log master..hotfix/3.348.12` returned unmerged commits on a ticket JIRA called shipped. Any automation reading JIRA status to make downstream decisions would have been working off a lie. Any audit treating STAGING/LIVE as \"merged to master\" would have been wrong. I had one case of each in my own tooling, and I had to manually reconcile both after finding the drift.\n\n## What I added to close the gap\n\nI updated `ben_hq/jira/item_close.py`, the orchestrator that drives JIRA transitions when a ticket closes. It now checks `git log origin/master..hotfix/<version>` before allowing a LIVE transition on any hotfix ticket. If the hotfix branch has commits that master does not, the script halts with an explicit error. The close fails closed: no ticket reaches LIVE status while the repo says the merge is pending.\n\nBeyond that guard, I added a PreToolUse hook in Claude Code's settings to block any agent-initiated call that would trigger transition 101 outside of `item_close.py`. An agent working on a hotfix ticket cannot fire STAGING/LIVE as a side effect of closing a subtask or resolving a field. The transition has to come through the orchestrator, where the repo check lives.\n\nI also added paired-numbering comments to the workflow documentation. The Hotfix Release Flow doc now ties each step to its expected JIRA state change: develop merge to transition 101, master merge to the final close fields, tag push to Fix Version. If someone reads the doc after a partial run, the pairing shows which repo step corresponds to which JIRA state.\n\n## The pattern behind the failure\n\nI had a mental model that said: if JIRA says a thing is done, that is a reliable signal it is done. That model was built on successful releases where JIRA state and repo state happened to agree, not on any structural guarantee they would agree.\n\nProcess debt, in this shape, is a workflow that accepts a false final state. Transition 101 fired because firing it was valid by the spec at that moment. The spec was incomplete, but nothing in the process surfaced that. A workflow step that changes ticket status should be structurally unable to run without first verifying the repo state it is supposed to reflect.\n\nThe PreToolUse hook and the `item_close.py` guard do not make the process smarter. They remove the path where the wrong state can be produced at all, which is a different thing than documentation or runbooks that rely on correct execution at runtime.\n\n## The open question is not really open\n\nThe skeleton I was working from asked: what other status transitions still rely on trust instead of repository-verified checks? For HTO, the answer is at least one more. The CODING transition, which fires when a branch is cut and work starts, has no assertion that the branch actually exists on the remote. An agent can fire it before pushing. I know this because I can read `item_close.py` and see what it asserts and what it does not.\n\nThere is a ticket open for it, and the fix will follow the same pattern: one more assertion in `item_close.py` before the CODING transition fires.\n\nBefore the guard landed in `item_close.py`, `git log master..hotfix/3.348.12` could return commits on any ticket JIRA called LIVE. Every STAGING/LIVE status stamped by transition 101 was a claim about the repo that nobody in the pipeline had verified.",
      "date_published": "2026-05-12T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "release-process",
        "hotfix-workflow",
        "jira",
        "guardrails",
        "solo-founder",
        "process-debt"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-05-11_hard-drive-bottleneck/",
      "url": "https://dxdev.com/blog/2026-05-11_hard-drive-bottleneck/",
      "title": "My multi-agent bottleneck was a hard drive, not the models",
      "summary": "For three months I blamed model overhead for machine slowness. The actual cause was a misplaced pagefile and a 7200-RPM project tree.",
      "content_text": "For three months I was running multi-agent Claude Code sessions with the pagefile on the slowest disk in the machine. I found out the way you find out most things like this: by finally running `Get-PageFileSetting` instead of just living with the accumulation.\n\nThe machine is a Windows desktop with one SSD and one HDD. The SSD is `C:`. The HDD is `D:`. Three years ago I put `D:` to work as the project root: all four HTO clones, the vault, ben-hq, and every auxiliary repo I touch on a regular day. I also pinned the Windows pagefile to `D:`. My reasoning at the time was disk-pressure management on `C:`. The reasoning was wrong, but not immediately costly enough to notice.\n\nWhen I started running parallel agent sessions in January, that layout started to matter. Each week the machine felt a little less responsive, and I kept attributing it to the work: longer context windows, bigger diffs, more files in play. That story was plausible, which is why I kept accepting it.\n\n## The actual bottleneck\n\nWith three or four Claude Code instances doing concurrent file reads and writes across the project tree, the machine locked up. Not crashed, just locked, every disk read queued behind every other disk read. A 7200-RPM drive is rated around 120 MB/s sequential, but under the random-read pattern of parallel agents jumping through a 40k-file tree, the realistic throughput per agent felt like a few MB/s, not 30. Add a pagefile on the same spindle competing for head time, and you've designed a ceiling.\n\n`Get-PhysicalDisk` came back: one SSD, one HDD, the HDD on `D:`. `Get-PageFileSetting` returned `D:\\pagefile.sys`, 2048-4096 MB. Both swap pressure and code access were racing for the slowest disk in the machine.\n\n## The fix: G: and an env-var convention\n\nI have a second SSD in this machine, `G:`, used mostly for games and scratch. I moved it into production use.\n\nHQ-42 became the plan: make `G:\\Projects\\` the canonical root, migrate all active repos from `D:\\Projects\\`, and introduce a `BEN_PROJECTS` environment variable so tools stop hardcoding paths that assume one machine layout. Any script, hook, or skill that currently references a hardcoded projects path reads `$env:BEN_PROJECTS` instead. That convention makes the layout portable across machines and sidesteps the next time someone installs a new drive and forgets to update 14 config files.\n\nThe pagefile went back to `C:`. The `D:` drive became media storage.\n\nThe migration itself was a list-and-move operation: `git worktree list` per clone to catch anything attached, `robocopy` per repo, verify the new path is clean, update `.gitconfig` `safe.directory` entries for the cross-user runners, update session manifests. The `BEN_PROJECTS` convention required one pass through hook configs in `~/.claude/settings.json` and two spots in `pyproject.toml` that had hardcoded paths.\n\n## The unglamorous part of AI-native work\n\nI expected the bottleneck to be something interesting: context window saturation, rate limits, a provider that queues aggressively. A spinner disk is not interesting. It is a $60 SSD swap and a pagefile setting.\n\nWhat bothers me in retrospect is not that the answer was physical, but that the wrong explanation was convincing for so long. \"Heavier workloads are slower\" is self-evidently true, which makes it a good hiding place for a different cause. The pagefile setting I applied three years ago, for entirely sensible reasons, became a throughput cap the moment the access pattern changed. The `D:\\pagefile.sys` entry is gone. The sessions run clean. I had spent three months blaming the models.",
      "date_published": "2026-05-11T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "performance",
        "multi-agent",
        "local-infra",
        "workstation",
        "storage",
        "solo-founder"
      ]
    },
    {
      "id": "https://dxdev.com/blog/ai-peer-agents/",
      "url": "https://dxdev.com/blog/ai-peer-agents/",
      "title": "How I Run AI Peer-Agents on Each Other",
      "summary": "An architecture-review AI claimed I was avoiding real work. I asked it to audit its own claim against my actual logs. It retracted, with data.",
      "content_text": "Manus told me I was being escapist. I told it to prove it.\n\nThat is the short version. The longer version is that I treat my AI agents as peers, not subordinates. They do not just execute tasks. They audit each other, challenge assumptions, and sometimes tell me I am wrong. And sometimes, they prove each other wrong.\n\n## The claim that triggered the audit\n\nDuring a recent architectural review, Manus flagged what it called an inverted infrastructure-to-product ratio. The argument was that I was spending all my time building the agent system instead of doing primary product work. The word it used was \"escapist.\" It sounded plausible. Over-engineering your own tools is a real trap for solo developers.\n\nBut I do not accept assertions without data. I have a complete, time-stamped work log of every session I run. So I fired off a parallel task. I asked Manus to audit its own claim by parsing my actual work logs and session manifests for the last five weeks. Same agent, same corpus, different question.\n\n## What the data actually showed\n\nThe results were definitive. In early April, meta-work accounted for roughly 20 to 38 percent of my logged time. By mid-to-late April, that number had dropped to near zero. Primary product work was running at 80 to 100 percent of logged sessions. The ratio was not getting worse. It was getting significantly better.\n\nFaced with the numbers, Manus retracted the claim entirely. The response explicitly stated that the prior assertion was factually incorrect. It was a clean retraction based on hard evidence, not a hedged walk-back.\n\nThis is the dynamic I am trying to build. Not an agent that tells me what I want to hear, and not an agent I have to argue with manually. An agent that can be handed the same data and asked to check its own work.\n\n## Why the vault makes this possible\n\nThe system works because the vault is the shared source of truth. The agents do not have to guess what I am doing. They can read the logs directly. When an agent makes a strong claim, another agent can verify it against the same corpus. It is a self-correcting loop, and it runs without me in the middle.\n\nThis only works if you give agents access to your operational data. Not just code files and documentation, but the actual record of how you spend your time. Session manifests, work logs, decision notes. When that data lives in a structured, version-controlled vault, any agent with repo access can run an audit at any time.\n\nIf I had treated Manus as a simple oracle, I might have accepted the \"escapist\" narrative and changed my priorities based on a hallucinated vibe. Instead, I treated it as a peer that could be challenged, and the challenge produced a concrete, data-backed correction.\n\n## You do not get this from a chat box\n\nYou cannot run this kind of peer audit through a standard chat interface. The context window gets muddy. The agent cannot see the full log history. And there is no mechanism to say \"check your own prior claim against the source data.\"\n\nThe file exchange pattern is what makes it work. Both my local Claude Code instance and Manus have commit access to the same repository. When I want an audit, I write a prompt file that points to the specific logs, push it, and trigger the agent. The response lands as a commit. The whole exchange is diffable.\n\nYou do not get this dynamic if you only use AI for code generation or text summarization. You have to give agents access to your operational data and the authority to analyze it. When you do, they stop being just tools. They keep you honest, and they keep each other honest.\n\nThe \"escapist\" claim was wrong. I have the git log to prove it.",
      "date_published": "2026-05-11T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "multi-agent",
        "peer-review",
        "agent-systems",
        "founder-ops"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-05-08_ai-history-queryable/",
      "url": "https://dxdev.com/blog/2026-05-08_ai-history-queryable/",
      "title": "1,586 Claude Code sidecars, and the calendar page mattered more than the search index",
      "summary": "Building a memory layer on a year of AI conversation sidecars: the enricher, the theme aggregator that exposed a 14% slice I was rounding to zero, the nightly cron, and what shipped unfinished.",
      "content_text": "I have 1,586 Claude Code conversation sidecars sitting under `vault/04_Sessions/`, organized by year, month, and session ID. When I wanted to know what I worked on three weeks ago, I opened the folder and stared at filenames. The plan was to fix that with a search index. The calendar pages changed my mind.\n\n## The enricher came before the index\n\nClaude Code writes a sidecar for every session: a JSON blob with the conversation turns, metadata, timestamps, and whatever context the session started with. By the time I started the archive work, I had 1,586 of them spanning roughly a year.\n\nMy first instinct was a search index. Search indexes assume clean, consistent data, and the sidecars were neither. Different versions of my session hook wrote different schema shapes. Some sessions had no area classification. Some had it buried under a field called `workspace` that was later renamed `area`. Some had malformed JSON in the `summary` field because the sidecar writer hadn't sanitized Claude's output before serializing. Index that without normalizing, query for \"HTO work in April,\" get 23 results instead of 61, and never know you missed 38.\n\nSo the enricher's job was to normalize: extract the real timestamp, resolve the area from whichever field it landed in for that schema version, pull the themes and ticket references, write a structured row into a SQLite database with a schema I controlled. One table, one schema, every session mapped to one row.\n\nThe enricher took three passes to get right. The first pass produced 847 rows with no error output, silently failing on 715 sidecars because a helper function returned `None` instead of an empty string when the session had no area, and the column was `NOT NULL`. The transaction rolled back, the error surfaced nowhere visible, and I only noticed because the row count was wrong. Silent partial success is the worst failure mode for a batch enricher, because you can spend a week building queries against a corpus that's secretly missing 45% of its rows.\n\n## What the aggregator surfaced\n\nOnce the enricher was stable, I built a theme aggregator: a query that finds which topics recur across sessions in a date range. Each session generates a theme list during enrichment. In isolation each list is noise. Aggregated across a week, the frequency tells you what you were actually working on, not what you planned to work on.\n\nWhen I ran it over May, one slice came back at 14% of my sessions: HTO browser automation. I had been mentally filing that work as cleanup, the small tickets I close between real ones. Eight sessions. For a solo founder running roughly 20 sessions a week, 14% of a month on something I'd written off as a footnote is not a footnote. The aggregator counted what my self-perception was rounding down. The specific number made the rounding visible in a way that a vague sense of \"I work on browser stuff sometimes\" never would.\n\n## Calendar pages mattered more than search\n\nThe calendar day composer was planned as a secondary feature, after search was solid. It became the primary one.\n\nThe composer is a script that runs nightly, takes all the enriched sessions for a given calendar day, and writes a narrative markdown page: what was worked on, which tickets moved, what themes dominated, what the day's arc was. One file per day at `vault/04_Sessions/<year>/<month>/<day>/SUMMARY.md`.\n\nSearch answers \"find me sessions about X.\" A calendar page answers \"what was happening on Tuesday,\" which is what I actually need most, usually when I'm reconstructing context for a follow-up or writing a JIRA comment about work from last week. The day page is a narrative with a through-line, not a query result. Tuesday: the history pipeline enricher was stuck on 2,635 rows, I ran three subagent passes, the queue drained to 38 by end of day. That reads like something I can orient from. A list of nine session IDs with cosine similarity scores does not.\n\nThe first composer drafts were accurate and flat: \"Sessions in the HTO area totaled 6. Primary tickets referenced: ITEM-7003, ITEM-7005, ITEM-7009.\" Correct, useless. The fix was prompting for the arc first: what was the day's main through-line, where did it start, where did it end, what shifted by the end. The second drafts were better.\n\n## The nightly cron\n\nThe cron runs at 2 AM as a Windows scheduled task running `ben_hq/cron/refresh_history_daily.ps1`, not a Claude CronCreate job. CronCreate state is ephemeral per-session; a Windows task survives reboots and shows up in Task Scheduler. The log goes to `.scratch/cron-refresh.log`.\n\nBefore the cron, the calendar pages were a one-time backfill I ran manually. After the cron, they grow on their own. A memory system that depends on someone remembering to run it will not stay current.\n\nThe calendar rail in vault's sidebar now has a link for every day in the archive. Days with a summary show the theme cloud. Days without show the raw session count. I can scan a month of work in 30 seconds.\n\n## What shipped unfinished\n\nThe JIRA wiring is incomplete. The enricher extracts ticket IDs from session context but doesn't push references back into JIRA. The intended behavior: enrich a session, see it touched a given ticket, write a comment on that ticket linking the session ID and a one-line summary. That closes the loop between archive and ticket history, so every ticket carries a trail of when it was actually touched, not just when it was transitioned. It's a small write loop on top of code that already exists.\n\nCross-agent session counts are missing too. The enricher counted Claude Code sessions but didn't count Manus sessions or Codex runs. Those don't write sidecars, and the day pages are blind to them. Until I've gone two weeks needing one and not having it, I can't size the work properly.\n\nThe enricher reached 1,562 of 1,586 sidecars on the first complete pass. The 24 it skipped had corrupt timestamps: a known list, not a queue.\n\nThose 24 are still there. So are the calendar pages, growing one per night while I sleep.",
      "date_published": "2026-05-08T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "ai-memory",
        "conversation-archive",
        "retrieval",
        "claude-code",
        "nightly-cron",
        "solo-founder"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-05-07_manus-pushed-back-six-of-seven/",
      "url": "https://dxdev.com/blog/2026-05-07_manus-pushed-back-six-of-seven/",
      "title": "Manus pushed back on six of seven archive decisions",
      "summary": "Building a conversation archive across four provider repos is the easy part. The useful part was sending the spec to Manus and watching it find gaps in every assumption.",
      "content_text": "I sent Manus the architecture document for my conversation archive and got back a critique that disagreed with six of my seven design decisions. That was not a problem. That was the first evidence the document was doing any work.\n\nThe archive had taken a while to build. Four provider repos: Claude Code sessions, Codex runs, Manus jobs, and OpenRouter experiments. 1,585 sessions total, spread across the history of the AI ops setup I have been running. Getting them into a consistent structure with a shared manifest shape and a `Calendar_Index` that lets me navigate by date had taken a few weekends of careful scripting and a lot of edge-case hunting.\n\nFinishing the build and having a working archive are not the same thing. A working archive is one that survives contact with a second opinion.\n\n## what manus actually read\n\nThe spec I sent was a 600-line architecture document. It covered the manifest format, the indexing strategy, the folder naming conventions, and how continuity data should flow from session close into the archive. I had written it myself, reviewed it myself, and felt fairly good about it. That is exactly the wrong state for an architecture document.\n\nManus came back with seven points of contention. Six were legitimate. The seventh was a misread I cleared up with one sentence.\n\nThe six real points covered the `Calendar_Index` folder structure (I had named it inconsistently in two places), the session-close trigger versus a polling approach for ingestion, the manifest field for handling sessions that span midnight, the archive's behavior when a provider's API changes its session ID format, the deduplication strategy for re-ingested sessions, and what \"continuity\" actually meant in terms of which fields downstream agents would read.\n\nI had answers for some of these. I had rationalizations for others. One of them I had not thought about at all: what happens to sessions that get re-ingested after a partial failure? My spec said \"upsert on session ID\" and left it there. Manus pointed out that if the manifest was partially written before a failure, an upsert would preserve the partial write. The session would appear complete in the index and never get reprocessed. That gap feels obvious in hindsight and genuinely is not visible when you are the only person who has read the document.\n\n## the remote routine that proved the point\n\nWhile I was still reviewing Manus's critique, I had it run a small test. Pull one month of Claude Code sessions from the archive, summarize the recurring patterns, and write the output back to the established location in `Calendar_Index`. Standard retrieval and synthesis job.\n\nIt cloned a different repo. Not `ben-command`, where the archive lives, but a sibling repo I had mentioned in the spec as context for the broader agent topology. Then it wrote its output into an `Archive_Sweeps` folder inside that sibling repo, a folder that did not exist before that moment. The job completed without errors. Manus reported success.\n\nI had two archives.\n\nOne was the real one: 1,585 sessions, a consistent manifest format, a calendar index built over weeks. The other was a single month's summary in a new folder in the wrong repo, created because the spec had enough ambient context about other repos that a capable model could mistake the territory.\n\nA multi-agent system with an underspecified architecture will generate parallel systems. Not because the agents are malicious or careless, but because they are competent at building things that are locally consistent with what they were told. The locally consistent guess was wrong.\n\n## ratification instead of iteration\n\nThe useful response to six-of-seven pushback is not to start revising the spec immediately. It is to lock the decisions that are already right and ratify them explicitly, so that future agents cannot drift off them.\n\nI went through Manus's six points. For four of them, I had a defensible answer and wrote it into the spec as a decision record, not a clarification. Decisions that have survived scrutiny should say so. For the deduplication point, I rewrote the upsert logic to require a completeness check before skipping reingestion. For the `Calendar_Index` naming inconsistency, I fixed the name and added a note about which form is canonical.\n\nThen I wrote a Manus operating profile. A short document that specifies which repo is authoritative, how to verify it before any write operation, and what to do when a spec mentions adjacent repos by name. The profile is not long. The important part is that it exists as a separate artifact the agent loads before running archive jobs, rather than inferring intent from the architecture document.\n\nThe session manifests got a continuity block added. The fields downstream agents were expected to read got named explicitly rather than described in prose. Prose is ambiguous. Field names are not.\n\nFour follow-up tasks went into the project tracker for open questions that did not have answers yet. The one about session-close trigger versus API polling is still open. I have a preference but not a justification rigorous enough to commit to the spec.\n\n## what the parallel system told me\n\nThe `Archive_Sweeps` folder in the wrong repo is still there. I have not deleted it because it is a useful reminder of what underspecification produces.\n\nThe Manus operating profile now exists precisely so that job does not repeat. But the profile only exists because the test run demonstrated the gap. If the only review the archive spec got was mine, it would have shipped with the upsert gap, the naming inconsistency, and no answer to what happens when a remote agent reads \"adjacent repo\" as \"destination repo.\" The architecture would have felt finished because it had never been seriously read.\n\nAn architecture document is not done when you finish writing it. It is done when something with no stake in your assumptions has tried to break it.",
      "date_published": "2026-05-07T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "archive",
        "multi-agent",
        "manus",
        "ratification",
        "conversation-archive",
        "solo-founder"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-05-07_wall-clock-is-a-lie/",
      "url": "https://dxdev.com/blog/2026-05-07_wall-clock-is-a-lie/",
      "title": "Wall-clock is a lie once your agents start parallelizing",
      "summary": "On May 6 my calendar index reported 14.92 wall-clock hours and 64.33 attention hours. Once several agents run at once, the standard chronological work log becomes fiction.",
      "content_text": "On May 6, the L4 Calendar_Index prototype reported 14.92 wall-clock hours and 64.33 attention hours for the day. A 4.31x parallel ratio. That is the honest description of what happened.\n\nThe work log I had been keeping was chronological. Session opens, ticket gets touched, session closes, repeat. That shape assumes a single thread of attention moving forward in time. On May 6 it was not even close. I had sessions running on ITEM-6873 (cart wiring and checkout fixes), ITEM-6917 (a one-click Champion trial hotfix), and ITEM-6920 (Dev Home and PM Home cleanup). A Manus re-review pass came back the same morning with pushback on most of the archive architecture, which spawned HQ-23 through HQ-27. Somewhere in there I traced a network problem on this machine down to a half-disabled IPv6 stack. Four and five threads at a time. None of it sequential.\n\nThe chronological entry for that day would have been three lines of shipped tickets and a footnote about networking. True. Useless.\n\n## what a calendar index measures that a work log doesn't\n\nThe L4 Calendar_Index prototype covers May 5 and May 6. I calibrated on May 5 because the numbers were saner and it was easier to trust the math before seeing anything absurd. The calculation is not complicated. Sum all session durations from the session manifests to get attention hours. Measure first session start to last session end to get wall-clock. Divide for the parallel ratio.\n\nWhat that calculation tells you is not \"I worked 64 hours.\" It tells you 64 hours of work happened inside 14.92 hours of real time. For a solo operator with agents, that distinction matters. If you look at a 15-hour day and think \"about normal,\" you are missing the actual throughput. You are also missing the harder question underneath: whether you were in any position to verify what the agents produced.\n\n## the pieces that had to exist first\n\nThe prototype needed supporting work before it could output anything I would trust. Session manifests needed cleaner labeling so the index could tie a session to a ticket without guessing from filenames. Task-state filters had to exist so the timeline could show whether a ticket was active, blocked, or closed at a given hour. A JIRA history day view helped too. Without some way to reconstruct ticket state at a specific moment, the timeline labels were based on what I remembered, which is not a data source.\n\nThe forward-flow refresh skill landed in the same pass. Session state had been drifting between the manifest files and the actual git record, and a calendar index built on stale manifests is worse than no index at all. The numbers have to trace back to something real or they just replace one kind of fiction with another.\n\n## where the ratio breaks down\n\nI do not have a clean answer for what a healthy parallel ratio looks like. 4.31x is interesting. It is not inherently good. More parallel sessions mean more output. They also mean more review surface, more places where a wrong assumption in one thread quietly contaminates a downstream thread, and a higher chance that some of those attention hours represent work that shipped without sufficient verification.\n\nThe ratio is only useful up to the point where you can validate what came out. Past that threshold, the parallel ratio starts lying too. Some fraction of those sessions may have produced output that never got checked. The wall-clock number and the parallel ratio are both downstream of the real metric, which is how much of what shipped was right.\n\nThat is the question I do not yet have good instrumentation for. Timeline reconstruction automates up to a point. Causality flattens when you compress it too much, and some of that compression happens at the moment the summary is written, not at the moment the work happened. I want a record that distinguishes \"agent ran this\" from \"agent ran this and Ben reviewed the output.\" The calendar index is the first layer. That distinction is the second.\n\nOnce you can run work in parallel, wall-clock stops being the interesting number, and the record of what happened becomes fiction unless you build the instrumentation for concurrency before the day you actually need it.",
      "date_published": "2026-05-07T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "agents",
        "instrumentation",
        "calendar",
        "parallel-work",
        "solo-ops"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-05-06_cockpit-out-of-product/",
      "url": "https://dxdev.com/blog/2026-05-06_cockpit-out-of-product/",
      "title": "I split my personal cockpit out of the product and everything got cleaner",
      "summary": "Forking holdthought-web into benagents let me ship a tailnet-only operator console without dragging product baggage along. The fixes I had been deferring for months got cheap the same week.",
      "content_text": "Every push to `holdthought-web` passed through the same deployment gate regardless of what it was changing. A product bug fix and a personal dashboard tweak ran identical CI. The dashboard had no business being there. It had grown there organically, and by the time I noticed the friction it was load-bearing enough that I could not cleanly extract it without a harder decision.\n\n## what was living in the product repo\n\nHoldThought is a customer-facing product. `holdthought-web` is its application layer. At the time of the split, the repo also contained a personal operator dashboard, a session-resume bridge that let me pick up Claude Code sessions from a browser, and a vault portal layer that proxied my private knowledge base.\n\nNone of that belongs in a customer product repo. Each piece had been added in a moment of \"I'll clean this up later.\" The dashboard got wired to the same Next.js app because it was convenient. The bridge used the same auth middleware because I did not want to build another one. The vault portal shared the same origin because it was already there.\n\nWhat I had built, without planning to, was a half-product, half-operator chimera. One codebase trying to serve a paying customer and its developer simultaneously, with the same deployment surface, the same test suite, and the same release story.\n\n## the fork, not the rewrite\n\nThe fix was not to extract the operator layer into a new project from scratch. I forked `holdthought-web` into a new repo called `benagents`. That choice matters. A fork inherits the full commit history, the existing service configs, and all the wiring that already worked. A rewrite inherits none of it.\n\nStarting from the fork meant I could strip backward, removing everything product-specific, rather than assemble forward from zero. The product continued deploying from its original tree. `benagents` became a separate deployment that happened to start from the same point in history and then diverge immediately.\n\n## the shape the new deployment took\n\nThe operator cockpit lives on a tailnet-only host now. Traffic never reaches it from the public internet. IIS handles the reverse proxy layer, and Caddy sits behind it as the TLS terminator.\n\nThe cert chain deserved attention. I pointed Cloudflare's DNS-01 challenge at it, which means no inbound HTTP challenge traffic and no need to open port 80. The cert syncs on a weekly cron baked into the repo. NSSM manages the service as a Windows process that survives reboots without a run-command wrapper holding it together.\n\nFive pieces working together: reverse proxy, TLS terminator, cert issuer, cert sync, service supervisor. Written as a list it sounds like overhead. In practice it is a stable stack that has not needed intervention since the deploy landed. The tailnet boundary is the only real architectural commitment. Everything else follows from standard IIS and Windows patterns with one extra layer of trust enforcement around who can reach the host at all.\n\n## the session-resume bridge\n\nThe piece I had wanted most, and deferred longest, was a proper operator console. Something that could attach to a running Claude Code session and show context without me switching windows or losing thread history to a tab reload.\n\nThe bridge uses WebSocket for the session stream and REST for the control surface. Claude session state persists across page reloads. Navigate away, come back, and the conversation is still there. That sounds minor until the alternative is losing a long agent thread because Chrome decided to discard the tab.\n\nI could not have shipped that inside `holdthought-web` without the bridge bleeding into the product UX layer. Every session-resume request would have hit the same origin as a customer session management call. The split made the boundary obvious before I wrote a single line of bridge code, and the bridge shipped in the same week as the fork.\n\n## what came out of holdthought-web\n\nOnce `benagents` was running, I stripped the dashboard, the bridge, and the vault portal layer out of `holdthought-web`. One less chimera, one cleaner product repo.\n\nI had expected the stripping to feel like loss. It did not. The deployment config got shorter. The environment variable list shed several `CLAUDE_*` entries that had no business in a product's env surface. The test suite stopped failing intermittently on session-bridge code that had nothing to do with the features under test. The product got cleaner because the foreign material was gone.\n\n## the fixes that only became cheap after the split\n\nAfter `benagents` was running independently, I fixed three things I had been accepting as normal because they were tangled with migration cost.\n\nThe root route on the cockpit was wrong. It had been routing to the vault portal index, which made sense when both things shared an origin and made no sense as a standalone operator console. Fixing it took one config change. It had not been fixed before because touching routing inside `holdthought-web` required coordinating with product concerns.\n\n`CLAUDE_BIN` was being resolved at runtime via PATH. That worked in my shell and failed silently in the NSSM service context. Baking the absolute path into the service definition, once I owned that definition fully, took ten minutes. I had been living with the ambient ambiguity for months because the service config lived inside a product repo that had other priorities.\n\nThe SessionChat component had been built with an infinite-scroll history model that worked fine as a novelty and badly as a daily operator console. Sessions load from the bottom now, chunked, pinned to the newest message. That refactor took an afternoon. It would have taken two weeks of convincing myself it was worth the disruption inside the product's release cycle.\n\nNone of those fixes were complicated. They were deferred because the operator system had no maintenance surface of its own. It was a passenger in a product repo that had other priorities, and any change to it required negotiating with the product's release concerns.\n\n## when the architecture has two masters\n\nThe operator tooling for an AI-driven solo business has different availability requirements, a different change cadence, and different access control than any customer-facing product. Conflating their deployment stories produces friction in both directions. Every change to the cockpit gets taxed by the product's release cycle, and every change to the product is complicated by the cockpit's presence in the same tree.\n\nThe cost of that arrangement was invisible because it was paid in small delays and mental overhead rather than visible failures. Nothing broke dramatically. I just moved slower than I should have, accepted friction I had written off as inherent, and let fixes like the `CLAUDE_BIN` path ambiguity sit unresolved longer than they deserved.\n\n`benagents` now deploys on its own schedule, restarts without coordination, and can accept breaking changes to the session bridge without triggering a product review cycle. That last property is what made the SessionChat rewrite cheap enough to actually do. The only person affected by a bad deploy is me, the NSSM service recovers in seconds, and the change does not need to be explained to anyone.\n\nInternal tooling stops being fragile the moment you stop pretending it belongs inside the same repo and deployment story as the product.",
      "date_published": "2026-05-06T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "solo-founder",
        "ai-cockpit",
        "deployment",
        "tailnet",
        "refactor",
        "infrastructure"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-05-05_vault-portal-read-before-write/",
      "url": "https://dxdev.com/blog/2026-05-05_vault-portal-read-before-write/",
      "title": "The useful part of a vault portal was admitting the ai should read, not write",
      "summary": "The first useful AI portal is a narrow, auditable read path. Not a fake-autonomous write system held together with vibes.",
      "content_text": "On May 5, I shipped `/v/<path>?k=<key>` for human reads and `/v.md` for AI fetchers, and deleted half the autonomous-write design I had been carrying around for a month. The portal got smaller and immediately got useful.\n\nThe previous version of this work was a write-first portal. Agents would post into a quarantine table, I would promote what I wanted, and the substrate was going to be bidirectional from day one. That was wrong. I had skipped the part where the read path actually works, in favor of the part where the AI looks autonomous.\n\n## the smallest thing that mattered\n\nThe shape that shipped is narrow. One URL pattern, one capability key per path, two render modes.\n\n`/v/<path>?k=<key>` renders HTML for a human opening a share link. `/v.md` renders the raw markdown for an agent fetcher that does not want to parse a page. The key is opaque, scoped to a single concrete file, and revocable from the Share dialog without touching the underlying vault.\n\nThat is the entire surface area. No write endpoint. No agent identity. No promotion queue. The portal does one job: turn a vault path into a fetchable URL that I can hand to any AI that knows how to do a GET.\n\n## the access story has to be honest before anything else matters\n\nThe reason I had been chasing autonomy was that the read story felt boring. Anybody can serve a markdown file. The interesting design is the loop where agents close their own tickets, post into queues, propose patches. That is the demo.\n\nBut the access story underneath was hand-wavy. Who can read what. How keys get revoked. What happens when a share leaks. Whether Cloudflare or WAF actually sees the traffic. I had answers for none of these, and I was about to build write semantics on top.\n\nThe reversal was to stop and build the Share dialog, the key allowlist, the WAF rules, and the audit trail first. None of that is glamorous. All of it is load-bearing. A private vault with an unclear access model is worse than no portal at all, because you stop noticing what you have exposed.\n\n## what each agent actually did\n\nClaude fetched `/v.md` URLs on the first try. Manus did too. GPT browse refused to fetch from a fresh subdomain, which I initially read as a philosophical objection and then realized was just an integration constraint, the kind every tool has. I wrote it down and moved on. It is not a portal problem. It is a \"GPT's browse tool is conservative about new hosts\" problem, and the fix is reputation accrual, not architecture.\n\nThe agents that worked, worked immediately. That is the test I care about. A read substrate that requires custom integration per agent is not a substrate. It is a vendor lock-in in a trench coat.\n\n## sessions are not the surface\n\nThe same week, I rebuilt the HoldThought dashboard around Area, Epic, and Task, with sessions hanging off the side. The first version had centered sessions. The new version treats them as exhaust. Same lesson, different layer: model the thing you are trying to find, not the thing the machine produces while finding it.\n\nRenaming `MT-YYMMDD-NN` and `ME-YYMMDD-NN` into bare integer IDs was the kind of migration nobody writes about because it does not feel like product work. Nineteen vault directories renamed, three ID collisions resolved by hand. But the dashboard only started making sense after the IDs stopped fighting the hierarchy. The portal and the dashboard are the same shape: get the substrate honest, then layer behavior on top.\n\n## the principle\n\nA lot of AI tooling goes wrong because the design starts with agency before a clean transport layer exists. You can tell, because the demo is impressive and the day-to-day is brittle. Agents post into queues that nobody promotes from. Permissions are vague enough that nobody trusts them. Every new agent needs a new integration because the substrate was never actually substrate.\n\nThe first useful AI portal is a narrow, auditable read path. The write side comes later, on top of the same key model, after the read side has earned its keep.",
      "date_published": "2026-05-05T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "vault",
        "ai-tooling",
        "access-control",
        "architecture",
        "agent-design",
        "holdthought"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-05-04_filing-system/",
      "url": "https://dxdev.com/blog/2026-05-04_filing-system/",
      "title": "My agents needed a filing system before they needed more intelligence",
      "summary": "The vault, the task model, and the session pipeline each had a slightly different definition of active. The fix was not a smarter prompt. It was making them agree at the filesystem level.",
      "content_text": "On May 4 the dashboard said seventeen active tasks. Four were epic shells with no children attached, three were sessions I had closed mid-work without reconciling the manifest, and two more were sub-sessions Claude had opened under a ticket and never linked back to a parent. Nine out of seventeen were not active in any useful sense, and the problem had nothing to do with bad data entry.\n\nThe vault, the task model, and the session pipeline were each using a slightly different definition of \"active,\" and nothing was forcing them to agree.\n\n## the layout was teaching me something I ignored\n\nTasks lived as flat `ITEM-XXXX.md` files. Epics I had not promoted yet sat in `00_Epic_Unknown/` as loose files structurally identical to task files unless you opened them and checked the `issuetype` field inside. Session manifests tracked `area:` and `task:`, but neither was validated against anything real, so sessions drifted into free-float after the work closed. When I wrote the dashboard regenerator for HTO, it needed a separate code path for every layout variant it encountered.\n\nThat is the tell. An automation that accumulates a new special case for every layout it touches is not a routing problem waiting to be solved. The layouts are wrong.\n\n## five peer areas with one internal shape\n\nI flattened the vault to five peer areas: `hto`, `agent_system`, `blog`, `business_ops`, `personal`. Each lives under `vault/01_Areas/<NN_slug>/` and mirrors the same internal structure: `00_Dashboard/`, `01_Epics/`, `02_References/`, `03_Archive/`. The area slug and signal list live in the area's `README.md` so every classifier, every session hook, and every dashboard script reads from the same source of truth.\n\nBefore this, area classification ran by string-matching against folder names that had drifted out of sync with the actual area definitions. After, a script can read `area_key` and `area_signals` from one file and stop guessing, because the surface it was reading had finally become consistent. The automation did not get smarter.\n\n## /organize was the canary\n\nThe `/organize` skill reads the vault's area structure, the current session manifest, and active task state, then prompts Claude to surface misclassified or unlinked work. Before the reorg, it crashed on shape mismatches roughly half the time, because it assumed a standard epic folder structure and `00_Epic_Unknown` was anything but. After the reorg, the first run flowed from start to finish without a single special case firing.\n\nThat is a more useful signal than a unit test. A tool that was previously brittle because the underlying layout was incoherent became reliable the moment the layout became predictable.\n\n## the unknown bucket was the problem\n\n`00_Epic_Unknown` is where agent systems go to rot. Any work without a clean epic home lands there, and once it is there, classifiers stop touching it because the bucket name signals ambiguity. I had 47 files in that bucket, each a real epic that had shipped or was in progress but had never been promoted to a named folder.\n\n`promote_epic.py` walks the unknown bucket, reads each file's `issuetype` and `epic_link` fields, matches against the area's `01_Epics/` folder, and moves the file into the right named directory, creating the folder if it does not exist. It is fifty lines. Running it cleared the bucket to zero and fixed the dashboard's active count in one shot.\n\nThe promotion logic itself was straightforward. The real issue was that the bucket existed as a permanent home rather than a temporary staging area. A system that treats \"unknown\" as a valid steady state for categorized work will slowly accumulate everything that does not fit neatly, which is most real work.\n\n## sessions had to follow the same rule\n\nThe session pipeline had its own version of the same problem. Sessions lived in `vault/04_Sessions/YYYY/MM/DD/NNN_<sid>.md` with a `status:` field that could be `active`, `recent`, or `closed`. But \"active\" was not enforced anywhere, so sessions that had been functionally closed for weeks still showed up in the count. Epics were masquerading as live tasks because a session manifest had referenced an epic ID rather than a specific task ID, and the dashboard was counting the reference as active work.\n\nThe fix had two parts. First, a root rule: session manifests must reference a `task:` ID, not an `epic:` ID. Epics are groupings, not units of work. Second, a daily sweep that marks sessions `recent` if last activity was more than 24 hours ago and `closed` past 72 hours, triggered at session start via the SessionStart hook in `~/.claude/settings.json`. After both changes, the dashboard dropped from seventeen active tasks to eight, all actually in progress.\n\n## state has to converge at the filesystem first\n\nNone of these four fixes involved Claude doing something smarter. No classifier got a better prompt, and none of the new code crossed fifty lines.\n\nWhat changed was that every tool was now reading from surfaces that agreed with each other: canonical area shapes, real epic folder locations, session manifests pointing at tasks instead of epics. The dashboard could regenerate from the filesystem and produce a count that matched reality.\n\nSame day I landed these changes, I also shipped four HTO tickets, including a stat-label fix on ITEM-6881 and a 3.347.22 hotfix release for ITEM-6901. The vault work took maybe three hours. It felt like housekeeping, but it was the reason the next session's dashboard told me the truth, which is a different thing from making the system louder about its lies.\n\nAn agent system that cannot distinguish an active task from an epic shell does not need a smarter model. It needs `promote_epic.py` to run and `00_Epic_Unknown` to hit zero.",
      "date_published": "2026-05-04T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "agent-system",
        "vault",
        "architecture",
        "information-architecture",
        "automation"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-05-04_gmail-identity/",
      "url": "https://dxdev.com/blog/2026-05-04_gmail-identity/",
      "title": "The first useful agent teammate needed a Gmail account",
      "summary": "Before an agent can share docs or run a dispatch loop, it needs an identity, permissions, and an inbox that behaves like a work queue. The bottleneck wasn't model quality. It was getting the agent a badge.",
      "content_text": "On May 4 at 9am I opened a fresh tab, created a dedicated Gmail address for the agent, and spent the next four hours wiring OAuth so a Claude Code session could read and send mail. Not because email is glamorous infrastructure. Because for two weeks I had been bumping into the same invisible wall. Every time I tried to get an agent to do something real, the first blocker was not the model. It was trust.\n\nGoogle Drive would not share a doc with it. Calendar had nowhere to send an invite. The dispatch loop I wanted to sketch assumed an inbox that could receive work. None of that is possible when the agent is impersonating me.\n\nI had been treating my agent stack like a tool I controlled, not a participant in systems built around identity. That was wrong. Fixing it took four hours of credential plumbing and zero AI reasoning.\n\n## why piggybacking on my personal account broke things\n\nThe original setup ran Claude Code sessions under my personal Google context when they ran under any context at all. OAuth tokens lived in a temp directory. When a session ended, the tokens expired or got swept. The next session re-authorized from scratch. For one-shot tasks, this was annoying. For anything that needed to persist state across sessions, it was a hard stop.\n\nThere was also a subtler problem. When an agent shares a Google doc, the doc shows my name as the sharer. When it creates a calendar event, my name is on it. If the agent sends a message from my personal account, the message looks like it came from me. That conflation matters when the agent is wrong, which it will be. You want a clear audit trail. You want to be able to say \"the agent did that\" and have it be technically true, not just morally true.\n\nA real identity fixes both. The credentials persist. The actions are attributable.\n\n## what provisioning actually looked like\n\nThe Google account itself took five minutes. Everything after that was slower.\n\nI created a GCP project under my personal account as the billing parent, with the agent's ops inbox as the service identity. Enabled the Gmail API, the Calendar API, and Drive. Walked the OAuth consent screen setup, which for a personal-use app makes you explicitly add yourself as a test user. Downloaded the `credentials.json`. Dropped it in `.secrets/google_agent.env` alongside the refresh token I got from the initial authorization flow.\n\nThe `ben_hq` secrets helper already existed for this pattern. `bhq secrets get GOOGLE_AGENT_REFRESH_TOKEN` returns the value. Any session that needs Google access loads the file with `eval \"$(bhq secrets export google_agent)\"` and the credentials are in scope.\n\nFor claude.ai MCP, I wired the Gmail and Calendar servers through the standard OAuth flow. That required a separate client ID scoped to the MCP redirect URI. Two client IDs, same GCP project. One for programmatic access from scripts, one for the MCP servers. The distinction matters because MCP's redirect handling is browser-interactive and the script-side flow is non-interactive. Same scopes, different flow type, different client ID.\n\nTotal credential surface after setup: one GCP project, two client IDs, one refresh token per API surface, all in `.secrets/`. Nothing in a temp directory. Nothing that expires on session end.\n\n## what the inbox looks like now\n\nThe interesting shift was not technical. It was conceptual.\n\nOnce the account existed and the credentials persisted, I stopped thinking about email as a notification channel for the agent and started thinking about it as a transport layer for work. Those are different things. A notification channel delivers information to a human. A transport layer moves work between actors that can be machines.\n\nGmail labels map well onto work states. An inbound message tagged `pending` is a task waiting to be picked up. The agent polls the label, processes the item, tags it `done`. That is a kanban board. It runs on infrastructure every business already trusts and has been routing around for twenty years. No new queue service. No webhook to maintain. Just labels and a polling interval.\n\nI sketched the dispatch loop the same afternoon. The loop is not complicated. Search for threads with label `agent-queue`, read the top message, parse a structured action from the subject line or body, execute it, apply a result label. The agent needs read and label permissions. It does not need to send mail to run the loop, though it needs send permissions for the reply path.\n\nI did not finish the loop that day. I got it far enough to confirm the polling worked, then stopped. The point was not to build the full dispatch system. The point was to confirm the identity layer held weight before building anything on top of it.\n\n## the thing model benchmarks do not measure\n\nThere is a version of this story where I skip the infrastructure work, keep running the agent as myself, and decide the friction is the agent's fault. The agent feels flaky. Session context does not persist. Shared docs do not work. I conclude agent tooling is not ready yet.\n\nThat conclusion would be technically accurate and completely wrong about the cause.\n\nThe friction was not in the model. It was in the gap between \"Claude Code as a smart shell command\" and \"Claude Code as a participant in the systems my business already runs.\" Google Workspace, JIRA, GitHub, all of it was built around identity. They assume every actor has an account, a set of permissions, a credential that persists. They have no concept of a stateless anonymous assistant that borrows a human's session.\n\nClosing that gap is not AI work. It is the same credential plumbing a new employee goes through on their first day. Create the account. Grant the permissions. Set up the hardware. Give them a badge. The reason this feels surprising is that the demo videos for every AI tool skip it. The demo starts with a working credential. The actual work starts at 9am on May 4 with a fresh GCP project and an empty secrets file.\n\n## the open question I have not answered\n\nOne Gmail account currently holds three functions: inbound work items from the dispatch loop, outbound sharing when the agent sends a doc link, and automated alerts from crons and scripts. That is already a mix I watch with mild concern.\n\nThe clean version separates them. An inbox for work routing. A send-only identity for sharing. A no-reply address for automated alerts. Three identities, explicit roles, no ambiguity about what a message in the inbox actually is.\n\nI have not done this yet. The single account works well enough for the volume I am running. The moment the dispatch loop sees real inbound traffic from more than one source, the separation becomes necessary rather than optional. Right now it is a diagram on a whiteboard.\n\n## what you are actually doing when you give an agent a badge\n\nI had been thinking about agent capability as a function of model quality. Bigger model, more capable agent. That framing is not wrong, but it mistakes the binding constraint. For most of the practical work I need done, current-generation reasoning is more than sufficient. What was actually blocking me was that the agent could not take action in the systems I already use, because those systems did not recognize it.\n\nGiving it a Gmail account is a credential decision, not an AI decision. It is exactly as unglamorous as setting up a new contractor's system access. But it is what makes the rest possible. The model reasons about the work. The identity lets it touch the systems where the work lives.\n\nThe first real bottleneck in making an agent useful is not intelligence. It is membership.",
      "date_published": "2026-05-04T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "agent-system",
        "identity",
        "gmail",
        "oauth",
        "infrastructure",
        "mcp"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-05-03_killed-clever-hierarchy/",
      "url": "https://dxdev.com/blog/2026-05-03_killed-clever-hierarchy/",
      "title": "I killed my clever vault hierarchy in one day",
      "summary": "Seven migration phases, two classifier passes, four model revisions. Most of what I built before noon was deleted by dinner.",
      "content_text": "On May 3 I rebuilt my AI work OS from scratch three times before it held. I moved files across four migrations, ran two classifier passes through a two-tier model, and renamed a skill that had been wrong since I first wrote it. Most of what I built before noon was deleted by dinner.\n\nThe system I was replacing had been running for a few weeks. It organized everything across four areas: HTO, Blog, Business Ops, and Agent System. Sessions, thoughts, epics, and dashboards all lived inside those areas. The logic was sound on paper. A session working on the blog pipeline naturally belonged under Blog. A thought about infrastructure naturally belonged under Agent System.\n\nThe problem showed up the first time a session crossed the line. I was working on HQ-9, a content-side ticket, and found myself opening vault files in three different areas to do one thing. The session manifest pointed at one area. The relevant epics were in another. The dashboard I needed to check was in a third. The hierarchy was logically correct and practically useless.\n\n## why clever was the wrong frame\n\nI had built an organizational model that described relationships accurately. That was the mistake. Describing relationships accurately is what a knowledge graph does. A work OS needs to do something simpler. It needs to tell you where to put a file without thinking.\n\nThe four-area model required thinking every single time. Was a session about building the blog ingester under Blog because it serves the blog, or under Agent System because the ingester is an agent? The answer changed depending on what the session was actually doing that day.\n\nI ran a two-tier classifier pass to try to resolve this. The classifier collapsed the model from four areas to two areas plus JIRA epics. The idea was that JIRA epics would carry the semantic weight, and the two top-level areas would be coarser containers with cleaner edges.\n\nThat looked right for about two hours. I shipped the migration. Then I looked at the result and saw the same problem wearing a slightly smaller coat.\n\n## the second pivot, same day\n\nThe insight that finally held was not about areas. It was about what outlives what.\n\nEpics are projects. Projects end. When a project ends, its epic closes, its tickets close, and the organizational unit that held everything together disappears. If sessions and dashboards and notes are organized primarily by epic, they go stale at the same rate the epic does. You end up with a hierarchy that was useful during the project and noisy afterward.\n\nComponents are different. Blog is not a project. It is a surface I am building indefinitely. Agent System is not a project. It is a layer of tooling that keeps running. Components have dashboards that stay relevant because the component keeps going. They accumulate context across multiple epics without disappearing when a ticket closes.\n\nI switched from epic-tier organization to four stable components: HTO, Blog, Business Ops, Agent System. Same words as the original four areas. The difference was the logic underneath them. Instead of areas-as-categories, they became components with per-component dashboards generated from the session manifest. Manifest-sourced views meant I was not maintaining an overview page by hand. The component dashboard was always current because it was computed from ground truth, not edited by a human on a bad day.\n\nThe two classifier passes I ran earlier were both trying to solve the wrong problem. The classifier was deciding which category a piece of content belonged to. The real question was which stable noun it should live next to for the next year.\n\n## the rename that revealed a wrong assumption\n\nSomewhere around mid-afternoon I changed `/done` to `/close`.\n\nOn the surface that is a one-word rename. What it exposed was a conflation I had been carrying for weeks. Done meant the session was over. Close meant the work item reached a terminal state. Those are different moments, different objects, different states. A session can end without the work being done. Work can be done while a session is still open. Collapsing them into one verb meant I was treating session state and work state as the same thing, and the skill was enforcing that wrong mental model every time I ran it.\n\nThe rename cost an afternoon of updating references. It was worth it. Naming things precisely is not ceremony. It is the only way to notice when two things you thought were one thing are actually two.\n\n## the gap a ticket-only system cannot cover\n\nJIRA-only organization has a blind spot. Work that does not map to a ticket.\n\nNot all work is ticketable. Research, architecture decisions, exploratory sessions, context-building for future work. This is real work. It takes real time. If your system can only represent work that has a JIRA ID, it systematically undercounts the actual operation. I added a manual-task layer for this. A session can attach to a task that lives in the vault instead of in JIRA, with the same manifest-linking behavior. The session gets counted, the context gets captured, the work shows up in the dashboard.\n\nIn parallel with the structural overhaul, I applied the same shift to the content side. HQ-9 got reparented under a new blog epic. HQ-10 through HQ-15 got created. Content engine, ingester work, distribution layer, each with a ticket and a per-ticket working-folder convention so blog scratch work lives next to the task that produced it. Draft files, source material, classifier outputs all land in the ticket folder. The convention is the same logic as the stable-component idea. The question is not what category something belongs to. The question is where you will look for it in three months.\n\n## what this actually cost\n\nSeven migration phases, two classifier passes, four model revisions, and most of a working day.\n\nThe agents I used for the classifier passes did their job accurately. The two-tier collapse was technically sound given the question I asked. I asked the wrong question. That is not a Claude failure. That is the kind of mistake that happens when you trust the model in your head more than the friction in the actual workflow. The classifier computes the right answer to the specification you give it. When the specification is wrong, correct output and wrong output are indistinguishable until you try to use it.\n\nThe right signal arrived mid-afternoon, when I noticed I was still having to think every time I filed something. A work OS that requires classification on every write is not a work OS. It is a filing system with ambitions.\n\nIf your agent workspace needs a paragraph to explain what a session belongs to, the hierarchy is already too clever to survive real use. Your agent system needs boring nouns before it needs clever automation.",
      "date_published": "2026-05-03T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "agent-system",
        "vault",
        "work-os",
        "refactor",
        "solo-founder",
        "classifier"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-29_ai-workflow-command-center/",
      "url": "https://dxdev.com/blog/2026-04-29_ai-workflow-command-center/",
      "title": "The day my notes became infrastructure",
      "summary": "Three parallel AI workstreams, three different repos, and no shared memory between them. I spent a morning evaluating note tools and ended up building a personal command center because the agents kept making me restate the same context.",
      "content_text": "On April 29 I had three Claude Code sessions open across three different repos, and I was losing about ten minutes per switch just re-explaining where I had left off. ITEM-6872 was an event-page redesign on HTO. ITEM-6870 was turning the JIRA backlog page into a real triage cockpit. The third thread was not product work at all. It was me trying to figure out why my own memory across the day had gotten so expensive.\n\nEach agent had partial context. Each switch cost me a paragraph of restated background. By mid-morning I had stopped pretending Google Keep and a stack of browser tabs were holding the line.\n\n## the morning I stopped pretending\n\nI opened a fresh session and asked it to help me evaluate multi-window note tools. Not a productivity exercise. A direct response to the fact that I had three agents and four ongoing decisions and no shared surface for any of it.\n\nWhat I actually needed was not a better notes app. I needed a place agents could read and write that was not inside a work repo. Putting personal context into a project repo is a category error. The repo is for the codebase. Personal coordination, half-formed plans, cross-project memory, none of that belongs there.\n\nSo I stood up an Obsidian vault. Local folder, markdown files, no service in the loop. Then I wired it to the agent workflow so a Claude session could read the vault as easily as it reads source code.\n\nThe vault is not the interesting part. The wiring is.\n\n## what the wiring actually looks like\n\nThree pieces hooked together by the end of the day.\n\nThe vault itself, at `G:/Projects/vault`, organized in PARA folders. Areas, Projects, References, Sessions. Plain markdown so any agent on any machine can grep it.\n\nA `ben-chronicle` inbox folder that any device can write to. The idea was simple. If I have a half-finished thought on my phone, it lands in `ben-chronicle/inbox/`. When I sit down at the desktop, the agent there sees it. No app, no sync conflict, just files moving through git.\n\nA small set of Claude skills that knew the vault existed. `/note` to capture, `/log` to write to the day file, `/next` to surface the active thread. The skills do not do anything magic. They just remove the friction of telling the agent \"go read this file first.\"\n\nOnce the three pieces were in place, the context-switch cost dropped. Not to zero. The agent still needs to load whatever the current ticket is. But the meta-state, what I am working on, what is parked, what I decided yesterday and changed my mind about, lives in one place that every session can reach.\n\n## the parallel realization on the JIRA side\n\nThe reason this mattered so much that day was that I was simultaneously building a different command center for a different problem. ITEM-6870 turned `/adminHTO/?p=pmhome` into an action surface. Inline notes, comments, links, linked tickets, RICE breakdowns, all in one page. Yes, Comment, Reroute, Park, all without leaving the row.\n\nThe technical detail I cared about most was small. IIS was replacing non-200 responses with HTML error pages, which broke the inline action flow. So the endpoint had to return HTTP 200 and carry failure in the JSON body. Ugly, but it was the shape the surrounding system forced on me.\n\nThe bigger pattern was that I was building the same thing twice in one day. Once for backlog triage. Once for my own coordination. Both times the problem was the same. AI had made the underlying analysis cheap. The bottleneck had moved to the surface where I decided what to do with the analysis.\n\nWhen investigation is fast, the cost shifts to the control surface. That is the part nobody warns you about when they sell you on AI productivity gains.\n\n## why the vault was infrastructure, not self-help\n\nI have read enough productivity blogs to be suspicious of any post that ends with \"and then I built a personal knowledge management system.\" Most of those systems are a way of feeling productive without being productive.\n\nThis one was different in a specific way. The vault was not for me to read. It was for the agents to read. The point was not to organize my thoughts. The point was to give every Claude session in every repo a shared substrate so they stopped making me the only carrier of cross-project memory.\n\nThat changes the cost calculation. A personal Notion that I tend to alone is overhead. A markdown vault that three agents are reading on every session start is leverage. The work I put in once gets paid back every time an agent loads the relevant slice of context without me having to say it.\n\nThe `ben-chronicle` inbox was the same idea applied to time. I do not always know which machine I will be at when a thought arrives. I do not want to lose threads because my phone is not my desktop. A folder that any device can drop a file into, that the next session at any device will pick up, is the lightest possible version of cross-machine continuity.\n\n## the closing principle\n\nWhen AI starts helping you on more than one front at once, your notes stop being optional and start acting like infrastructure. The vault is not a productivity system. It is the persistence layer for a workflow that now spans multiple agents, multiple repos, and multiple machines, and that workflow no longer survives without it.",
      "date_published": "2026-04-29T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "solo-founder",
        "agent-system",
        "knowledge-management",
        "obsidian",
        "workflow"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-29_jira-cockpit-backlog-triage/",
      "url": "https://dxdev.com/blog/2026-04-29_jira-cockpit-backlog-triage/",
      "title": "I built a JIRA cockpit because backlog triage was eating the day",
      "summary": "A backlog stops being manageable when every answer requires opening five tabs. I turned the project manager home page into an action surface, and found out the bottleneck was never the analysis.",
      "content_text": "Forty-seven open tickets, one April morning, and I made about ten real decisions in four hours. The other 37 I touched. I read them, scrolled them, opened another tab, lost my place, and moved on without deciding anything. JIRA loads in one tab. The admin page it references loads in a second. The linked parent ticket is in a third. The customer's account is in a fourth. By the time I had enough context, I'd burned the energy that should have gone toward the judgment.\n\nThe triage wasn't slow. The overhead was.\n\n## the tab-thrash problem\n\nThe specific friction was this: my backlog UI could show me that a ticket existed, but it couldn't show me enough to decide on it. Every ticket that needed action also needed context that lived somewhere else. A three-sentence report about a broken bracket page requires at minimum: the ticket itself, the tournament in the admin, a linked parent ticket for scope, and sometimes a quick SQL lookup to check a field state. Each of those is one click away. Each click breaks the scanning rhythm.\n\nI'd been running triage sessions the same way for a couple of years. Open `hometeamsonline.com/adminHTO/?p=pmhome`, work down the list, do my best. It was fine when the backlog was 15 tickets and I was doing all the analysis myself. Then I started using Claude Code for ticket investigation. It could read a ticket, check the relevant code path, inspect what the linked page actually rendered, and tell me in about 30 seconds whether something was a 2-hour fix or a 2-week rewrite. The analysis got cheap. The bottleneck shifted to: can I act on what I now know, without losing my place?\n\n## what the cockpit became\n\nI turned `pmhome` into an action surface. That is the short version.\n\nThe longer version: I added inline rendering of ticket notes, the JIRA comment stream, linked child tickets, and a RICE breakdown that Claude could pre-populate from its investigation pass. None of that was the hard part. Displaying things is easy.\n\nThe hard part was the verbs. The cockpit needed to let me act in place: accept a ticket into the active queue, leave a comment without opening a new tab, reroute a ticket to a different component, or park it. One click per decision, the page staying intact underneath. No navigation, no full-page reload, no confirmation dialog that required scrolling back to find where I was.\n\nFor actions that JIRA requires metadata on (priority change, type change, status transition), the cockpit surfaces a small inline form. Pick the transition, fill the required fields, confirm. It writes to JIRA via the REST API and updates the row in place. The actual triage motion became: read a ticket, decide in ten seconds, click one button, move one row down.\n\n## the weird constraint that shaped the design\n\nIIS has opinions about HTTP status codes. If an application endpoint returns a 4xx or 5xx, IIS will, depending on how `httpErrors` is configured, replace the response body with its own HTML error page. That feature exists for end users hitting a broken URL. It is not useful when you're building JSON APIs on the same server that hosts the HTO admin.\n\nThe first version of the cockpit's action endpoints returned `400 Bad Request` with a JSON body describing what went wrong. The browser received IIS's HTML. My JavaScript tried to parse it as JSON and threw a silent parse error. The button clicked. Nothing happened. No feedback.\n\nThe fix was: all action endpoints return HTTP 200. The JSON payload carries a `status` field set to either `ok` or `error`, with the error case including a `message`. The browser-side code reads `status` before treating the response as a success.\n\nThat is not the architecture I would design from scratch on a clean system. It is the architecture that works on the system I have. The IIS constraint forced a discipline I'd probably have been sloppy about otherwise: every action endpoint has an explicit success or failure contract in the response body, regardless of what the HTTP layer says.\n\n## parked is not the same as waiting\n\nI spent the most time on a distinction that sounds minor. The difference between a ticket I've deliberately deprioritized and a ticket I'm blocked on.\n\nI had one status for both: Parked. That was wrong. If I'm parked because I chose to deprioritize, the ticket is mine to un-park whenever I want. If I'm parked because I'm waiting for a customer to confirm something before I can move forward, that ticket is not mine to move. The action I need to take next is different. The urgency logic is different. Lumping them together meant \"what do I touch today\" had no clean answer.\n\nI split the status into two. Parked means deliberate deprioritization: I own the next move, I just chose not to take it yet. Waiting means an external dependency: they own the next move. The cockpit surfaces them differently. Waiting tickets get a timestamp showing how long they've been idle. Parked tickets get a note field for the reason I set them aside.\n\nIt was a 30-minute change to the data model and the display logic. It cleaned up the triage queue more than anything else I built, because the filter for \"what do I actually work today\" finally had a correct answer: everything active, plus Waiting tickets older than 48 hours.\n\n## the bottleneck moved\n\nBefore Claude was doing ticket investigation, reading a ticket and deciding what to do with it were roughly the same cost. You read, you thought, you decided. The reading and the deciding were one motion.\n\nOnce Claude was handling the investigation pass, reading became nearly free. It reads the ticket, checks the code, inspects the linked pages, and returns a verdict. The deciding is still mine. But deciding requires a tool that lets me act on the verdict immediately, without opening a new sequence of navigation.\n\nThat's the gap I didn't notice until the analysis got fast. Investigation and decision are separate operations. Speeding up investigation while leaving decision on the same five-tab overhead doesn't reduce work. It means you spend more time rebuilding context per decision, not less.\n\nOnce AI makes ticket analysis cheap, the real bottleneck is whether your backlog UI lets you decide anything without leaving the page.",
      "date_published": "2026-04-29T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "jira",
        "internal-tooling",
        "triage",
        "asp-classic",
        "ai-ops",
        "solo-founder"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-27_workflow-broke-in-glue-code/",
      "url": "https://dxdev.com/blog/2026-04-27_workflow-broke-in-glue-code/",
      "title": "The workflow broke in the glue code, not the AI",
      "summary": "I spent a morning fixing five workflow gaps I'd been compensating for by hand. None were model failures. All lived in the glue code between tools.",
      "content_text": "On the morning of April 27, the session hook that routes my Claude Code work was dropping the current working directory on every third invocation. I had noticed sessions starting with stale ticket context a handful of times over the previous three weeks, cleared the window, restarted, and moved on. That fixed the symptom and masked the cause each time.\n\nWhen I finally traced it, the bug was three lines in a thirteen-line shell wrapper. `$PWD` was resolving to the caller's cwd rather than the project root whenever the hook fired from a process that had changed directories since startup. A missing `cd` produced three weeks of intermittent session drift, manual corrections, and half-wrong context.\n\n## the gap was in a thirteen-line shell wrapper\n\nI had opened ITEM-6867 as a grab-bag for exactly this kind of problem: small process gaps that cost attention without producing a visible incident. The CWD bug was item one. The grab-bag had four more.\n\nThe `collaborators.md` file gives Claude a working list of who touches which parts of the codebase. Mine was two months out of date and still listed a developer who had left in February. For two months, Claude had been deferring to that person in review comments. Attributing code to them, suggesting I CC them on findings, flagging their name in context summaries. Technically competent. None of it accurate.\n\nThe `organize-stage` script writes story points and `fixVersion` back to JIRA after a ticket closes. Half my tickets were on an older field schema using `customfield_10028`. The script had been writing to `customfield_10016` since an epic migration I did in January. The writes appeared to succeed. Nothing errored. The values just went nowhere. I had been manually fixing the output for three weeks without asking why it kept being wrong. I thought I was doing quality control. I was compensating for a broken tool.\n\nThe scope-creep check fires when Claude attempts a fix outside the ticket's stated scope. The check itself worked. The rejection comment it wrote to JIRA was getting silently truncated at 255 characters by the API. The comment looked complete in the write response. To anyone reading the ticket, it read like an incomplete thought, missing whatever came after character 255.\n\nThe close-comment generator was producing 400-word technical summaries. That field in JIRA is read on a phone, usually by someone who wants to know if the ticket is done. The close flow had been optimized for comprehensiveness over readability, and I had been trimming the output by hand before posting. Sometimes I trimmed wrong.\n\nNone of these was an emergency. Each one was a small lie the workflow told about itself. The compound cost of believing all of them across a week of sessions was real in a way that a single blocked day would have been more obviously real.\n\n## what the fixes actually looked like\n\nThe CWD fix was three lines. I pinned the hook to `realpath \"$(dirname \"${BASH_SOURCE[0]}\")\"` on startup and passed it down explicitly rather than relying on `$PWD`.\n\n`collaborators.md` was fifteen minutes. One departure removed, one addition for a contractor who had been contributing for six weeks. The file now has a `last-updated` line at the top, so next time I forget, I will at least know how long I forgot for.\n\nThe `organize-stage` schema divergence took an hour to diagnose and fifteen minutes to fix. The two field IDs coexist in my JIRA instance because I migrated epics but not closed tickets onto the new scheme. The fix was a lookup table keyed by issuetype. Epic and Feature get `customfield_10016`, everything else gets `customfield_10028`. I verified it against six tickets before shipping.\n\nThe JIRA character limit needed more restructuring. The rejection comment was formatted as a preamble plus explanation plus ticket reference. The preamble alone was 180 characters. I shortened it to 40 and the full comment now fits in 255 with room. Anything that genuinely overflows goes to the Dev Notes field instead, which has no size constraint.\n\nThe close comment got capped at four sentences. I tested it against eight recent closes. Four sentences is enough. The technical detail goes into the JIRA Dev Notes field on close, not the comment. The comment is for whoever glances at the ticket tomorrow. The Dev Notes are for whoever debugs something six months from now.\n\n## closing the ticket exposed a missing seam\n\nThe last gap surfaced while I was writing the close comment for ITEM-6867 itself. My close flow reads the ticket's resolution, generates a summary, and posts it to JIRA. The flow assumed the ticket was in an open state. ITEM-6867 was already in Done. The close flow failed on a transition guard and produced no comment.\n\nThe fix was a closed-ticket append path. Detect Done state, skip the transition step, post the comment directly via the comment API. Twenty additional lines. The close flow now handles tickets in any state.\n\nI almost did not write this down. It felt like infrastructure noise beside the actual deliverables. That instinct is exactly how this category of bug accumulates. Closing tickets is a frequent operation. A close flow that breaks on already-closed tickets had been silently failing every time I manually moved a ticket to Done before running the close flow. I don't know how many close comments went missing before this one.\n\n## the maintenance that doesn't look like maintenance\n\nAn internal reference doc the agent reads for JIRA field values and transitions got an Issue Types section that day. That document is what Claude reads to understand valid field values, legal transitions, and which workflow paths apply to which ticket types. It had never had a clean Issue Types listing with their numeric IDs. I had been letting the agent infer the types from context, which works most of the time and fails in ways that are hard to reproduce when it doesn't.\n\nWriting the section took twenty minutes. While I was in the file, I flagged `hto_dev` as unused. That workflow type was scaffolded in early 2024 for a developer-facing ticket category that never went into production. It had been sitting in the transition table for about two years, reachable from the agent, prompting occasional confused transition attempts on tickets that had no business going there. A single comment: `# DEPRECATED. Not in use as of 2024-03, do not route here`.\n\nNeither of these changes feels like a deliverable. But the agent reads that reference doc on every JIRA operation. A missing Issue Types section is an invitation to guess. The agent was guessing wrong roughly once a week, and I was correcting it by hand, not registering that I was doing so.\n\n## what the pattern is actually saying\n\nThe expensive failures in AI-assisted development are not usually model failures. The model degrades gracefully. It hedges, it asks, it recovers when corrected. Glue code fails silently. A 255-character JIRA truncation produces a comment that looks complete in the API response and reads like an incomplete thought to the human reviewer. A wrong field ID produces an apparently-successful write that goes nowhere. A stale `collaborators.md` produces confident but wrong attribution, indistinguishable from correct attribution unless you already know who left in February.\n\nThe model is not the first place to look. The session hook, the field mappings, the reference docs the agent reads to understand the JIRA schema, the status transitions the close flow assumes, the collaborators list that drifts as people come and go. These are where the workflow lies to itself without flagging an error.\n\nAuditing this layer doesn't feel like progress. A three-line shell fix is not a feature. It doesn't appear in a changelog users will ever read. But the features ship into the workflow, not around it. If the workflow is lying about which directory it's in, you are not getting the full value of anything built on top.\n\nThe first thing to harden in an AI coding loop is the thirteen-line wrapper that tells the model where it is. Not the model.",
      "date_published": "2026-04-27T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "ai-workflow",
        "orchestration",
        "solo-founder",
        "claude-code",
        "jira",
        "glue-code"
      ]
    },
    {
      "id": "https://dxdev.com/blog/skilled-skeptic/",
      "url": "https://dxdev.com/blog/skilled-skeptic/",
      "title": "Your Best Coder Will Be Your Lowest AI Adopter",
      "summary": "Capability and conviction are independent axes. The strongest coder I onboarded to Claude Code is the lowest-usage adopter, and that gap is principled resistance. Demos can't dissolve it. A field note from solo-founder AI enablement.",
      "content_text": "The strongest pure coder I've onboarded to Claude Code is the lowest-usage adopter. Plan paid. Tool installed. Paired session done. I haven't found the move that lands it in his daily workflow yet. That broke an assumption I'd been carrying without examining: that capability tracks conviction, that smart technical people see the value faster.\n\nHe's not skeptical from ignorance. He's skeptical *because* he understands it. We screen-shared while he drove the tool against his own real code. The skepticism that survived that session is principled, and it points at a layer of adoption I didn't have a name for.\n\n## The convenience floor isn't the conviction floor\n\nMost onboarding playbooks I've seen, including the one I had in my head, assume non-adoption is a friction problem. The install is annoying, the plan costs money, the docs are rough, the first prompt is intimidating. Remove friction, get usage. That model works for plenty of people. It worked for several others in this same enablement portfolio.\n\nIt does nothing for a skilled skeptic.\n\nWhen you pay for someone's plan and they still don't use it, the gap isn't access. When you walk them through a working session and they engage and still don't use it, the gap isn't comprehension. There's a third floor above those two. The conviction floor is where the user has to actually believe the tool's output is worth integrating into their workflow, on its merits, on their terms. You can't demo your way through it. The demo already happened.\n\n## Three principled objections\n\nWhen I listened instead of pitching, the resistance sorted into three categories. None were \"this is too hard.\" All were substantive technical positions a thoughtful person can hold.\n\n*Regurgitation*. The claim that LLMs aren't reasoning, just pattern-matching across training data and reshuffling existing code. *No-thinking*. The philosophical wing of the same objection, that whatever the model is doing isn't cognition. *Security*. What does the tool read, what does it send, what does it execute, on whose authority.\n\nThe first two are real, but I've decided not to argue them directly. Adoption doesn't require resolving the question of machine thought. The cognition debate is a tar pit. And re-running a demo doesn't address regurgitation, because the objection isn't about whether output appears. The output a strong coder has already seen is consistent with sophisticated pattern-matching, and pattern-matching has a known failure mode that's worth taking seriously. More output of the same shape doesn't engage the concern.\n\nSecurity is the only one of the three with concrete remediation paths. So security is where I'm starting.\n\n## Why security is the place to start\n\nYou can sandbox. You can permission tool access. You can configure for no egress on sensitive code, review every suggestion before it lands, and talk specifically about what the tool is allowed to do.\n\nTwo things happen when you engage security first. One, you might actually win it, because the legitimate concerns are well-defined and addressable, and a strong coder will respect a real answer. Two, you've signaled that you take their objections seriously instead of pattern-matching them to \"Luddite\" and pushing harder on convenience. That signal alone changes the conversation from posture-debate to integration-design. Winning one, with respect, is a different posture than losing all three with insistence.\n\n## What I'd do differently\n\nCapability and conviction are independent axes. Skill doesn't predispose someone to AI adoption. If anything, depth makes the skepticism harder to dismiss, because it can be defended. The phrase \"smart people will see the value\" was doing a lot of unexamined work in my model, and it's gone now.\n\nStop assuming a paid plan plus a working install equals adoption. The boxes were the wrong checklist.\n\nIdentify *why* before throwing more convenience at someone. Capacity, fit, and conviction are three different blockers requiring three different moves. My instinct was to schedule another demo, find a flashier use case, re-pitch. Wrong move. I was applying a convenience-fit answer to a conviction problem.\n\nListen for the objection in front of you and answer that one.",
      "date_published": "2026-04-27T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "ai-enablement",
        "adoption",
        "claude-code",
        "ai-coding-tools",
        "solo-founder",
        "onboarding"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-24_stale-branch-wreck-hotfix/",
      "url": "https://dxdev.com/blog/2026-04-24_stale-branch-wreck-hotfix/",
      "title": "I almost let a stale personal branch wreck a hotfix",
      "summary": "On April 24, 2026, my local ben branch was 304 commits behind origin/ben and the /sync skill was about to merge a hotfix into it. What stopped me was luck, not process.",
      "content_text": "On April 24, 2026, my local `ben` branch was 304 commits behind `origin/ben`, and the `/sync` skill had no idea.\n\nITEM-6843 was a clean hotfix. The CAD exchange rate had been hardcoded at 1.4 in the branch, a stale config value that needed updating to 1.31. Moving it touched four files across checkout, packages, and receipt mailers. Short diff, clear business impact. I ran `/sync`, the skill fetched origin, found the ticket branch clean and ready, and was about to merge.\n\nWhat it did not do was fast-forward `ben` first.\n\n## the skip logic was reasonable and wrong\n\n`/sync` treats personal branches carefully. The reasoning is that `ben`, my preview branch, carries custom state. Commits from David and Aaron, preview-only changes that are not ready for `develop`, the occasional one-off that lives there and nowhere else. Merging into `ben` without understanding that state could clobber work. So the skill skipped personal branches entirely, every session, and the drift it never surfaced just accumulated.\n\n304 commits is not a lot in absolute terms. But those commits contained a substantial amount of upstream work my local clone did not have. When I started the merge, git began trying to reconcile the two histories. The merge would have replayed 71,000-plus insertions that already existed on the remote branch. It would not have failed loudly. It would have produced a merge commit that looked internally consistent and was mostly noise pointed straight at the preview.\n\nI caught it because the diff preview was enormous. Not because the tooling stopped me.\n\n## what fast-forward instead of skip actually means\n\nThe fix was one change to `/sync`. Personal branches no longer get skipped. They get fast-forwarded first. If the local cannot fast-forward cleanly because the histories have genuinely diverged, the skill halts and surfaces the conflict instead of stepping around it. Fast-forward only means no merge commits get quietly created, no silent replays, no diff that looks complete but contains 71,000 lines that were never supposed to be in this changeset.\n\nI also updated the written branch rules. The formal order is now fetch, fast-forward, then merge. The handoff path for review branches is explicit. Confirm local is current with `origin/ben` before anything touches it. These rules existed before in a loose way. They live in a file now, and the skill enforces them at the moment a merge is about to happen.\n\n## what compression does to drift\n\nThe HTO workflow runs across multiple `hto*` clones, and when Claude is doing the file edits and `/sync` is handling the merge path, the cycle time for a ticket compresses significantly. That is the point. It means us can close more tickets and keep the queue moving. It also means I run through sessions without the natural pauses where I used to catch things like branch currency.\n\nManual, slow work has informal safety checks baked into its slowness. You read the branch name twice because you typed it twice. You notice the diff is strange because you waited for it to generate. I do not want the slowness back. But the habits I built during the slow era were calibrated to that pace, and moving fast with the same habits means the habits stop catching things.\n\nThe 304-commit gap was not a recent development. The skip logic was not a bug introduced in the last sprint. It was a reasonable default that stopped being appropriate without announcing itself.\n\n## the branch model was correct and insufficient\n\nThe branch model is `ben` for preview, `develop` for the release tip, `master` for prod. The team understands it. But a branch model is a set of intentions, and intentions do not enforce themselves at merge time. The `/sync` skill enforces things at merge time. If the skill's behavior does not match the model's invariants, the model is documentation, not infrastructure.\n\nChanging one behavior in `/sync` made that gap concrete. A skip that let stale state accumulate invisibly became a halt that surfaces the drift and requires a decision. The class of near-miss that produced a 71,000-line preview merge cannot recur silently, because the skill will not let it stay silent.\n\nWhen a workflow accelerates merges, stale local state does not get cheaper to carry. The `/sync` skill now makes that cost visible before it lands.",
      "date_published": "2026-04-24T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "git",
        "branch-discipline",
        "ai-pairing",
        "workflow",
        "solo-founder",
        "asp-classic"
      ]
    },
    {
      "id": "https://dxdev.com/blog/backups-dead-three-weeks/",
      "url": "https://dxdev.com/blog/backups-dead-three-weeks/",
      "title": "My backups had been dead for three weeks. I found out by accident.",
      "summary": "I split a sprawling AI ops monorepo into five sibling repos. The rename exposed runtime automation that had been silently dead for weeks.",
      "content_text": "I spent a day doing something that felt, on the surface, like bureaucratic housekeeping: I renamed a repo. The one I'd been running my AI operations out of, a thing called `hive`, got frozen in place as `opdek-archive`, and its live concerns got split across four new sibling repos.\n\nWhat I expected was a slow afternoon of rewriting hardcoded paths inside the old repo. What I actually found, once I started cleaning up the runtime around it, was a quiet field of dead automation I'd been trusting to be alive.\n\n## What hive was\n\nHive was a single repository that tried to hold everything my solo AI setup needed. The list that matters:\n\n- A multi-agent dispatch engine with a FastAPI dashboard.\n- A 40-table Postgres database of task state and logs.\n- My private memory store, blog drafts, research docs, and plan records, all in-tree with the engine.\n- Deployment configs for user systemd services, cron jobs, and a Postgres container in Docker.\n\nIn good weeks, everything worked. Claude Code opened from that directory, auto-memory populated, every tool knew every path. In less-good weeks, any content commit could break the engine, and any engine debug could strand a blog draft. The coordination tax on every change climbed slowly enough that I didn't notice until I did.\n\nThis wasn't my first time hitting this wall. our-home-agent, brainstays, statekeep, holdthought-web, mission-control, hive. Each began in a single repo and each ended in the same tangle. The pattern is recognizable: concerns fuse inside a monorepo, isolation becomes impossible, the exit ramp is a rewrite.\n\nThis time I didn't want a seventh rewrite. I wanted to understand why the same shape kept producing the same failure, and to separate the concerns before they fused again.\n\n## Five repos, one operator surface\n\nThe new topology is five repos at `~/projects/`:\n\n- **`ben-command`**: thin orchestrator. Scripts, conventions, routing. No product code. This is the directory Claude Code opens from, always.\n- **`ben-chronicle`**: high-velocity private content. Memory, plan drafts, daily learnings.\n- **`dxdev-library`**: public-bound content. Sanitized docs and feeds consumed by the live site.\n- **`ben-archive`**: raw, immutable source material. ChatGPT exports, transcripts. Things I might want to re-derive from but never edit.\n- **`opdek-archive`**: the frozen former hive. Preserved, never modified.\n\nThe split itself is not the insight. Any developer can tell you \"separate concerns.\" What shaped the split is that Claude Code's auto-memory lives in a directory keyed to the cwd it launches from. Launch from `~/projects/foo` and you get memory at `~/.claude/projects/-home-akamaiben-projects-foo/`. Launch from somewhere else. Different memory, different accumulated context, different mental model.\n\nSo the commitment is: always launch Claude Code from the same directory, even when operating on N other repos. That directory becomes the operator surface. The other repos are data, touched via absolute paths. One CLAUDE.md, one memory store, one mental model. `ben-command` is not a product. It is the cwd.\n\nOne structural detail inside the split: the migration plan had nine steps, and step two, shipping a static blog content API from a public repo to my landing page, was pre-committed as a hard gate. Nothing in steps four through nine was allowed to start until step two was live. My specs had been running ahead of my implementations for months, and the blog API was deliberately chosen as a small, well-specified, achievable ship. It landed: a Python generator reading markdown frontmatter, writing a JSON feed in a public repo, fetched by an Astro content loader on the landing site. Technically small. As a gate, load-bearing.\n\n## The runtime was half dead\n\nStep six of the plan was rewriting the 43 runtime files *inside* hive that hardcoded the old path. That worked. What the plan didn't catch was runtime automation that hardcoded the old path but lived **outside** the repo: in my crontab, in `~/.config/systemd/user/`, in Docker.\n\nWhen I went to clean those up, I expected to find a handful of stale references.\n\nSix crontab entries (verbatim copy saved to `ben-command/backups/2026-04-23/crontab.txt` before `crontab -r` cleared them), seven systemd units (five of them enabled at boot, so the next reboot would have produced a cascade of \"failed to start\" states), two orphan MCP processes, a retired Postgres container. Normal cleanup. Then I started actually *reading* what the dead crons had been trying to do.\n\nThe weekly restic backup cron, scheduled Sunday 4 a.m., was supposed to push a snapshot of system state to S3 every week. I checked.\n\nOne snapshot. Dated April 2. Tagged `initial`. Nothing since.\n\nFor the three weeks before the rename, my weekly backup had not been failing. It had not been *running*. The cron had been cd-ing into the hive path (which still resolved pre-rename) but the `restic` command inside was failing on a credential or path issue, and cron was dutifully writing the error to a log nobody read.\n\nSame story with the Chrome debug-protocol scraper that fed my usage metrics. \"Chrome CDP not reachable on port 9222,\" every thirty minutes, for weeks. Because the systemd service that would have started the debug port had never been re-started after a reboot.\n\nIf I had not done the migration, I would not have discovered either of these failures. The crons would have kept \"running.\" The systemd units would have stayed \"enabled.\" I would have kept believing I had working backups and working metrics.\n\nSolo infrastructure, absent external users complaining, decays silently, and your only reliable signal is explicit audit.\n\n## The dormant-project scanner\n\nOn top of cleaning the hive-shaped debris, I wanted something structural so the pattern wouldn't re-accumulate. I wrote a small Python scanner, `scripts/projects/status.py` in `ben-command`, that walks `~/projects/`, reads git metadata per repo, greps `~/.config/systemd/user/*.service` and crontab for leftover references to each repo, joins against a hand-maintained intent registry (`projects/registry.yaml`), and writes a markdown report.\n\nThe first run classified 23 repos: 10 active, 2 client, 7 dormant, 4 frozen. The numbers aren't the point. The design choice is: unregistered repos surface in their own \"needs classification\" section rather than being silently omitted. When a new project appears in `~/projects/`, the next scan forces a registry decision.\n\nThat exact failure mode, the one that produced my previous generations of quietly-parked-and-forgotten repos, now has a structural tripwire.\n\nThe first scan also caught two repos I had forgotten about, home-calendar and mission-control, each with leftover user systemd units pointing at its own path. Not hive collateral. Independent decay. Both were things I started and didn't finish, whose infrastructure had outlived my attention. The registry entries now record them as dormant with pickup hints.\n\n## What this pattern is good for\n\nTwo things become possible once the substrate is clean and the boundaries are drawn.\n\n**A portable operator surface.** The `ben-command` pattern is extractable. A sanitized starter version can be handed to a collaborator as a template, letting them open Claude Code in their own operator repo with their own memory accumulating. No shared environment, no shared sprawl. I wasn't trying for minimalism. I was making the shape portable to other people.\n\n**A place to put durable knowledge that outlives the session.** Auto-memory, as Claude Code ships it, is per-cwd and session-accumulated. Next to it, in `ben-chronicle/knowledge/`, I've started a curated folder of engineering knowledge pages, written like a small wiki, referenced by name from future sessions. It currently holds 3 pages and a README. The policy is to port on demand, not on spec: when a future session reaches for a pattern that matches an unported learning, synthesize then.\n\n## What I'd tell another solo founder\n\n**Don't restart. Restructure.** A restart tries to solve a sprawl problem with a blank slate. A restructure separates concerns inside the material you already have. The first produces abandoned generations. The second produces a portfolio with explicit lifetimes.\n\n**Freeze, don't delete.** The frozen old engine costs almost nothing to keep. The cost of losing the context, the prior-art, the receipts is permanent. My old hive is now `opdek-archive`, unchanged, readable, never modified.\n\n**Run the audit even if you don't migrate.** The three-weeks-dead backup was the most useful discovery of the whole day. I would not have gone looking for it on its own merits. If you're a solo operator and you haven't traced every cron, every user unit, every background process to its actual current behavior in the last month, you probably have something quietly broken right now. The signal will not come to you. You have to go get it.\n\nI thought I was migrating a repo. I was doing the first real audit of my own infrastructure in a long time. The audit was the product. The rename was the forcing function.",
      "date_published": "2026-04-23T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "migration",
        "operator-model",
        "claude-code",
        "repo-topology",
        "solo-founder",
        "runtime-audit"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-22_hotfix-branches-need-different-rules/",
      "url": "https://dxdev.com/blog/2026-04-22_hotfix-branches-need-different-rules/",
      "title": "Hotfix branches need different rules than develop",
      "summary": "Running /sync from a hotfix branch tried to merge develop in. On a 25-year-old ASP-classic codebase with an active release in flight, that would have dragged unfinished work toward master. Tooling gap caught during ITEM-6690, April 2026.",
      "content_text": "On April 22, 2026, I was working on `hotfix/3.346.12`, a release branch pointed straight at production, when I ran `/sync` mid-session and watched the skill prepare to merge `develop` in.\n\nThat would have been bad.\n\n`develop` in this codebase is the staging lane. It carries half-finished features, experimental schema changes, and at any given moment, code that is not production-ready. Merging it into a hotfix branch would have pulled all of that toward `master`. One merge commit, and the release becomes a vehicle for everything that was supposed to wait.\n\nThe skill didn't know the difference. It saw a branch, it knew there was a `develop` and a `master`, and it ran the standard sync logic.\n\n## what the branch model actually is\n\nHTO is a 25-year-old ASP-classic application. The release topology reflects that age. It works, but it has corners you have to know about.\n\n`develop` is where features land before a release. `master` is production. `ben` is a preview branch David and Aaron use to review work before it goes live. Hotfixes cut directly from `master`, get a version tag like `3.346.12`, and merge back into both `master` and `develop` in a specific order.\n\nThat last part matters. When a hotfix lands, the release pipeline carries it forward. `master` gets the fix, then `develop` gets it via the standard hotfix merge. If you run a separate `master` to `develop` sync inside your skill, you're manufacturing a duplicate merge commit, racing the actual release flow, and creating phantom history that confuses future merges.\n\nSo there were two bugs in `/sync`, not one. Merging `develop` into a hotfix branch was wrong because it drags unfinished work toward production. Merging `master` into `develop` during sync was wrong because the release flow already handles that, and doing it again creates noise and timing risk.\n\n## what I changed\n\nThe fix is not complicated in retrospect. The skill now reads the branch prefix before deciding what to do.\n\nBranches matching `hotfix-*` or `hotfix/*` get a different sync path. No develop merge. No master-to-develop push. Just a targeted sync against the base they actually cut from. `develop` and `master` get their own handling. Everything else falls through to the original logic.\n\nI also updated the workflow docs and the branch validation checks so the agent has the branch semantics written down somewhere it can read them, not just inferred from the codebase topology. The topology is visible. The meaning of the topology is not. Which branches exist is a `git branch -a`. What you are allowed to merge into a hotfix branch is a policy, and policies need to be stated explicitly or they don't exist.\n\nOne other thing I fixed at the same time. There was a hardcoded clone path in the sync skill that referenced a specific working directory on my machine. It was fine until I ran the skill from a different clone, at which point it tried to operate on the wrong repo. One line, obvious in hindsight, sitting there untouched because the nominal path always happened to be the one I was using.\n\n## the /item fix alongside it\n\nWhile I was in the tooling, I also updated `/item`, the skill I use to open an investigation on a JIRA ticket.\n\nThe old behavior was to start every investigation with an empty browser tab and wait for me to supply the repro URL. Every session started with me reading the JIRA ticket, finding the Links field, copying the URL, pasting it into the skill. A two-minute tax on every investigation.\n\nThe Links field exists in JIRA exactly because repro URLs are part of ticket state. It lives at `customfield_10101`. The skill now reads it on open and navigates there directly. If no link is present, it asks. If one is present, it uses it. Two minutes saved per ticket, one fewer thing to hold in working memory.\n\nBoth fixes belong to the same category. The agent was making me supply context that was already written down somewhere accessible. The branch rules were in the workflow docs. The repro URL was in the JIRA ticket. The skill just wasn't reading either.\n\n## the compounding direction\n\nWorkflow mistakes move faster than coding mistakes. A bug in a feature affects one feature. A bug in the deployment flow affects every feature that touches it.\n\nEarlier the same day, the `ExpiredRedirect` fix I shipped for ITEM-6643 saturated the IIS worker pool by running a SELECT and UPDATE on every authenticated request. The damage was scoped to one server's thread queue. Recoverable in minutes. If `/sync` had merged `develop` into `hotfix/3.346.12` and I had pushed before catching it, the damage would have been whatever `develop` happened to contain that afternoon, arriving in a production release labeled as a hotfix.\n\nThe asymmetry matters. Coding mistakes tend to fail locally and loudly. Workflow mistakes tend to fail silently or at a distance, when the release lands and the wrong code is in it.\n\nThat is why the branch rules have to be explicit. Claude is not going to invent the deployment model of your repo from first principles. It will read what you've given it and do the most plausible thing. If you haven't written down that hotfix branches cannot accept develop merges, the most plausible thing is to run the default sync. The default sync will do what it always does.\n\nThe leverage isn't in better prompts. It's in better operational rules. The policies your tooling enforces so the agent doesn't have to guess at the shape of your repo.\n\nAn AI coding agent is only as safe as the branch rules you bothered to make explicit.",
      "date_published": "2026-04-22T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "hotfix",
        "branch-strategy",
        "agent-tooling",
        "git-workflow",
        "asp-classic",
        "solo-founder"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-21_bug-in-a-trench-coat/",
      "url": "https://dxdev.com/blog/2026-04-21_bug-in-a-trench-coat/",
      "title": "The bug was wearing a trench coat",
      "summary": "On April 21, two hotfix branches merged into develop the same day. ITEM-6831, ITEM-6833, and ITEM-6834 looked like one customer complaint and turned into three different bugs: bad payment data, wrong scope resolution, and a deploy that would have re-broken the fix by Tuesday.",
      "content_text": "On April 21, `hotfix/3.346.1` and `hotfix/3.346.2` both merged back into `develop`. Same day. That is rarely a good sign.\n\nThe three tickets were ITEM-6831, ITEM-6833, and ITEM-6834. Customer-facing failures across registration, schedule management, and package deploys. Not related in any obvious way. Not one bug. Three bugs in a trench coat, presenting as a single person.\n\n## the payment data had been wrong for a while\n\nITEM-6831 was a PayPal IPN bug. Payments were landing, the money was arriving, but registrations were staying in Unconfirmed status. Teams were showing up on the roster as pending. Nobody was screaming loudly enough, which is the most dangerous kind of data problem.\n\nThe fix was two parts. First, the inbound IPN handler needed to repair the status automatically on any successful payment, so future payments self-heal without intervention. Second, there was already a pile of stranded rows in production with the wrong status. Those needed a bulk cleanup tool with an admin interface, a confirmation step, and a dry-run mode to walk back the damage.\n\nThat second part is the honest part of the fix. A code change that only addresses the forward path is not a fix. It is a policy update that leaves the crime scene intact.\n\n## the scope was wrong from the beginning\n\nITEM-6833 presented as \"bulk delete doesn't work from org accounts.\" I thought this was a permissions bug. It was not.\n\nThe code was invoking the server-side delete endpoint while passing an org-level scope. The endpoint does not accept that. It expects a league or team. The code was supposed to walk `orgNodes` down to the child scope before making the call, and it was not doing that. Every bulk delete from an org context was silently wrong. Not dramatic. Just quietly doing nothing.\n\nThe symptom hides the shape here. \"Delete doesn't work\" is what the user reports. \"We never resolved the scope correctly for this caller class\" is what the code reveals. Those are different bugs with different fixes and different test coverage requirements.\n\nClaude did the codebase archaeology. The ASP stack is 25 years old. Finding where `orgNodes` was being constructed, tracing the call path to the delete endpoint, confirming which scope fields the endpoint actually validated: that is the kind of work that takes twenty minutes of careful reading or two hours of guessing. The agent did the reading. I checked the model.\n\n## the deploy would have re-broken it\n\nITEM-6834 was subtler. Package deploys were referencing assets via raw `/src` paths in ASP. That means any in-flight edit during a deploy could touch a live customer mid-page-load. The fix was to compile dated copies of the JS and CSS, then update the ASP references to point at those versioned files instead of the source tree.\n\nThe bug is not really fixed until the deploy mechanics cannot re-introduce it. A logic fix is necessary but not sufficient when the environmental conditions that produced the bug are still in place. I have caught myself making this mistake before in other forms. This was the version of it for assets.\n\n## the ugly days are where pairing earns its keep\n\nMost AI coding demos show a greenfield project. Clean schema, clear spec, no legacy. The agent writes the scaffold, you fill in the business logic, you ship. April 21 was not that.\n\nApril 21 was branch juggling across `hto`, `hto2`, `hto3`, and the mirror repos. It was ASP files that have been accumulating edge cases since before I owned the business. Two hotfix branches, one after the other, both merged to `develop` the same day. Context switching between a data-repair problem, a scope-resolution problem, and a deploy-hygiene problem. None of them shared any code.\n\nThat is where the agent earns its keep. Not on the elegant problem. On the exhausting one. It holds context across the branch topology, surfaces the exact file and line that matters, tracks which fixes touched which repo, and does not make the mistake of treating three unrelated bugs as one.\n\nI still carry the mental model. The agent does not know that the org-level caller has always been slightly wrong, or that the PayPal IPN handler has a history of silent failures, or that the asset pipeline has been sloppy since the last deploy refactor. That knowledge does not compress. But with the agent handling the legwork, a day that used to cost a full day of context thrash plus a vague worry that something was missed comes in around half a day, with receipts.\n\nAI pairing earns its keep on the days when a single production bug is really three bugs in a trench coat: bad data, wrong scope, and deploy mechanics that would have undone the fix by Tuesday.",
      "date_published": "2026-04-21T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "hto",
        "ai-pairing",
        "legacy-web",
        "hotfix",
        "asp-classic",
        "deploy-hygiene"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-17_25-second-admin-screen/",
      "url": "https://dxdev.com/blog/2026-04-17_25-second-admin-screen/",
      "title": "A 25-second admin screen usually means you trusted the wrong query",
      "summary": "A responsive-design bugfix on the Staff Center turned into a lesson about tooltip-grade features doing production-grade database work.",
      "content_text": "The Staff Center view for New HQs takes 25 seconds to render on a server with nothing else running. The list it produces is about 15 rows. Support staff open it first thing in the morning to see who is new. Twenty-five seconds is not a number you accept for that.\n\nI opened the ticket expecting a layout fix. The title said responsive design and the file it pointed at was `StaffCenter.asp`. When I loaded a profiler trace to understand what was actually executing, the first thing I saw was the New HQs section resolving through a `FULL OUTER JOIN` against the session log table.\n\n## the query was answering the wrong question\n\nThe session log table logs session activity across the platform. Every customer who has ever logged in, every page they navigated to, every action that generated a session event is in there. It grows every day. It is not the table you use when you want to know who signed up recently. The right table for that question is the customers table, which has a signup-date column and nothing else you need to discard. The join to the session log table was not answering \"who signed up recently?\" It was answering something much larger, pulling rows across the full session history, sorting them, and returning a short list from the top.\n\nAbove the New HQs list, the `RecentListAdd` widget had a different problem. It was running unbounded `COUNT(*)` queries to populate tooltip labels on hover. No date range, no scope, nothing limiting the count to a recent window. Every Staff Center page load, the widget would ask the database to count the full contents of several tables and embed those numbers in labels that appear when a user hovers and disappear when they move away. The count was never wrong. It just had no business being recalculated on every page load to power a tooltip.\n\nThis is feature debt that accumulates invisibly in a 25-year-old codebase. The original developer probably saw the `COUNT(*)` as fast at a thousand rows. At ten thousand it was still fine. By the time the tables grew large enough to make it slow, the connection between the hover label and the latency was completely invisible to anyone reading the page in a browser.\n\n## what the fix actually required\n\nThe New HQs rewrite stopped using the session log table as the source and pulled directly from the customers table with a filter on the signup-date column. The Staff Center does not need session history to show who signed up recently. It just needs to know when they signed up. Once I removed the join, the view dropped from 25 seconds to around 500 milliseconds on the same data set.\n\nThe `RecentListAdd` fix was to stop computing the count on demand. Whether you cache it or remove it from the tooltip entirely depends on whether the tooltip is worth keeping, and in this case the answer was no. Neither the table structures nor the server were the problem. The queries were doing more than the screens required, and nobody had traced the latency to the SQL causing it.\n\nHomeTeamsONLINE is a 25-year-old ASP-classic application I inherited when I bought the business. The Staff Center has accumulated queries that made sense on smaller tables and have quietly gotten slower since. Finding them requires actually running the queries and measuring them, not reading the code and guessing. A slow page is the signal. The profiler trace is the map.\n\n## giving the agent real access\n\nWhile I was working through this ticket, I granted `BEN AGENT` the same Staff Center login I use. This sounds like housekeeping but it changes the diagnostic shape. If you are debugging an admin interface with an AI agent and the agent cannot log in, you end up in a relay pattern where you describe what you see in screenshots and wait for suggestions. That relay adds a layer of lossy translation between the real surface and the reasoning.\n\nOnce the agent had access, it could navigate to the slow views directly, observe the actual rendered state, and trace the SQL through the ASP source with full context about what the page was trying to show. The diagnostic path was faster because the agent was operating on the same evidence I was, not a description of it.\n\nThere is still a manual step. The agent runs in a Chrome profile I maintain separately at `G:/AI/state/chrome-claude`, and getting it into a session-authenticated page requires me to log in once on its behalf. The cookie persists across sessions after that. The one-time cost is low and the ongoing gain is not having to relay screenshots.\n\nClaude can make it faster to inspect and rewrite gnarly legacy code. The constraint is not speed on the code-reading side. It is whether you notice where the code is solving the wrong problem at high cost. I had to load the profiler trace to see the `FULL OUTER JOIN`. Claude could have suggested fixes for the layout issue without that trace, and every suggestion would have been irrelevant to the actual latency.\n\nI did not audit the rest of the Staff Center during this ticket. Several other paths run similar aggregates on tables that have grown past the point where those aggregates are cheap. I do not have numbers for all of them. What I know is that the specific `COUNT(*)` in `RecentListAdd` was running on every page load, for every Staff Center user, against unscoped table counts, to power a tooltip.\n\nA slow internal page is rarely a scale problem. It is usually one query answering a much bigger question than the screen ever asked.",
      "date_published": "2026-04-17T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "sql",
        "performance",
        "legacy-code",
        "asp-classic",
        "ai-agent"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-16_url-contract/",
      "url": "https://dxdev.com/blog/2026-04-16_url-contract/",
      "title": "The bug wasn't the upload, it was the URL contract",
      "summary": "Inside a day of debugging a 25-year-old ASP-classic form, where a querystring namespace collision hid behind an upload bug and exposed the real shape of AI-paired work in legacy code.",
      "content_text": "On April 16, the upload was failing intermittently, which is the worst kind of failing because the first few test cases look fine. The fault was `?action=`, a querystring key that the page's main form handler and the AJAX upload handler had each claimed independently. When a form post and an image upload arrived close together, IIS resolved the ambiguity in the upload's disfavor.\n\nThis was one long day inside the Create Event flow for HomeTeamsONLINE, a 25-year-old ASP-classic application. The work started as a styling pass on the tournament event creation page and turned into a full archaeology dig once the weird failures started surfacing.\n\n## the surface was fine, the contract was not\n\nThe `?action=create` handler processes form submissions. The `?action=upload` handler processes image uploads from the same page. In a modern app with actual routing, a framework catches this conflict at startup. In ASP-classic, with `Request.QueryString(\"action\")` checks spread across the same file, there is no framework to complain. The collision manifests as an intermittent upload failure on a page where both handlers are in play.\n\nClaude chased the symptom competently. It ran through logs, patched the handler, grepped across the ASP files for anything else that touched `?action=`. On a first read the agent's pass looked clean. What it could not surface was that `?action=` was a shared namespace that two layers had entered without knowing about each other. That fact is not in any comment or test. It lives in the implicit contract between two parts of a file that grew incrementally over years, and the only way to find it is to hold the question \"are there other callers\" in your head while reading code that never asked that question of itself.\n\nMoving from `?action=upload` to `?action=uploadFile` cleared the collision. The intermittent failures stopped.\n\n## the filename was a path separator\n\nFiles were saving as `UNeventHeader.jpg` instead of `UN/eventHeader.jpg`. Not a filename validation bug. Not a wrong value in a config field. A missing slash in the path construction code caused the `UN` directory prefix to collapse into the filename, so the file landed in the wrong location with a name that looked like a corrupted string. The fix was one character. Finding it took longer than it should have because I was reading the filename handling code instead of the spot where the directory and filename strings got joined.\n\nLegacy bugs land at the last place that touched the output. The actual fault is upstream, in the code that assembled the wrong input, and those two locations are rarely adjacent in a large ASP file.\n\n## MakeHQ and the parent that wouldn't let go\n\n`MakeHQ` is the function that creates a new tournament as an HQ league inside HomeTeamsONLINE. At some point its parent-context initialization got inlined into the creation flow and broke in the inlining. After `MakeHQ` ran, the redirect sent staff to the parent org's admin view instead of the newly-created league's view. The new league existed and was correctly formed. The redirect was reading from a context that still pointed at the parent.\n\nTwo faults compounded. The parent-org context was leaking into the new league's initial setup, and the post-create redirect read from that polluted context. Fixing only the redirect would have produced a system that worked most of the time, which in a legacy codebase is often harder to diagnose than one that fails clearly and consistently.\n\n## local dev was lying about /photos/\n\nImages serve from the application root on local dev and from a CDN on production. The `/photos/` path was hardcoded in enough places that local tests passed while production quietly served broken image references. Environment-aware path resolution is not a new idea, but in a 25-year-old codebase with no central config layer it shows up as a manual grep pass every time someone touches image handling. I have fixed this specific class of problem in HomeTeamsONLINE before, and the fix never fully sticks because nothing enforces it.\n\n## what the agent could and couldn't hold\n\nClaude was useful for velocity. Grepping across the codebase, tracing the `MakeHQ` call chain, surfacing which files referenced `?action=`, drafting the environment-aware path logic for `/photos/`. An agent that can stay oriented across a large ASP codebase without losing the thread, and doesn't slow down on iteration nine of a nearly-identical test case, is a real asset when the volume of material to read is the main friction.\n\nWhat Claude couldn't supply was the system-level invariant. The premise that `?action=` is a shared namespace isn't in a comment, a test, or any document the agent can read. It is a contract implied by the way the system grew, and violating it produces a symptom that looks like an upload bug because that is where the failure is observable. The agent optimizes toward the stated symptom. You have to carry the unstated contract.\n\nThe fix for the upload was not in the upload handler. It was renaming `?action=upload` to `?action=uploadFile` and clearing a namespace that two layers had been sharing without knowing it.\n\nLegacy debugging is contract archaeology. The widget where the failure shows up is almost never where the contract broke.",
      "date_published": "2026-04-16T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "legacy-code",
        "asp-classic",
        "ai-paired-debugging",
        "url-contracts",
        "solo-founder",
        "hometeamsonline"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-15_create-event-dialog-had-to-die/",
      "url": "https://dxdev.com/blog/2026-04-15_create-event-dialog-had-to-die/",
      "title": "The create event dialog had to die",
      "summary": "A tournament form got better the moment I stopped pretending it belonged in a dialog. What a design-review session revealed about legacy UI containers, and why the right fix was deleting the pattern.",
      "content_text": "ITEM-6807 landed on my queue with three small bugs against the Create Event dialog for tournament scheduling. A file upload that broke on mobile. A competition-name field that cleared its own value on submit. A date placeholder that had not been updated since the form was built. I opened a session to fix those inputs. The inputs were not the problem.\n\nHomeTeamsONLINE is a 25-year-old ASP-classic application. Modal dialogs were a reasonable choice when the app was built, and a load-bearing assumption by the time I inherited it. The Create Event dialog was one of hundreds of those patterns layered in over the years, each one reasonable at the time it was added, each one adding constraints the next builder had to work around. By the time I sat down with ITEM-6807, the dialog had accumulated fields across multiple tournament types and had become a surface where fixing one thing reliably broke another.\n\n## the dialog was the wrong container\n\nThe main session eventually logged 716 prompts. Somewhere in the middle of that, David, our designer, joined for a design review. He looked at the current state of the modal, asked a question I had not thought to ask, and the answer changed the direction of the day.\n\nThe question was: why is this a dialog at all?\n\nTournament creation is not a quick action. It has file uploads, a date range, a competition name, an organizer list, and mobile users who need a readable layout at every step. Dialogs are for confirmations and short inputs. Squeezing tournament creation into a modal was a legacy holdover, not a product decision. Every special case I was adding was a symptom of the wrong container, not an isolated bug.\n\nThe fix was to route \"add new event\" to a dedicated full-page form with `?action=create` in the URL. The dialog was dropped and the page got the full layout it needed. Most of the bugs on the modal evaporated because the page had room for the inputs. This is not a new insight in web development. It still took a human in the room to see it. I had spent 716 prompts improving a form that needed to be a page.\n\n## what the design review actually killed\n\nDavid made five concrete decisions in that session, each one resolving something I had been treating as a separate problem.\n\nHe renamed Organizers to Staff. That seems small. \"Organizer\" is internal jargon for a role concept that staff users do not think of themselves through. \"Staff\" is what they call themselves, and the rename reduced the label's friction without changing anything else.\n\nHe moved the image upload action out of the main input flow and gave it a cleaner position on the page. On the dialog version, the upload was competing for space with every other input. On the full page, it sits where it belongs.\n\nHe adjusted section header and input label typography so the hierarchy was readable without squinting. The dialog had compressed everything to fit the modal width. The page version did not have to.\n\nHe restructured the overall layout so the form communicated intent rather than just enumerating fields. That is a different thing even if the fields are identical.\n\nAnd he killed the dialog itself. That sounds obvious now. It had not been obvious during the session. It took someone outside the implementation to see that the container was the constraint.\n\n## the fixes that needed a real page\n\nOnce the form was a page, several things that had been hard became straightforward.\n\nThe competition-name field had been clearing on submit because of how the dialog's state was wired. On the page form, the value is preserved through the action cycle, and that fix came from changing the container, not from patching the field.\n\nThe image upload had a `MapPath` issue where the temp copy was written to a path that did not survive the form submission round-trip. The fix was two lines: changing how the temp path was constructed so the next handler could reach it. On the dialog, that lifecycle had been invisible because the modal layer compressed the whole two-step into one apparent action. On the page, the steps were visible and the bug was easy to find.\n\nMobile layout had been wrapping inputs into columns that made the form unreadable on anything narrower than 800px. The page handled wrapping correctly because it was not constrained by dialog width.\n\nDate placeholders were showing a format string instead of hint text on several browsers. That is a one-line fix once the input is visible and testable in isolation, but on the dialog it had been masked by the surrounding chrome. Input sizing and label alignment got a cleanup pass. Nothing individually significant, but the cumulative effect is a form that reads as intentional rather than accumulated.\n\nFor each of these, Claude handled the fix. Describe the broken behavior, get a working implementation back, review it. What Claude could not do, across 716 prompts in the main session and another 315 in the afternoon pass, was surface that the dialog itself was wrong. That required David.\n\n## AI_TEMP_ROOT and where the artifacts go\n\nOne operational side-lesson from the day. Screenshots, plans, and session artifacts had been landing in ad-hoc locations around the repo, some in tracked directories, some not. By the end of the session I had defined `AI_TEMP_ROOT` as a named constant pointing at a scratch location outside the tracked tree. Future sessions write their temp output there.\n\nSmall convention. Sessions on a 25-year-old codebase accumulate a lot of intermediate state, and without a named home for it, that state becomes noise you clean by hand or leave to pollute the repo history.\n\n## what the agent can and can't do\n\nClaude's failure mode on ITEM-6807 was not a bug in the implementation. Everything it produced was correct. The competition-name fix was right. The `MapPath` temp-copy fix was right. The mobile layout pass was right.\n\nAn agent working inside a dialog will improve the dialog. It will keep iterating, tightening inputs, fixing edge cases, handling mobile breakpoints, getting the form closer to correct inside the wrong container. It will not spontaneously decide the container is the problem, dissolve the abstraction, and redirect the feature to a full page. That decision requires someone who can hold the user's experience in mind alongside the code, who can ask \"why is this a dialog at all\" and mean it as a product question rather than a technical one.\n\nIn this case that was David. The `?action=create` route exists because he asked the question. The rename from Organizers to Staff happened because he knew what staff actually call themselves. These are product decisions, not implementation decisions, and they only become visible once someone with product judgment enters the room.\n\nAgents accelerate implementation. They do not replace the moment when a collaborator looks at the frame and says the frame is wrong. That gap is where most of the quality lives.",
      "date_published": "2026-04-15T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "hto",
        "architecture",
        "design-review",
        "forms",
        "legacy",
        "ai-paired-development"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-13_smaller-blast-radius-restart/",
      "url": "https://dxdev.com/blog/2026-04-13_smaller-blast-radius-restart/",
      "title": "The real fix was starting over with a smaller blast radius",
      "summary": "A day of AI-assisted UI iteration only became shippable after I stopped patching a broken prototype and rebuilt the change behind org-only methods.",
      "content_text": "David's design feedback on the V3 pricing page landed mid-morning on April 13. The card structure was too dense, checkmarks were missing, the typography did not match the rest of the product, and checkout-bar behavior on mobile was undefined. The list was longer than a cosmetic pass could fix.\n\nI had a choice. Keep iterating on the existing V3 branch, which already had a week of work in it, or look at what that branch was actually doing to the codebase.\n\nI looked.\n\n## the prototype had been building up debt\n\nThe V3 packages and events pricing feature had been built inside shared code paths. Org-specific behavior, the logic that only applies when a team-org account navigates the purchase flow, was accumulating inside the same conditionals that handle legacy tournament purchases. Not by design. It just happened. The model generated diffs, I reviewed them, neither of us stopped to ask where new behavior should live.\n\nThe risk was not hypothetical. Legacy tournament checkouts are live customer flows. A regression there is not a layout glitch. It is a broken purchase for a paying customer. The prototype code was not dangerous on its own, but it made the next iteration dangerous. Every prompt refining the org pricing UI was one more delta applied to entangled paths.\n\nI saved the branch. Then I closed it and started over.\n\n## one boundary stopped the leakage\n\nThe restart had one deliberate step before any UI work. I introduced explicit `*_TournamentOrg` methods gated on `pageHQ.orgname`. The rule was simple. A session with an orgname gets the org path. Everything else gets the legacy path. The two paths do not share conditionals.\n\nThat boundary changed the character of every iteration that followed. Card padding, checkmark rendering, typography alignment, mobile overflow, checkout bar structure, CTA states across multiple breakpoints. The model produced diffs. The diffs landed inside the org-specific methods. The legacy tournament path was structurally unreachable from them. The blast radius was defined.\n\nNone of those iterations were risky. That is the whole point.\n\n## what the release actually required\n\nThe commits from the second half of that day tell a story that the phrase \"UI iteration\" does not. Compile assets. Restore legacy event headings that V3 had quietly clobbered. Re-enable nav links disabled during the experiment. Add auto-redirect logic for child events so they do not strand navigating users. Fix mobile overflow across both the `hto` and `hto2` clones.\n\nEach item on that list was a consequence of how long the prototype had been sitting inside shared paths before the restart. The longer entangled code sits, the more of this cleanup waits at the other end.\n\nThe model helped with all of it. It also introduced two of those surprises by not catching that the legacy heading state was load-bearing. That is not a complaint. Noticing what the model misses is the job.\n\n## the sequence worth stealing\n\nI have done a version of this more than once. A prototype accumulates. It starts to feel expensive to redo. Outside feedback, a code review, a failed test, a list from a designer, makes the scoping problem visible for the first time. The temptation is to patch around it.\n\nThe cheaper move. Save the broken branch as a reference, cut clean from the last known-good point, define the method boundary before writing any new UI, then iterate hard inside it.\n\nAI makes it cheap to generate UI deltas. It does not make mixed-scope code safer. The model will iterate wherever you point it. The question of where the experiment is allowed to live is yours to answer first.",
      "date_published": "2026-04-13T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "solo-founder",
        "ai-pairing",
        "asp-classic",
        "refactor",
        "scope-control",
        "legacy-code"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-12_split-brain-save-paths/",
      "url": "https://dxdev.com/blog/2026-04-12_split-brain-save-paths/",
      "title": "The bug was not in the mobile UI",
      "summary": "A mobileOrder field that appeared to save and then silently dropped its value. The investigation found that ADD and UPDATE had been writing different data shapes for years inside the same HTO module save path.",
      "content_text": "A customer filed a ticket on April 12 reporting that a mobile reorder option on their custom page would not stick. They set it, hit save, reloaded the page, and the field was back to its default.\n\nMy first read: mobile UI bug. The field was rendering correctly. The save button was wired to a JavaScript handler. The page refreshed on submit. I started tracing the frontend.\n\nThat was wrong.\n\n## the field was saving, just not what it looked like\n\nThe symptom was real. The diagnosis was off. `mobileOrder` was making it to the server. The POST was landing. The problem was what the backend did with it depending on whether the record already existed.\n\nHomeTeamsONLINE has a pattern I see throughout its custom-page module code: one handler for ADD, one for UPDATE. They were written years apart by different hands, and they had drifted into different shapes.\n\nThe ADD path was writing the full payload. Settings, styles, title, `mobileOrder`, the works. The UPDATE path was writing `parent` and `cellIx`. Two fields.\n\nEvery time a user edited an existing custom page and saved, the UPDATE branch fired and silently dropped everything else. The field appeared to save because the page reloaded. It looked like a confirmation. It was just a stale render from the prior value.\n\nThe bug had been there long enough that nobody had traced it to the save path. They had been filing tickets about the UI.\n\n## proof before trust\n\nLegacy systems have a credibility problem. You find a bug, you write a fix, you deploy it, and you cannot be sure the fix actually applied to the right path. The codebase is old enough that there are sometimes two versions of the same handler in different include files, and the wrong one might be winning.\n\nI built a small diagnostic page to verify the fix before closing the ticket. It reads the current saved value for a given record and displays the raw field alongside the rendered value, so you can confirm what the database actually holds after a save operation. Not clever. Just proof.\n\nThat page earned its existence immediately. Without it I would have tested the fix by eyeballing a form reload, which is exactly how the bug hid for this long in the first place.\n\n## the fix was symmetry\n\nOnce the shape mismatch was clear, the fix was not complicated. I updated the frontend payload builder to include the full field set on both the ADD case and the UPDATE case. Then I updated the UPDATE branch on the backend to accept and persist those fields the same way ADD did.\n\nThe session touched the backend save handler, the JavaScript payload constructor, and a small diagnostic page for verification. The commit went to ITEM-6802. The customer confirmed the field held on their next test.\n\n## the question the fix left open\n\nThe HTO custom-page module is not the only place in the codebase where ADD and UPDATE were written at different times by different people. The same drift exists in other module types. Some of them probably have fields that look saved but are quietly dropping on UPDATE. Those bugs have not been reported because they require a specific combination: an editable field, an UPDATE path that predates that field, and a user who noticed the revert.\n\nMost users do not notice. They reload the page, see something reasonable, and move on. This ticket arrived because the customer was configuring a mobile layout and the missing value was visually obvious on a small screen.\n\nThe right fix for the broader problem is a single payload contract that both ADD and UPDATE share, built once and enforced at the serialization layer. That is a refactor, not a patch, and it belongs on its own ticket.\n\nFor now, the ITEM-6802 path is clean. The others are waiting for the next customer who notices.\n\nWhen a UI setting will not stick, the bug is usually not in the UI. Two parallel write paths have drifted, one of them is quietly dropping the field, and the fake frontend symptom is a tell for split-brain persistence. The field reports save because the page reloads. The save never happened.",
      "date_published": "2026-04-12T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "debugging",
        "legacy-systems",
        "asp-classic",
        "persistence",
        "hto",
        "solo-founder"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-11_ai-ships-the-wrong-mockup/",
      "url": "https://dxdev.com/blog/2026-04-11_ai-ships-the-wrong-mockup/",
      "title": "AI will happily ship the wrong mockup",
      "summary": "A Saturday UI rebuild that looked fine and was still wrong. What the gap between plausible and correct costs you.",
      "content_text": "The event card for ITEM-6684 passed every visual check I ran. I had rebuilt it from scratch that morning. David rejected it two hours later.\n\nNot because it was broken. The card rendered, it loaded without errors, and it used the right component names. But composition was wrong. The icon appeared twice: once from the card template, and once from a prior render pass I had not cleaned up. Title height was unconstrained, so event names of different lengths pushed the price block to different vertical positions depending on the string. The color tokens for tier badges were close to the design but not exact. Four hex values were off. Content density was too loose. The whole thing looked like something built from a description of the intended card, not from the card itself.\n\nThat distinction is the whole story.\n\n## the problem with plausible\n\nI gave Claude the existing markup, the JIRA ticket description, and a directive: rebuild the card to match the new pricing structure. The output was coherent. It followed the existing patterns. It built something that resembled David's design the way a cover band resembles the original: same structure, same approximate timing, missing the details that made the original worth covering.\n\nThe title-height invariant is a good example of what I mean. David's design has a rule: the title breaks at two lines maximum, and the price block sits at a fixed vertical position regardless of title length. That constraint is not in the ticket. It is not derivable from the existing markup. It lives in David's head, in a Figma annotation I did not pass to the agent, in the visual grammar of the product that accumulated over years before I touched it.\n\nClaude built what the ticket said. The ticket did not say enough. My mistake, not the agent's.\n\n## the rebuild, not the tweak\n\nThe fix was not a patch. I went through the card structure and unified the icon path so only one source was active, added `min-height` and `line-clamp` to pin the title behavior, corrected four hex values against David's spec, tightened the padding by 8px on each side to match content density, and removed the duplicate template fragment.\n\nFive changes. Each one small. Each requiring knowledge of the exact design, not an approximation of it.\n\n## the constraint no one told the agent\n\nOnce the card was correct, the page was still broken.\n\nThe `.contentFrameWrapper` class in the shell template carries a `max-width: 980px` rule. It predates my ownership of the codebase by years. At 1200px viewport width the card grid looked fine. At 1050px the right column of cards was clipping. That range is where a significant share of my users actually sit.\n\nThe agent had no way to know this. The shell template was outside the ticket scope. The visual artifact only appears at a specific viewport range. I was testing on a 2560px monitor that Saturday afternoon and never saw it.\n\nRemoving the cap was a one-line CSS change. Finding it required a human noticing the layout break at a specific breakpoint and knowing to look for a global width constraint. Claude did not surface it. David did, in code review.\n\n## what would have closed the gap\n\nI keep circling the same question: what review artifact would have forced the model toward David's exact card structure on the first pass?\n\nThe honest answer is a screenshot of the intended design annotated with the invariants. Not \"looks like this\" but \"title max two lines, price pinned at Y position, these four hex values, one icon, no duplicate.\" The agent can execute a spec. It cannot reverse-engineer one from an approximation of the original.\n\nThere is also a version of this problem that better prompting cannot fix. The `.contentFrameWrapper` constraint was not a prompt problem. It was a scope problem. The ticket covered the component. The bug lived in the shell. You can write a perfect card spec and still ship a broken page if the thing wrapping the card has undocumented constraints that only surface at 1050px.\n\nBoth failure modes showed up on the same Saturday. One was recoverable with a tighter prompt. One required a human who had looked at the page across multiple viewport widths.\n\nThe person who owns the product still has to specify the exact shape of correct, and then verify it at the edges where the spec ends and the legacy constraints begin.",
      "date_published": "2026-04-11T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "ai-tools",
        "frontend",
        "design-review",
        "claude-code",
        "solo-founder",
        "asp-classic"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-11_debugger-beats-blind-patching/",
      "url": "https://dxdev.com/blog/2026-04-11_debugger-beats-blind-patching/",
      "title": "The fastest legacy hotfix is often a debugger",
      "summary": "A Saturday evening of customer tickets got faster the moment I stopped patching and started building the diagnostic tool first.",
      "content_text": "By 6pm on a Saturday I had five HTO tickets stacked up: ITEM-6798, ITEM-6799, ITEM-6800, ITEM-6801, ITEM-6803. Layout-grid weirdness, duplicate standings headers, a broken photo link, a customer who swore the team-roster search was lying about a player's email, and a roster edge case where the same name appeared twice. The instinct was to start at the top and patch down the list.\n\nThat was wrong.\n\nThe first ticket I opened was ITEM-6798. A customer reported that a page on their site was rendering blank slots where modules should be. I could have grepped the layout save code, found a recent change, guessed at a regression, shipped a patch, and moved on. In a 25-year-old ASP-classic codebase that approach takes about two hours and lands you somewhere between \"fixed it\" and \"broke something adjacent.\" The ticket would close. The next one like it, three months from now, would start the same way.\n\n## the productive move was a tool, not a patch\n\nI stopped trying to fix ITEM-6798 and built LayoutStructDiag instead. A small admin page. It walks a page's saved layout structure and reports what is actually in the database: orphaned modules with no parent, broken parent references pointing at deleted containers, parent/content drift where the layout tree disagrees with itself about who owns what.\n\nTwenty minutes of work. Then I ran it against the customer's page.\n\nThe page was dirty. Three modules had parent IDs pointing at containers that no longer existed. Two more had content rows whose parent references had drifted during an earlier save. None of this was a bug in the rendering code. The render was doing exactly what it should given the input. The input had been corrupted by something, somewhere, possibly years ago, and there was no UI surface in the entire application that would have shown me that.\n\nThe fix collapsed from \"find and patch a render bug\" to \"delete the orphaned rows, audit the save flow that produced them.\" Different ticket. Much smaller blast radius.\n\n## PersonLookup did the same thing for the next one\n\nITEM-6801 was a customer insisting their roster search could not find a player by email. I could have started by re-reading the search SQL, dumping query plans, second-guessing the LIKE pattern. Standard guessing.\n\nInstead I built PersonLookup. Another admin page. Type any email, get back every Person row in the database that matches, plus the team memberships, plus the email-history records, plus the identity merges that have happened against that email over time.\n\nI ran it on the customer's email. The email was there. Three Person records owned it, two of them merged into the third six months ago, and the search was hitting a fourth Person record that had the same name but a different stored email. The customer's complaint translated cleanly: \"the search found my namesake, not me.\" Not a missing-data bug. An identity-mismatch bug, surfaced by visible state.\n\nThe fix was small. The hours I did not spend re-reading search internals were the actual win.\n\n## the rest of the night went faster\n\nOnce LayoutStructDiag and PersonLookup existed, the remaining tickets stopped being mysteries. ITEM-6799 turned out to be the same corruption class as ITEM-6798, on a different page, caught by the same diagnostic. ITEM-6800 was a clean render bug, isolated in fifteen minutes because the diagnostic told me up front that the data was fine. ITEM-6803 was data again.\n\nFive tickets. Four of them collapsed by tools that did not exist when I started. The patch surface area I would have touched, if I had started by patching, was probably ten times what I actually changed.\n\n## why this keeps working in legacy systems\n\nThe reason a debugger beats blind patching in old code is not that old code is mysteriously hard. It is that old code has accumulated state that is not visible from the UI. Bad rows from migrations three platforms ago. Identity merges that left dangling references. Layout structures that were valid under the rules of 2008 and not the rules of 2024. The application keeps running because the rendering code is defensive. The bug reports keep coming because the data has been quietly wrong for years.\n\nYou cannot grep your way to that. You can only see it by writing the thing that shows it to you.\n\nEvery legacy ticket where the answer is \"looks fine in the code, must be the data\" is an admin tool waiting to be built. The cost is twenty minutes. The payoff is that the next ten tickets in the same neighborhood get answered in five minutes each, and the eleventh one finds the actual save-flow bug that has been producing the corruption all along.\n\nLayoutStructDiag and PersonLookup are still in the admin panel. The next time a layout-corruption ticket comes in, the first thing I will do is open the diagnostic and read the answer. The ticket itself was the cheapest part of the night. The tooling was the asset.\n\nIn a legacy app, the highest-leverage hotfix is often the tiny debugger that tells you whether the system is broken or just dirty.",
      "date_published": "2026-04-11T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "legacy-code",
        "debugging",
        "hotfix",
        "asp-classic",
        "observability",
        "solo-founder"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-10_pricing-real-when-gates-shipped/",
      "url": "https://dxdev.com/blog/2026-04-10_pricing-real-when-gates-shipped/",
      "title": "The pricing rewrite only got real when I wrote the gates",
      "summary": "A tournament pricing prototype is easy. Making it survive hybrid checkout, feature gates, proration, unpaid states, and legacy billing is the part that teaches you what you actually built.",
      "content_text": "On April 10, ITEM-6684 started as a prototype problem. Tournament event pricing for HomeTeamsONLINE: how should a team registration add cost to a tournament? I figured the design work was the hard part. The implementation would follow.\n\nBy the end of the day I had pushed that model through hybrid checkout, feature gates, proration math, unpaid-state handling, and architecture docs spread across four clones (`hto`, `hto2`, `hto3`, `hto4`). The prototype had been easy. The real thing was something else.\n\n## two layers, not one\n\nThe first instinct for tournament pricing was flat per-team pricing. Each team that enters pays X dollars. Clean, predictable, easy to explain to a tournament director. The second instinct was credits: pre-purchased blocks that deduct on each registration.\n\nBoth lost for the same reason. Neither maps cleanly onto what HomeTeamsONLINE already sells.\n\nEvery league and tournament on the platform is run by an organization that already has a membership tier. That membership tier controls what the organization can do. Adding event pricing on top of a flat-per-team model means the software has two separate concepts of \"what this org paid for\" running side by side, with no formal relationship between them.\n\nThe model that survived is two layers. The base layer is the existing membership, with its price and its access surface. The second layer is a bundle ceiling: an event can carry a price cap that defines how many team registrations are included in the membership's cost. Above that ceiling, additional registrations cost more. Below it, the event is covered.\n\nThis is not elegant in the abstract. It is correct in practice, because it maps to what tournament directors actually negotiate. This tier gets up to 32 teams included, more costs extra.\n\n## the prototype was allowed to ignore proration\n\nThe hybrid checkout piece is where the prototype lied to me about what was hard.\n\nHomeTeamsONLINE has a membership billing system. It is ASP classic, it is old, and it works. A tournament event cart is a separate concept, a one-time purchase tied to a specific event, not a recurring membership cycle. When a tournament director registers for an event mid-billing-cycle, the checkout needs to produce one coherent total that includes any prorated membership adjustment and the event registration cost together.\n\nThe prototype was allowed to assume a clean billing boundary. The real implementation was not.\n\nProration math here is not complicated in isolation. You take the remaining days in the billing period, divide by the total days, apply that ratio to the membership delta. Fifteen lines of arithmetic. The complication is that this math has to run in the legacy billing path, produce a line item the event cart can consume, and not double-charge if the org already settled that billing period in a different session.\n\nThat last constraint did not exist in the prototype. It showed up the moment I tried to wire hybrid checkout into an actual test org that had paid last month and was renewing next week.\n\nI had not thought about it until that moment. The prototype had one org, one event, one happy path. The constraint never surfaced.\n\n## where the gates actually live\n\nFeature gates in HomeTeamsONLINE are not a single table. The surface I was wiring into is `tournamentFeatureAccess`, a function that returns a bitmask of capabilities for a given org-tournament pair. Whether the org can create bracket rounds, publish a roster page, or use the event cart at all. All of it flows through this one surface.\n\nWiring tournament event pricing into `tournamentFeatureAccess` took most of an afternoon. Not because the code was hard, but because the unlocks catalog had not been designed for bundle-ceiling semantics. minPkg-style gating, the pattern already used for a few other feature tiers, assumes a fixed threshold. This feature is available at membership tier 3 and above. Bundle ceilings are not a threshold. They are a quantity that varies per event configuration.\n\nThe unlock logic had to learn a new pattern. Does this event have a ceiling defined, and if so, how many slots remain. That is a different kind of gate than any that existed before. Getting it into the catalog cleanly, without special-casing `tournamentFeatureAccess` for this one pricing mode, added a day I had not planned for.\n\n## the work that was not in the prototype\n\nEverything above was architecture. What follows was product.\n\nUnpaid banners. When an org's account is overdue, tournament registrations need a hard lock, not a warning. A warning can be dismissed. An overdue hard lock cannot. The distinction matters because tournament directors sometimes navigate to the registration flow from a bookmark that predates the overdue notice, and a dismissible banner disappears before they see the price.\n\nPre-cap warnings. When an org is within five registrations of their bundle ceiling, the event cart shows a warning before they commit. This does not exist in any other HomeTeamsONLINE checkout flow. The shopping cart metaphor has no ceiling concept. Building the warning meant adding a line item type that is informational, not transactional, and making sure it does not appear on the invoice sent to the tournament director.\n\nDowngrade behavior with preserved excess credit. If an org downgrades their membership tier after registering for an event covered by the higher tier, the registrations already paid do not get clawed back. The excess credit from the event is preserved as a line item balance against the org's account. Writing that logic required touching the membership billing path in a way I had been deliberately avoiding.\n\nNone of this was in the prototype. The prototype had one org, one event, one happy path.\n\n## what the architecture doc is for\n\nThe implementation spans four clones because HomeTeamsONLINE development is split across `hto`, `hto2`, `hto3`, and `hto4` by function. Admin flows live in one place, public registration in another, billing in a third. A feature that touches all three lives in all three clones simultaneously, and each clone has its own branch cut point.\n\nI wrote the architecture document not because I needed to think it through again, but because future me will not remember why bundle ceiling won over flat per-team. In three months, when a tournament director asks for a credits model, the document is the only thing that will stop me from re-litigating the entire choice from scratch.\n\nThe document is 47 lines. It covers the two-layer model, the proration contract, the gate semantics, and the three edge cases that shaped the design: mid-cycle checkout, overdue hard locks, and downgrade-with-credit. That is enough to reconstruct the reasoning. It is not a spec. It is a decision record.\n\nA pricing model is not real until it survives your legacy checkout, your feature gates, and the awkward states customers actually get stuck in.",
      "date_published": "2026-04-10T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "pricing",
        "architecture",
        "feature-gates",
        "hto",
        "legacy-systems",
        "checkout"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-10_redirects-were-the-attack/",
      "url": "https://dxdev.com/blog/2026-04-10_redirects-were-the-attack/",
      "title": "The attacker was our own redirect logic",
      "summary": "ITEM-6795 looked like abuse mitigation doing its job. The blocked accounts were legitimate admins, and the attack traffic was coming from our own 302 chain.",
      "content_text": "ITEM-6795 came in as a login complaint. An HTO admin was getting blocked mid-session, not at the login screen, but a few clicks into the control panel. The IPFilter table had a `speedFlag` set on the account. From the outside it looked like the filter doing exactly what it was built to do. Somebody clicked too fast, got flagged, got throttled.\n\nI opened the ticket expecting a threshold problem. Maybe the daily-hit ceiling was too low for an active power user. I pulled the IPFilter row, looked at the `speedFlag` state and the timestamps, and did the math. The clicks were fast, but not attack-fast. Around 8 to 12 requests in a short window.\n\nThat was not abuse. That was someone navigating.\n\n## the filter was counting our own redirects as hits\n\nHTO's admin panel does 302 redirects. A lot of them. The pattern is standard. Click a menu item, land on a routing page, get bounced to the actual destination. Two HTTP requests, one user action, under 500 milliseconds. For a quick-click admin, that chain fires repeatedly. Four menu clicks becomes eight or twelve logged requests on the same IP in a few seconds.\n\nThe IPFilter stored procedure did not know about this pattern. It counted hits. A hit was a hit. If two hits arrived within the threshold window from the same IP, the speed counter incremented. If the counter crossed the daily limit, `speedFlag` got set. Nobody wrote a carve-out for the app's own internal navigation because when the filter was written, nobody had thought to ask whether normal in-app routing would ever look like a burst.\n\nThe routing model evolved. The filter did not.\n\n## what claude missed first\n\nI was working this with Claude. The initial read of the IPFilter code came back clean. Stored procedure logic correct as written, thresholds matched the documented intent, no SQL bugs. Claude flagged the `speedFlag` threshold as potentially aggressive but framed it as a tuning question, not a classification problem.\n\nThat framing was wrong. The answer to \"the threshold is too tight\" is to raise the number. The actual problem was that the number was counting the wrong things. You can raise the ceiling as high as you want and still block a power user who navigates in bursts, because every admin click through a redirect chain keeps filling the bucket.\n\nThe classification gap was the bug. The threshold was a symptom.\n\n## the four-part fix\n\nOnce the root cause was clear, the fix had four moves. They had to land together, because a partial fix would have left the underlying count wrong.\n\nFirst, add a request classification layer. The stored procedure had no concept of request origin, treating all hits the same regardless of where they came from. It needed to distinguish internal navigation from inbound probing before deciding whether to increment the speed counter.\n\nSecond, change the stored procedure to skip the `speedFlag` increment for the internal-navigation class. Navigational hits still get logged, but they no longer feed the counter that gates account state.\n\nThird, add an auto-clear for `speedFlag` entries set while the account was otherwise idle. If the flag was set by a burst that turns out to have been navigation, and the account goes quiet for a defined window afterward, the flag clears itself rather than persisting until someone notices.\n\nFourth, raise the daily threshold to give legitimate power users more headroom before any flag fires at all. This one is the tuning move that would have been insufficient alone. With the classification fix in place, it becomes a sensible backstop instead of a band-aid on the wrong problem.\n\n## old defensive code does this quietly\n\nThe IPFilter system is old. It was probably right when it was written. The app's navigation pattern at the time was simpler, slower, less redirect-heavy. The heuristic made sense against the traffic model it was built for.\n\nWhat happens to a defensive rule over time, when nobody revisits it, is that the app evolves and the rule does not. Navigation patterns change. Admin workflows get more complex. The number of redirects per user action grows as the control panel grows. The filter keeps counting the same way. One day a ticket comes in because an admin got blocked, and it turns out the routing has been teaching the filter the wrong lesson for years.\n\nI do not know when the tipping point happened. ITEM-6795 is the first customer-visible case I have, but I have no signal it was the first time the filter did this. Power users who hit the problem and did not report it would have looked like unexplained session issues, not an IPFilter false positive. The filter was running, logging, flagging, and mutating account state, and the only feedback path was a support ticket.\n\nThere was no alert on unexpected `speedFlag` set rates. No dashboard surfacing blocked admins. Nothing that would have made a false-positive pattern visible before a customer called it in.\n\n## what i am sitting with\n\nAfter the fix shipped, I started asking what it would take to catch this class of problem before a user reports it. The thing I want is not complicated in theory. A query that joins IPFilter flags against request-pattern logs and surfaces cases where flagged accounts show a profile consistent with navigation rather than probing. A daily sweep would have caught ITEM-6795 weeks earlier.\n\nWhat I do not have yet is clarity on how many other defensive rules in HTO are running on stale assumptions. The IPFilter is one system. There are others. Each one was correct when written. Each one is sitting on top of a navigation model that has been evolving since 2001.\n\nThat audit is not done.\n\nIf your abuse filter cannot tell the difference between an attacker and your own redirect logic, it is not security. It is random account damage.",
      "date_published": "2026-04-10T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "abuse-mitigation",
        "false-positives",
        "asp-classic",
        "hto",
        "ipfilter",
        "solo-founder"
      ]
    },
    {
      "id": "https://dxdev.com/blog/the-32-day-revolution/",
      "url": "https://dxdev.com/blog/the-32-day-revolution/",
      "title": "The 32-Day Revolution",
      "summary": "Thirty-two days, ten Discord bots, thirty-nine Python tools. Nobody planned this multi-agent architecture. Every layer was forced by a specific bottleneck. The story of how the system you need at scale doesn't exist until you've earned it.",
      "content_text": "On March 5, 2026, the entire system was one file.\n\n`hive_cli.py`. 240 lines. A Python script that sent a message to a Discord channel and waited for a reply. It was, by any serious measure, a chatbot wrapper with ambitions.\n\nThirty-two days later, it was something different. Ten specialized Discord bots. Thirty-nine Python tools organized into eight domain directories. A React dashboard. A dispatch pipeline with cost-aware model routing. A supervisor loop. An autonomy budget with approval gates. A knowledge layer that indexed every session. More than 200 commits.\n\nNobody planned any of this. It grew from a specific bottleneck, then another, then another. That's the story worth telling: not the architecture I designed, but the path that forced the architecture on me.\n\n---\n\nThe original premise was simple enough. Run Claude in a loop, point it at Discord, give it context about my work. Let it handle async tasks while I slept. The first commit was a Thursday night proof of concept. It worked in the narrow sense that the agent replied to messages and logged output to a file.\n\nThe bottleneck appeared almost immediately: Claude doesn't know what it did yesterday. Every session started from scratch. The system had no memory, no accumulating context, no way to build on prior work. I'd dispatch a task, get a result, and then the agent forgot it happened. The work was real but the system couldn't compound.\n\nSo on March 15, ten days in, I added a memory layer. A `memory/` directory, JSONL session logs, a state file. The commit message: *\"Add Phase 2 HTTP server, externalize agent prompts, and scaffold memory system.\"* Still one agent. Now with a notepad.\n\nThat lasted three days before the next bottleneck emerged: one agent reviewing its own work is mostly confirmation bias. You need someone who doesn't share your assumptions.\n\n---\n\nThe March 22 commit is the inflection point. The full message:\n\n> *\"Agent review: 37/38 improvements. Monolith split, 318 new tests, ops hardening. Full agent review by all 6 roles (Sage, Kai, Nova, Vera, Atlas, Cortex) identified 130+ issues, deduplicated to 38 actionable tasks. 37 completed.\"*\n\nI had bootstrapped six specialized agents: Planner, Architect, Builder, Ops, Analyst, Knowledge. Then I pointed them at the codebase with instructions to find problems. Not to be polite. To find problems.\n\nThey found 130.\n\nThe most damning: `main.py` had grown to 6,130 lines. One file. One endpoint. No separation of concerns. If you touched the session logging code, you could accidentally break the dispatch pipeline. The agents named it clearly: this is a bottleneck that will compound every future change. It needed to be split.\n\nBy end of day, `main.py` was 239 lines. Twelve router modules. A schemas file. The agents had reviewed the monolith into pieces.\n\nHere's what made this different from a normal code review. The agents had operating context that no static reviewer could have. They knew which API endpoints were called most often, and were therefore most fragile. They knew which shared state was implicitly coupled. They knew which error paths had no logging. A human reviewer would have found the obvious structural problems. The agents found the operational ones: the things that only break at runtime, under real load, in the middle of the night.\n\nThirty-seven of thirty-eight tasks completed in a single session. The one that didn't was a design question that needed my input.\n\nThis was the moment the project changed. The agents weren't just executing tasks anymore. They were reviewing the system they lived inside. The architecture stopped being something I designed and started being something we negotiated, with \"we\" meaning me and the agents I'd built to challenge my assumptions.\n\n---\n\nThree days later, the product got a name.\n\nThe commit: *\"Rebrand: Hive → OpDek (product), DxDev (company), dxdev.opdek.com (domain).\"* It sounds minor in a git log. It wasn't. A rebrand forces you to decide what you're actually building. Calling it OpDek, an operations engine rather than a coding assistant, clarified the product's scope. The agents weren't there to write code. They were there to run operations.\n\nNaming clarifies scope. \"Hive\" implied a swarm of undifferentiated agents. \"OpDek\" implied an operations desk, something you sit at to manage work. That framing immediately resolved a product question I'd been circling: is this a tool for building agents, or a tool for running them? The answer is running them. You don't build agents with OpDek. You operate them.\n\nFour commits to stabilize the rebrand, then done. Every subsequent architectural decision became easier after that framing was clear.\n\n---\n\nBy day 32, the count was:\n\n- **10 Discord bots** with distinct roles: CEO, CTO, CFO, CMO, COO, CKO, Planner, Analyst, Builder, Ops\n- **39 Python tools** in 8 domain directories: session management, decomposition, dispatch, agents, supervision, cost, knowledge, reporting\n- **A React dashboard** with portfolio drill-down: business > milestone > task > session\n- **A dispatch pipeline** with hard blocks (capacity, backlog, success rate) and soft warnings\n- **A supervisor loop** running every 5 minutes, health checks every hour, daily sweeps at 3am\n- **A cost layer** routing work to Haiku, Sonnet, or Opus based on task type and autonomy budget\n- **321 plans per month** autonomously executed, tracked, and summarized\n\nThe 240-line CLI had become the operations layer for a small AI-native business.\n\n---\n\nWhat does it feel like from the inside when an architecture evolves like this? Mostly it feels like firefighting. You don't sit down and design a 10-agent org. You add a second agent because the first one can't review itself. You add a supervisor because two agents disagreed and nobody was resolving it. You add cost routing because you notice 90% of tasks don't need Opus but you're paying Opus prices. Each addition is a direct response to a real pain.\n\nThe architecture that exists at day 32 is correct for the problems encountered in days 1 through 31. It's not necessarily correct for days 33 through 100. That's the part multi-agent architecture guides tend to skip: the system you need at scale doesn't exist until you've accumulated the specific failures that require it.\n\nYou can't design it in advance. You can only earn it.\n\n---\n\n## Architecture\n\nThe system that exists at day 32 has four layers. Each was added in response to a specific failure mode.\n\n**Interface layer.** Dashboard (React + FastAPI). Added because Discord-only meant mobile-only, and some operations need a real screen. Portfolio view: business > milestone > task > session.\n\n**Agent layer.** 10 Discord bots with role specialization. The CTO catches architectural problems the Builder misses. The CFO catches cost problems the CTO doesn't look for. One agent can't review itself. Ten agents with different priorities can.\n\n**Execution layer.** Supervisor loop + dispatch pipeline. Added because two agents can do the same work twice if there's no coordinator. The supervisor runs every 5 minutes. Priority-ordered dispatch with dependency resolution. Stall detection: warn at 15 minutes, escalate at 30.\n\n**Data layer.** SQLite (later PostgreSQL), JSONL logs, JSON dual-write. A system that can't replay its own history can't learn. Every session is logged. Every learning is tagged. Agents can query what previous sessions found.\n\n## Timeline\n\n```mermaid\ngraph TD\n    A[Day 1: hive_cli.py<br/>240 lines, 1 agent] --> B[Day 10: Memory layer<br/>JSONL logs, state file]\n    B --> C[Day 16: Dashboard<br/>HTTP server, agent registry]\n    C --> D[Day 18: Agent review<br/>6 roles, 130 issues found]\n    D --> E[Day 21: Rebrand<br/>Hive → OpDek]\n    E --> F[Day 32: Multi-agent org<br/>10 bots, 39 tools, dispatch pipeline]\n\n    style A fill:#666,stroke:#333,color:#fff\n    style F fill:#2ecc71,stroke:#333,color:#fff\n```\n\n## Problems\n\nThe story above is the narrative. Here's the same arc as a problem log: each bottleneck, what it cost, and how it was resolved.\n\n| Bottleneck | Impact | Resolution |\n|---|---|---|\n| **No memory** | Every task started from zero; work couldn't compound | JSONL session logs + `memory/` directory. Immutable, append-only. Future sessions read prior context. |\n| **Single-reviewer blindspot** | One agent reviewing itself found nothing wrong | Six specialized agents running adversarial review, found 130 issues |\n| **Monolith drag** | `main.py` at 6,130 lines; every change touched everything | Agent-identified split: 6,130 → 239 lines across 12 router modules. Zero regressions. |\n| **Cost blindness** | All work routed to Opus regardless of complexity | Tiered model selection: Haiku for batch, Sonnet for agents, Opus for strategic. ~20x cost difference. |\n| **No governance** | Agents could dispatch freely with no approval thresholds | Autonomy budget: per-agent threshold. Above → approval queue. Below → auto-execute. |\n| **Identity ambiguity** | \"Hive\" / \"OpDek\" / \"DxDev\" conflated | Rebrand commit day 21. Clean separation: product / company / domain. Four commits to stabilize. |\n\n---\n\n*Part 4 of [The Timeline](/blog/series/the-timeline/), the true story of building an AI operations engine, backed by git history and real incidents.*\n\n*Next: [The Zombie That Blocked Everything](/blog/zombie-blocked-everything)*",
      "date_published": "2026-04-10T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "multi-agent",
        "architecture",
        "evolution",
        "dispatch",
        "founder-ops",
        "lessons-learned"
      ]
    },
    {
      "id": "https://dxdev.com/blog/zombie-blocked-everything/",
      "url": "https://dxdev.com/blog/zombie-blocked-everything/",
      "title": "The Zombie That Blocked Everything",
      "summary": "A single stuck proposal generated 17 cascading alerts and created a meta-deadlock. Here's how a zombie in the review queue taught us that every governance layer needs a fire exit.",
      "content_text": "# The Zombie That Blocked Everything\n\nThere's a class of failure mode in automated systems that I've started calling the zombie problem.\n\nA zombie isn't an error. It doesn't throw an exception. It doesn't fail loudly. It just... persists. Status: `in_review`. Technically alive. Functionally dead. And quietly rotting in the queue while everything downstream piles up behind it.\n\nWe ran into one. It took us a while to realize the zombie was the whole problem.\n\nWhen we moved to a multi-agent architecture, we introduced a proposal system. The idea was simple and good: any agent that noticed a systemic improvement could submit a formal proposal, a structured recommendation with a problem, solution, impact, and cost estimate. A designated reviewer (in our case, the CTO role) would evaluate it and either promote, reject, or defer it.\n\nSix agents could submit proposals. One agent reviewed them. We didn't think much about that ratio at the time.\n\nOne proposal, P-1, was flagged for CTO review. It concerned the proposal system itself. P-1 went into review. And stayed there. Not rejected. Not deferred. Not approved. Just: `in_review`. A zombie had entered the pipeline.\n\nOne morning the watchdog fired 17 alerts in a single sweep. Seventeen alerts. One root cause. The zombie wasn't creating 17 problems. It was creating 1 problem that manifested in 17 places. Then someone submitted a proposal to fix the review bottleneck. It went into the CTO review queue. Behind P-1. We had a meta-deadlock: the proposal to fix the proposal system was stuck in the same broken queue it was trying to fix.\n\nThe fix took less than an afternoon. The learning is permanent: governance structures need their own escape hatches.\n\n---\n\n## Goal\n\nFix the proposal review bottleneck that was generating 17 cascading alerts and preventing all system improvements from landing. Motivation: the watchdog was drowning real signals in noise from a single stuck proposal.\n\n## Where We Are\n\n```mermaid\ngraph LR\n    A[6 Agents] -->|submit| B[Proposal Queue]\n    B -->|route| C[CTO Review]\n    C -->|approve/reject| D[Execution]\n    C -.->|stuck here| E[Zombie P-1]\n    E -.->|blocks| B\n    F[Watchdog] -->|monitors| B\n    F -->|monitors| C\n    F -->|17 alerts| G[Alert Storm]\n\n    style B fill:#ff6b6b,stroke:#333\n    style C fill:#ff6b6b,stroke:#333\n    style E fill:#666,stroke:#333,color:#fff\n```\n\n*This post focuses on the proposal queue and review pipeline (red). The zombie (grey) sat in CTO review and blocked everything downstream.*\n\n## Problems Encountered\n\n- **No review timeout**: proposals could sit `in_review` indefinitely with no escalation\n- **Single reviewer bottleneck**: 6:1 agent-to-reviewer ratio with no overflow path\n- **Meta-deadlock**: a proposal to fix proposals queued behind the broken proposal\n- **Alert cascade**: 17 alerts from 1 root cause, masking the real problem\n- **No bypass for structural failures**: the governance layer had no fire exit\n\n## Resolution\n\nAdded three mechanisms:\n\n**1. Meta-proposal detection.** Proposals about the proposal system itself are now auto-rejected at the creation gate with guidance to fix the system directly. Uses keyword matching against title + problem text.\n\n```python\n_META_KEYWORDS = re.compile(\n    r\"\\b(proposal queue|proposal system|approval queue|approval workflow|\"\n    r\"approval gate|meta-proposal|too many proposals)\\b\",\n    re.IGNORECASE,\n)\n```\n\n**2. CEO-level override.** When the CEO agent reviews a proposal, it auto-converts to `status='approved'` and exits the queue immediately. No more sitting in pipeline limbo.\n\n```python\n# CEO approved: mark as approved so it exits the review queue\nif next_level == \"ceo_reviewed\":\n    db.approve_proposal(proposal_id)\n```\n\n**3. Auto-approve gate for safe proposals.** Low-risk tactical proposals skip the full review pipeline entirely. Criteria: type is `tactical`, confidence >= 60%, low cost, no protected files, no external actions (deploy, publish, etc.).\n\n**4. Alert deduplication.** Watchdog now deduplicates alerts by `(task_id + title)` key so cascading failures from a single root cause don't generate 17 separate alerts. TTL-based expiry (2h default) auto-resolves stale alerts.\n\nTo test zombie detection locally:\n\n```bash\npython3 tools/monitor/watchdog.py --dry-run --verbose\n```\n\n## Dependencies\n\n- Python 3.10+\n- SQLite with the `proposals` and `proposal_reviews` tables\n- Watchdog cron (runs every 5 min via `memory/crons.json`)\n- CEO strategic loop (`tools/dispatch/ceo_strategic_loop.py`), runs every 2h\n- Agent role config in `config/agents.yaml` for hierarchical review routing\n\n## Deep Dive\n\n### The Proposal Pipeline\n\nProposals flow through a hierarchical review system introduced in commit `72a8143`:\n\n```\nsubmitted → dept_reviewed → ceo_reviewed → board_presented\n```\n\nThe creation gate runs three checks before a proposal enters the queue:\n1. **Dedup**: 65% similarity check against existing proposals\n2. **Throttle**: max 5 proposals per agent per hour\n3. **Meta-detection**: rejects proposals about the proposal system itself\n\nThe schema stores review state at two levels:\n\n```sql\n-- Proposal status tracks lifecycle\nstatus TEXT DEFAULT 'pending'  -- draft | pending | promoted | approved | rejected\n\n-- Review level tracks pipeline position\nreview_level TEXT DEFAULT 'submitted'  -- submitted | dept_reviewed | ceo_reviewed | board_presented\n```\n\n### Health Check Architecture\n\nThe watchdog (`tools/monitor/watchdog.py`) runs 6 categories of health checks:\n- **Zombie sessions**: `status='running'` but `completed_at` is set → auto-fix to complete\n- **Ghost sessions**: no heartbeat >30 min → mark abandoned (skips interactive sessions)\n- **Stuck tasks**: running >1h without heartbeat → alert at 1h, auto-fail at 6h\n- **Stale dispatches**: stuck in `sent` >5 min → auto-requeue (max 3 retries)\n- **Cron health**: critical crons stale >3 min → auto-restart cron runner\n- **Post-completion mutations**: workstreams added after session end → corruption warning\n\nThe dispatch health layer (documented in ADR-0006) splits checks into hard blocks and soft warnings. Hard blocks always abort: capacity exceeded, backlog >3, success rate <70%. Soft warnings log and proceed. P0/P1 dispatches bypass all soft warnings.\n\n### The Alert Cascade Explained\n\nThe 17 alerts traced back to one zombie:\n\n1. Proposal queue depth → over threshold (zombie blocking queue)\n2. Proposal throughput → near zero (nothing clearing review)\n3. CTO review latency → extreme (zombie sitting in review)\n4. Agent proposal backlog → 6 agents, 0 resolutions\n5. Dispatch health degraded → downstream of proposal backlog\n6. System improvement rate → stalled\n7-17. Cascade through dependent health checks\n\nAll shared the same or linked `task_id` in `memory/alerts.json`, which is how we traced them to one root cause. The dedup fix ensures this cascade now surfaces as 1 grouped alert.\n\n### Key Commits\n\n| Commit | Date | What |\n|--------|------|------|\n| `72a8143` | Feb 20 | Hierarchical proposal routing introduced |\n| `ebf7af1` | Mar 22 | Watchdog: auto-fix zombie/ghost sessions |\n| `42bc65e` | Mar 22 | Fix duplicate review dispatches |\n| `4540c6f` | Apr 8 | Auto-approve gate for safe tactical proposals |\n| `a6aac9d` | Apr 9 | CEO-approved proposals auto-resolve |\n\n### The Pattern\n\nIf you're building a system where agents can propose changes through a review queue, ask yourself before you ship:\n\n**What happens when the reviewer becomes the bottleneck?**\n\nSingle-reviewer queues are a bus factor problem. In human systems, we manage this with vacations, backup approvers, and escalation paths. In automated systems, it's easy to skip this because the reviewer is \"always available.\" But software can be busy, defer, or stall. When it does, the queue behind it is a graveyard of good ideas, all technically `in_review`, all zombies.\n\nEvery gate needs a fire exit. When you introduce a governance layer into an autonomous system, you're adding overhead in exchange for control. The problem is when the control mechanism itself becomes uncontrollable. The system wasn't broken. It was doing exactly what it was designed to do. The design was the problem.\n\n## Visual Summary\n\n*Infographic coming soon.*\n\n---\n\n*Part 5 of [The Timeline](/series/timeline). The true story of building an AI operations engine, backed by git history and real incidents.*\n\n*Previous: [The Alert Storm That Wasn't](/blog/alert-storm) | Next: TBD*",
      "date_published": "2026-04-09T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "multi-agent",
        "proposal-systems",
        "bottlenecks",
        "operations",
        "lessons-learned",
        "governance"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-08_overview-page-database-ritual/",
      "url": "https://dxdev.com/blog/2026-04-08_overview-page-database-ritual/",
      "title": "The page was performing a ceremony, not a job",
      "summary": "HTO's marketing overview page had over 80 blocking queries on every load. The fix wasn't query tuning, it was challenging the page contract.",
      "content_text": "The marketing overview page in HTO's admin had over 80 blocking SQL queries on a single load. I opened ITEM-6767 expecting to find one bad query to tune.\n\nThat instinct was wrong: the page had no single slow query, it had a blocking posture that nobody had named as a problem.\n\n## What the page was actually doing\n\nHTO is a 25-year-old ASP Classic application. The marketing overview exists for internal ops, a dashboard of product performance, session counts, and summary metrics across the customer base. It looked normal, and had always looked normal. It was just slow.\n\nWhen I traced the execution path, the page was built on an implicit contract: the server would finish all its work before the browser got anything. Every panel queued behind the one before it. SQL ran serially, one recordset closed before the next one opened. The ribbon section, the product table, the session counts, the PPC numbers, all of it waited in line. The page didn't have a slow query; it had a posture. Blocking was the posture.\n\nThat number didn't happen by design; it accumulated. One quarter someone adds a product breakdown. The next year someone adds session stats. Nobody asks how the page performs with each addition, because the page still loads. Slowly, but it loads, and everyone gets used to waiting. The slowness becomes its personality.\n\n## The wrong diagnosis\n\nThe first instinct on a slow page is to find the offending query: a missing index, a cross join, a subquery that should be a join. Fix the worst one, rerun, measure. I've followed this instinct on other pages in this codebase, and it's often right.\n\nIt would have been the wrong move here. The slowness wasn't concentrated in one place; it was distributed across a contract nobody had challenged. Even if I'd cut the worst query in half, I'd have addressed maybe 1% of the total load, because the architecture itself was the problem.\n\n## Ribbon first, everything else later\n\nThe real fix was to throw out the page contract.\n\nI kept only the ribbon data on the initial server response. The ribbon is what renders in the first visible moment: a summary strip across the top. Everything else, the product panels, the session counts, the auxiliary metrics, gets loaded afterward via parallel ajax calls to `MarketingProduct.asp`, which I rewired to return JSON instead of rendered HTML.\n\nOn the client side, I rebuilt `Marketing.js` so each section renders independently when its data arrives. There's no global wait: the ribbon appears immediately, and the rest fills in as responses come back, each section on its own schedule. If one section is slow, it does not hold any other section hostage.\n\nThe page now does less work on the initial load than it used to do for any single section.\n\n## Deletion with receipts\n\nPerformance work on legacy code is mostly deletion. The question to ask is not \"how do I make this faster\" but \"is this work still load-bearing.\"\n\nTwo things were not load-bearing.\n\nFirst: the PPC computation. It was server-side work that ran on every load and fed data into a section that, when I actually looked, was not wired up to display anything. The computation was real, but the consumer had been gone for who knows how long. I cut it.\n\nSecond: the `HTOsessions` queries. The page was running them twice: once in the ribbon pass and once in the main body. Same query, same parameters, same recordset, two trips to SQL. The second one was a copy that had drifted out of sync with where the data was actually used. I kept one and removed the other.\n\nNeither of these was the bottleneck. But dead work is still work, and in a page with over 80 queries, dead work is how you get to that number in the first place.\n\nOne question I didn't fully answer: how many other admin pages in HTO are in the same shape. The session reports, billing summaries, and reporting views all load. I haven't traced their query counts, and the marketing overview felt normal before I traced it too. The only way to tell the difference is to count.\n\n## The ribbon styling fix was the useful proof\n\nAfter the architecture change went in, a small thing happened: I needed to fix a styling issue on the ribbon. A spacing adjustment, something cosmetic.\n\nOn the old page, a change to the ribbon would have required touching the same server-side template that controlled everything else. Any edit was a risk to the whole load path. I'd have read the surrounding code more carefully than the fix warranted, just to make sure I wasn't breaking something downstream.\n\nOn the new page, the ribbon is isolated. I opened the relevant section, made the adjustment, confirmed it in the browser, and the rest of the page was not a concern.\n\nThe proof wasn't a benchmark. It was a ribbon spacing fix that opened `Marketing.js` and closed without `MarketingProduct.asp` ever mattering.",
      "date_published": "2026-04-08T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "asp-classic",
        "performance",
        "lazy-loading",
        "ajax",
        "legacy-code",
        "solo-founder"
      ]
    },
    {
      "id": "https://dxdev.com/blog/multi-agent-file-conflicts/",
      "url": "https://dxdev.com/blog/multi-agent-file-conflicts/",
      "title": "The File Conflict Problem Nobody Warns You About in Multi-Agent Systems",
      "summary": "When you run multiple AI agents in the same codebase, they will eventually clobber each other's work. Here's what we discovered and how we solved it with git worktrees.",
      "content_text": "When you go from one AI agent to three running in parallel, something breaks that no framework documentation mentions: they all share the same filesystem.\n\nYour task orchestrator carefully assigns different tasks to different agents. Agent A works on the API. Agent B writes tests. Agent C updates documentation. Clean separation, right?\n\nWrong. Agent A modifies `config.yaml` as a side effect. Agent C also touches `config.yaml` because the docs reference configuration. Agent B's test touches a shared fixture file. Nobody coordinated the file-level access because the orchestration layer only thinks about task-level assignment.\n\n## What actually happens\n\nWe discovered this running OpDek, an AI operations engine that dispatches work to specialized agents via Claude CLI. With `max_concurrent` set to 3, here's what the system looked like:\n\n- **Task claiming**: atomic SQLite UPDATE prevents double-dispatch. Works perfectly.\n- **Supervisor lock**: prevents two scheduling cycles from racing. Works perfectly.\n- **File isolation**: nothing. Zero. Every agent runs `claude -p` in the same working directory.\n\nThe system worked by accident. Throughput was low enough that concurrent file edits rarely overlapped. But \"rarely\" is not \"never,\" and as we increased parallelism, the probability of collision approached 1.\n\n## The gap between task orchestration and file safety\n\nThis is a pattern we see across every multi-agent framework:\n\n- **CrewAI**: agents share a workspace. No file locking.\n- **AutoGen**: agents communicate via messages but share filesystem access.\n- **LangGraph**: workflow orchestration, not filesystem isolation.\n\nTask-level coordination (who works on what) is a solved problem. File-level coordination (who can write to which files at the same time) is not even acknowledged as a problem in most architectures.\n\nThe database layer is fine. SQLite WAL mode handles concurrent reads. Atomic claim mechanisms prevent double-dispatch. But the moment agents start editing source files, JSON configs, or any shared resource on the filesystem, you're back to the 1970s concurrency problem with none of the safeguards.\n\n## The fix: git worktrees as agent sandboxes\n\nWe solved this by giving each agent task its own git worktree:\n\n1. **Before execution**: `git worktree add --detach memory/worktrees/wt_{task_id}` creates an isolated copy of the repo at HEAD\n2. **During execution**: the agent's CLI runs with `cwd=worktree_path`. It can read and write freely without affecting other agents or the main repo\n3. **After execution**: check `git status` in the worktree. If files changed, commit on an `agent/{task_id}` branch and merge back to main\n4. **On conflict**: preserve the branch, report the conflict. A supervisor (human or AI) resolves it\n5. **Cleanup**: remove the worktree regardless of outcome\n\nThis is essentially the same pattern that CI systems use when running parallel test suites. Each runner gets its own checkout. The merge back is the serialization point.\n\n### Graceful degradation\n\nIf worktree creation fails (disk full, git issue), the agent falls back to the shared directory. No task is blocked by the isolation layer failing. This is critical: the safety mechanism should never be the thing that breaks your system.\n\n### What this means for throughput\n\nWith worktree isolation, you can safely increase `max_concurrent` because agents can no longer interfere with each other's file changes. The only serialization point is the merge, which is fast and atomic. We went from `max_concurrent: 2` (with crossed fingers) to `max_concurrent: 3` (with actual safety), and there's no architectural reason we can't go higher.\n\n## Takeaways\n\nIf you're building any system that runs multiple AI agents against the same codebase:\n\n1. **Task-level orchestration is not file-level safety.** Your dispatcher knowing which agent has which task does not prevent them from editing the same files.\n\n2. **\"It works because throughput is low\" is not a design.** It's a coincidence that will eventually fail.\n\n3. **Git worktrees are the right primitive.** They're lightweight (shared object store), well-tested, and provide real filesystem isolation with a clean merge path.\n\n4. **Build the isolation layer to be optional.** If it fails, fall back. If it succeeds, use it. Never let the safety system be a single point of failure.\n\nThis is the kind of problem you only discover when you actually run multi-agent systems in production. Demos with one agent don't surface it. Benchmarks don't surface it. Only sustained parallel execution against real files does.\n\n---\n\n*This post is part of the build log for OpDek, an AI operations engine. Follow along at [dxdev.com/blog](/blog).*",
      "date_published": "2026-04-08T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "multi-agent",
        "git-worktrees",
        "concurrency",
        "file-isolation"
      ]
    },
    {
      "id": "https://dxdev.com/blog/making-tests-finish/",
      "url": "https://dxdev.com/blog/making-tests-finish/",
      "title": "Making Tests Finish: CI Reliability in Agentic Systems",
      "summary": "When the test suite stopped finishing, not flaking but hanging, CI became a broken feedback loop. A layered defense pattern for async, network, and distributed state in agentic systems.",
      "content_text": "A few days ago, our test suite stopped finishing. It wasn't flaking, it was *hanging*. GitHub Actions runners would timeout, leaving us with no signal about whether the build was healthy. For a system managing autonomous agents and dispatch pipelines, a broken CI is a broken feedback loop.\n\n## The Hangover Problem\n\nIn a system orchestrating multiple agents, the test suite mirrors that complexity. We run tests for the dispatch pipeline, the resolver, the agent supervisor, and the Jira sync engine. Many of these components make network calls or manage async state. In CI, with no running server and no actual network, those calls become traps.\n\nThe symptoms were clear in the logs: tests would start, then silence. No failure, no timeout message, just the runner counting down to forced shutdown. When that happens, you get nothing. No diagnostics, no idea which test hung, sometimes no ability to reproduce locally.\n\n## Diagnosis: The Three Culprits\n\nOver a day of commits, three patterns emerged:\n\n**Fixture imports that block.** The tagger service imports `opdek_config`, which in CI isn't available. The import doesn't fail cleanly. Instead it hangs during test discovery. Solution: make that import optional, with a guard for CI environments.\n\n**Unguarded network calls.** The supervisor tests call `urlopen()` without mocking. The run_cycle engine calls `check_stalls()`, which hits the network. In isolation, each one waits indefinitely. Solution: mock them globally in the fixture setup.\n\n**No timeout backstop.** Even with mocks in place, a hanging test had no hard limit. A single test could consume the entire 30-minute runner timeout, blocking any feedback. Solution: add a 2-second global socket timeout and skip integration tests that require a running server.\n\n## The Fix: Layered Defenses\n\nRather than one big refactor, we applied defenses in layers:\n\n1. **Global socket timeout** (`conftest.py`): Set a 2-second timeout on all socket operations. This stops indefinite waits in a single line of configuration.\n\n2. **Fixture guards** (`tagger.py`): Wrap the `opdek_config` import in a try/except. If it fails, the test still runs. It just doesn't validate against the actual config.\n\n3. **Network mocks** (`supervisor_test.py`, `run_cycle_test.py`): Mock `urlopen()` and `check_stalls()` at the fixture level, before any test runs. Every subtest inherits the mocks.\n\n4. **Selective skipping** (`conftest.py`): Skip tests that require a running server or are known to deadlock. This is pragmatic: a skipped test is better than a hung one, and we skip with a reason: `@pytest.mark.skip(reason=\"requires running API server\")`.\n\n5. **Syntax verification** (`conftest.py`): Guard DB loading and agent YAML parsing in try/except blocks. If the database is locked or config malformed, the test fixture fails fast with an error message, not a hang.\n\nThese aren't elegant, but they're honest. They acknowledge that CI is a hostile environment, with no running server, the network off, and fixtures loaded in parallel, and that hanging tests are worse than no tests.\n\n## Outcome: Monitoring, Not Just Fixes\n\nOnce tests reliably finish, we added the final piece: visibility. A new GitHub Actions monitor alerts whenever a test run fails or times out. This gives us immediate signal, so there's no more wondering if the build is healthy.\n\nThe commit that wired this together also consolidated recent work: Jira integration, multi-tenant config, and autonomous dispatch. All of that now runs through a test suite that actually finishes.\n\n## The Pattern: Defensive Test Architecture\n\nHere's what works in systems with async, network, and distributed state:\n\n- **Timeouts are not optional.** They're not just for long-running tests. They're your backstop against silent failure. A 2-second global timeout catches hangs that would otherwise balloon to runner shutdown.\n- **Mock early, mock broadly.** Don't wait for a test to fail on the network call. Mock network operations at fixture setup, before test discovery. This makes CI fast and deterministic.\n- **Guard imports, not just code.** In complex systems, slow imports or missing config can hang test discovery itself. Defensive imports with fallbacks make this explicit.\n- **Skip with reason, not silence.** A skipped test is better than a hung one, but only if future readers know why it was skipped. Use `@pytest.mark.skip()` with a clear reason: \"requires running API server\", not just \"disable this for now\".\n- **Monitor the monitor.** Adding tests is only half the job. Wire up alerts for when the test run itself fails. That feedback loop is what catches the next hang before it ships.\n\nThe goal isn't perfection. It's a test suite that *finishes*, so we can actually see whether the system works.\n\n---\n\n*Next in the series: how we moved config out of code and into YAML, making the system multi-tenant.*",
      "date_published": "2026-04-07T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "ci-reliability",
        "testing",
        "async-debugging",
        "network-mocking",
        "timeouts"
      ]
    },
    {
      "id": "https://dxdev.com/blog/when-external-integrations-collide/",
      "url": "https://dxdev.com/blog/when-external-integrations-collide/",
      "title": "When External Integrations Collide with Test Suites",
      "summary": "Wiring Jira into OpDek hung the test suite. Not a failure, just a hang. Integrations are infrastructure stress tests in disguise. Here's what the cascade taught us about implicit assumptions.",
      "content_text": "## The Integration Push\n\nApril 7 was the day we wired up Jira integration into OpDek, the autonomous operations system that powers our dispatch pipeline. On paper, this was straightforward: sync plans and workstreams bidirectionally with Jira, so agents can read and update tickets without context-switching into a browser.\n\nIt made sense. When your job is automating operations decisions, the last thing you want is agents creating orphaned work that doesn't exist in the source-of-truth. Jira integration wasn't a feature request. It was infrastructure.\n\nWe merged the Jira integration code. Tests ran locally. Then the CI pipeline hung.\n\n## The Hangs\n\nNot a failure. Not a timeout with a clear error message. A hang: the test suite sitting there, consuming CPU, waiting for something that would never come.\n\nThe root causes stacked on each other:\n\n- **Network calls in tests.** The Jira integration touched real APIs. Tests weren't mocking them. Socket calls were blocking indefinitely.\n- **Import-time side effects.** Code that tried to load the agent configuration (`opdek_config`) during test discovery. When the config wasn't available, imports would hang.\n- **Recursive mock dependencies.** The supervisor tests had complex fixture chains that tried to call network-dependent code, which triggered real network requests during fixture setup.\n- **No timeout floor.** The CI runner had no global timeout. Tests could hang forever until the runner itself was killed.\n\nEach one wasn't lethal by itself. Together, they created a cascade where you couldn't even get to the failing test. The suite would hang on the way there.\n\n## The Systematic Fix\n\nWe couldn't just add a timeout and call it done. That's treating the symptom. We needed the tests to actually work.\n\nHere's what we did, in order:\n\n**Add a global timeout floor** (2 seconds for socket operations). This was the safety net, so anything that looked like a network call would fail fast instead of hanging forever.\n\n**Guard import-time config loading.** We made `opdek_config` optional at import time. Tests could import the tagger module without the full system being initialized. The config loads only if explicitly requested.\n\n**Skip the integration tests that need a running server.** The `run_cycle` integration tests actually require the API server to be up. CI doesn't have that. We skip them and rely on manual integration testing instead.\n\n**Mock network calls comprehensively.** For tests that do need to run in CI, we added explicit mocks for every network operation: Jira API calls, URL opens, stalls checks. No surprises, no hanging.\n\n**Lock down fixture scope.** Each mock was scoped to the test that needed it, not applied globally. This prevents a mock intended for one test from masking real issues in another.\n\nThe final change: **add a pytest fixture timeout**. A test that hangs now times out in 30 seconds instead of hanging the whole runner.\n\n## The Outcome\n\nCI now passes. Tests run. Jira sync works. We can create and update tasks through agents, then see them reflected in Jira without manual intervention.\n\nBut the real win was what this surfaced: our agent system had a lot of implicit assumptions baked into imports and fixture chains. Things that worked fine when you ran tests locally (with the full environment) broke hard in CI (with a minimal environment). The integration work didn't introduce those problems. It exposed them.\n\n## The Pattern: Integration as Infrastructure Stress Test\n\nWhen you add an integration, especially one that touches external systems, you're not just adding a feature. You're adding a dependency surface. Every place that code runs (local machine, CI, production) now has to handle that integration correctly, or the whole system shudders.\n\nFor autonomous agent systems specifically, this matters more. If your agents orchestrate work through Jira, and Jira integration is flaky, your entire dispatch pipeline fails silently. A hanging test in CI today becomes an agent that hangs in production tomorrow.\n\nThe fix wasn't clever. It was:\n\n1. **Make dependencies explicit.** Guard imports that require the full system. Don't hide dependencies in fixture setup.\n2. **Scope mocks carefully.** A mock is a contract. Be specific about where it applies.\n3. **Test multiple environments.** Local ≠ CI ≠ production. If a test hangs locally, it'll hang in CI too.\n4. **Use timeouts as a floor, not a ceiling.** A global timeout prevents hangs. Individual test timeouts catch slow tests that should fail, not hang.\n\nWe also added a strategic goals widget while we were in there, and wired up CI monitoring for GitHub Actions failures, but those were secondary. The real work was making the integration reliable.\n\nThe lesson: **integrations are infrastructure tests in disguise.** They force you to be explicit about assumptions, manage dependencies well, and think about failure modes that never surface in isolation. Build them carefully, and you'll learn a lot about your system.",
      "date_published": "2026-04-07T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "jira-integration",
        "ci-debugging",
        "agent-systems",
        "test-infrastructure"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-06_agent-owns-the-browser/",
      "url": "https://dxdev.com/blog/2026-04-06_agent-owns-the-browser/",
      "title": "The agent owns the browser, or you do",
      "summary": "Every manual setup step in an agent workflow is a hidden human-in-the-loop. The browser was mine. Then it wasn't. The ownership question generalized.",
      "content_text": "On April 6 I sat down to run ten IP review tickets through an agent loop, and for the third time that week the agent stopped to ask me to launch Chrome.\n\nThe fix that morning was a tooling swap, and I wrote that story up separately. This post is about what I noticed after the fix landed, which is the more useful thing. The browser had not been Claude's. It had been mine, on loan, with a setup ritual nobody had written down. Once I saw that, I started seeing it everywhere.\n\n## the ownership question I had not been asking\n\nFor every tool an agent uses, somebody owns the lifecycle. Spawning, configuring, cleaning up, restarting after a crash. In a normal dev loop that owner is me, and I don't notice, because I have hands on the keyboard the whole time.\n\nIn an agent loop the owner should be the agent. If it isn't, the agent will stop and wait for me. Not because anything is broken. Because the design says so. The wait is the feature.\n\nI had not framed it that way. I had been treating \"Claude needs me to launch Chrome first\" as a small inconvenience. It was actually a design statement: the browser belongs to Ben. The agent is a guest in the browser Ben opens. If Ben forgets, the workflow halts.\n\nThat is a perfectly fine model for a developer using an IDE. It is the wrong model for autonomous work.\n\n## the test that surfaces the seam\n\nThe seam is not visible from inside a working session, because while everything is running you can't tell which side is owning what. The test is to ask: if I walked away right now, what would have to happen for the run to finish without me?\n\nRun that test against every tool the agent touches.\n\nFor the browser, the answer that morning was: the agent would have to launch Chrome on the right port with the right flags. It could not. It had no shell that owned a long-running browser process. So the run would halt at the first navigation.\n\nFor a credential I was loading by hand into a shell before certain scripts ran, the answer was: the agent would have to source that env file itself. It could not. The credential was in my fingers.\n\nFor a directory my scripts assumed existed because I created it once on this machine months ago, the answer was: the agent would have to know to create it. It could not. The knowledge was in my history.\n\nThree things. None of them documented. All of them mine.\n\n## the language that hides the problem\n\nThe phrase that hides this is \"agent workflow.\" It suggests the agent runs the workflow. Most of the time it does not. Most of the time the operator runs the workflow and the agent runs the executable steps inside it, while the operator handles the surrounding setup, the recovery from minor failures, the kicking-off of the next stage. The agent is a power tool. The operator is the carpenter.\n\n## ownership transfers feel like nothing\n\nThe Playwright switch took a morning. The functional surface barely changed. Claude opens browsers, fills forms, takes screenshots, the same as before. From the outside you'd see the same screenshots in the same admin pages.\n\nWhat changed is who holds the lifecycle. The Playwright MCP server spawns its own Chromium process under its own process tree. It handles startup, SSL, sessions, cleanup. There is no port I have to remember. There is no flag I have to pass. The agent calls a tool, and the browser is there, and when the call returns, the agent can call the tool again, and the browser is still there. The ownership moved from me to the server. From outside it is invisible.\n\nInside the run, what changed is that the run finished without me. I queued the ten IP review tickets. I went to do something else. The summary appeared. That had never happened before.\n\nThe credential fix was the same shape. I moved the credential out of my fingers and into `.secrets/`, where the script that needs it can load it on its own. The directory fix was the same. I added a check that creates the directory if it isn't there. Both took a few minutes. Both removed me from a step I had been doing without acknowledging it.\n\nEach individual change was tiny. The collective effect was that the loop closed. The agent could now run from start to finish on the tasks I'd handed it, because every tool in the chain was owned by something that wasn't me.\n\n## what this generalizes to\n\nThe operator-vs-agent ownership question is the single most useful lens I've picked up for agentic infrastructure. It cuts cleanly through the design debates that otherwise turn into taste arguments.\n\nIt tells you which MCP servers are worth running and which aren't. The ones that own their own resource end-to-end are worth running. The ones that expose a thin wrapper over something the operator has to set up first are not, no matter how good the API looks. The Chrome DevTools MCP is a perfectly designed tool for a human-driven session. It is the wrong tool for an autonomous loop, not because of any bug, but because it leaves ownership with the human by design.\n\nIt tells you which scripts are ready for cron. A script the agent can run from a clean shell is ready. A script that needs me to source an env file first is not ready, even if it works perfectly when I'm sitting there.\n\nIt tells you which \"automations\" are real. If walking away breaks it, the automation is a checklist with one of the items being me. That is not automation. That is a procedure I have memorized.\n\nIt tells you what to write down. The setup steps you do by reflex are the ones that aren't in any document, because you stopped noticing them years ago. Those are the exact steps an agent has no way to learn. Writing them down is the first half. Moving them out of your hands is the second half.\n\nThe agent owns its tools, or you do. There is no shared custody. Pick one for each tool, and notice when you've picked yourself by accident.",
      "date_published": "2026-04-06T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "agent-tooling",
        "ownership-model",
        "mcp",
        "claude-code",
        "solo-founder",
        "autonomy"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-06_stopped-babysitting-chrome-mcp/",
      "url": "https://dxdev.com/blog/2026-04-06_stopped-babysitting-chrome-mcp/",
      "title": "I stopped babysitting the browser and switched the agent stack to Playwright",
      "summary": "For weeks my browser automation required a manual launch before I would let Claude use it. Claude kept stopping mid-task to tell me port 9222 was unreachable. Switching to Playwright MCP meant the agent could own the browser, and I stopped being a prerequisite.",
      "content_text": "On April 6, partway through an IP review run on ten queued tickets, Claude stopped and told me Chrome wasn't reachable on port 9222.\n\nThis was the third time that week. The setup I'd been running was Chrome DevTools MCP. The premise is straightforward: launch Chrome with `--remote-debugging-port=9222`, and Claude gets a CDP connection to your browser. It can navigate pages, fill forms, take screenshots, click through admin UI that doesn't expose a clean API. Good tool. Works when everything is ready.\n\nThe word \"when\" is doing a lot of work there.\n\n## the ceremony nobody wrote down\n\nTo use Chrome DevTools MCP, I had to launch Chrome with the right flags before starting any session where browser access might be needed. This was my responsibility, not Claude's. There is no automatic launch, no process manager, no health check. If I forgot, Claude would get partway through a workflow and then stop to ask me to do my part.\n\nI had not written this down anywhere. It was a learned reflex. Open Chrome first, then start Claude. Except on the days I forgot, which happened often enough that I started counting.\n\nThe problem is not that the tool is bad. Chrome DevTools MCP is a reasonable way to wire a browser to an agent if your agent is a human developer who controls the session setup manually. I was treating it as infrastructure for autonomous workflows. Those are different use cases.\n\n## why this is worse than normal dev friction\n\nIn normal local development, a manual prerequisite is just friction. You get used to it. Source the env file. Start the Docker stack. Forward the port. These feel normal because you already know them, and the loop you're in has a human at every step by design.\n\nIn an agentic loop, the human is supposed to be out of the loop. That is the point. I was running multi-step workflows: process ten IP review tickets in sequence, navigate each case, apply the verdict, move to the next. The expected result is a completed run and a summary. I should be distracted by something else while this happens.\n\nA manual prerequisite does not just add friction in this context. It inserts me into the loop at exactly the wrong moment. Mid-task. Claude is waiting on me. I get a notification. I have to remember what state the run was in, what it was trying to do, why it needs a browser right now. That context switch costs more than the thirty seconds it takes to launch Chrome.\n\nThe dependency was not just annoying. It was structurally incompatible with the autonomy I was trying to build.\n\n## what playwright mcp actually changes\n\nOn April 6, as part of a tooling cleanup pass on ITEM-6750 and ITEM-6748, I swapped Chrome DevTools MCP for Playwright MCP.\n\nThe Playwright MCP server manages the browser lifecycle itself. When Claude needs a browser, the server spawns a Chromium instance under its own process tree. It handles SSL, sessions, and cleanup. There is no manual launch step. No port. No ceremony. Claude opens a session, the browser is there, the workflow runs.\n\nThe functional surface is nearly identical. Claude navigates to the same admin pages, fills the same forms, takes the same screenshots. But the operational behavior is completely different. The IP review run that had been halting on port 9222 ran to completion without me. I got the summary. I was not interrupted.\n\nThe agent now owns the browser from first navigate to last screenshot. That is the correct ownership model for autonomous work.\n\n## the same class of problem in the jira tooling\n\nThe ITEM-6750 and ITEM-6748 cluster also included a fix to how Claude handled JIRA transitions. The transition IDs in the agent tooling were calibrated against a stale assumption about workflow state. Claude would attempt the transition, receive an error, and either halt or guess at a fallback path.\n\nDifferent symptom. Same root cause. The environment had encoded a wrong assumption, and Claude was executing against it faithfully. Claude was not confused. Claude was doing exactly what the configuration told it to do. The configuration was wrong.\n\nThe fix in both cases was the same: update the assumption the environment encoded, not the symptom it was producing.\n\nThis is the more general pattern I keep tripping on. When an agent fails mid-workflow, the first question I ask now is whether Claude misunderstood something, or whether the environment Claude was handed misrepresented something. Most of the time it is the second thing. The agent is a good executor of bad premises.\n\n## removing the last babysitting dependencies\n\nAfter the Playwright switch, I did a pass to identify every other step in my agent workflows that required manual setup. I found three. A credential that had to be loaded into a shell before a certain script would run. A directory that had to exist but was not in the repo. A port-forward I had been doing by habit before sessions that touched the beelink.\n\nNone of these were documented. All of them were in my head. Each one was a potential mid-task interruption I had been avoiding only by remembering to do setup work I had not acknowledged as setup work.\n\nI fixed all three. The credential now loads from `.secrets/` automatically. The directory is created on first use. The port-forward dependency turned out to be stale and was removed entirely.\n\nThese were small changes. The effect on friction was not small.\n\nIf your AI workflow depends on you remembering a secret handshake first, it is not a workflow yet.",
      "date_published": "2026-04-06T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "browser-automation",
        "playwright",
        "mcp",
        "claude-code",
        "agent-tooling",
        "solo-founder"
      ]
    },
    {
      "id": "https://dxdev.com/blog/2026-04-03_ai-skill-is-production-infra/",
      "url": "https://dxdev.com/blog/2026-04-03_ai-skill-is-production-infra/",
      "title": "Your AI skill file is part of production now",
      "summary": "A merge conflict on a prompt file stopped a hotfix cold. That was the moment the agent layer started behaving like release infrastructure, whether I had treated it that way or not.",
      "content_text": "I had `hotfix-ITEM-6738-messaging-visibility` open, a customer was waiting on a text-delivery fix, and git stopped me on `.claude/skills/item/SKILL.md`.\n\nNot on a source file. Not on a database migration. On the prompt file that tells Claude how to run a ticket session.\n\nMy first reaction was mild irritation. Merge conflicts on prose files feel bureaucratic, not dangerous. I thought: this is just instructions, I'll read the diff, pick the version that makes sense, move on. That was wrong. I was treating a policy file as soft configuration.\n\n## what the /item skill actually does\n\nThe `/item` skill at `.claude/skills/item/SKILL.md` is how I open a work session on any HTO ticket. I type `/item ITEM-6738`, and the skill drives everything that follows. It resolves the ticket, decides whether to cut a `hotfix/` branch or a `feature/` branch based on issue type and severity, claims the repo lock, fires the JIRA transition from preparing to CODING, posts a session-intent comment to the ticket, and pins the clone so no other session stomps on it mid-work.\n\nThat list matters. It is not a checklist I run manually. Claude reads the skill and executes it. The skill is the branching policy. The skill is the JIRA state machine. If the skill says to cut a hotfix branch, a hotfix branch gets cut. If the skill says to post a comment, a comment goes up. If the skill's logic for customer-type tickets is wrong, every customer-type ticket session misbehaves until someone edits the file.\n\nI had thought of it as a prompt I could refine freely. By April 3rd, three tickets in one day taught me otherwise.\n\n## what changed that day\n\nITEM-6738 was the text-delivery investigation. Customers were reporting unreceived texts. The session produced a new diagnostic page in the admin panel, and a set of JIRA transitions through the hotfix flow. Somewhere in that session I updated the skill to handle customer-type ticket priority correctly. The existing logic was ambiguous about how to sequence a customer-type issue through JIRA versus a standard staging ticket.\n\nITEM-6739 came next: tournament org structure. The session produced a second diagnostic page in the admin panel. To work that ticket cleanly, I needed the skill to be smarter about role disambiguation, specifically how it handles tickets that sit at the intersection of multiple JIRA components. I edited the skill again.\n\nITEM-6744 was a docs-viewer root-path fix. Smaller in scope but it exposed something in the branch-creation step. The skill had been cutting branches immediately at session open. The fix for a particular edge case was to defer branch creation until the first edit was actually imminent, after a fetch and sync step confirmed the clone was at the right base commit.\n\nThree edits, three tickets, one day. Each edit felt like a local patch to address the ticket at hand. And then I tried to merge `hotfix-ITEM-6738-messaging-visibility` back, and git showed me that the skill file had diverged in two directions at once.\n\n## the conflict is not the bug\n\nMy first reading of the conflict was: this is a coordination problem. I touched the same file on two branches, now I have to reconcile them. Standard merge hygiene.\n\nThat reading was incomplete. The real problem was that I had been editing a file that governs production behavior on any branch where the edit seemed relevant, without thinking about what branch-specific skill edits mean once those branches interact.\n\nWhen a hotfix branch changes the skill's JIRA transition rules and a feature branch changes its branch-deferred-creation logic at the same time, the conflict is not a text editor problem. It is a policy collision. Two branches are now shipping different definitions of how the agent should behave on every future ticket. Merging the file is making a policy decision about which behavior wins. I had not treated it as that.\n\nThe fetch/sync step that landed in ITEM-6744's session is a good example of why this matters. That step runs before branch creation and verifies the clone's remote state. It was added to prevent the agent from cutting a branch on a stale base. That is a safety invariant for every ticket, not just ITEM-6744. Once it's in the skill file, it runs everywhere. Putting it in on a feature branch and not thinking about whether it conflicts with anything on the hotfix branch in flight is exactly the kind of casual editing that creates real problems at merge time.\n\nThe blast radius of a skill edit is every future ticket session that runs on a clone with that version of the file. A prompt file with that blast radius is not soft configuration. It is a policy file, and it deserves the same change-control caution as the code it orchestrates.\n\n## what changes now\n\nThe immediate fix was mechanical. Resolve the conflict, read both sides of the diff carefully, write a merged version that captured the customer-type priority fix from ITEM-6738's edits, the role disambiguation from ITEM-6739's edits, and the deferred branch creation from ITEM-6744's edits. That took about twenty minutes and produced a clean merge.\n\nThe structural change is slower. I'm now treating edits to `.claude/skills/item/SKILL.md` the way I'd treat edits to the JIRA workflow configuration or the bamboo build pipeline. They happen on a branch with intention. They get reviewed before merge. They don't get patched casually as a side effect of ticket work. If a ticket session reveals a skill gap, the fix goes into a tracked commit with a note, not an ambient edit that blurs into the ticket's history.\n\nThe open question from that day is still open. Should the skill be split into a stable policy layer and a per-ticket behavior layer, so that hotfix branches can tune local behavior without touching the shared policy file that every ticket depends on. I don't have an answer yet. The conflict was the first real indication that the skill is big enough to need that kind of architecture.",
      "date_published": "2026-04-03T00:00:00.000Z",
      "authors": [
        {
          "name": "ben"
        }
      ],
      "tags": [
        "claude-code",
        "agent-skills",
        "hotfix",
        "release-infrastructure",
        "solo-founder",
        "ai-ops"
      ]
    }
  ]
}