I have 1,586 Claude Code conversation sidecars sitting under vault/04_Sessions/, organized by year, month, and session ID. When I wanted to know what I worked on three weeks ago, I opened the folder and stared at filenames. The plan was to fix that with a search index. The calendar pages changed my mind.
The enricher came before the index
Claude Code writes a sidecar for every session: a JSON blob with the conversation turns, metadata, timestamps, and whatever context the session started with. By the time I started the archive work, I had 1,586 of them spanning roughly a year.
My first instinct was a search index. Search indexes assume clean, consistent data, and the sidecars were neither. Different versions of my session hook wrote different schema shapes. Some sessions had no area classification. Some had it buried under a field called workspace that was later renamed area. Some had malformed JSON in the summary field because the sidecar writer hadn’t sanitized Claude’s output before serializing. Index that without normalizing, query for “HTO work in April,” get 23 results instead of 61, and never know you missed 38.
So the enricher’s job was to normalize: extract the real timestamp, resolve the area from whichever field it landed in for that schema version, pull the themes and ticket references, write a structured row into a SQLite database with a schema I controlled. One table, one schema, every session mapped to one row.
The enricher took three passes to get right. The first pass produced 847 rows with no error output, silently failing on 715 sidecars because a helper function returned None instead of an empty string when the session had no area, and the column was NOT NULL. The transaction rolled back, the error surfaced nowhere visible, and I only noticed because the row count was wrong. Silent partial success is the worst failure mode for a batch enricher, because you can spend a week building queries against a corpus that’s secretly missing 45% of its rows.
What the aggregator surfaced
Once the enricher was stable, I built a theme aggregator: a query that finds which topics recur across sessions in a date range. Each session generates a theme list during enrichment. In isolation each list is noise. Aggregated across a week, the frequency tells you what you were actually working on, not what you planned to work on.
When I ran it over May, one slice came back at 14% of my sessions: HTO browser automation. I had been mentally filing that work as cleanup, the small tickets I close between real ones. Eight sessions. For a solo founder running roughly 20 sessions a week, 14% of a month on something I’d written off as a footnote is not a footnote. The aggregator counted what my self-perception was rounding down. The specific number made the rounding visible in a way that a vague sense of “I work on browser stuff sometimes” never would.
Calendar pages mattered more than search
The calendar day composer was planned as a secondary feature, after search was solid. It became the primary one.
The composer is a script that runs nightly, takes all the enriched sessions for a given calendar day, and writes a narrative markdown page: what was worked on, which tickets moved, what themes dominated, what the day’s arc was. One file per day at vault/04_Sessions/<year>/<month>/<day>/SUMMARY.md.
Search answers “find me sessions about X.” A calendar page answers “what was happening on Tuesday,” which is what I actually need most, usually when I’m reconstructing context for a follow-up or writing a JIRA comment about work from last week. The day page is a narrative with a through-line, not a query result. Tuesday: the history pipeline enricher was stuck on 2,635 rows, I ran three subagent passes, the queue drained to 38 by end of day. That reads like something I can orient from. A list of nine session IDs with cosine similarity scores does not.
The first composer drafts were accurate and flat: “Sessions in the HTO area totaled 6. Primary tickets referenced: ITEM-7003, ITEM-7005, ITEM-7009.” Correct, useless. The fix was prompting for the arc first: what was the day’s main through-line, where did it start, where did it end, what shifted by the end. The second drafts were better.
The nightly cron
The cron runs at 2 AM as a Windows scheduled task running ben_hq/cron/refresh_history_daily.ps1, not a Claude CronCreate job. CronCreate state is ephemeral per-session; a Windows task survives reboots and shows up in Task Scheduler. The log goes to .scratch/cron-refresh.log.
Before the cron, the calendar pages were a one-time backfill I ran manually. After the cron, they grow on their own. A memory system that depends on someone remembering to run it will not stay current.
The calendar rail in vault’s sidebar now has a link for every day in the archive. Days with a summary show the theme cloud. Days without show the raw session count. I can scan a month of work in 30 seconds.
What shipped unfinished
The JIRA wiring is incomplete. The enricher extracts ticket IDs from session context but doesn’t push references back into JIRA. The intended behavior: enrich a session, see it touched a given ticket, write a comment on that ticket linking the session ID and a one-line summary. That closes the loop between archive and ticket history, so every ticket carries a trail of when it was actually touched, not just when it was transitioned. It’s a small write loop on top of code that already exists.
Cross-agent session counts are missing too. The enricher counted Claude Code sessions but didn’t count Manus sessions or Codex runs. Those don’t write sidecars, and the day pages are blind to them. Until I’ve gone two weeks needing one and not having it, I can’t size the work properly.
The enricher reached 1,562 of 1,586 sidecars on the first complete pass. The 24 it skipped had corrupt timestamps: a known list, not a queue.
Those 24 are still there. So are the calendar pages, growing one per night while I sleep.