---
title: "The first useful agent teammate needed a Gmail account"
canonical: https://dxdev.com/blog/2026-05-04_gmail-identity/
datePublished: 2026-05-04
---
On May 4 at 9am I opened a fresh tab, created a dedicated Gmail address for the agent, and spent the next four hours wiring OAuth so a Claude Code session could read and send mail. Not because email is glamorous infrastructure. Because for two weeks I had been bumping into the same invisible wall. Every time I tried to get an agent to do something real, the first blocker was not the model. It was trust.

Google Drive would not share a doc with it. Calendar had nowhere to send an invite. The dispatch loop I wanted to sketch assumed an inbox that could receive work. None of that is possible when the agent is impersonating me.

I had been treating my agent stack like a tool I controlled, not a participant in systems built around identity. That was wrong. Fixing it took four hours of credential plumbing and zero AI reasoning.

## why piggybacking on my personal account broke things

The original setup ran Claude Code sessions under my personal Google context when they ran under any context at all. OAuth tokens lived in a temp directory. When a session ended, the tokens expired or got swept. The next session re-authorized from scratch. For one-shot tasks, this was annoying. For anything that needed to persist state across sessions, it was a hard stop.

There was also a subtler problem. When an agent shares a Google doc, the doc shows my name as the sharer. When it creates a calendar event, my name is on it. If the agent sends a message from my personal account, the message looks like it came from me. That conflation matters when the agent is wrong, which it will be. You want a clear audit trail. You want to be able to say "the agent did that" and have it be technically true, not just morally true.

A real identity fixes both. The credentials persist. The actions are attributable.

## what provisioning actually looked like

The Google account itself took five minutes. Everything after that was slower.

I created a GCP project under my personal account as the billing parent, with the agent's ops inbox as the service identity. Enabled the Gmail API, the Calendar API, and Drive. Walked the OAuth consent screen setup, which for a personal-use app makes you explicitly add yourself as a test user. Downloaded the `credentials.json`. Dropped it in `.secrets/google_agent.env` alongside the refresh token I got from the initial authorization flow.

The `ben_hq` secrets helper already existed for this pattern. `bhq secrets get GOOGLE_AGENT_REFRESH_TOKEN` returns the value. Any session that needs Google access loads the file with `eval "$(bhq secrets export google_agent)"` and the credentials are in scope.

For claude.ai MCP, I wired the Gmail and Calendar servers through the standard OAuth flow. That required a separate client ID scoped to the MCP redirect URI. Two client IDs, same GCP project. One for programmatic access from scripts, one for the MCP servers. The distinction matters because MCP's redirect handling is browser-interactive and the script-side flow is non-interactive. Same scopes, different flow type, different client ID.

Total credential surface after setup: one GCP project, two client IDs, one refresh token per API surface, all in `.secrets/`. Nothing in a temp directory. Nothing that expires on session end.

## what the inbox looks like now

The interesting shift was not technical. It was conceptual.

Once the account existed and the credentials persisted, I stopped thinking about email as a notification channel for the agent and started thinking about it as a transport layer for work. Those are different things. A notification channel delivers information to a human. A transport layer moves work between actors that can be machines.

Gmail labels map well onto work states. An inbound message tagged `pending` is a task waiting to be picked up. The agent polls the label, processes the item, tags it `done`. That is a kanban board. It runs on infrastructure every business already trusts and has been routing around for twenty years. No new queue service. No webhook to maintain. Just labels and a polling interval.

I sketched the dispatch loop the same afternoon. The loop is not complicated. Search for threads with label `agent-queue`, read the top message, parse a structured action from the subject line or body, execute it, apply a result label. The agent needs read and label permissions. It does not need to send mail to run the loop, though it needs send permissions for the reply path.

I did not finish the loop that day. I got it far enough to confirm the polling worked, then stopped. The point was not to build the full dispatch system. The point was to confirm the identity layer held weight before building anything on top of it.

## the thing model benchmarks do not measure

There is a version of this story where I skip the infrastructure work, keep running the agent as myself, and decide the friction is the agent's fault. The agent feels flaky. Session context does not persist. Shared docs do not work. I conclude agent tooling is not ready yet.

That conclusion would be technically accurate and completely wrong about the cause.

The friction was not in the model. It was in the gap between "Claude Code as a smart shell command" and "Claude Code as a participant in the systems my business already runs." Google Workspace, JIRA, GitHub, all of it was built around identity. They assume every actor has an account, a set of permissions, a credential that persists. They have no concept of a stateless anonymous assistant that borrows a human's session.

Closing that gap is not AI work. It is the same credential plumbing a new employee goes through on their first day. Create the account. Grant the permissions. Set up the hardware. Give them a badge. The reason this feels surprising is that the demo videos for every AI tool skip it. The demo starts with a working credential. The actual work starts at 9am on May 4 with a fresh GCP project and an empty secrets file.

## the open question I have not answered

One Gmail account currently holds three functions: inbound work items from the dispatch loop, outbound sharing when the agent sends a doc link, and automated alerts from crons and scripts. That is already a mix I watch with mild concern.

The clean version separates them. An inbox for work routing. A send-only identity for sharing. A no-reply address for automated alerts. Three identities, explicit roles, no ambiguity about what a message in the inbox actually is.

I have not done this yet. The single account works well enough for the volume I am running. The moment the dispatch loop sees real inbound traffic from more than one source, the separation becomes necessary rather than optional. Right now it is a diagram on a whiteboard.

## what you are actually doing when you give an agent a badge

I had been thinking about agent capability as a function of model quality. Bigger model, more capable agent. That framing is not wrong, but it mistakes the binding constraint. For most of the practical work I need done, current-generation reasoning is more than sufficient. What was actually blocking me was that the agent could not take action in the systems I already use, because those systems did not recognize it.

Giving it a Gmail account is a credential decision, not an AI decision. It is exactly as unglamorous as setting up a new contractor's system access. But it is what makes the rest possible. The model reasons about the work. The identity lets it touch the systems where the work lives.

The first real bottleneck in making an agent useful is not intelligence. It is membership.
