Menu

From Assistant to Collaborator: How My AI Second Brain Grew Up

Over the past few months I’ve been writing about how I use AI for product work. The first post covered the philosophy: context files, opinionated prompts, and how to compose the right inputs for each task. The second added slash commands and daily summaries. The third was a hands-on setup guide. And the fourth introduced project brains for keeping complex initiatives organized.

This post covers a different kind of change. The earlier additions were incremental: more commands, better context, smoother workflows. What changed recently feels more like a threshold. The system graduated from a tool I invoke for specific tasks to something closer to a collaborator I dispatch to do real work. Three capabilities drove that shift: multi-agent orchestration, cross-session memory, and the encoding of domain expertise into the system itself.

Multi-Agent Workflows

The clearest example is customer escalation investigations. As a PM for data products, I regularly investigate customer-reported issues: logging gaps, data discrepancies, behavior that doesn’t match expectations. These investigations require pulling information from multiple sources and cross-referencing it all into an analysis that engineering can act on.

I built a slash command that handles this as a multi-phase workflow. When I run it with a ticket ID, here’s what happens:

  1. The system reads the customer ticket, extracts the core problem, identifies which product area is involved, and classifies the issue type.
  2. Three specialist agents launch simultaneously, each focused on a different data source. One searches the codebase for the relevant logic and recent changes. Another searches for related tickets and prior incidents across projects. A third checks documentation and internal wiki pages for relevant operational context.
  3. A fourth agent receives the combined findings and produces database queries that can confirm or refute the working hypothesis.
  4. The system combines everything into a structured analysis: issue classification, root cause anchored in code where possible, customer impact, and recommended next steps.
  5. A blind validator independently re-fetches every source cited in the draft to verify the claims hold up. Then an adversarial challenger looks for alternative explanations and tests whether the classification is correct.

The output is a document I can review with an engineering colleague or paste into a chat thread. It includes a confidence assessment and a data collection status table showing what was checked and what was unavailable, along with how the analysis compensated for gaps.

The command file that orchestrates all of this isn’t prompting in the traditional sense. It defines which agents to dispatch, what information each one needs, when to wait for results before proceeding, and how to handle failures gracefully. Writing this felt more like designing a workflow than writing a prompt.

I’ve applied the same pattern to other tasks. A “fix feasibility” command evaluates whether a ticket describes a code change simple enough for a PM to implement with AI coding assistance, and produces an implementation brief if the answer is yes. The specific use cases differ, but the architecture is the same: break the problem into specialist tasks that run in parallel, then synthesize and validate the results.

Cross-Session Memory

AI conversations are stateless by default. Every new session starts from zero, which means re-explaining context that should already be established. Over a few weeks of working on the same projects, this friction adds up.

I addressed this with a four-layer memory system:

  • The first layer is stable facts: a compact file that captures the current state of all active work, including project status, recent decisions, and environment constraints. This is the primary orientation file. When I start a session, the AI reads it and immediately knows what’s in flight.
  • The second is a session log: a reverse-chronological list of handoff notes. Each entry records what happened in a session and what threads remain open. The last three entries give enough context to pick up where I left off.
  • Third, a corrections file. This holds behavioral fixes for things the AI consistently gets wrong. It’s a staging area that should shrink over time as fixes get promoted elsewhere.
  • And finally, a decisions log: a cross-cutting record of decisions that don’t belong to a specific project. Each entry captures context and rationale so I don’t relitigate settled questions.

Two commands manage this. /session-start loads all four files and presents a brief summary of current state and recent sessions. /session-end reviews the conversation, writes a handoff note, and then checks whether any learnings should be promoted to infrastructure.

“Promote to infrastructure” means taking something learned during a session and baking it into the files the agent actually reads. A correction about how to handle a specific edge case in escalation investigations might start in the corrections file, then get promoted into the escalation command or a domain skill once it’s validated. The corrections file shrinks over time as knowledge graduates into the right places.

This creates a loop where the system improves its own instructions. I approve every change, so it’s not self-modifying in a creepy way. But in practice each work session can make the next one slightly better, and the compound effect over weeks is noticeable.

Domain Expertise

The earlier posts described skills like pm-thinking, which applies product methodology (problem-first thinking, measurable outcomes) to any PM-related conversation. That’s useful, but generic. It works the same way regardless of what product you’re building.

The bigger shift was building skills that encode institutional knowledge about specific products. I now have skills for each major product area my team owns: log delivery, analytics, audit logs, alerting, and data pipelines. Each skill contains the product’s architecture and common failure modes, along with which code repositories to search and which database tables hold relevant data.

This is what makes the multi-agent workflows genuinely useful. When the code investigator agent examines an escalation about missing logs, the domain skill tells it which service handles job state and which repository contains the delivery pipeline. It also flags recent architectural changes that might be relevant. Without that context, the agent produces plausible-sounding analysis that misses the specific details engineering needs.

Now every investigation that uses a skill validates or extends the knowledge it contains, and /session-end catches insights that should be added back.

How The Work Changes

A few practical observations from working this way:

  • The role has shifted from “write the right prompt” to “design the right process.” The escalation command is a workflow with phases, dependencies, and validation steps. Thinking about it that way produces better results than trying to pack everything into a single conversation.
  • Validation has to be built in. The blind validator exists because agents make mistakes. They cite files that don’t exist, mischaracterize what code does, or draw conclusions the evidence doesn’t support. Catching those issues before they reach anyone else is the whole point.
  • Cross-session memory requires discipline. The system only works if I run /session-end after substantive sessions and keep stable facts current. When I skip it, the next session starts cold and I lose the compounding benefit. Automation helps, but the commitment to maintain the memory is mine.
  • And domain skills need regular maintenance. Products change. Code gets refactored, pipelines get rearchitected. Skills that aren’t periodically updated drift from reality. I haven’t solved this well yet. It’s still a manual process of noticing when a skill’s knowledge is stale and updating it.

The system still makes mistakes. Multi-agent workflows are more thorough than single-prompt conversations, but they’re not infallible. The confidence assessment in the escalation output exists because sometimes the answer is “medium confidence, we couldn’t confirm this from the available data.” That honesty about limitations is more useful than false certainty.

Where This Is Going

I’m sure the specific commands and skills will look different in six months as I learn what works and what doesn’t. But the underlying pattern feels durable: compose specialist agents with deep domain context, validate their output, and feed learnings back into the system.

I’ve published updated files to the Product AI Public repo, including the session memory commands and a generalized version of the multi-agent escalation workflow. If you’re building something similar, those might be useful starting points.

The value of this system is in how the pieces reinforce each other. Domain skills make agents useful for real investigations. Session memory means the system gets smarter over time. And the promote-to-infrastructure loop ties it together, so each piece of work has a chance to make the next one better.

When Using AI Leads to “Brain Fry"

I am definitely feeling the “brain fry” right now:

We found that the phenomenon described in these posts—cognitive exhaustion from intensive oversight of AI agents—is both real and significant. We call it “AI brain fry,” which we define as mental fatigue from excessive use or oversight of AI tools beyond one’s cognitive capacity. Participants described a “buzzing” feeling or a mental fog with difficulty focusing, slower decision-making, and headaches.

The research is fascinating and worth reading, with super interesting findings like this:

 As employees go from using one AI tool to two simultaneously, they experience a significant increase in productivity. As they incorporate a third tool, productivity again increases, but at a lower rate. After three tools, though, productivity scores dipped. Multitasking is notoriously unproductive, and yet we fall for its allure time and again.

Earlier this week I had this thought: “Oh no, I think I’ve blown out my context window. I wish I could add some more tokens to my brain. Until then I might just have to respond to new requests with 401 Unauthorized.”

And that’s when I realized I probably need to go touch grass or something.

AI should help us produce better code

As usual, Simon Willison hits the nail on the head here:

If adopting coding agents demonstrably reduces the quality of the code and features you are producing, you should address that problem directly: figure out which aspects of your process are hurting the quality of your output and fix them. Shipping worse code with agents is a choice. We can choose to ship code that is better instead.

Also see Mitchell Hashimoto’s idea of “harness engineering”:

It is the idea that anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again.

On Meeting Your Child Again, and Again

Derek Thompson wrote a wonderful essay on what happens when you become a parent:

The baby you bring home from the hospital is not the baby you rock to sleep at two weeks, and the baby at three months is a complete stranger to both. In a phenomenological sense, parenting a newborn is not at all like parenting “a” singular newborn, but rather like parenting hundreds of babies, each one replacing the previous week’s child, yet retaining her basic facial structure. “Parenthood abruptly catapults us into a permanent relationship with a stranger,” Andrew Solomon wrote in Far From the Tree. Almost. Parenthood catapults us into a permanent relationship with strangers, plural to the extreme.

Why It's Still Valuable To Learn To Code

Carson Gross has a good essay on whether junior programmers should still learn to code given how capable AI has become. His core warning to students:

Yes, AI can generate the code for this assignment. Don’t let it. You have to write the code. I explain that, if they don’t write the code, they will not be able to effectively read the code. The ability to read code is certainly going to be valuable, maybe more valuable, in an AI-based coding future. If you can’t read the code you are going to fall into The Sorcerer’s Apprentice Trap, creating systems you don’t understand and can’t control.

And on what separates senior engineers who can use AI well from those who can’t:

Senior programmers who already have a lot of experience from the pre-AI era are in a good spot to use LLMs effectively: they know what ‘good’ code looks like, they have experience with building larger systems and know what matters and what doesn’t. The danger with senior programmers is that they stop programming entirely and start suffering from brain rot.

This maps directly onto what I’ve been writing about with AI for product work and the second brain setup I’ve built. The system works because I spent years writing and reading PRDs, strategy docs, and OKRs—enough to develop actual opinions about what good looks like. You have to do the work first, then the second brain is worth building.

An AI Wake-Up Call

Matt Shumer’s Something Big Is Happening has made the rounds over the last couple of weeks, but just in case you haven’t seen it, I think it’s very much worth reading. He’s an AI startup founder writing for the non-technical people in his life:

AI isn’t replacing one specific skill. It’s a general substitute for cognitive work. It gets better at everything simultaneously. When factories automated, a displaced worker could retrain as an office worker. When the internet disrupted retail, workers moved into logistics or services. But AI doesn’t leave a convenient gap to move into. Whatever you retrain for, it’s improving at that too.

Previous waves of automation always left somewhere to go. The uncomfortable implication here is that the escape routes are closing as fast as they open.

There are too many quotes worth commenting on, but this observation about what we tell our kids feels important:

The people most likely to thrive are the ones who are deeply curious, adaptable, and effective at using AI to do things they actually care about. Teach your kids to be builders and learners, not to optimize for a career path that might not exist by the time they graduate.

Predictions about the pace of change tend to be simultaneously too aggressive and too conservative in ways that are hard to anticipate. But the direction feels right, and the practical advice is sound: use the tools seriously, don’t assume they can’t do something just because it seems too hard, and spend your energy adapting rather than debating whether this is real.

Toolshed, blueprints, and why good agents need good DevEx

Alistair Gray published part two of Stripe’s “Minions” series, going deeper on how they built their internal coding agents. It’s a great read throughout, but three ideas really stood out to me.

First, blueprints. These are workflows that mix deterministic steps with agentic ones:

Blueprints are workflows defined in code that direct a minion run. Blueprints combine the determinism of workflows with agents’ flexibility in dealing with the unknown: a given node can run either deterministic code or an agent loop focused on a task. In essence, a blueprint is like a collection of agent skills interwoven with deterministic code so that particular subtasks can be handled most appropriately.

If you know a step should always happen the same way, don’t let an LLM decide how to do it. Let the agent handle the ambiguous parts, and hardcode the rest (this can also dramatically reduce token cost).

Second, their centralized MCP server:

We built a centralized internal MCP server called Toolshed, which makes it easy for Stripe engineers to author new tools and make them automatically discoverable to our agentic systems. All our agentic systems are able to use Toolshed as a shared capability layer; adding a tool to Toolshed immediately grants capabilities to our whole fleet of hundreds of different agents.

A shared tool layer that all agents can use… 500 tools, one server, hundreds of agents. Very cool idea.

And third, what they call “shifting feedback left”:

We have pre-push hooks to fix the most common lint issues. A background daemon precomputes lint rule heuristics that apply to a change and caches the results of running those lints, so developers can usually get lint fixes in well under a second on a push.

If you can catch a problem before it hits CI, do it there. A sub-second lint fix on push is better than a 10-minute CI failure, whether you’re a person or an LLM burning tokens.

So much of Stripe’s agent success is built on top of investments they made for human developer productivity. Good dev environments, fast feedback loops, shared tooling. The agents benefit from all of it, and developers remain in control.

Project Brains: Organizing Complex Initiatives for AI-Assisted Work

I’ve written before about how I use AI for product work and how that workflow evolved with slash commands and skills. This post focuses on how to maintain context for complex, long-running projects.

The Problem: Context Fragmentation

When I’m working on a major initiative, relevant information ends up scattered everywhere: PRDs in one tool, tickets in another, meeting notes in a third, plus emails and chat threads. Every time I return to a project after a few days, I spend time reconstructing where things stand.

AI assistants can make this worse because each conversation starts fresh. I can reference files, but the model doesn’t know which files matter for this project, what decisions we’ve already made, or what questions remain open. I end up re-explaining context that should be obvious.

Project brains solve this by creating a dedicated folder for each major initiative with a standard structure that both humans and AI can navigate.

What a Project Brain Looks Like

The structure looks like this:

projects/[project-name]/
├── CONTEXT.md        # The hub: status, stakeholders, decisions, open questions
├── artifacts/        # PRDs, specs, designs, one-pagers
├── decisions/        # Decision logs with rationale and alternatives
├── research/         # Customer feedback, data analysis, technical investigation
└── meetings/         # Meeting notes related to this project

The CONTEXT.md file is a living document that answers the questions I’d need to answer every time I pick up a project:

  • What’s the current status?
  • Who are the stakeholders and what do they care about?
  • What decisions have we made and why?
  • What questions are still open?
  • Where are the relevant artifacts?

When I start a conversation about a project, I point the AI to the project folder. It reads CONTEXT.md first, then can drill into specific artifacts as needed. The model immediately knows the project state without me explaining it.

A Real Example

Say I’m working on adding observability to an internal platform—something that needs coordination across multiple teams over several months. The CONTEXT.md includes:

  • Quick reference table: Status, PM, engineering lead, target dates, links to the PRD and relevant tickets. Everything I’d need to orient myself.
  • Problem statement: A clear articulation of the user pain. In this case: “Platform incidents go undetected until users report them, and debugging takes hours due to lack of visibility.”
  • Success metrics with baselines and targets: Things like uptime targets, reduction in mean time to resolution, and alert accuracy. These anchor every conversation about scope.
  • Key decisions made: A table showing what was decided, when, why, and what alternatives we considered. When someone asks “why aren’t we including component X in v1?”, the answer is already documented.
  • Open questions: A checklist of unresolved issues. This prevents the AI from assuming things are settled when they’re not.
  • Links: Direct paths to the PRD, spec, analysis docs, and related pages.

The decisions/ folder contains detailed decision logs for significant choices. The research/ folder holds whatever analysis informed the project direction. The meetings/ folder captures sync notes that would otherwise disappear into Gemini notes in a Google Drive… somewhere.

When to Create a Project Brain

Not every task needs this treatment. I create a project brain when:

  • The work spans multiple weeks or months. Short-term tasks don’t need the overhead.
  • Multiple stakeholders are involved. If I need to coordinate with other teams, having a single source of context helps.
  • Decisions require documented rationale. If someone might ask “why did you do it this way?” later, a decision log is worth the investment.
  • The project crosses team boundaries. Cross-functional initiatives benefit from dedicated context that doesn’t live in any one team’s space.

For simpler work, I use a flatter folder structure with documents organized by type. Project brains are for the complex initiatives where context fragmentation is a real cost.

How AI Uses Project Brains

The payoff comes when I’m working with AI on project-specific tasks. A few examples:

  • Preparing for a meeting: “Read the CONTEXT.md in the [project] folder. I have a spec review meeting tomorrow. What are the open questions I should raise?”
  • Drafting an update: “Based on the project context, draft a status update for leadership. Focus on progress since the start of the month and remaining blockers.”
  • Decision analysis: “We need to decide whether to include [component] in scope. Read the research folder and the current CONTEXT.md. What would you recommend and why?”

The AI knows the project history, the stakeholders, the constraints. Its recommendations are grounded in documented context rather than generic best practices.

Maintaining the Project Brain

The value depends on keeping CONTEXT.md current. I’ve found a few practices help:

  • Update after significant events. When a decision is made, a meeting happens, or the status changes, update the file immediately. “I’ll do it later” means it won’t happen. LLMs are great at making these updates, so you can simply say “update relevant files based on the session we just concluded.”
  • Move open questions to resolved. When a question gets answered, don’t delete it. Mark it resolved and note the answer. This preserves the reasoning trail.
  • Link, don’t duplicate. CONTEXT.md should point to artifacts, not contain them. Keep PRDs in the artifacts folder. Keep meeting notes in the meetings folder. The context file is a hub, not a repository.

Scaffolding New Projects

I have a slash command that scaffolds new project brains:

/new-project platform-observability

This creates the folder structure, generates a CONTEXT.md from a template, and fills out a rough draft based on whatever context I provide. Removing the friction of setup means I’m more likely to actually use the system. You can view the command here.

The template includes the standard sections (Quick Reference, Problem Statement, Success Metrics, etc.) with placeholder text. I fill in what I know and mark other sections as TBD. Even an incomplete project brain is more useful than scattered notes.

What I’ve Learned

A few observations from using this approach:

  • Structure beats volume. A well-organized project brain with sparse content is more useful than a folder full of undifferentiated documents. The AI (and future me) can navigate structure. It can’t navigate chaos.
  • Decision logs compound. Every decision I document now saves time later. When stakeholders ask “why didn’t we do X?”, I can point to a decision log instead of reconstructing my reasoning from memory.
  • CONTEXT.md is for humans too. I originally built this for AI assistance, but I reference these files constantly in my own work. The discipline of maintaining project context helps me stay oriented, not just the AI.
  • The folder structure is flexible. Some projects need more subfolders (like research/customer-interviews/). Some need fewer. The template is a starting point, not a requirement.

This approach requires discipline to maintain, and the upfront setup takes time. But for complex initiatives where context fragmentation is a real problem, project brains have been worth the investment. The AI becomes a more useful collaborator when it has access to the same context I do.

I’m still iterating on the structure. I suspect the template will look different six months from now as I learn what sections actually get used and which ones I skip every time. The point isn’t to get the folder structure perfect, but to stop losing context between conversations and start building on what you already know.

The A.I. Disruption Has Arrived, and It Sure Is Fun

Paul Ford writes about vibe coding for the NYT (gift link) and what happens when software suddenly becomes cheap and fast to ship:

There are many arguments against vibe coding through A.I. It is an ecological disaster, with data centers consuming billions of gallons of water for cooling each year; it can generate bad, insecure code; it creates cookie-cutter apps instead of real, thoughtful solutions; the real value is in people, not software. All of these are true and valid. But I’ve been around too long. The web wasn’t “real” software until it was. Blogging wasn’t publishing. Big, serious companies weren’t going to migrate to the cloud, and then one day they did.

And then he brings it home in a way that continues to make him one of my favorite web writers:

The simple truth is that I am less valuable than I used to be. It stings to be made obsolete, but it’s fun to code on the train, too. And if this technology keeps improving, then everyone who tells me how hard it is to make a report, place an order, upgrade an app or update a record — they could get the software they deserve, too. That might be a good trade, long term.

We can grieve what we lost, while also being optimistic about the future AI is unlocking for all of us. It’s uncomfortable, but that’s ok, all technological shifts are.

The Father-Daughter Divide

Isabel Woodford has a research-heavy essay in The Atlantic about why dads and daughters crave closeness but struggle to find it. 28% of American women are estranged from their father, and even where relationships are intact, they tend to be thinner—more transactional, less emotionally honest—than daughters want.

At the root of the modern father-daughter divide seems to be a mismatch in expectations. Fathers, generally speaking, have for generations been less involved than mothers in their kids’ (and especially their daughters‘) lives. But lots of children today expect more: more emotional support and more egalitarian treatment. Many fathers, though, appear to have struggled to adjust to their daughters’ expectations. The result isn’t a relationship that has suddenly ruptured so much as one that has failed to fully adapt.

And the psychological explanation that cuts deepest:

“What generates closeness is another person’s vulnerability,” Coleman explained, and dads may not be ready for that.

Daughters aren’t asking for grand gestures or dramatic change—they’re asking for their fathers to show up emotionally. Which turns out to be hard for a lot of men who were raised to see that kind of openness as weakness.