Menu

Posts tagged “product management”

The Product Leader’s Influence on the World We All Will Live in

In a practical example of brain fry, Petra Wille recalls some of her personal experience during coaching:

The product leaders and CPOs I coach tell me their people are completely fried before lunch—after a morning of generating content and reviewing outputs in Claude, Gemini, and ChatGPT, they’re just done. Adapting to this new type of work doesn’t make them more productive because they’re out of energy and brainpower by noon.

So conversations about how we actually work—what a sustainable rhythm looks like for humans in this new setup—still needs to happen.

This has become a pretty common complaint/concern among people I talk to, and it gets me too. I’ve been sitting on posting this link because I wanted to include some kind of proposal but… I got nothing. Just agreement with Petra that we really really need to figure out how to work in this new world in a way that avoids mass burnout.

You're Worse at Your Job Because You Care Too Much

Yes, it’s a clickbaity title, but if you read this as an essay about what to care about at work, it has some good reminders like this:

“Care less” is directionally right, but let’s get more specific. The real shift is learning to place your care deliberately — to get good at telling the difference between what’s strategically important and what’s just noisy. A lot of what happens inside companies is frustrating without being important. Reacting to a messy call that you personally wouldn’t have made as if it’s a strategic risk is what drains you. So is holding on to every detail as if it’s existential. Not everything deserves to be treated with equal importance. A gut check that helps: Will this matter in a year? If not, it probably doesn’t deserve much energy now. What’s the worst-case scenario? Often, it’s not that bad.

How to stay relevant when the PM role keeps rewriting itself

Melissa Perri chimes in on how AI is changing the product role, and makes the case for measuring PMs by decisions changed and outcomes shipped, not by tickets written and docs generated:

If you are a PM, stop measuring your productivity by how many tickets you wrote, how many pages of documentation you spun up, or how fast you closed the loop on the last sprint. That work is going to keep getting easier.

Measure your productivity by how often you changed a decision that mattered, how often you saw around a corner, how often a senior leader walked out of a room thinking differently because of something you said. How often your shipped features translate into real customer outcomes is what matters.

Everything I read is saying the same thing right now: judgment, customer understanding, and the ability to change a senior leader’s mind in a room are the skills that AI can’t touch. I’m not disagreeing necessarily, but I do think that narrative is missing a big new skill that is needed. I wrote about this in What actually changed about being a PM:

I was talking to my wife the other day about what I’m doing, and she asked the obvious question: “Why are you automating your job away?” My answer: the people who automate their own jobs away are the ones who become more valuable, because the craft is now in orchestration — setting up the layers so the AI does the right thing.

I also continue to think about this quote from Org Design in the Age of AI and how the focus is shifting from “information movers” to builders:

The old PM spent most of their energy making ideas legible to other people. The new PM validates directly — prototyping, running data analyses, generating first-pass implementations. […] The managers who thrive will be the ones whose real contribution was always judgment, coaching, and navigating ambiguity — not routing information.

Product Roadmaps: How the Best Product Teams Plan for Uncertainty

I’m a big fan of Now/Next/Later roadmaps, and I think it adapts particularly well to an AI-assisted world, so I was curious to read Teresa’s Take on different roadmap models. It’s a fun trip through different prioritization frameworks, and I do like her reframing of the Now/Next/Later approach:

Here’s what I’ve seen work best: Take the Now Next Later format, but instead of filling every column with features at different levels of detail, change the type of content as you move across columns. […]

Specifically, I list solutions in the Now column, opportunities in the Next column, and outcomes in the Later column.

The Slide

The single biggest challenge for new managers — giving up the responsibility for the product… for the building. Learning how to give accountability for projects of significance to the team. It’s an essential set of complex skills involving trust, communication, and, most importantly, judgment. Failure to understand delegation is failing to be a leader. Senior or not.

— Michael Lopp, The Slide

AI Prototyping Is Changing How We Build Products at Uber

There is no doubt that this post was at least 80% written by AI but I’m not even super mad about it because that is just the way of the world now, and the summary it generated from how Uber works is actually legit interesting:

A prototype without a PRD can drift away from the problem the team intends to solve. A PRD without a prototype can remain abstract, leaving room for inconsistent interpretations. […] If going from idea to prototype is now fast and cheap, the PRD can no longer be the primary place where ideas are defined. Its value increasingly lies in capturing intent, tradeoffs, success metrics, and decisions.

The PRD as an artifact is in the spotlight right now in a way that I think is really healthy. Should it remain but change its JTBD? Should it be an eval instead? Who knows. Let’s figure it out together…

What actually changed about being a PM

I have decided that in this new AI era I will be practicing FDD. Fear-Driven Development. Every time I send a pull request, which happens a lot now, I'm terrified of an engineer sending it back to me and asking me to please stay in my lane and stop sending them slop. So I plan, write specs and implementation plans, test thoroughly, and I don't trust the agent's inevitable confidence.

I'll come back to that, but let me first frame what this post is about. The loudest take on PM work right now is that AI is collapsing the role — that we're one product cycle away from redundancy, or being reduced to prompt jockeys. That hasn't been my experience at all. The job got more hands-on, harder (brain fry is real), but also a lot more fun. What follows is what actually shifted for me over the last 5 months at Cloudflare, what didn't, and a couple of things I got wrong.

Continue reading →

Evals Are the New PRD

Braintrust makes a good case (apologies for the X.com link…) for rethinking how PMs work on AI products: the eval replaces the PRD.

An eval is a structured, repeatable test that answers one question. Does my AI system do the right thing? You define a set of inputs along with expected outputs, run them through your AI system, and score the results using algorithms or AI judges.

The eval becomes both the spec and the acceptance criteria. The directive to engineering:

“Here is the eval. Make this number go up.”

That’s very different to how most teams work today, but I can definitely see the industry moving this way. Product usage generates signals, observability captures them, and evals turn them into improvement targets. The PM’s job is to define what “good” looks like in code and curate the data that reveals what “bad” looks like.

The PM skills that transfer are the same ones that always mattered — discovering needs and opportunities, and making judgment calls about what to build for business value. The difference is that instead of a document that describes the intent, you have a test suite that encodes it.

AI might actually need more PMs

Amol Avasare, Anthropic’s Head of Growth, said on Lenny’s Podcast that maybe PM jobs are not going to shrink as much as we may have thought…

Rather than immediately replacing PMs, AI is currently increasing engineering leverage the fastest, which creates new pressure on PMs and designers. In larger organizations, that may actually increase the value of PMs who can guide priorities, manage alignment, and sharpen decision-making—especially as engineers take on more “mini-PM” responsibilities.

From Assistant to Collaborator: How My AI Second Brain Grew Up

Over the past few months I’ve been writing about how I use AI for product work. The first post covered the philosophy: context files, opinionated prompts, and how to compose the right inputs for each task. The second added slash commands and daily summaries. The third was a hands-on setup guide. And the fourth introduced project brains for keeping complex initiatives organized.

This post covers a different kind of change. The earlier additions were incremental: more commands, better context, smoother workflows. What changed recently feels more like a threshold. The system went from a tool I invoke for specific tasks to something closer to a collaborator I dispatch to do real work. Three capabilities drove that shift: multi-agent orchestration, cross-session memory, and the encoding of domain expertise into the system itself.

Multi-Agent Workflows

The clearest example is customer escalation investigations. As a PM for data products, I regularly investigate customer-reported issues: logging gaps, data discrepancies, behavior that doesn’t match expectations. These investigations require pulling information from multiple sources and cross-referencing it all into an analysis that engineering can act on.

I built a slash command that handles this as a multi-phase workflow. When I run it with a ticket ID, here’s what happens:

  1. The system reads the customer ticket, extracts the core problem, identifies which product area is involved, and classifies the issue type.
  2. Three specialist agents launch simultaneously, each focused on a different data source. One searches the codebase for the relevant logic and recent changes. Another searches for related tickets and prior incidents across projects. A third checks documentation and internal wiki pages for relevant operational context.
  3. A fourth agent receives the combined findings and produces database queries that can confirm or refute the working hypothesis.
  4. The system combines everything into a structured analysis: issue classification, root cause anchored in code where possible, customer impact, and recommended next steps.
  5. A blind validator independently re-fetches every source cited in the draft to verify the claims hold up. Then an adversarial challenger looks for alternative explanations and tests whether the classification is correct.

The output is a document I can review with an engineering colleague or paste into a chat thread. It includes a confidence assessment and a data collection status table showing what was checked and what was unavailable, along with how the analysis compensated for gaps.

The command file that orchestrates all of this isn’t prompting in the traditional sense. It defines which agents to dispatch, what information each one needs, when to wait for results before proceeding, and how to handle failures gracefully. Writing this felt more like designing a workflow than writing a prompt.

I’ve applied the same pattern to other tasks. A “fix feasibility” command evaluates whether a ticket describes a code change simple enough for a PM to implement with AI coding assistance, and produces an implementation brief if the answer is yes. The specific use cases differ, but the architecture is the same: break the problem into specialist tasks that run in parallel, then synthesize and validate the results.

Cross-Session Memory

AI conversations are stateless by default. Every new session starts from zero, which means re-explaining context that should already be established. Over a few weeks of working on the same projects, this friction adds up.

I addressed this with a four-layer memory system:

  • The first layer is stable facts: a compact file that captures the current state of all active work, including project status, recent decisions, and environment constraints. This is the primary orientation file. When I start a session, the AI reads it and immediately knows what’s in flight.
  • The second is a session log: a reverse-chronological list of handoff notes. Each entry records what happened in a session and what threads remain open. The last three entries give enough context to pick up where I left off.
  • Third, a corrections file. This holds behavioral fixes for things the AI consistently gets wrong. It’s a staging area that should shrink over time as fixes get promoted elsewhere.
  • And finally, a decisions log: a cross-cutting record of decisions that don’t belong to a specific project. Each entry captures context and rationale so I don’t relitigate settled questions.

Two commands manage this. /session-start loads all four files and presents a brief summary of current state and recent sessions. /session-end reviews the conversation, writes a handoff note, and then checks whether any learnings should be promoted to infrastructure.

“Promote to infrastructure” means taking something learned during a session and baking it into the files the agent actually reads. A correction about how to handle a specific edge case in escalation investigations might start in the corrections file, then get promoted into the escalation command or a domain skill once it’s validated. The corrections file shrinks over time as that knowledge moves into the right places.

This creates a loop where the system improves its own instructions. I approve every change, so it’s not self-modifying in a creepy way. But in practice each work session can make the next one slightly better, and the compound effect over weeks is noticeable.

Domain Expertise

The earlier posts described skills like pm-thinking, which applies product methodology (problem-first thinking, measurable outcomes) to any PM-related conversation. That’s useful, but generic. It works the same way regardless of what product you’re building.

The bigger shift was building skills that encode institutional knowledge about specific products. I now have skills for each major product area my team owns: log delivery, analytics, audit logs, alerting, and data pipelines. Each skill contains the product’s architecture and common failure modes, along with which code repositories to search and which database tables hold relevant data.

This is what makes the multi-agent workflows useful. When the code investigator agent examines an escalation about missing logs, the domain skill tells it which service handles job state and which repository contains the delivery pipeline. It also flags recent architectural changes that might be relevant. Without that context, the agent produces plausible-sounding analysis that misses the specific details engineering needs.

Now every investigation that uses a skill validates or extends the knowledge it contains, and /session-end catches insights that should be added back.

How The Work Changes

The biggest change is in my own role. It’s gone from “write the right prompt” to “design the right process.” The escalation command is a workflow with phases, dependencies, and validation steps, and thinking about it that way beats trying to pack everything into a single conversation. A few other things I’ve noticed:

  • Validation has to be built in. The blind validator exists because agents make mistakes. They cite files that don’t exist, mischaracterize what code does, or draw conclusions the evidence doesn’t support. Catching those issues before they reach anyone else is the whole point.
  • Cross-session memory requires discipline. The system only works if I run /session-end after substantive sessions and keep stable facts current. When I skip it, the next session starts cold and I lose the compounding benefit. Automation helps, but the commitment to maintain the memory is mine.
  • And domain skills need regular maintenance. Products change. Code gets refactored, pipelines get rearchitected. Skills that aren’t periodically updated drift from reality. I haven’t solved this well yet. It’s still a manual process of noticing when a skill’s knowledge is stale and updating it.

The system still makes mistakes. Multi-agent workflows are more thorough than single-prompt conversations, but they’re not infallible. The confidence assessment in the escalation output exists because sometimes the answer is “medium confidence, we couldn’t confirm this from the available data.” That honesty about limitations is more useful than false certainty.

Where This Is Going

I’m sure the specific commands and skills will look different in six months as I learn what works and what doesn’t. But the underlying pattern feels durable: compose specialist agents with deep domain context, validate their output, and feed learnings back into the system.

I’ve published updated files to the Product AI Public repo, including the session memory commands and a generalized version of the multi-agent escalation workflow. If you’re building something similar, those might be useful starting points.

None of these pieces does much on its own. It’s the way they feed each other that turned a pile of separate prompts into something I lean on every day.