No One Else Can Speak the Words on Your Lips

Ben Roy explains why prompting an LLM to write an essay misunderstands what writing actually is:

People fundamentally can’t prompt good essays into existence because writing is not a top-down exercise of applying knowledge you have upfront and asking an LLM to create something. AI agents also can’t create good essays for the same reason. Even though their step-by-step reasoning is more complex and iterative than human prompting, a chain of thought is still trying to accomplish a predefined goal. By contrast, real writing is bottom up. You don’t know what you want to say in advance. It’s a process of discovery where you start with a set of half-baked ideas and work with them in non-linear ways to find out what you really think.

I will continue to argue that for general business writing LLMs are fantastic if they are given the right context and guidance, and that it can save hours of work (with high quality results). But all my experiments with using LLMs for creative writing has so far fallen flat. Maybe—likely?—that will change within the next few months. But for now, the brain work this kind of writing requires remains. Not a bad thing imo.

12 April 2026

Zombie Flow

Derek Thompson goes into the history of the “flow” concept, and how tech and entertainment companies learned to simulate it without any of the substance psychologist Mihaly Csikszentmihalyi originally had in mind:

Algorithmic flow is flow without achievement, flow without challenge, flow without even volition… To be lost in the lazy river of algorithmic media is to be lost the current of life without a mind. Zombie flow.

Ten years ago the question was how to get into flow more often. Now it might be how to get out of the fake version fast enough to remember what the real one felt like.

10 April 2026

AI might actually need more PMs

Amol Avasare, Anthropic’s Head of Growth, said on Lenny’s Podcast that maybe PM jobs are not going to shrink as much as we may have thought…

Rather than immediately replacing PMs, AI is currently increasing engineering leverage the fastest, which creates new pressure on PMs and designers. In larger organizations, that may actually increase the value of PMs who can guide priorities, manage alignment, and sharpen decision-making—especially as engineers take on more “mini-PM” responsibilities.

10 April 2026

Eight years of wanting, three months of building with AI

Lalit Maganti writes about building a SQLite parser with AI — a project he’d been putting off for eight years, finished in three months. His comparison of AI coding to slot machines is uncomfortably familiar:

I found myself up late at night wanting to do “just one more prompt,” constantly trying AI just to see what would happen even when I knew it probably wouldn’t work. The sunk cost fallacy kicked in too: I’d keep at it even in tasks it was clearly ill-suited for, telling myself “maybe if I phrase it differently this time.”

Also, I agree that this is still true today, but I’m not convinced it will remain true beyond 2026:

AI is an incredible force multiplier for implementation, but it’s a dangerous substitute for design.

29 March 2026

Endgame for the open web

Anil Dash has a long essay on the state of the open web and not all of it rings true for me, but buried in the opening is a wonderful definition of what the open web actually is:

The open web is something extraordinary: anybody can use whatever tools they have, to create content following publicly documented specifications, published using completely free and open platforms, and then share that work with anyone, anywhere in the world, without asking for permission from anyone. Think about how radical that is.

It does feel like if the web got invented in 2026, it would not have been left as an open technology for long (see also AI and how much open source models are lagging).

23 March 2026

Negative space in writing

Tracy Durnell explores non-visual negative space—what happens when writing leaves room for the reader to think:

The current design trend of business and self-help style books is to use tons of subheadings and callout boxes and always, a list of the key points at the end of the chapter. While this is a highly skimmable format and often nice visual design, it essentially sucks the negative space out of the text — the places in which the reader might step back and consider their own examples or anticipate what point the author is trying to make. There’s no time for hunches here.

And:

The negative space of the text helps build the aesthetic experience. Small details flavor the text with a sense of reality. Drawing out events — leaving questions unresolved and conflicts unsettled — can build tension. And textual space creates a gap for the reader to make the personal decodings of the text that build meaning.

Not everything has to get to the point immediately. Sometimes the best thing a writer can do is leave room for the reader to get there on their own. I’m thinking about this because I’m currently reading The Will of the Many. It is slow, and long, and one of the best books I’ve read in ages. The negative space is probably a big reason why I love it so much.

23 March 2026

Agentic manual testing

Simon Willison has a practical guide on manual testing with coding agents. Two tips I’ve already started using:

It’s still quick for an agent to write out a demo file and then compile and run it. I sometimes encourage it to use /tmp purely to avoid those files being accidentally committed to the repository later on.

And:

If an agent finds something that doesn’t work through their manual testing, I like to tell them to fix it with red/green TDD. This ensures the new case ends up covered by the permanent automated tests.

14 March 2026

From Assistant to Collaborator: How My AI Second Brain Grew Up

Over the past few months I’ve been writing about how I use AI for product work. The first post covered the philosophy: context files, opinionated prompts, and how to compose the right inputs for each task. The second added slash commands and daily summaries. The third was a hands-on setup guide. And the fourth introduced project brains for keeping complex initiatives organized.

This post covers a different kind of change. The earlier additions were incremental: more commands, better context, smoother workflows. What changed recently feels more like a threshold. The system graduated from a tool I invoke for specific tasks to something closer to a collaborator I dispatch to do real work. Three capabilities drove that shift: multi-agent orchestration, cross-session memory, and the encoding of domain expertise into the system itself.

Multi-Agent Workflows

The clearest example is customer escalation investigations. As a PM for data products, I regularly investigate customer-reported issues: logging gaps, data discrepancies, behavior that doesn’t match expectations. These investigations require pulling information from multiple sources and cross-referencing it all into an analysis that engineering can act on.

I built a slash command that handles this as a multi-phase workflow. When I run it with a ticket ID, here’s what happens:

The system reads the customer ticket, extracts the core problem, identifies which product area is involved, and classifies the issue type.
Three specialist agents launch simultaneously, each focused on a different data source. One searches the codebase for the relevant logic and recent changes. Another searches for related tickets and prior incidents across projects. A third checks documentation and internal wiki pages for relevant operational context.
A fourth agent receives the combined findings and produces database queries that can confirm or refute the working hypothesis.
The system combines everything into a structured analysis: issue classification, root cause anchored in code where possible, customer impact, and recommended next steps.
A blind validator independently re-fetches every source cited in the draft to verify the claims hold up. Then an adversarial challenger looks for alternative explanations and tests whether the classification is correct.

The output is a document I can review with an engineering colleague or paste into a chat thread. It includes a confidence assessment and a data collection status table showing what was checked and what was unavailable, along with how the analysis compensated for gaps.

The command file that orchestrates all of this isn’t prompting in the traditional sense. It defines which agents to dispatch, what information each one needs, when to wait for results before proceeding, and how to handle failures gracefully. Writing this felt more like designing a workflow than writing a prompt.

I’ve applied the same pattern to other tasks. A “fix feasibility” command evaluates whether a ticket describes a code change simple enough for a PM to implement with AI coding assistance, and produces an implementation brief if the answer is yes. The specific use cases differ, but the architecture is the same: break the problem into specialist tasks that run in parallel, then synthesize and validate the results.

Cross-Session Memory

AI conversations are stateless by default. Every new session starts from zero, which means re-explaining context that should already be established. Over a few weeks of working on the same projects, this friction adds up.

I addressed this with a four-layer memory system:

The first layer is stable facts: a compact file that captures the current state of all active work, including project status, recent decisions, and environment constraints. This is the primary orientation file. When I start a session, the AI reads it and immediately knows what’s in flight.
The second is a session log: a reverse-chronological list of handoff notes. Each entry records what happened in a session and what threads remain open. The last three entries give enough context to pick up where I left off.
Third, a corrections file. This holds behavioral fixes for things the AI consistently gets wrong. It’s a staging area that should shrink over time as fixes get promoted elsewhere.
And finally, a decisions log: a cross-cutting record of decisions that don’t belong to a specific project. Each entry captures context and rationale so I don’t relitigate settled questions.

Two commands manage this. /session-start loads all four files and presents a brief summary of current state and recent sessions. /session-end reviews the conversation, writes a handoff note, and then checks whether any learnings should be promoted to infrastructure.

“Promote to infrastructure” means taking something learned during a session and baking it into the files the agent actually reads. A correction about how to handle a specific edge case in escalation investigations might start in the corrections file, then get promoted into the escalation command or a domain skill once it’s validated. The corrections file shrinks over time as knowledge graduates into the right places.

This creates a loop where the system improves its own instructions. I approve every change, so it’s not self-modifying in a creepy way. But in practice each work session can make the next one slightly better, and the compound effect over weeks is noticeable.

Domain Expertise

The earlier posts described skills like pm-thinking, which applies product methodology (problem-first thinking, measurable outcomes) to any PM-related conversation. That’s useful, but generic. It works the same way regardless of what product you’re building.

The bigger shift was building skills that encode institutional knowledge about specific products. I now have skills for each major product area my team owns: log delivery, analytics, audit logs, alerting, and data pipelines. Each skill contains the product’s architecture and common failure modes, along with which code repositories to search and which database tables hold relevant data.

This is what makes the multi-agent workflows genuinely useful. When the code investigator agent examines an escalation about missing logs, the domain skill tells it which service handles job state and which repository contains the delivery pipeline. It also flags recent architectural changes that might be relevant. Without that context, the agent produces plausible-sounding analysis that misses the specific details engineering needs.

Now every investigation that uses a skill validates or extends the knowledge it contains, and /session-end catches insights that should be added back.

How The Work Changes

A few practical observations from working this way:

The role has shifted from “write the right prompt” to “design the right process.” The escalation command is a workflow with phases, dependencies, and validation steps. Thinking about it that way produces better results than trying to pack everything into a single conversation.
Validation has to be built in. The blind validator exists because agents make mistakes. They cite files that don’t exist, mischaracterize what code does, or draw conclusions the evidence doesn’t support. Catching those issues before they reach anyone else is the whole point.
Cross-session memory requires discipline. The system only works if I run /session-end after substantive sessions and keep stable facts current. When I skip it, the next session starts cold and I lose the compounding benefit. Automation helps, but the commitment to maintain the memory is mine.
And domain skills need regular maintenance. Products change. Code gets refactored, pipelines get rearchitected. Skills that aren’t periodically updated drift from reality. I haven’t solved this well yet. It’s still a manual process of noticing when a skill’s knowledge is stale and updating it.

The system still makes mistakes. Multi-agent workflows are more thorough than single-prompt conversations, but they’re not infallible. The confidence assessment in the escalation output exists because sometimes the answer is “medium confidence, we couldn’t confirm this from the available data.” That honesty about limitations is more useful than false certainty.

Where This Is Going

I’m sure the specific commands and skills will look different in six months as I learn what works and what doesn’t. But the underlying pattern feels durable: compose specialist agents with deep domain context, validate their output, and feed learnings back into the system.

I’ve published updated files to the Product AI Public repo, including the session memory commands and a generalized version of the multi-agent escalation workflow. If you’re building something similar, those might be useful starting points.

The value of this system is in how the pieces reinforce each other. Domain skills make agents useful for real investigations. Session memory means the system gets smarter over time. And the promote-to-infrastructure loop ties it together, so each piece of work has a chance to make the next one better.

14 March 2026

When Using AI Leads to “Brain Fry"

I am definitely feeling the “brain fry” right now:

We found that the phenomenon described in these posts—cognitive exhaustion from intensive oversight of AI agents—is both real and significant. We call it “AI brain fry,” which we define as mental fatigue from excessive use or oversight of AI tools beyond one’s cognitive capacity. Participants described a “buzzing” feeling or a mental fog with difficulty focusing, slower decision-making, and headaches.

The research is fascinating and worth reading, with super interesting findings like this:

As employees go from using one AI tool to two simultaneously, they experience a significant increase in productivity. As they incorporate a third tool, productivity again increases, but at a lower rate. After three tools, though, productivity scores dipped. Multitasking is notoriously unproductive, and yet we fall for its allure time and again.

Earlier this week I had this thought: “Oh no, I think I’ve blown out my context window. I wish I could add some more tokens to my brain. Until then I might just have to respond to new requests with 401 Unauthorized.”

And that’s when I realized I probably need to go touch grass or something.

14 March 2026

AI should help us produce better code

As usual, Simon Willison hits the nail on the head here:

If adopting coding agents demonstrably reduces the quality of the code and features you are producing, you should address that problem directly: figure out which aspects of your process are hurting the quality of your output and fix them. Shipping worse code with agents is a choice. We can choose to ship code that is better instead.

Also see Mitchell Hashimoto’s idea of “harness engineering”:

It is the idea that anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again.