Menu

Building MCP servers in the real world

This has been my experience with MCP servers as well. As useful as I think my Last.fm MCP server is, I can’t see it every having more than a dozen users. But internal company servers are massively useful:

MCP is being used especially heavily by internal data and platform teams to give internal users access to systems. These are systems that these users perhaps already had access to, but it was either too complex or too broad, or needed a lot of documentation or special skills to use.

Wiki search is so much better now that I can use our internal MCP server for it via Windsurf.

Source: Building MCP servers in the real world

Measuring AI's Impact on Shipping Speed and Code Quality

Will Larson has a good post about how they’re adopting AI at his company. The process is interesting, but this is the part that jumped out at me:

My biggest fear for AI adoption is that they can focus on creating the impression of adopting AI, rather than focusing on creating additional productivity. Optics are a core part of any work, but almost all interesting work occurs where optics and reality intersect.

It’s really hard to figure out if AI tools are (1) helping teams ship faster (2) without sacrificing quality.

We’re working on figuring out this problem right now at Cloudflare. Our proposed approach sidesteps the problem of per-commit AI attribution (did Copilot write this line? did Claude?) by correlating team-level AI tool usage with team-level health metrics over time. If a team’s AI adoption increases by 30% and their change failure rate stays stable, that’s a useful signal. If AI usage spikes and incidents start trending up, that’s worth investigating.

The key insight is that you don’t need perfect attribution to get directionally useful data. Correlation isn’t causation, and teams adopting AI tools may already be more experimental or higher-performing. But at least you’re measuring something real instead of the something like “# of lines written by AI”, which leads straight to the Goodhart’s Law problem where metrics become targets.

How I Use AI for Product Work

Update! I wrote a follow-up post here: How My AI Product “Second Brain” Evolved.

I’ve been refining my approach to using LLMs for product work, and I figured it’s time to write up how I actually use them day-to-day.

The most valuable thing an AI assistant can do for product work is push back on weak reasoning, spot gaps you missed, and force you to articulate why your idea is good. A peer reviewer who shares your product philosophy, basically. With the right prompts, AI assistants are also good at producing background and framing documents: explainers that synthesize complex topics and summaries of technical concepts.

Here’s how I’ve set it up, what makes it work, and how you might build something similar.

The Philosophy: A Sparring Partner

I believe LLMs are most useful when you give them two things: context and constraints.

Context tells the model who you are, what you’re working on, and what “good” looks like in your world. Constraints keep it from drifting into generic advice or invented frameworks.

Every prompt I use is designed to provide both. They’re opinionated on purpose. I’d rather have an assistant that pushes back on bad ideas than one that says “Great idea!” to everything.

I don’t want AI writing presentations for me. I want a thinking partner that:

  • Challenges weak problem statements before I waste time on solutions
  • Spots missing success criteria I forgot to define
  • Asks “why?” when my reasoning gets hand-wavy
  • Points out when I’m jumping to solutions before understanding the problem
  • Helps me create background docs and explainers that set context for others

The Building Blocks

Here’s the folder structure I maintain, with a series of Markdown files organized by purpose:

llm-prompts/
├── prompts/           # System prompts for different use cases
│   ├── pm/            # Product management prompts
│   └── technical/     # Technical/engineering prompts
├── context/           # Personal context files (who I am, how I work)
├── reference/         # Syntax guides and reference docs
└── work/              # Saved feedback and refined docs

What makes this work is how you combine them.

Layer 1: System Prompts

These are the instructions that tell the AI how to behave for a specific task. I have different prompts for different jobs:

  • General PM sparring: A prompt that knows my product philosophy and pushes back on weak reasoning. I use this for thinking through tradeoffs, preparing for meetings, and sanity-checking my approach.
  • Document review: Prompts specifically designed to critique PRDs, OKRs, strategy docs, and other artifacts. These encode what “good” looks like and call out common anti-patterns.
  • Idea stress-testing: A prompt that I stole from my friend Stephen, which simulates a debate between an optimist and a skeptic to pressure-test new ideas before I get too attached to them.
  • Technical understanding: Prompts that help me understand systems, architectural decisions, and technical concepts well enough to lead effectively (I’m not an engineer, but I need to hold my own in architecture reviews).

Each one carries a specific point of view about what good work looks like.

Layer 2: Personal Context

I maintain files that describe:

  • Who I am: My role, my experience, my communication style
  • How I work: My product philosophy, my management approach, my values
  • What I’m working on: Current projects, team context, company priorities

When I start a conversation, I can pull in the relevant context files alongside my prompt. The model then has the background it needs to give me advice that fits my situation, instead of generic best practices from a blog post.

Layer 3: Reference Materials

Sometimes you need the model to follow specific formats or conventions. I keep reference files for things like wiki markup syntax, documentation templates, or internal style guides. These keep the output usable without a round of reformatting.

How I Actually Use This

I use Windsurf as my daily driver, and it has a feature that makes this whole system work: the @ mention. In the chat panel, I can reference any file by typing @ followed by the path. Windsurf then includes that file’s contents as context for the conversation.

This means I can compose my “assistant” on the fly by combining:

  1. A system prompt for the task at hand
  2. Relevant personal context files
  3. The document or code I’m working on

Example: Document Review

When I need feedback on a PRD before sharing it with stakeholders, I’ll start a conversation and reference my PRD review prompt plus my product philosophy context. Then I paste in the PRD and ask for critique.

The model comes back with feedback measured against my own standards. It’ll call out if my problem statement is vague, if my success metrics aren’t measurable, or if I’m jumping to solutions before properly framing the problem—the kind of pushback I’d want from a peer reviewer.

Example: Brainstorming Partner

For early-stage thinking, I use a more conversational prompt that knows how I like to explore ideas. I’ll describe what I’m thinking about and ask it to poke holes, suggest angles I haven’t considered, or help me articulate why something feels off.

This helps most before big meetings. I can rehearse my reasoning and get challenged on the weak spots before I’m in front of stakeholders.

Example: Technical Understanding

I’m not an engineer, but I work with technical teams. When I need to understand how a system works well enough to ask good questions or spot when something doesn’t add up, I use prompts designed for technical explanation.

The key is that these prompts know to explain things without condescension but also without assuming I know the jargon. They cite specific files and line numbers when relevant, and they explain the “why” behind design decisions.

Connecting to Real Data

MCP (Model Context Protocol) servers connect the AI to external data sources: internal wikis, documentation sites, code repositories, APIs. So it can answer from real information instead of guessing from training data.

In my prompts, I tell the model which MCP servers are available and when to use them. For example, my technical prompts instruct the model to:

  • Search official documentation first to ground answers in verified information
  • Check internal wikis for known issues, edge cases, and workarounds
  • Look at code repositories when documentation is incomplete
  • Always cite sources with links so I can verify

Now the AI answers like someone with access to your company’s actual knowledge base, citing real docs and real code instead of best-practice boilerplate.

Keeping a Record

Save the conversation output somewhere useful.

I have a work/ folder organized by topic where I save feedback and refined thinking. When the model gives me good critique on a PRD, I’ll ask it to write a summary of the key issues to a Markdown file I can reference later. This keeps the insights from getting lost in chat history.

What I’ve Learned

  1. Context files are worth the investment. I have files that describe who I am, how I work, and what I value. Updating these takes time, but it pays off in every conversation.
  2. Pushback is the point. These prompts are designed to challenge bad thinking. If the model is pushing back on your approach, take it seriously: it might be right.
  3. Iterate on the prompts. I update these regularly based on what works and what doesn’t. If a prompt isn’t helping, change it.
  4. Less context is often more. Including too much context can dilute the signal. Start with the minimum you need, add more if the model seems confused.

The thinking is still mine

None of this does the thinking for me. It encodes my preferences and philosophies into something an LLM can use as a baseline for pushing back. I still write my own PRDs, OKRs, and strategy docs, because those represent my actual thinking. But I let AI help me create background documents, explainers, and context-setting materials. And I have a sparring partner that catches the gaps I miss and asks the uncomfortable questions before stakeholders do.

If you build something similar, I’d love to hear how it goes.

"Disagree and Let’s See"

I like this alternative to the “Disagree and Commit” saying:

“Disagree and let’s see” allows you to stay aligned with the team without forcing you to pretend you had conviction you didn’t have. It lets you walk into a room with your team and be honest:

“Here’s the path that was chosen. It wasn’t my first pick, but here’s the experiment we’re running, and here’s what we’re trying to learn.”

That’s a much more authentic stance for most leaders than repeating something with a tight smile and hoping no one notices your doubt.

Source: “Disagree and Let’s See”

New side project: Discord Stock & Crypto Bot

Not sure how many people would be interested in this, but it was fun to make so I thought I’d share. This is a Discord bot that provides real-time stock and cryptocurrency information, 30-day price trends, and AI-powered news summaries through slash commands. When you add the bot to Discord you can use the /stock and /crypto commands to get information like this:

Want to add it to your Discord server? Head over here!

Horrible edge cases to consider when dealing with music

Metadata is the hardest problem in software, and these examples prove my point. Don’t @ me!

My favourite: a band named brouillard, with a single member called brouillard, whose every single album is named brouillard, and of course, so is every single track.

Source: Horrible edge cases to consider when dealing with music

Brief thoughts on the recent Cloudflare outage

Lorin Hochstein is a big name in the LFI (Learning From Incidents) space. He often writes about post-incident reviews, and he has a very interesting write-up of the Cloudflare outage on November 18, 2025 blog post. I especially loved this part:

Companies generally err on the side of saying less rather than more. After all, if you provide more detail, you open yourself up to criticism that the failure was due to poor engineering. The fewer details you provide, the fewer things people can call you out on. It’s not hard to find people online criticizing Cloudflare online using the details they provided as the basis for their criticism.

I think it would advance our industry if people held the opposite view: the more details that are provided an incident writeup, the higher esteem we should hold that organization. I respect Cloudflare is an engineering organization a lot more precisely because they are willing to provide these sorts of details. I don’t want to hear what Cloudflare should have done from people who weren’t there, I want to hear us hold other companies up to Cloudflare’s standard for describing the details of a failure mode and the inherently confusing nature of incident response.

Source: Brief thoughts on the recent Cloudflare outage

The price of admission

Some tough love here about what it means to have “executive presence”.

When someone tells you that you need more business sense, or that you’re not ready for more scope, or that you need to level up, this is typically what they’re trying to communicate. That you’re more concerned with how work happens than with what work should happen in the first place.

Source: The price of admission

How I give the right amount of context (in any situation)

A great list of things to keep in mind when communicating via writing. The article is focused on “managing up” but these principles are relevant in a much broader context as well.

What questions does your manager usually ask? Answer those questions yourself. If you take anything away from this article, make it this. Every manager has their own idiosyncrasies, worldview, values, etc. That’s why the best thing to do is to pattern match. Consider what they’ve asked you in the past, when talking to you or others. Try to give context through that lens.

Source: How I give the right amount of context (in any situation)

Selling Lemons

This is an essay I think everyone should read, front to back. It’s about all the things we are living through right now, but it’s especially about work (and AI):

I’m not sure hiring can ever be much more efficient, because neither side has reason to show themselves as they really are, warts and all. Idealistically, both would come straight; pragmatically, it is a game of chicken. Candidates polish résumés and present curated versions of their abilities, listing outcomes and impact statistics with dubious accuracy and provenance. Companies do the same, putting culture and mission front and center while hiding systematic dysfunctions and looming existential risks. When neither side is forthcoming, you’re left with proxies: a famous logo on a resume, a polished culture deck.

Source: Selling Lemons