Evals Are the New PRD

Braintrust makes a good case (apologies for the X.com link…) for rethinking how PMs work on AI products: the eval replaces the PRD.

An eval is a structured, repeatable test that answers one question. Does my AI system do the right thing? You define a set of inputs along with expected outputs, run them through your AI system, and score the results using algorithms or AI judges.

The eval becomes both the spec and the acceptance criteria. The directive to engineering:

“Here is the eval. Make this number go up.”

That’s very different to how most teams work today, but I can definitely see the industry moving this way. Product usage generates signals, observability captures them, and evals turn them into improvement targets. The PM’s job is to define what “good” looks like in code and curate the data that reveals what “bad” looks like.

The PM skills that transfer are the same ones that always mattered — discovering needs and opportunities, and making judgment calls about what to build for business value. The difference is that instead of a document that describes the intent, you have a test suite that encodes it.