Measuring AI's Impact on Shipping Speed and Code Quality

Will Larson has a good post about how they’re adopting AI at his company. The process is interesting, but this is the part that jumped out at me:

My biggest fear for AI adoption is that they can focus on creating the impression of adopting AI, rather than focusing on creating additional productivity. Optics are a core part of any work, but almost all interesting work occurs where optics and reality intersect.

It’s really hard to figure out if AI tools are (1) helping teams ship faster (2) without sacrificing quality.

We’re working on figuring out this problem right now at Cloudflare. Our proposed approach sidesteps the problem of per-commit AI attribution (did Copilot write this line? did Claude?) by correlating team-level AI tool usage with team-level health metrics over time. If a team’s AI adoption increases by 30% and their change failure rate stays stable, that’s a useful signal. If AI usage spikes and incidents start trending up, that’s worth investigating.

The key insight is that you don’t need perfect attribution to get directionally useful data. Correlation isn’t causation, and teams adopting AI tools may already be more experimental or higher-performing. But at least you’re measuring something real instead of the something like “# of lines written by AI”, which leads straight to the Goodhart’s Law problem where metrics become targets.