No Satisfaction

Posted 2025-07-18

I don’t actually get much satisfaction from saying “I told you so” because I only do it when the bad thing I was worried about has happened. What motivates this observation is the recent METR paper claiming that using AI lowers developers’ productivity. Cat Hicks and others have done a better job of explaining the flaws in METR’s methodology than I possibly could. What I want now is a workshop that teaches programmers how to do trustworthy experiments on LLMs and spot the mistakes in others’ studies.

Thousands and thousands of developers are trying to benchmark bots, figure out if spicy autocomplete is making them more productive, et endless cetera. Most of those efforts are methodological and statistical dumpster fires; while I don’t want to be cynical, I think the same programmers who ignored It Will Never Work in Theory might pay attention to something with “AI” in its title. Please, someone, save us from ourselves…