Time horizon

How METR measures Long Tasks and Experienced Open Source Dev Productivity - Joel Becker, METR

How METR measures Long Tasks and Experienced Open Source Dev Productivity - Joel Becker, METR

AI models show remarkable progress on benchmarks, yet a field study with experienced developers revealed no productivity gains. This summary explores the disconnect between lab results and real-world impact, examining the causal relationship between compute and AI capabilities, the nuances of the developer productivity study, and future directions for measuring what AI can truly do.