It's 2026, and We're Still Talking Evals
Maggie Konstanty, AI Product Manager at Prosus, provides a candid look into the realities of LLM evaluation in production. She argues that standard metrics like accuracy are misleading and advocates for a culture of continuous, goal-oriented evaluation focused on deep failure analysis and understanding real user behavior, asserting that mature teams inevitably build custom tooling to meet their specific needs.