EDD: The Science of Improving AI Agents // Shahul Elavakkattil Shereef // Agents in Production 2025
This talk introduces Eval-Driven Development (EDD) as a scientific alternative to 'vibe-based' iteration for improving AI agents. It covers quantitative evaluation (choosing strong end-to-end metrics, aligning LLM judges) and qualitative evaluation (using error and attribution analysis to debug failures), providing a structured framework for consistent agent improvement.