Agent Evals

Tokenless

Jan 09, 2026

Collaborative AI Agents At OpenAI

Robert from OpenAI discusses the critical role of structured evaluations (evals) and graders for developing advanced collaborative agents. He explores the limitations of 'vibe-based' assessments, introduces a maturity model for evals, and presents a comprehensive rubric for measuring agent performance beyond simple accuracy, connecting these concepts to the power of Reinforcement Fine-Tuning (RFT).