Ai evals

Why Tejal Patwardhan stopped underestimating the models - Episode 21

Why Tejal Patwardhan stopped underestimating the models - Episode 21

Tejal Patwardhan, head of OpenAI's frontier evals team, discusses the critical evolution of AI evaluations. She explains why traditional benchmarks fail as models become more capable, how OpenAI develops realistic, long-horizon tests (including groundbreaking wet lab experiments), and the implications of rapidly advancing multimodal and reasoning models for scientific discovery and the future of human work.

Why experts writing AI evals is creating the fastest-growing companies in history | Brendan Foody

Why experts writing AI evals is creating the fastest-growing companies in history | Brendan Foody

Brendan Foody, CEO of Mercor, discusses the critical role of AI evaluations (evals) in model improvement, detailing how his company achieved unprecedented growth by supplying high-skilled experts to top AI labs. He explores the shift to Reinforcement Learning from AI Feedback (RLAIF), the future of work in an AI-driven economy, and why he believes the path to AGI is paved with better evals, not just more data.