Ai evaluation

The 100-person lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

The 100-person lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

Edwin Chen, founder and CEO of Surge AI, discusses his contrarian, bootstrapped approach to building a billion-dollar company, the critical role of high-quality data and 'taste' in training advanced AI models, the pitfalls of current benchmarks, and why Reinforcement Learning environments are the next frontier in AI.

Ideas: Community building, machine learning, and the future of AI

Ideas: Community building, machine learning, and the future of AI

Co-founders Jenn Wortman Vaughan and Hanna Wallach reflect on 20 years of the Women in Machine Learning (WiML) workshop, discussing its origins, their parallel careers in responsible AI, and the future challenges of evaluating generative AI and fostering critical thought.

Designing AI Agents for the Complex Realities of Healthcare

Designing AI Agents for the Complex Realities of Healthcare

Dr. Sarah Gebauer presents a clinical framework for deploying AI agents in healthcare, drawing a powerful analogy between AI agents and medical residents. She outlines the critical risks, validation strategies, and post-deployment monitoring required to make agents useful, safe, and credible in high-stakes clinical environments.

2025 is the Year of Evals! Just like 2024, and 2023, and … — John Dickerson, CEO Mozilla AI

2025 is the Year of Evals! Just like 2024, and 2023, and … — John Dickerson, CEO Mozilla AI

A deep dive into why 2025 is poised to be the 'Year of Evals' for AI. The speaker argues that a confluence of factors—the C-suite's post-ChatGPT awakening, budget dynamics, and the rise of autonomous agentic systems—has finally made AI evaluation a critical, top-of-mind issue for enterprise leaders.