Evals

Apr 28, 2026

Why building eval platforms is hard — Phil Hetzel, Braintrust

An evaluation platform is more than a simple test runner; it's a complex system for creating shared definitions of quality. This talk explores the evolution of eval platforms from basic spreadsheets to sophisticated, integrated systems, highlighting the hidden data and systems engineering challenges involved in making them credible, scalable, and usable for building trustworthy AI agents.

Jan 11, 2026

What OpenAI & Google engineers learned deploying 50+ AI products in production

Aishwarya Naresh Reganti and Kiriti Badam, with experience from OpenAI, Google, and Amazon, share a framework for building successful enterprise AI products. They detail why AI development differs from traditional software, emphasizing the challenges of non-determinism and the agency-control trade-off, and introduce their 'Continuous Calibration, Continuous Development' (CC/CD) lifecycle to build reliable, value-driven AI systems.

Oct 29, 2025

Build Hour: AgentKit

A deep dive into OpenAI's AgentKit, demonstrating how to visually build, deploy, and optimize multi-step, tool-calling agents using Agent Builder, ChatKit, and the integrated Evals platform.

Oct 23, 2025

Al Engineering 101 with Chip Huyen (Nvidia, Stanford, Netflix)

Chip Huyen, an AI expert and author of 'AI Engineering', explains the realities of building successful AI applications. She covers the nuances of model training, the critical role of data quality in RAG systems, the mechanics of RLHF, and why the future of AI improvement lies in post-training, system-level thinking, and solving UX problems rather than just chasing the newest models.

Sep 05, 2025

Before Building AI Agents Watch This (Deep Agent Expertise)

Nishikant Dhanuka from Prosus Group shares practical lessons on building effective AI agents for e-commerce and productivity. He covers why context engineering is more crucial than prompt tweaking, how to build a modern search pipeline, the failures of pure-chat interfaces, and why a robust evaluation framework is the real competitive advantage.

Sep 03, 2025

Build Hour: Voice Agents

A deep dive into building sophisticated voice agents using OpenAI's Realtime API and Agents SDK. The session covers architectural patterns like chained vs. end-to-end models, the use of multi-agent systems with handoffs for specialized tasks, and best practices for production including debugging with traces, implementing guardrails, and creating robust evaluations.