Evals

Build Hour: AgentKit

Build Hour: AgentKit

A deep dive into OpenAI's AgentKit, demonstrating how to visually build, deploy, and optimize multi-step, tool-calling agents using Agent Builder, ChatKit, and the integrated Evals platform.

Al Engineering 101 with Chip Huyen (Nvidia, Stanford, Netflix)

Al Engineering 101 with Chip Huyen (Nvidia, Stanford, Netflix)

Chip Huyen, an AI expert and author of 'AI Engineering', explains the realities of building successful AI applications. She covers the nuances of model training, the critical role of data quality in RAG systems, the mechanics of RLHF, and why the future of AI improvement lies in post-training, system-level thinking, and solving UX problems rather than just chasing the newest models.

Before Building AI Agents Watch This (Deep Agent Expertise)

Before Building AI Agents Watch This (Deep Agent Expertise)

Nishikant Dhanuka from Prosus Group shares practical lessons on building effective AI agents for e-commerce and productivity. He covers why context engineering is more crucial than prompt tweaking, how to build a modern search pipeline, the failures of pure-chat interfaces, and why a robust evaluation framework is the real competitive advantage.

Build Hour: Voice Agents

Build Hour: Voice Agents

A deep dive into building sophisticated voice agents using OpenAI's Realtime API and Agents SDK. The session covers architectural patterns like chained vs. end-to-end models, the use of multi-agent systems with handoffs for specialized tasks, and best practices for production including debugging with traces, implementing guardrails, and creating robust evaluations.

Build Hour: Reinforcement Fine-Tuning

Build Hour: Reinforcement Fine-Tuning

A deep dive into Reinforcement Fine-Tuning (RFT), covering how to set up tasks, design effective graders, and run efficient training loops to improve model reasoning, based on a live demonstration from OpenAI's Build Hours.

Perceptual Evaluations: Evals for Aesthetics — Diego Rodriguez, Krea.ai

Perceptual Evaluations: Evals for Aesthetics — Diego Rodriguez, Krea.ai

KREA.ai's cofounder Diego Rodriguez discusses the critical failure of current AI evaluation metrics in understanding human perception and aesthetics, advocating for a new paradigm of personalized, perceptually-aware evals.