Evals

What OpenAI & Google engineers learned deploying 50+ AI products in production

What OpenAI & Google engineers learned deploying 50+ AI products in production

Aishwarya Naresh Reganti and Kiriti Badam, with experience from OpenAI, Google, and Amazon, share a framework for building successful enterprise AI products. They detail why AI development differs from traditional software, emphasizing the challenges of non-determinism and the agency-control trade-off, and introduce their 'Continuous Calibration, Continuous Development' (CC/CD) lifecycle to build reliable, value-driven AI systems.

Build Hour: AgentKit

Build Hour: AgentKit

A deep dive into OpenAI's AgentKit, demonstrating how to visually build, deploy, and optimize multi-step, tool-calling agents using Agent Builder, ChatKit, and the integrated Evals platform.

Al Engineering 101 with Chip Huyen (Nvidia, Stanford, Netflix)

Al Engineering 101 with Chip Huyen (Nvidia, Stanford, Netflix)

Chip Huyen, an AI expert and author of 'AI Engineering', explains the realities of building successful AI applications. She covers the nuances of model training, the critical role of data quality in RAG systems, the mechanics of RLHF, and why the future of AI improvement lies in post-training, system-level thinking, and solving UX problems rather than just chasing the newest models.

Before Building AI Agents Watch This (Deep Agent Expertise)

Before Building AI Agents Watch This (Deep Agent Expertise)

Nishikant Dhanuka from Prosus Group shares practical lessons on building effective AI agents for e-commerce and productivity. He covers why context engineering is more crucial than prompt tweaking, how to build a modern search pipeline, the failures of pure-chat interfaces, and why a robust evaluation framework is the real competitive advantage.

Build Hour: Voice Agents

Build Hour: Voice Agents

A deep dive into building sophisticated voice agents using OpenAI's Realtime API and Agents SDK. The session covers architectural patterns like chained vs. end-to-end models, the use of multi-agent systems with handoffs for specialized tasks, and best practices for production including debugging with traces, implementing guardrails, and creating robust evaluations.

Build Hour: Reinforcement Fine-Tuning

Build Hour: Reinforcement Fine-Tuning

A deep dive into Reinforcement Fine-Tuning (RFT), covering how to set up tasks, design effective graders, and run efficient training loops to improve model reasoning, based on a live demonstration from OpenAI's Build Hours.