Reliability

Catastrophic agent failure and how to avoid it // Edward Upton // Agents in Production 2025

Catastrophic agent failure and how to avoid it // Edward Upton // Agents in Production 2025

Edward, a founding engineer at Asteroid, discusses the critical challenge of managing catastrophic failures in agentic browser solutions, particularly in high-stakes domains like healthcare and insurance. He shares real-world examples of agent failures and outlines a practical framework for building more reliable, predictable, and accountable agents by scoping their capabilities, implementing robust human-in-the-loop tooling, and employing independent evaluation systems.

Evals Are Not Unit Tests — Ido Pesok, Vercel v0

Evals Are Not Unit Tests — Ido Pesok, Vercel v0

Ido Pesok from Vercel explains why LLM-based applications often fail in production despite successful demos, and presents a systematic framework for building reliable AI systems using application-layer evaluations ("evals").

Practical tactics to build reliable AI apps — Dmitry Kuchin, Multinear

Practical tactics to build reliable AI apps — Dmitry Kuchin, Multinear

Moving an AI PoC from 50% to 100% reliability requires a new development paradigm. This talk introduces a practical, evaluations-first approach, reverse-engineering tests from real-world user scenarios and business outcomes to build a robust benchmark, prevent regressions, and enable confident optimization.

From Self-driving to Autonomous Voice Agents — Brooke Hopkins, Coval

From Self-driving to Autonomous Voice Agents — Brooke Hopkins, Coval

Brooke Hopkins, founder of Coval, discusses how evaluation methodologies from the autonomous vehicle industry, particularly from her experience at Waymo, can be adapted to build reliable, scalable, and trustworthy voice and conversational AI systems.

Scaling AI Agents Without Breaking Reliability — Preeti Somal, Temporal

Scaling AI Agents Without Breaking Reliability — Preeti Somal, Temporal

Preeti Somal from Temporal explains that as AI agents move to production, they face significant reliability and scalability challenges. She introduces Temporal as a platform to abstract away this complexity, allowing developers to build robust, stateful AI agents by focusing on business logic instead of infrastructure plumbing like retries and error handling.