Testing

Aug 06, 2025

Evals Are Not Unit Tests — Ido Pesok, Vercel v0

Ido Pesok from Vercel explains why LLM-based applications often fail in production despite successful demos, and presents a systematic framework for building reliable AI systems using application-layer evaluations ("evals").

Aug 03, 2025

Practical tactics to build reliable AI apps — Dmitry Kuchin, Multinear

Moving an AI PoC from 50% to 100% reliability requires a new development paradigm. This talk introduces a practical, evaluations-first approach, reverse-engineering tests from real-world user scenarios and business outcomes to build a robust benchmark, prevent regressions, and enable confident optimization.

Jul 25, 2025

Beyond the Prototype: Using AI to Write High-Quality Code - Josh Albrecht, Imbue

Josh Albrecht, CTO of Imbue, discusses the engineering challenges in building reliable AI coding agents. He introduces Sculptor, an experimental environment designed to build trust in AI-generated code by focusing on preventing and detecting problems through structured workflows, automated testing, and AI-driven analysis, moving beyond simple code generation to create maintainable software.

← Previous