Logfire

Playground in Prod - Optimising Agents in Production Environments — Samuel Colvin, Pydantic

Playground in Prod - Optimising Agents in Production Environments — Samuel Colvin, Pydantic

Samuel Colvin, creator of Pydantic, demonstrates a hands-on workflow for continuously optimizing AI agents in production. The session covers using Logfire for running evaluations, GEPA (Genetic Pareto) for autonomously evolving better prompts, and managed variables to deploy these improvements to live services without redeployment.

Evaluating AI Agents: Why It Matters and How We Do It

Evaluating AI Agents: Why It Matters and How We Do It

Annie Condon and Jeff Groom from Acre Security detail their practical approach to robustly evaluating non-deterministic AI agents. They share their philosophy that evaluations are critical for quality, introduce their "X-ray machine" analogy for observability, and walk through their evaluation stack, including versioning strategies and the use of tools like Logfire for tracing and Confident AI (Deep Evals) for systematic metric tracking.