Pydantic

Playground in Prod - Optimising Agents in Production Environments — Samuel Colvin, Pydantic

Playground in Prod - Optimising Agents in Production Environments — Samuel Colvin, Pydantic

Samuel Colvin, creator of Pydantic, demonstrates a hands-on workflow for continuously optimizing AI agents in production. The session covers using Logfire for running evaluations, GEPA (Genetic Pareto) for autonomously evolving better prompts, and managed variables to deploy these improvements to live services without redeployment.

⚡️Monty: the ultrafast Python interpreter by Agents for Agents — Samuel Colvin, Pydantic

⚡️Monty: the ultrafast Python interpreter by Agents for Agents — Samuel Colvin, Pydantic

Sam Khavari, the creator of Pydantic, introduces Monty, a new, secure, and high-performance Python interpreter written in Rust. Monty is designed specifically for AI agents, bridging the gap between simple, limited tool-calling and complex, slow, full-featured sandboxes.

Evaluating AI Agents: Why It Matters and How We Do It

Evaluating AI Agents: Why It Matters and How We Do It

Annie Condon and Jeff Groom from Acre Security detail their practical approach to robustly evaluating non-deterministic AI agents. They share their philosophy that evaluations are critical for quality, introduce their "X-ray machine" analogy for observability, and walk through their evaluation stack, including versioning strategies and the use of tools like Logfire for tracing and Confident AI (Deep Evals) for systematic metric tracking.