LLM Evals

May 15, 2026

Combine Skills and MCP to Close the Context Gap — Pedro Rodrigues, Supabase

Pedro Rodrigues from Supabase shares key lessons from building an agent skill to work with Postgres and Supabase. He explains why critical security rules must go in the main skill file, the importance of pointing to living documentation, and how providing opinionated workflow guidance closes the reliability gap for agents in production systems.

May 07, 2026

Playground in Prod - Optimising Agents in Production Environments — Samuel Colvin, Pydantic

Samuel Colvin, creator of Pydantic, demonstrates a hands-on workflow for continuously optimizing AI agents in production. The session covers using Logfire for running evaluations, GEPA (Genetic Pareto) for autonomously evolving better prompts, and managed variables to deploy these improvements to live services without redeployment.