Llm evals

Combine Skills and MCP to Close the Context Gap — Pedro Rodrigues, Supabase

Combine Skills and MCP to Close the Context Gap — Pedro Rodrigues, Supabase

Pedro Rodrigues from Supabase shares key lessons from building an agent skill to work with Postgres and Supabase. He explains why critical security rules must go in the main skill file, the importance of pointing to living documentation, and how providing opinionated workflow guidance closes the reliability gap for agents in production systems.

Playground in Prod - Optimising Agents in Production Environments — Samuel Colvin, Pydantic

Playground in Prod - Optimising Agents in Production Environments — Samuel Colvin, Pydantic

Samuel Colvin, creator of Pydantic, demonstrates a hands-on workflow for continuously optimizing AI agents in production. The session covers using Logfire for running evaluations, GEPA (Genetic Pareto) for autonomously evolving better prompts, and managed variables to deploy these improvements to live services without redeployment.