MLOps

May 11, 2026

Why AI Agents Shouldn't Replace Your Fraud Models

Varant Zanoyan, original author of the Chronon feature platform, introduces 'agentic experimentation'—a pattern where AI agents improve high-stakes ML systems without making live decisions. He explains how Chronon solves key challenges like infrastructure sprawl, safety, and reproducibility through a semantic API, branch-based isolation, and compute reuse, enabling agents to safely create production-ready pipelines for human review.

May 07, 2026

Playground in Prod - Optimising Agents in Production Environments — Samuel Colvin, Pydantic

Samuel Colvin, creator of Pydantic, demonstrates a hands-on workflow for continuously optimizing AI agents in production. The session covers using Logfire for running evaluations, GEPA (Genetic Pareto) for autonomously evolving better prompts, and managed variables to deploy these improvements to live services without redeployment.

May 05, 2026

The Small Model Infrastructure Nobody Built (So We Did) — Filip Makraduli, Superlinked

Filip Makraduli from Superlinked discusses the common infrastructure gaps and profiling mistakes encountered when deploying small embedding and transformer models. He introduces the Superlinked Inference Engine (SIE), an open-source solution designed for dynamic model loading, hot-swapping, and memory-aware eviction to maximize GPU utilization and streamline the path from development to production.

May 01, 2026

Getting Humans Out of the Way: How to Work with Teams of Agents

Rob Ennals, creator of Broomy, discusses a paradigm shift in working with AI coding agents: moving away from micromanagement towards orchestrating teams of parallel agents. The key is to design robust, automated validation systems and reshape the development environment to empower agents to work autonomously, efficiently, and at scale.

Apr 28, 2026

Why building eval platforms is hard — Phil Hetzel, Braintrust

An evaluation platform is more than a simple test runner; it's a complex system for creating shared definitions of quality. This talk explores the evolution of eval platforms from basic spreadsheets to sophisticated, integrated systems, highlighting the hidden data and systems engineering challenges involved in making them credible, scalable, and usable for building trustworthy AI agents.

Apr 27, 2026

It's 2026, and We're Still Talking Evals

Maggie Konstanty, AI Product Manager at Prosus, provides a candid look into the realities of LLM evaluation in production. She argues that standard metrics like accuracy are misleading and advocates for a culture of continuous, goal-oriented evaluation focused on deep failure analysis and understanding real user behavior, asserting that mature teams inevitably build custom tooling to meet their specific needs.

← Previous Next →