Mlops

Catastrophic agent failure and how to avoid it // Edward Upton // Agents in Production 2025

Catastrophic agent failure and how to avoid it // Edward Upton // Agents in Production 2025

Edward, a founding engineer at Asteroid, discusses the critical challenge of managing catastrophic failures in agentic browser solutions, particularly in high-stakes domains like healthcare and insurance. He shares real-world examples of agent failures and outlines a practical framework for building more reliable, predictable, and accountable agents by scoping their capabilities, implementing robust human-in-the-loop tooling, and employing independent evaluation systems.

Advancing the Cost-Quality Frontier in Agentic AI // Krista Opsahl-Ong // Agents in Production 2025

Advancing the Cost-Quality Frontier in Agentic AI // Krista Opsahl-Ong // Agents in Production 2025

Krista Opsahl-Ong from Databricks introduces Agent Bricks, a platform designed to overcome the key challenges of productionizing enterprise AI agents. The talk covers common use cases, the difficult trade-offs between cost and quality, and how Agent Bricks uses automated evaluation and advanced optimization techniques to build cost-effective, high-performance agents.

Small Language Models are the Future of Agentic AI Reading Group

Small Language Models are the Future of Agentic AI Reading Group

This paper challenges the prevailing "bigger is better" narrative in AI, arguing that Small Language Models (SLMs) are not just sufficient but often superior for agentic AI tasks due to their efficiency, speed, and specialization. The discussion explores the paper's core arguments, counterarguments, and the practical implications of adopting a hybrid LLM-SLM approach.

Too much lock-in for too little gain: agent frameworks are a dead-end // Valliappa Lakshmanan

Too much lock-in for too little gain: agent frameworks are a dead-end // Valliappa Lakshmanan

Lak Lakshmanan presents a robust architecture for building production-quality, framework-agnostic agentic systems. He advocates for using simple, composable GenAI patterns, off-the-shelf tools for governance, and a strong emphasis on a human-in-the-loop design to create continuously learning systems that avoid vendor lock-in.

AI traces are worth a thousand logs

AI traces are worth a thousand logs

An exploration of how a single, structured trace, based on OpenTelemetry standards, offers a superior method for debugging, testing, and understanding AI agent behavior compared to traditional logging. Learn how programmatic access to traces enables robust evaluation and the creation of golden datasets for building more reliable autonomous systems.

Iterating on Your AI Evals // Mariana Prazeres // Agents in Production 2025

Iterating on Your AI Evals // Mariana Prazeres // Agents in Production 2025

Moving an AI agent from a promising demo to a reliable product is challenging. This talk presents a startup-friendly, iterative process for building robust evaluation frameworks, emphasizing that you must iterate on the evaluations themselves—the metrics and the data—not just the prompts and models. It outlines a practical "crawl, walk, run" approach, starting with simple heuristics and scaling to an advanced system with automated checks and human-in-the-loop validation.