Mlops

Small Language Models are the Future of Agentic AI Reading Group

Small Language Models are the Future of Agentic AI Reading Group

This paper challenges the prevailing "bigger is better" narrative in AI, arguing that Small Language Models (SLMs) are not just sufficient but often superior for agentic AI tasks due to their efficiency, speed, and specialization. The discussion explores the paper's core arguments, counterarguments, and the practical implications of adopting a hybrid LLM-SLM approach.

Too much lock-in for too little gain: agent frameworks are a dead-end // Valliappa Lakshmanan

Too much lock-in for too little gain: agent frameworks are a dead-end // Valliappa Lakshmanan

Lak Lakshmanan presents a robust architecture for building production-quality, framework-agnostic agentic systems. He advocates for using simple, composable GenAI patterns, off-the-shelf tools for governance, and a strong emphasis on a human-in-the-loop design to create continuously learning systems that avoid vendor lock-in.

AI traces are worth a thousand logs

AI traces are worth a thousand logs

An exploration of how a single, structured trace, based on OpenTelemetry standards, offers a superior method for debugging, testing, and understanding AI agent behavior compared to traditional logging. Learn how programmatic access to traces enables robust evaluation and the creation of golden datasets for building more reliable autonomous systems.

Iterating on Your AI Evals // Mariana Prazeres // Agents in Production 2025

Iterating on Your AI Evals // Mariana Prazeres // Agents in Production 2025

Moving an AI agent from a promising demo to a reliable product is challenging. This talk presents a startup-friendly, iterative process for building robust evaluation frameworks, emphasizing that you must iterate on the evaluations themselves—the metrics and the data—not just the prompts and models. It outlines a practical "crawl, walk, run" approach, starting with simple heuristics and scaling to an advanced system with automated checks and human-in-the-loop validation.

Integration of AI into Traditional Systems // Hakan Tek // Agents in Production 2025

Integration of AI into Traditional Systems // Hakan Tek // Agents in Production 2025

This talk explores practical, low-disruption strategies for integrating AI capabilities into traditional and legacy enterprise systems without requiring a complete overhaul. It covers common challenges, effective integration patterns like API-based and hybrid approaches, and highlights readily available tools to help teams start small and deliver immediate business value.

Evaluation-Driven Development with MLflow 3.0

Evaluation-Driven Development with MLflow 3.0

Yuki Watanabe from Databricks introduces Evaluation-Driven Development (EDD) as a critical methodology for building production-ready AI agents. This talk explores the five pillars of EDD and demonstrates how MLflow 3.0's new features—including one-line tracing, automated evaluation, human-in-the-loop feedback, and monitoring—provide a comprehensive toolkit to ensure agent quality and reliability.