Mlops

Integration of AI into Traditional Systems // Hakan Tek // Agents in Production 2025

Integration of AI into Traditional Systems // Hakan Tek // Agents in Production 2025

This talk explores practical, low-disruption strategies for integrating AI capabilities into traditional and legacy enterprise systems without requiring a complete overhaul. It covers common challenges, effective integration patterns like API-based and hybrid approaches, and highlights readily available tools to help teams start small and deliver immediate business value.

Evaluation-Driven Development with MLflow 3.0

Evaluation-Driven Development with MLflow 3.0

Yuki Watanabe from Databricks introduces Evaluation-Driven Development (EDD) as a critical methodology for building production-ready AI agents. This talk explores the five pillars of EDD and demonstrates how MLflow 3.0's new features—including one-line tracing, automated evaluation, human-in-the-loop feedback, and monitoring—provide a comprehensive toolkit to ensure agent quality and reliability.

Why Language Models Need a Lesson in Education

Why Language Models Need a Lesson in Education

Stephanie Kirmer, a staff machine learning engineer at DataGrail, adapts her experience as a former professor to address the challenge of evaluating LLMs in production. She proposes a robust methodology using LLM-based evaluators guided by rigorous, human-calibrated rubrics to bring objectivity and scalability to the subjective task of assessing text generation quality.

EDD: The Science of Improving AI Agents // Shahul Elavakkattil Shereef // Agents in Production 2025

EDD: The Science of Improving AI Agents // Shahul Elavakkattil Shereef // Agents in Production 2025

This talk introduces Eval-Driven Development (EDD) as a scientific alternative to 'vibe-based' iteration for improving AI agents. It covers quantitative evaluation (choosing strong end-to-end metrics, aligning LLM judges) and qualitative evaluation (using error and attribution analysis to debug failures), providing a structured framework for consistent agent improvement.

When Agents Hire Their Own Team: Inside Hypermode’s Concierge // Ryan Fox-Tyler

When Agents Hire Their Own Team: Inside Hypermode’s Concierge // Ryan Fox-Tyler

Ryan Fox-Tyler from Hypermode explains their philosophy of empowering AI agents to design and deploy other agents. He introduces Concierge, an agent that builds other agents, and details the underlying actor-based runtime built for scalability, fault tolerance, and efficient, event-driven execution of thousands of parallel agent instances.

The Truth About LLM Training

The Truth About LLM Training

Paul van der Boor and Zulkuf Genc from Prosus discuss the practical realities of deploying AI agents in production. They cover their in-house evaluation framework, strategies for navigating the GPU market, the importance of fine-tuning over building from scratch, and how they use AI to analyze usage patterns in a privacy-preserving manner.