Observability

AI Agents: Transforming Anomaly Detection & Resolution

AI Agents: Transforming Anomaly Detection & Resolution

Martin Keen explores how agentic AI can significantly reduce IT downtime and Mean Time To Repair (MTTR) by moving beyond naive data dumps and embracing context-aware analysis. The key lies in using topology-aware correlation to curate relevant data for an AI agent, which can then systematically identify the root cause, provide explainable insights, and generate actionable remediation steps, ultimately augmenting human SREs rather than replacing them.

Evaluation-Driven Development with MLflow 3.0

Evaluation-Driven Development with MLflow 3.0

Yuki Watanabe from Databricks introduces Evaluation-Driven Development (EDD) as a critical methodology for building production-ready AI agents. This talk explores the five pillars of EDD and demonstrates how MLflow 3.0's new features—including one-line tracing, automated evaluation, human-in-the-loop feedback, and monitoring—provide a comprehensive toolkit to ensure agent quality and reliability.

Streamline evaluation, monitoring, optimization of AI data flywheel with NVIDIA and Weights & Biases

Streamline evaluation, monitoring, optimization of AI data flywheel with NVIDIA and Weights & Biases

A walkthrough of the NVIDIA Data Flywheel Blueprint, demonstrating how to use production data and Weights & Biases to systematically fine-tune AI agents. This process enhances model accuracy and efficiency by creating a continuous improvement cycle, moving beyond the limitations of prompt engineering.

The Hidden Bottlenecks Slowing Down AI Agents

The Hidden Bottlenecks Slowing Down AI Agents

Paul van der Boor and Bruce Martens from Prosus discuss the real bottlenecks in AI agent development, arguing that the primary challenges are not tools, but rather evaluation, data quality, and feedback loops. They detail their 'buy-first' philosophy, the practical reasons they often build in-house, and how new coding agents like Devon and Cursor are changing their development workflows.

MLflow 3.0: The Future of AI Agents

MLflow 3.0: The Future of AI Agents

Eric Peter from Databricks outlines the evolution from the traditional MLOps lifecycle to the more complex Agent Ops lifecycle. He details the five essential components of a successful agent development platform and introduces MLflow 3.0, a new release designed to provide a comprehensive, open-standard solution for building, evaluating, and deploying AI agents.

LLMOps for eval-driven development at scale

LLMOps for eval-driven development at scale

Mercari's engineering team shares their practical, evaluation-centric approach to LLMOps. Learn how they leverage tiered evaluations, strategic tooling for observability, and rapid iteration to productionize LLM features for over 23 million users, emphasizing that good 'evals' are often more critical than model fine-tuning or RAG.