Observability

AI traces are worth a thousand logs

AI traces are worth a thousand logs

An exploration of how a single, structured trace, based on OpenTelemetry standards, offers a superior method for debugging, testing, and understanding AI agent behavior compared to traditional logging. Learn how programmatic access to traces enables robust evaluation and the creation of golden datasets for building more reliable autonomous systems.

AI Agents: Transforming Anomaly Detection & Resolution

AI Agents: Transforming Anomaly Detection & Resolution

Martin Keen explores how agentic AI can significantly reduce IT downtime and Mean Time To Repair (MTTR) by moving beyond naive data dumps and embracing context-aware analysis. The key lies in using topology-aware correlation to curate relevant data for an AI agent, which can then systematically identify the root cause, provide explainable insights, and generate actionable remediation steps, ultimately augmenting human SREs rather than replacing them.

Evaluation-Driven Development with MLflow 3.0

Evaluation-Driven Development with MLflow 3.0

Yuki Watanabe from Databricks introduces Evaluation-Driven Development (EDD) as a critical methodology for building production-ready AI agents. This talk explores the five pillars of EDD and demonstrates how MLflow 3.0's new features—including one-line tracing, automated evaluation, human-in-the-loop feedback, and monitoring—provide a comprehensive toolkit to ensure agent quality and reliability.

Streamline evaluation, monitoring, optimization of AI data flywheel with NVIDIA and Weights & Biases

Streamline evaluation, monitoring, optimization of AI data flywheel with NVIDIA and Weights & Biases

A walkthrough of the NVIDIA Data Flywheel Blueprint, demonstrating how to use production data and Weights & Biases to systematically fine-tune AI agents. This process enhances model accuracy and efficiency by creating a continuous improvement cycle, moving beyond the limitations of prompt engineering.

The Hidden Bottlenecks Slowing Down AI Agents

The Hidden Bottlenecks Slowing Down AI Agents

Paul van der Boor and Bruce Martens from Prosus discuss the real bottlenecks in AI agent development, arguing that the primary challenges are not tools, but rather evaluation, data quality, and feedback loops. They detail their 'buy-first' philosophy, the practical reasons they often build in-house, and how new coding agents like Devon and Cursor are changing their development workflows.

MLflow 3.0: The Future of AI Agents

MLflow 3.0: The Future of AI Agents

Eric Peter from Databricks outlines the evolution from the traditional MLOps lifecycle to the more complex Agent Ops lifecycle. He details the five essential components of a successful agent development platform and introduces MLflow 3.0, a new release designed to provide a comprehensive, open-standard solution for building, evaluating, and deploying AI agents.