Observability

Evaluating AI Agents: Why It Matters and How We Do It

Evaluating AI Agents: Why It Matters and How We Do It

Annie Condon and Jeff Groom from Acre Security detail their practical approach to robustly evaluating non-deterministic AI agents. They share their philosophy that evaluations are critical for quality, introduce their "X-ray machine" analogy for observability, and walk through their evaluation stack, including versioning strategies and the use of tools like Logfire for tracing and Confident AI (Deep Evals) for systematic metric tracking.

Building Multi-Player AI Systems (and why it’s SO hard)

Building Multi-Player AI Systems (and why it’s SO hard)

MeshAgent introduces a multiplayer AI paradigm, shifting from single-user systems to collaborative 'Rooms' where teams of humans and agents can work together with shared context. This talk explores the platform's architecture, developer tools, and its approach to solving real-world collaborative tasks.

Reliability Engineering Mindset • Alex Ewerlöf & Charity Majors • GOTO 2025

Reliability Engineering Mindset • Alex Ewerlöf & Charity Majors • GOTO 2025

Alex Ewerlöf, author of "Reliability Engineering Mindset," discusses the significant gap between Google's idealized SRE practices and the resource-constrained reality of most companies. The conversation focuses on making Service Level Objectives (SLOs) practical by tying Service Level Indicators (SLIs) directly to business impact, using them as a data-driven communication tool to negotiate reliability costs, and moving from a "best practice" to a "fit practice" mindset.

From Spikes to Stories: AI-Augmented Troubleshooting in the Network Wild // Shraddha Yeole

From Spikes to Stories: AI-Augmented Troubleshooting in the Network Wild // Shraddha Yeole

Shraddha Yeole from Cisco ThousandEyes explains how they are transforming network observability by moving from complex dashboards to AI-augmented storytelling. The session details their use of an LLM-powered agent to interpret vast telemetry data, accelerate fault isolation, and improve MTTR, covering the technical architecture, advanced prompt engineering techniques, evaluation strategies, and key challenges.

AI traces are worth a thousand logs

AI traces are worth a thousand logs

An exploration of how a single, structured trace, based on OpenTelemetry standards, offers a superior method for debugging, testing, and understanding AI agent behavior compared to traditional logging. Learn how programmatic access to traces enables robust evaluation and the creation of golden datasets for building more reliable autonomous systems.

AI Agents: Transforming Anomaly Detection & Resolution

AI Agents: Transforming Anomaly Detection & Resolution

Martin Keen explores how agentic AI can significantly reduce IT downtime and Mean Time To Repair (MTTR) by moving beyond naive data dumps and embracing context-aware analysis. The key lies in using topology-aware correlation to curate relevant data for an AI agent, which can then systematically identify the root cause, provide explainable insights, and generate actionable remediation steps, ultimately augmenting human SREs rather than replacing them.