Observability

May 12, 2026

Dark Factory: How OpenClaw Ships Faster Than You Can Read the Diff — Vincent Koc

Vincent Koc argues that static benchmarks are failing in the era of adaptive AI. He proposes a shift from static testing to 'malleable evals,' where agents self-optimize and curate their own test suites based on user intent and production data, treating evaluation as a living, evolving system.

May 12, 2026

State of Serverless DevEx & Observability • Jones Zachariah Noel N • GOTO 2025

Explore the evolution of AWS Lambda and the serverless ecosystem over the last decade, focusing on the two core pillars that have revolutionized the developer journey: Developer Experience (DevEx) and Observability. This session covers the shift from console-based workflows to modern IDE-integrated development, the role of tools like Lambda Layers and Extensions, and the importance of the broader serverless landscape including AWS Step Functions.

May 07, 2026

Everything You Need To Know About Agent Observability — Danny Gollapalli and Ben Hylak, Raindrop

Agent failures are unlike traditional software failures. This workshop provides a practical framework for monitoring production agents, moving beyond evals to real-world observability by using explicit signals (errors, latency) and implicit signals (user frustration, refusals, self-diagnostics) to catch regressions and understand agent behavior.

May 03, 2026

Context Is the New Code — Patrick Debois, Tessl

Patrick Debois argues that as AI coding agents become more capable, the context that drives them—prompts, rules, and memory—needs its own engineering discipline, akin to how we manage code. He introduces the Context Development Lifecycle (Generate, Evaluate, Distribute, and Observe) to make context a shared, repeatable, and improvable part of software delivery, creating a flywheel effect where better context leads to better agent output and continuous improvement.

May 02, 2026

Human-in-the-Loop Automation with n8n — Liam McGarrigle

Liam McGarrigle demonstrates how to build secure, observable, and controllable AI agents in n8n. The workshop covers creating a human-in-the-loop workflow for managing Gmail and Google Calendar, focusing on n8n's visual system for tool configuration, prompting strategies, and implementing essential approval steps to prevent unintended actions.

Apr 28, 2026

Why building eval platforms is hard — Phil Hetzel, Braintrust

An evaluation platform is more than a simple test runner; it's a complex system for creating shared definitions of quality. This talk explores the evolution of eval platforms from basic spreadsheets to sophisticated, integrated systems, highlighting the hidden data and systems engineering challenges involved in making them credible, scalable, and usable for building trustworthy AI agents.