Testing

Jan 22, 2026

Effect Oriented Programming • Bill Frasure, Bruce Eckel, James Ward & Andrew Harmel-Law • GOTO 2026

Authors Bill Frasure, Bruce Eckel, and James Ward discuss the core concepts of Effect-Oriented Programming. They explain how effects are composable operations that encapsulate side effects and defer execution, allowing developers to manage unpredictability with compiler-checked types. The conversation covers ZIO, the expansion of effect systems into languages like TypeScript and Kotlin, and their unique, constraint-driven writing process.

Oct 10, 2025

Evals Aren't Useful? Really?

A deep dive into the critical importance of robust evaluation for building reliable AI agents. The summary covers bootstrapping evaluation sets, advanced testing techniques like multi-turn simulations and red teaming, and the necessity of integrating traditional software engineering and MLOps practices into the agent development lifecycle.

Sep 27, 2025

Evaluating AI Agents: Why It Matters and How We Do It

Annie Condon and Jeff Groom from Acre Security detail their practical approach to robustly evaluating non-deterministic AI agents. They share their philosophy that evaluations are critical for quality, introduce their "X-ray machine" analogy for observability, and walk through their evaluation stack, including versioning strategies and the use of tools like Logfire for tracing and Confident AI (Deep Evals) for systematic metric tracking.

Aug 26, 2025

AI traces are worth a thousand logs

An exploration of how a single, structured trace, based on OpenTelemetry standards, offers a superior method for debugging, testing, and understanding AI agent behavior compared to traditional logging. Learn how programmatic access to traces enables robust evaluation and the creation of golden datasets for building more reliable autonomous systems.

Aug 25, 2025

Iterating on Your AI Evals // Mariana Prazeres // Agents in Production 2025

Moving an AI agent from a promising demo to a reliable product is challenging. This talk presents a startup-friendly, iterative process for building robust evaluation frameworks, emphasizing that you must iterate on the evaluations themselves—the metrics and the data—not just the prompts and models. It outlines a practical "crawl, walk, run" approach, starting with simple heuristics and scaling to an advanced system with automated checks and human-in-the-loop validation.

Aug 12, 2025

Reading Code Effectively: An Overlooked Developer Skill • Marit van Dijk & Hannes Lowette

Marit van Dijk and Hannes Lowette discuss why reading code is a critical, yet underdeveloped, skill for software developers. They explore research-backed strategies like structured code reading clubs, leveraging modern IDEs and AI assistants to comprehend complex codebases, and the importance of empathy in code reviews. The conversation emphasizes using tests as documentation and writing clear commit messages to improve collaboration and long-term maintainability.