System design

Beyond the Gold Standard: Evaluating and Trusting Agents in the Wild // Sanjana Sharma

Beyond the Gold Standard: Evaluating and Trusting Agents in the Wild // Sanjana Sharma

A deep dive into the challenges of deploying AI agents in production, arguing that reliability stems not from model intelligence but from a "system-first" approach. The talk introduces a new architecture that separates the LLM's reasoning from a versioned, auditable "Context Layer" containing business logic and expert knowledge, which is continuously updated through a "Living Ground Truth" loop driven by expert feedback.

Creating Momentum with The Value Flywheel Effect • David Anderson • GOTO 2025

Creating Momentum with The Value Flywheel Effect • David Anderson • GOTO 2025

David Anderson explains his "Value Flywheel Effect" framework, a model for continuous cloud modernization that joins business and technology strategy. He details how creating psychological safety, a robust serverless-first technology strategy, and a focus on system design over code builds the necessary momentum and shared context to harness future technologies like AI effectively.

This Startup Beat Gemini 3 on ARC-AGI — at Half the Cost

This Startup Beat Gemini 3 on ARC-AGI — at Half the Cost

Poetic, a startup by ex-DeepMind researchers, has significantly advanced performance on the ARC-AGI benchmark by applying a recursive self-improvement system to Gemini 3. Co-founder Ian Fisher discusses how their approach of automating prompt and system engineering provides a substantial performance boost without needing access to model weights, and explores its potential as a path toward AGI.

Multi-Agent Systems for the Misinformation Lifecycle

Multi-Agent Systems for the Misinformation Lifecycle

A detailed overview of a modular, five-agent system designed to combat the entire lifecycle of digital misinformation. Based on an ICWSM research paper, this practitioner's guide details the roles of the Classifier, Indexer, Extractor, Corrector, and Verifier agents. The system emphasizes scalability, explainability, and high precision, moving beyond the limitations of single-LLM solutions. The talk covers the complete blueprint, from agent coordination and MLOps to holistic evaluation and optimization strategies for production environments.

The Infinite Software Crisis – Jake Nations, Netflix

The Infinite Software Crisis – Jake Nations, Netflix

In an era of the "Infinite Software Crisis" where AI-generated code outpaces human understanding, this talk argues for choosing "simple" design over "easy" generation. The speaker presents a three-phase methodology—Research, Planning, and Implementation—that forces developers to think critically before generating code. This approach leverages AI for mechanical tasks while ensuring that human judgment, context, and a deep understanding of the system remain the core of the software development process, turning human insight into the ultimate competitive advantage.

Designing safe digital systems for the humanitarian sector

Designing safe digital systems for the humanitarian sector

Carmela Troncoso from EPFL discusses her collaboration with the International Committee of the Red Cross (ICRC) to digitalize humanitarian aid distribution. She advocates for a paradigm shift from data minimization to "purpose limitation," designing systems that are structurally incapable of being misused, even if the data is accessed. The talk details a practical, low-cost, and connectivity-resilient system built on this principle, using smart cards and cryptographic techniques to protect vulnerable aid recipients while meeting the operational needs of the ICRC.