Ground truth

Beyond the Gold Standard: Evaluating and Trusting Agents in the Wild // Sanjana Sharma

Beyond the Gold Standard: Evaluating and Trusting Agents in the Wild // Sanjana Sharma

A deep dive into the challenges of deploying AI agents in production, arguing that reliability stems not from model intelligence but from a "system-first" approach. The talk introduces a new architecture that separates the LLM's reasoning from a versioned, auditable "Context Layer" containing business logic and expert knowledge, which is continuously updated through a "Living Ground Truth" loop driven by expert feedback.