Production ml

It's 2026, and We're Still Talking Evals

It's 2026, and We're Still Talking Evals

Maggie Konstanty, AI Product Manager at Prosus, provides a candid look into the realities of LLM evaluation in production. She argues that standard metrics like accuracy are misleading and advocates for a culture of continuous, goal-oriented evaluation focused on deep failure analysis and understanding real user behavior, asserting that mature teams inevitably build custom tooling to meet their specific needs.

Agents as Search Engineers // Santoshkalyan Rayadhurgam

Agents as Search Engineers // Santoshkalyan Rayadhurgam

Large language models are transforming search from a static, stateless process into a dynamic, agent-based reasoning system. This talk explores the practical patterns—like query rewriting, hybrid retrieval, and agent-based reranking—for building and deploying these 'agentic search' systems at scale, covering the architectural principles, production challenges, and the future trajectory where search itself may dissolve into understanding.

LLMOps for eval-driven development at scale

LLMOps for eval-driven development at scale

Mercari's engineering team shares their practical, evaluation-centric approach to LLMOps. Learn how they leverage tiered evaluations, strategic tooling for observability, and rapid iteration to productionize LLM features for over 23 million users, emphasizing that good 'evals' are often more critical than model fine-tuning or RAG.