Agentic systems

Build Hour: Agentic Tool Calling

Build Hour: Agentic Tool Calling

A deep dive into building agentic systems using OpenAI's latest APIs. The session covers the core concept of 'agentic tool calling' (reasoning + tools), outlines a four-part framework (Agent, Infrastructure, Product, Evaluation) for designing long-horizon tasks, and provides a hands-on demonstration of building a non-blocking task processing system with a real-time progress UI.

Too much lock-in for too little gain: agent frameworks are a dead-end // Valliappa Lakshmanan

Too much lock-in for too little gain: agent frameworks are a dead-end // Valliappa Lakshmanan

Lak Lakshmanan presents a robust architecture for building production-quality, framework-agnostic agentic systems. He advocates for using simple, composable GenAI patterns, off-the-shelf tools for governance, and a strong emphasis on a human-in-the-loop design to create continuously learning systems that avoid vendor lock-in.

From Spikes to Stories: AI-Augmented Troubleshooting in the Network Wild // Shraddha Yeole

From Spikes to Stories: AI-Augmented Troubleshooting in the Network Wild // Shraddha Yeole

Shraddha Yeole from Cisco ThousandEyes explains how they are transforming network observability by moving from complex dashboards to AI-augmented storytelling. The session details their use of an LLM-powered agent to interpret vast telemetry data, accelerate fault isolation, and improve MTTR, covering the technical architecture, advanced prompt engineering techniques, evaluation strategies, and key challenges.

Evaluation-Driven Development with MLflow 3.0

Evaluation-Driven Development with MLflow 3.0

Yuki Watanabe from Databricks introduces Evaluation-Driven Development (EDD) as a critical methodology for building production-ready AI agents. This talk explores the five pillars of EDD and demonstrates how MLflow 3.0's new features—including one-line tracing, automated evaluation, human-in-the-loop feedback, and monitoring—provide a comprehensive toolkit to ensure agent quality and reliability.

Why Language Models Need a Lesson in Education

Why Language Models Need a Lesson in Education

Stephanie Kirmer, a staff machine learning engineer at DataGrail, adapts her experience as a former professor to address the challenge of evaluating LLMs in production. She proposes a robust methodology using LLM-based evaluators guided by rigorous, human-calibrated rubrics to bring objectivity and scalability to the subjective task of assessing text generation quality.

When Agents Hire Their Own Team: Inside Hypermode’s Concierge // Ryan Fox-Tyler

When Agents Hire Their Own Team: Inside Hypermode’s Concierge // Ryan Fox-Tyler

Ryan Fox-Tyler from Hypermode explains their philosophy of empowering AI agents to design and deploy other agents. He introduces Concierge, an agent that builds other agents, and details the underlying actor-based runtime built for scalability, fault tolerance, and efficient, event-driven execution of thousands of parallel agent instances.