Mlops

The Production AI Playbook: Deploying Agents at Enterprise Scale — Sandipan Bhaumik, Databricks

The Production AI Playbook: Deploying Agents at Enterprise Scale — Sandipan Bhaumik, Databricks

Sandipan Bhaumik presents a five-pillar framework for successfully moving AI systems from demos to production, inspired by a retail bank's failed chatbot PoC. The framework covers defining numerical success (Evaluation), tracing every AI decision (Observability), building robust data pipelines (Data Foundation), managing multiple AI interactions (Multi-agent Orchestration), and ensuring accountability and security (Governance). He illustrates these concepts with a banking chatbot case study, emphasizing continuous evaluation, data quality, and a proactive incident playbook.

Power agents with full context of your experiments and traces with W&B MCP server

Power agents with full context of your experiments and traces with W&B MCP server

The W&B Model Context Protocol (MCP) is a hosted endpoint that enables AI agents to intelligently interact with all Weights & Biases data, including runs, traces, evaluations, and reports. It features discovery tools for smart queries, automated analysis for comparing experiments and identifying regressions, and seamless integration with IDEs, coding agents, and chat interfaces like Mistral AI for streamlined ML workflows and on-the-go reporting.

Scaling Meta's Multi-Agent Systems to a Billion Videos

Scaling Meta's Multi-Agent Systems to a Billion Videos

Meta's approach to solving modality misalignment and content theft in short-form video using a multi-agent system of smaller, specialized models instead of a single large LLM. The talk covers the architecture (Perceiver, Retriever, Reasoner), evaluation stack, and key cost-saving optimizations.

Lobster Trap: OpenClaw in Containers from Local to K8s and Back — Sally Ann O'Malley, Red Hat

Lobster Trap: OpenClaw in Containers from Local to K8s and Back — Sally Ann O'Malley, Red Hat

This talk presents a container-first methodology for developing, distributing, and managing AI agents. Using a stack of Podman for local development and Kubernetes for scalable deployment, this approach transforms personalized agent setups from messy collections of files into reproducible, secure, and portable container images that can serve as a team-wide baseline. The session covers practical techniques for secrets management, state persistence, and automated setup, highlighted by a real-world example from an Nvidia team using this pattern for model evaluations.

Building AI Agents That Survive Production

Building AI Agents That Survive Production

Haytham Abuelfutuh, CTO of Union.ai, argues that the key to production-ready AI agents is not preventing failure, but embracing it. He introduces the '3 D's' framework—Dynamic, Durable, and Defended—for building agents that can fail cheaply and recover automatically, grounded in a real-world case study of an agent system indexing over 250,000 products on Flyte.

Lessons from Trillion Token Deployments at Fortune 500s — Alessandro Cappelli, Adaptive ML

Lessons from Trillion Token Deployments at Fortune 500s — Alessandro Cappelli, Adaptive ML

95% of GenAI pilots fail due to feedback integration issues, not deployment challenges. Alessandro Cappelli argues that Reinforcement Learning (RL) provides the only systematic way to incorporate business metrics and production signals to continuously improve models, especially for complex agent-based systems.