Llm

Ship Agents that Ship: A Hands-On Workshop - Kyle Penfound, Jeremy Adams, Dagger

Ship Agents that Ship: A Hands-On Workshop - Kyle Penfound, Jeremy Adams, Dagger

A detailed summary of a workshop on building and deploying production-minded AI coding agents using Dagger. The session covers creating controlled, observable, and test-driven agent workflows and integrating them into CI/CD systems like GitHub Actions for automated, reliable software development.

Strategies for LLM Evals (GuideLLM, lm-eval-harness, OpenAI Evals Workshop) — Taylor Jordan Smith

Strategies for LLM Evals (GuideLLM, lm-eval-harness, OpenAI Evals Workshop) — Taylor Jordan Smith

Traditional benchmarks and leaderboards are insufficient for production AI. This summary details a practical, multi-layered evaluation strategy, moving from foundational system performance to factual accuracy and finally to safety and bias, using open-source tools like GuideLLM, lm-eval-harness, and Promptfoo.

Beyond the Prototype: Using AI to Write High-Quality Code - Josh Albrecht, Imbue

Beyond the Prototype: Using AI to Write High-Quality Code - Josh Albrecht, Imbue

Josh Albrecht, CTO of Imbue, discusses the engineering challenges in building reliable AI coding agents. He introduces Sculptor, an experimental environment designed to build trust in AI-generated code by focusing on preventing and detecting problems through structured workflows, automated testing, and AI-driven analysis, moving beyond simple code generation to create maintainable software.

Devin 2.0 and the Future of SWE - Scott Wu, Cognition

Devin 2.0 and the Future of SWE - Scott Wu, Cognition

Scott Wu, CEO of Cognition AI, discusses the exponential growth of AI capabilities in software engineering, likening it to a "Moore's Law for AI agents" with a doubling time of every 70 days. He chronicles the evolution of their AI agent, Devin, from handling repetitive code migrations to autonomously managing entire backlogs, highlighting the key technical challenges and paradigm shifts at each stage.

Winning & Attracting AI Researchers in the Age of $100M Bounties | Babak Hodjat | CTO AI Cognizant

Winning & Attracting AI Researchers in the Age of $100M Bounties | Babak Hodjat | CTO AI Cognizant

Babak Hodjat, CTO of AI at Cognizant and a co-inventor of the technology behind Siri, discusses the strategic shift from generative AI to autonomous, multi-agent systems. He explores how these agentic systems will redefine enterprise operations, the intense "arms race" for AI talent, and the critical need for a decentralized, secure framework for agent collaboration.

MLflow 3.0: The Future of AI Agents

MLflow 3.0: The Future of AI Agents

Eric Peter from Databricks outlines the evolution from the traditional MLOps lifecycle to the more complex Agent Ops lifecycle. He details the five essential components of a successful agent development platform and introduces MLflow 3.0, a new release designed to provide a comprehensive, open-standard solution for building, evaluating, and deploying AI agents.