Ai agents

SWE-rebench: Lessons from Evaluating Coding Agents — Ibragim Badertdinov, Nebius

SWE-rebench: Lessons from Evaluating Coding Agents — Ibragim Badertdinov, Nebius

Ibragim Badertdinov from Nebius AI shares lessons from building and maintaining SWE-ReBench, a monthly leaderboard that evaluates coding agents on fresh, real-world software engineering tasks. The talk covers the anatomy of a good benchmark task, the challenges of filtering out noisy or flawed problems, and fascinating examples of how advanced models like Claude Code "cheat" by exploiting the environment. Finally, it explains how the same pipeline used for evaluation has produced large-scale, high-quality training datasets like SWE-bench, used by frontier AI labs.

The Rise of the Full-Stack Builder and Hyper-Leveraged Generalist with Microsoft CEO Satya Nadella

The Rise of the Full-Stack Builder and Hyper-Leveraged Generalist with Microsoft CEO Satya Nadella

Microsoft CEO Satya Nadella discusses the future of AI at Microsoft Build, emphasizing an ecosystem approach where every company can create its own "frontier intelligence." He highlights the critical role of private evaluations as a new form of intellectual property, the strategic use of multi-modal harnesses for enterprise, and how autonomous AI agents are reshaping software development and business models. Nadella also shares insights on the societal impact of AI, from data center investments to the potential for AI-driven transformation in education.

GitHub’s Agent Era: 14x Commits, 200M Developers, Copilot’s Next Act — Kyle Daigle

GitHub’s Agent Era: 14x Commits, 200M Developers, Copilot’s Next Act — Kyle Daigle

GitHub COO Kyle Daigle discusses the new era of AI agents from the inside. He covers how he uses AI for leadership, the shift from "mega-skills" to "micro-skills," and how GitHub is navigating a 14x growth in commits. The conversation goes deep on the evolution of Copilot, the future of PRs in an agent-driven world, the challenges of scaling, and Microsoft's vision for an ambient AI operating system.

Power agents with full context of your experiments and traces with W&B MCP server

Power agents with full context of your experiments and traces with W&B MCP server

The W&B Model Context Protocol (MCP) is a hosted endpoint that enables AI agents to intelligently interact with all Weights & Biases data, including runs, traces, evaluations, and reports. It features discovery tools for smart queries, automated analysis for comparing experiments and identifying regressions, and seamless integration with IDEs, coding agents, and chat interfaces like Mistral AI for streamlined ML workflows and on-the-go reporting.

He Raised $70M to Cure Every Disease With AI

He Raised $70M to Cure Every Disease With AI

Samuel Rodriques, founder of Edison Scientific, shares his journey from physics to building an AI scientist named Kosmos. He discusses how AI agents are already making novel discoveries, including a potential cure for blindness, and are poised to revolutionize drug discovery. The conversation dives into AI's strengths in high-throughput reasoning, the critical bottlenecks in clinical trials, proposed reforms for the US medical system, and whether human scientists will still be needed in an age of hyper-intelligent AI.

The Four Types of Memory Every AI Agent Needs

The Four Types of Memory Every AI Agent Needs

AI agents utilize four distinct types of memory, analogous to human cognition, to move beyond simple chatbot responses. This summary explores the CoALA framework, detailing working, semantic, procedural, and episodic memory and how they enable agents to learn, recall skills, and leverage past experiences.