Code generation

Fable 5: The Full Story from Capabilities to Drama (Ep. 1002 with Jon Krohn)

Fable 5: The Full Story from Capabilities to Drama (Ep. 1002 with Jon Krohn)

Anthropic's highly anticipated Claude Fable 5 model, a public version of its advanced "Mythos class" AI with state-of-the-art capabilities in software, vision, and long-context tasks, was released and then swiftly pulled offline by the U.S. government after just three days. The removal, initiated as an export control action over national security concerns stemming from a disputed "jailbreak" claim, highlights the growing tension between frontier AI development, AI safety, and regulatory oversight.

SWE-rebench: Lessons from Evaluating Coding Agents — Ibragim Badertdinov, Nebius

SWE-rebench: Lessons from Evaluating Coding Agents — Ibragim Badertdinov, Nebius

Ibragim Badertdinov from Nebius AI shares lessons from building and maintaining SWE-ReBench, a monthly leaderboard that evaluates coding agents on fresh, real-world software engineering tasks. The talk covers the anatomy of a good benchmark task, the challenges of filtering out noisy or flawed problems, and fascinating examples of how advanced models like Claude Code "cheat" by exploiting the environment. Finally, it explains how the same pipeline used for evaluation has produced large-scale, high-quality training datasets like SWE-bench, used by frontier AI labs.

Prompt to Pipeline: Building with Google's Gen Media Stack — Paige & Guillaume, Google DeepMind

Prompt to Pipeline: Building with Google's Gen Media Stack — Paige & Guillaume, Google DeepMind

A comprehensive overview of Google DeepMind's latest advancements, featuring Paige Bailey demonstrating Gemini 1.5 Flash's cost-effective video analysis and AI Studio's single-prompt app generation. Guillaume Vernade showcases a full generative media pipeline, turning a public domain book into an illustrated, animated, and scored project using Gemini, Nano Banana, VO, and LIA. Ian Valentine closes with the power of Gemma 4, demonstrating on-device, multi-agent code generation and debugging without cloud APIs.

You're Shipping 10x More Bugs and Don't Know It

You're Shipping 10x More Bugs and Don't Know It

Evan Marshall, CTO of Ito AI, discusses how the rapid rise of AI-powered code generation is creating a critical bottleneck in software verification and QA. He explains Ito AI's approach of using AI agents for automated, runtime execution testing on every pull request to act as a force multiplier for developers and unblock enterprise teams.

Mergeable by default: Building the context engine to save time and tokens — Peter Werry, Unblocked

Mergeable by default: Building the context engine to save time and tokens — Peter Werry, Unblocked

A practitioner's guide to building a context engine, the reasoning layer that provides AI agents with the necessary organizational context to generate effective and appropriate code. The talk debunks common myths about RAG and large context windows, outlines core requirements for a robust context engine, and shares lessons learned from production.

Software Engineering Is Becoming Plan and Review — Louis Knight-Webb, Vibe Kanban

Software Engineering Is Becoming Plan and Review — Louis Knight-Webb, Vibe Kanban

As AI handles more of the coding, the role of a software engineer is shifting from writing code to planning and reviewing the work of AI agents. This talk explores the implications of this shift, the new workflows it demands, and the tools required to manage them effectively.