Reinforcement learning

Continual System Prompt Learning for Code Agents – Aparna Dhinakaran, Arize

Continual System Prompt Learning for Code Agents – Aparna Dhinakaran, Arize

The talk by Aparna Dhinakaran introduces "system prompt learning" as an efficient alternative to traditional Reinforcement Learning for improving large language model-based coding agents. By leveraging LLM-as-a-judge evaluations to generate English feedback and explanations for code failures, agents can automatically refine their system prompts and rules. This method, demonstrated on Claude and Klein, significantly boosts performance on benchmarks like SWEBench with minimal data, highlighting the critical role of high-quality evaluation prompts.

Code World Model: Building World Models for Computation – Jacob Kahn, FAIR Meta

Code World Model: Building World Models for Computation – Jacob Kahn, FAIR Meta

Jacob Kahn from FAIR, Meta, introduces the Code World Model (CWM), a new paradigm for AI models that learn from program execution rather than just code syntax. By training on detailed execution traces, CWM builds an internal world model of computation, enabling it to predict a program's behavior. This talk explores CWM's architecture, its highly scalable and asynchronous reinforcement learning setup, and groundbreaking applications like a 'neural debugger' that understands user intent from code structure and the potential to approximate undecidable problems like the halting problem.

Why Physical AI Needs a new Data Set | Rerun CEO

Why Physical AI Needs a new Data Set | Rerun CEO

Nikolaus West, CEO of Rerun, explains how their data logging and visualization platform, built on an Entity Component System (ECS) inspired by gaming, is unlocking new capabilities in physical AI. He discusses the rapid progress in robot manipulation through imitation learning, the gap between impressive demos and real-world products, and the critical need for better data tooling to handle complex, multi-rate sensor data in robotics and AR/VR.

How We Built a Leading Reasoning Model (Olmo 3)

How We Built a Leading Reasoning Model (Olmo 3)

A comprehensive overview of the entire process behind building Olmo 3 Think, covering the full stack from pre-training architecture and data selection to the detailed post-training recipe involving SFT, DPO, and a deep dive into the advanced infrastructure for scaling Reinforcement Learning (RL). The summary also includes critical reflections on the challenges and nuances of evaluating modern reasoning models.

Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

At Applied Compute, efficient Reinforcement Learning is critical for delivering business value. This talk explores the transition from inefficient synchronous RL to a high-throughput asynchronous 'Pipeline RL' system. The core challenge is managing 'staleness'—a side effect of in-flight weight updates that can destabilize training. The speakers detail their first-principles systems model, based on the Roofline model, used to simulate and find the optimal allocation of GPU resources between sampling and training, balancing throughput with algorithmic stability and achieving significant speedups.

Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

A deep dive into the challenges and solutions for efficient Reinforcement Learning (RL) in enterprise settings. The talk contrasts synchronous and asynchronous RL, explains the critical trade-off of "staleness" versus stability, and details a first-principles system model used to optimize GPU allocation for maximum throughput.