Reinforcement learning

Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

At Applied Compute, efficient Reinforcement Learning is critical for delivering business value. This talk explores the transition from inefficient synchronous RL to a high-throughput asynchronous 'Pipeline RL' system. The core challenge is managing 'staleness'—a side effect of in-flight weight updates that can destabilize training. The speakers detail their first-principles systems model, based on the Roofline model, used to simulate and find the optimal allocation of GPU resources between sampling and training, balancing throughput with algorithmic stability and achieving significant speedups.

Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

Efficient Reinforcement Learning – Rhythm Garg & Linden Li, Applied Compute

A deep dive into the challenges and solutions for efficient Reinforcement Learning (RL) in enterprise settings. The talk contrasts synchronous and asynchronous RL, explains the critical trade-off of "staleness" versus stability, and details a first-principles system model used to optimize GPU allocation for maximum throughput.

The 100-person lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

The 100-person lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

Edwin Chen, founder and CEO of Surge AI, discusses his contrarian, bootstrapped approach to building a billion-dollar company, the critical role of high-quality data and 'taste' in training advanced AI models, the pitfalls of current benchmarks, and why Reinforcement Learning environments are the next frontier in AI.

The 100-person lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

The 100-person lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

Edwin Chen, founder and CEO of Surge AI, discusses his contrarian approach to building a bootstrapped, billion-dollar company, the critical role of high-quality data and 'taste' in training AI, the flaws in current benchmarks, and why reinforcement learning environments are the next frontier for creating models that truly advance humanity.

Shaping Model Behavior in GPT-5.1— the OpenAI Podcast Ep. 11

Shaping Model Behavior in GPT-5.1— the OpenAI Podcast Ep. 11

OpenAI's Christina Kim (Research) and Laurentia Romaniuk (Product) discuss the development of GPT-5.1, detailing the shift to universal "reasoning models" to enhance both IQ and EQ. They explore the nuances of "model personality," the technical challenges of balancing steerability with safety, and how features like Memory create a more personalized, context-aware user experience.

Agents are Robots Too: What Self-Driving Taught Me About Building Agents — Jesse Hu, Abundant

Agents are Robots Too: What Self-Driving Taught Me About Building Agents — Jesse Hu, Abundant

Drawing surprising parallels between AI agents and robotics, this talk argues that the agent development community is repeating a key mistake from the self-driving industry: underestimating the difficulty of action and over-focusing on reasoning. It covers essential robotics concepts like DAgger, MDPs, simulation, and the critical importance of a robust offline infrastructure, explaining why perfect reasoning doesn't guarantee successful execution in the real world.