Reinforcement Learning

Reinforcement learning

May 26, 2026

How Cursor Trained Composer on Fireworks: Distributed Infrastructure for High-Performance RL

Cursor's Federico Cassano and Fireworks' Dmytro Dzhulgakov detail their collaboration on Composer 2, a specialized foundation model for software engineering. They discuss their top-down training strategy, the infrastructure challenges of large-scale distributed Reinforcement Learning on sparse models, and how model specialization achieves frontier performance with superior efficiency.

May 26, 2026

End-to-End Foundation Models for the Energy Industry — with Jazmia Henry

Jazmia Henry details the end-to-end process of building specialized foundation models for the energy industry. She covers the four key stages from data curation of unstructured, handwritten documents to optimizing inference, and introduces her Grounded Continuous Evaluation (GCE) framework to combat reward hacking in reinforcement learning.

May 13, 2026

The Founders Who Left Tesla to Rebuild America | a16z

Erin Price-Wright, Turner Caldwell (Mariana Minerals), and Drew Baglino (Heron Power) discuss closing America's critical minerals gap and modernizing the power grid for the AI economy. They cover how automation, reinforcement learning, and lessons from Tesla can accelerate mining, refining, and grid infrastructure development to compete with China and enable re-industrialization.

May 12, 2026

Lessons from Trillion Token Deployments at Fortune 500s — Alessandro Cappelli, Adaptive ML

95% of GenAI pilots fail due to feedback integration issues, not deployment challenges. Alessandro Cappelli argues that Reinforcement Learning (RL) provides the only systematic way to incorporate business metrics and production signals to continuously improve models, especially for complex agent-based systems.

May 06, 2026

AI That Designs Its Own Chips: Ricursive's Anna Goldie and Azalia Mirhoseini

Co-founders of Ricursive Intelligence, Anna Goldie and Azalia Mirhoseini, outline their thesis that AI should design the chips that train AI. They detail their three-phase plan to first accelerate chip design with AI tools 100,000x faster than current software, then become a 'design-less' platform for custom silicon, and finally achieve vertical integration by building their own chips and models.

May 01, 2026

Waymo's Dmitri Dolgov: 20 Million Rides and the Road to Full Autonomy

Dmitri Dolgov, co-CEO of Waymo, discusses the 20-year journey from the DARPA challenge to full autonomy. He explains the Waymo Foundation Model—a multimodal world action model powering the driver, simulator, and critic—and how their "end-to-end plus" architecture enables superhuman safety and exponential scaling.

← Previous Next →