Reinforcement learning

Scale AI CEO on Meta’s $14B deal, scaling Uber Eats to $80B, & what frontier labs are building next

Scale AI CEO on Meta’s $14B deal, scaling Uber Eats to $80B, & what frontier labs are building next

Jason Droege, CEO of Scale AI, discusses the evolution of AI training from simple labeling to complex, expert-driven tasks. He shares insights on the future of AI agents, the reality of enterprise AI adoption, and crucial business lessons learned from building Uber Eats from zero to a multi-billion dollar business.

Some thoughts on the Sutton interview

Some thoughts on the Sutton interview

A reflection on Richard Sutton's "Bitter Lesson," arguing that while his critique of LLMs' inefficiency and lack of continual learning is valid, imitation learning is a complementary and necessary precursor to true reinforcement learning, much like fossil fuels were to renewable energy.

Building an AI Physicist: ChatGPT Co-Creator’s Next Venture

Building an AI Physicist: ChatGPT Co-Creator’s Next Venture

Former researchers from OpenAI and Google DeepMind, Liam Fedus and Ekin Dogus Cubuk, discuss their new venture, Periodic Labs. They aim to create an 'AI physicist' by integrating large language models with real-world, iterative experiments, moving beyond simulation to solve fundamental challenges in physics and chemistry, starting with high-temperature superconductivity.

Richard Sutton – Father of RL thinks LLMs are a dead end

Richard Sutton – Father of RL thinks LLMs are a dead end

Richard Sutton, a foundational figure in reinforcement learning, argues that Large Language Models (LLMs) are a flawed paradigm for achieving true intelligence. He posits that LLMs are mimics of human-generated text, lacking genuine goals, world models, and the ability to learn continually from experience. Sutton advocates for a return to the principles of reinforcement learning, where an agent learns from the consequences of its actions in the real world, a method he believes is truly scalable and fundamental to all animal and human intelligence.

From Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki

From Vibe Coding to Vibe Researching: OpenAI’s Mark Chen and Jakub Pachocki

OpenAI’s Chief Scientist, Jakub Pachocki, and Chief Research Officer, Mark Chen, discuss the research behind GPT-5, the push toward long-horizon reasoning, and the grand vision of an automated researcher. They cover how OpenAI evaluates progress beyond saturated benchmarks, the surprising durability of reinforcement learning, and the culture required to protect fundamental research while shipping world-class products.

Upwork's Radical Bet on Reinforcement Learning: Building RLEF from Scratch | Andrew Rabinovich (CTO)

Upwork's Radical Bet on Reinforcement Learning: Building RLEF from Scratch | Andrew Rabinovich (CTO)

Andrew Rabinovich, CTO and Head of AI at Upwork, details their strategy for building AI agents for digital work. He introduces a custom reinforcement learning approach called RLEF (Reinforcement Learning from Experience), explains why digital work marketplaces are ideal training grounds, and shares his vision for a future where AI delivers finished projects, orchestrated by a meta-agent named Uma.