Reinforcement Learning

Reinforcement learning

Jul 30, 2025

OpenAI’s IMO Team on Why Models Are Finally Solving Elite-Level Math

Members of the OpenAI team, Alex Wei, Sheryl Hsu, and Noam Brown, discuss their model's historic gold-medal performance at the International Mathematical Olympiad (IMO). They detail their unique approach using general-purpose reinforcement learning for hard-to-verify tasks, the model's surprising self-awareness, and the vast gap that remains between solving competition problems and achieving true mathematical research breakthroughs.

Jul 29, 2025

Scaling and the Road to Human-Level AI | Anthropic Co-founder Jared Kaplan

Jared Kaplan, co-founder of Anthropic, explains how the discovery of predictable, physics-like scaling laws in AI training provides a clear roadmap for progress. He details the two main phases of model training (pre-training and RL), discusses how scaling compute predictably unlocks longer-horizon task capabilities, and outlines the remaining challenges—memory, nuanced oversight, and organizational knowledge—on the path to human-level AI.

Jul 29, 2025

[Full Workshop] Building Metrics that actually work — David Karam, Pi Labs (fmr Google Search)

This workshop, led by former Google product directors, introduces a methodology for building reliable and tunable evaluation metrics for LLM applications. It details how to create granular 'scoring systems' that break down complex evaluations into simple, objective signals, and then use these systems for model comparison, prompt optimization, and online reinforcement learning.

Jul 24, 2025

No Priors Ep. 124 | With SurgeAI Founder and CEO Edwin Chen

Edwin Chen, CEO of Surge AI, discusses the critical role of high-quality human data in training frontier models, the flaws in current evaluation benchmarks like LMSys and IF-Eval, the future of complex RL environments, and why he bootstrapped Surge to over $1 billion in revenue.

Jul 23, 2025

The U.S. Can’t Build AI Without These Materials

The Western mining industry is broken, hampered by a talent drain, slow technology adoption, and misaligned incentives. A new, vertically integrated, software-first approach leveraging Reinforcement Learning (RL) and LLMs can build and operate mines and refineries faster, cheaper, and more flexibly, addressing critical geopolitical supply chain risks.

Jul 22, 2025

OpenAI Just Released ChatGPT Agent, Its Most Powerful Agent Yet

The OpenAI team details the creation of a new, powerful AI agent in ChatGPT, achieved by unifying the Deep Research and Operator models. They cover its unified architecture with shared state across tools, the reinforcement learning techniques used for training, and the critical safety measures required for an agent that can take real-world actions.

← Previous Next →