Reinforcement Learning

Reinforcement learning

Feb 10, 2026

Building the GitHub for RL Environments: Prime Intellect's Will Brown & Johannes Hagemann

Prime Intellect's Will Brown and Johannes Hagemann discuss the paradigm shift from static prompting to dynamic, environment-based AI development. They introduce their Environments Hub, a platform aimed at democratizing frontier-level training and enabling companies to build specialized models by compounding institutional knowledge.

Feb 03, 2026

Reinforcement Learning for Agents — with Amazon AGI Labs’ Antje Barth

Antje Barth from Amazon's AGI Labs discusses Nova Act, a new service for building reliable AI agents. She explores how they achieve over 90% reliability using reinforcement learning in 'web gyms', the shift towards 'normcore' agents for practical automation, and the future of AI as a digital co-worker.

Jan 14, 2026

How Ricursive Intelligence’s Founders are Using AI to Shape The Future of Chip Design

Anna Goldie and Azalia Mirhoseini of Ricursive Intelligence discuss how their work on Google's AlphaChip, which used AI to design TPUs, is now being extended to automate the entire chip design process. They explain their vision for a 'designless' industry and a recursive self-improvement loop where AI designs better chips, which in turn accelerates AI development.

Jan 09, 2026

Collaborative AI Agents At OpenAI

Robert from OpenAI discusses the critical role of structured evaluations (evals) and graders for developing advanced collaborative agents. He explores the limitations of 'vibe-based' assessments, introduces a maturity model for evals, and presents a comprehensive rubric for measuring agent performance beyond simple accuracy, connecting these concepts to the power of Reinforcement Fine-Tuning (RFT).

Jan 08, 2026

Post-training best-in-class models in 2025

An expert overview of post-training techniques for language models, covering the entire workflow from data generation and curation to advanced algorithms like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning (RL), along with practical advice on evaluation and iteration.

Dec 10, 2025

How We Built a Leading Reasoning Model (Olmo 3)

A comprehensive overview of the entire process behind building Olmo 3 Think, covering the full stack from pre-training architecture and data selection to the detailed post-training recipe involving SFT, DPO, and a deep dive into the advanced infrastructure for scaling Reinforcement Learning (RL). The summary also includes critical reflections on the challenges and nuances of evaluating modern reasoning models.