Sft

Post-training best-in-class models in 2025

Post-training best-in-class models in 2025

An expert overview of post-training techniques for language models, covering the entire workflow from data generation and curation to advanced algorithms like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning (RL), along with practical advice on evaluation and iteration.

How We Built a Leading Reasoning Model (Olmo 3)

How We Built a Leading Reasoning Model (Olmo 3)

A comprehensive overview of the entire process behind building Olmo 3 Think, covering the full stack from pre-training architecture and data selection to the detailed post-training recipe involving SFT, DPO, and a deep dive into the advanced infrastructure for scaling Reinforcement Learning (RL). The summary also includes critical reflections on the challenges and nuances of evaluating modern reasoning models.

Introducing serverless reinforcement learning: Train reliable AI agents without worrying about GPUs

Introducing serverless reinforcement learning: Train reliable AI agents without worrying about GPUs

Kyle Corbett and Daniel from CoreWeave (formerly Openpipe) discuss the practical advantages of Reinforcement Learning (RL) over Supervised Fine-Tuning (SFT) for building reliable and efficient AI agents. They introduce Serverless RL, a new platform designed to eliminate the infrastructure complexities of RL training, and share a playbook for teams looking to get started.