Accordance

Build Hour: Reinforcement Fine-Tuning

Build Hour: Reinforcement Fine-Tuning

A deep dive into Reinforcement Fine-Tuning (RFT), covering how to set up tasks, design effective graders, and run efficient training loops to improve model reasoning, based on a live demonstration from OpenAI's Build Hours.