Rlhf

Machine Learning Explained: A Guide to ML, AI, & Deep Learning

Machine Learning Explained: A Guide to ML, AI, & Deep Learning

A breakdown of Machine Learning (ML), its relationship with AI and Deep Learning, and its core paradigms: supervised, unsupervised, and reinforcement learning. The summary explores classic models and connects them to modern applications like Large Language Models (LLMs) and Reinforcement Learning with Human Feedback (RLHF).

The Startup Powering The Data Behind AGI

The Startup Powering The Data Behind AGI

Edwin Chen, founder and CEO of Surge AI, shares the company's origin story, its rapid, bootstrapped growth, and its research-driven philosophy on data. He critiques traditional data labeling, explains why metrics like inter-annotator agreement fail for complex tasks, and offers a sharp analysis of benchmark hacking. Chen also details the future of data, from multimodal and agentic reasoning in rich RL environments to the need for hyper-specialized expertise for scientific discovery.

Inside the little-known expert network quietly training every frontier AI model | Garrett Lord

Inside the little-known expert network quietly training every frontier AI model | Garrett Lord

Garrett Lord, CEO of Handshake, details the company's extraordinary pivot from a college career network to a dominant AI data provider. He explains how they leveraged their proprietary network of 500,000 PhDs and 3 million advanced degree holders to build a business on track to surpass $100 million ARR in its first year by providing high-quality, expert-generated data for training frontier AI models.

How Reinforcement Learning can Improve your Agent

How Reinforcement Learning can Improve your Agent

This talk addresses the unreliability of current AI agents, arguing that prompting is insufficient. It posits that Reinforcement Learning (RL) is the most promising solution, delving into the mechanisms of RLHF and RLVR. The core challenge identified is 'reward hacking', and the discussion explores future directions to overcome it, such as RLAIF, data augmentation, and the development of interactive, online models that can learn in real-time.

913: LLM Pre-Training and Post-Training 101 — with Julien Launay

913: LLM Pre-Training and Post-Training 101 — with Julien Launay

Julien Launay, CEO of Adaptive ML, discusses the evolution of Large Language Model (LLM) training, detailing the critical shift from pre-training to post-training with Reinforcement Learning (RL). He explains the nuances of RL feedback mechanisms (RLHF, RLEF, RLAIF), the role of synthetic data, and how his company provides the "RLOps" tooling to make these powerful techniques accessible to enterprises. The conversation also explores the future of AI, including scaling beyond data limitations and the path to a "spiky" AGI.

No Priors Ep. 124 | With SurgeAI Founder and CEO Edwin Chen

No Priors Ep. 124 | With SurgeAI Founder and CEO Edwin Chen

Edwin Chen, CEO of Surge AI, discusses the critical role of high-quality human data in training frontier models, the flaws in current evaluation benchmarks like LMSys and IF-Eval, the future of complex RL environments, and why he bootstrapped Surge to over $1 billion in revenue.