Rlhf

The 100-person lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

The 100-person lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

Edwin Chen, founder and CEO of Surge AI, discusses his contrarian approach to building a bootstrapped, billion-dollar company, the critical role of high-quality data and 'taste' in training AI, the flaws in current benchmarks, and why reinforcement learning environments are the next frontier for creating models that truly advance humanity.

Al Engineering 101 with Chip Huyen (Nvidia, Stanford, Netflix)

Al Engineering 101 with Chip Huyen (Nvidia, Stanford, Netflix)

Chip Huyen, an AI expert and author of 'AI Engineering', explains the realities of building successful AI applications. She covers the nuances of model training, the critical role of data quality in RAG systems, the mechanics of RLHF, and why the future of AI improvement lies in post-training, system-level thinking, and solving UX problems rather than just chasing the newest models.

Machine Learning Explained: A Guide to ML, AI, & Deep Learning

Machine Learning Explained: A Guide to ML, AI, & Deep Learning

A breakdown of Machine Learning (ML), its relationship with AI and Deep Learning, and its core paradigms: supervised, unsupervised, and reinforcement learning. The summary explores classic models and connects them to modern applications like Large Language Models (LLMs) and Reinforcement Learning with Human Feedback (RLHF).

The Startup Powering The Data Behind AGI

The Startup Powering The Data Behind AGI

Edwin Chen, founder and CEO of Surge AI, shares the company's origin story, its rapid, bootstrapped growth, and its research-driven philosophy on data. He critiques traditional data labeling, explains why metrics like inter-annotator agreement fail for complex tasks, and offers a sharp analysis of benchmark hacking. Chen also details the future of data, from multimodal and agentic reasoning in rich RL environments to the need for hyper-specialized expertise for scientific discovery.

Inside the little-known expert network quietly training every frontier AI model | Garrett Lord

Inside the little-known expert network quietly training every frontier AI model | Garrett Lord

Garrett Lord, CEO of Handshake, details the company's extraordinary pivot from a college career network to a dominant AI data provider. He explains how they leveraged their proprietary network of 500,000 PhDs and 3 million advanced degree holders to build a business on track to surpass $100 million ARR in its first year by providing high-quality, expert-generated data for training frontier AI models.

How Reinforcement Learning can Improve your Agent

How Reinforcement Learning can Improve your Agent

This talk addresses the unreliability of current AI agents, arguing that prompting is insufficient. It posits that Reinforcement Learning (RL) is the most promising solution, delving into the mechanisms of RLHF and RLVR. The core challenge identified is 'reward hacking', and the discussion explores future directions to overcome it, such as RLAIF, data augmentation, and the development of interactive, online models that can learn in real-time.