Rlhf

⚡️Every product of the future will be a living system  — Ronak Malde, Trajectory.ai

⚡️Every product of the future will be a living system — Ronak Malde, Trajectory.ai

Ronuk Malde, CEO of Trajectory.ai, discusses his journey from building AI coding agents at Windsurf to his current focus on continual learning for enterprise AI. He shares insights on leveraging real-world user data, the unique challenges of model acquisition, and how Trajectory.ai's platform, powered by innovations like scaled SDPO and a novel training stack, enables dynamic, always-learning AI models for diverse industries from legal to finance.

The State of Frontier Post-Training Recipes | Conversation with Finbarr Timbers

The State of Frontier Post-Training Recipes | Conversation with Finbarr Timbers

This discussion with Finbarr Timbers reviews the evolution of frontier post-training recipes, highlighting the shift from simpler SFT-DPO-RL to complex multi-teacher on-policy distillation (MOPD). It covers the organizational challenges of building models like Olmo, the rise of synthetic data and reasoning-focused RL in DeepSeek, and the complexities of integrating expert teachers, while also exploring open questions on environments, specialized APIs, and career strategies in the rapidly changing AI landscape.

Lessons from Trillion Token Deployments at Fortune 500s — Alessandro Cappelli, Adaptive ML

Lessons from Trillion Token Deployments at Fortune 500s — Alessandro Cappelli, Adaptive ML

95% of GenAI pilots fail due to feedback integration issues, not deployment challenges. Alessandro Cappelli argues that Reinforcement Learning (RL) provides the only systematic way to incorporate business metrics and production signals to continuously improve models, especially for complex agent-based systems.

What is Human In The Loop with AI? How HITL Shapes AI Systems

What is Human In The Loop with AI? How HITL Shapes AI Systems

Exploring the concept of Human-in-the-Loop (HITL) AI, this summary details the spectrum of human involvement—from strict HITL to full autonomy. It covers how humans are integrated at different stages of the AI workflow, including training (Active Learning), tuning (RLHF), and inference (runtime oversight), to ensure safety, instill judgment, and build trust in AI systems.

The 100-person lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

The 100-person lab that became Anthropic and Google's secret weapon | Edwin Chen (Surge AI)

Edwin Chen, founder and CEO of Surge AI, discusses his contrarian approach to building a bootstrapped, billion-dollar company, the critical role of high-quality data and 'taste' in training AI, the flaws in current benchmarks, and why reinforcement learning environments are the next frontier for creating models that truly advance humanity.

Al Engineering 101 with Chip Huyen (Nvidia, Stanford, Netflix)

Al Engineering 101 with Chip Huyen (Nvidia, Stanford, Netflix)

Chip Huyen, an AI expert and author of 'AI Engineering', explains the realities of building successful AI applications. She covers the nuances of model training, the critical role of data quality in RAG systems, the mechanics of RLHF, and why the future of AI improvement lies in post-training, system-level thinking, and solving UX problems rather than just chasing the newest models.