Reinforcement learning

Fully autonomous robots are much closer than you think – Sergey Levine

Fully autonomous robots are much closer than you think – Sergey Levine

Sergey Levine, co-founder of Physical Intelligence, outlines the path to general-purpose robots, predicting a 'self-improvement flywheel' could lead to fully autonomous household robots by 2030. He discusses the architecture of vision-language-action models, the critical role of embodiment in solving the data problem, and how robotics will scale faster than self-driving cars.

GPT-OSS vs. Qwen vs. Deepseek: Comparing Open Source LLM Architectures

GPT-OSS vs. Qwen vs. Deepseek: Comparing Open Source LLM Architectures

A technical breakdown and comparison of the architectures, training methodologies, and post-training techniques of three leading open-source models: OpenAI's GPT-OSS, Alibaba's Qwen-3, and DeepSeek V3. The summary explores their different approaches to Mixture-of-Experts, long-context, and attention mechanisms.

The $10 Trillion AI Revolution: Why It’s Bigger Than the Industrial Revolution

The $10 Trillion AI Revolution: Why It’s Bigger Than the Industrial Revolution

Sequoia Capital's Konstantine Buhler presents an investment thesis on the AI-driven "Cognitive Revolution," framing it as a transformation larger and faster than the Industrial Revolution. The core of the thesis is the $10 trillion opportunity in automating the US services market and the shift in work from certainty to high leverage. Buhler outlines five current investment trends, including real-world validation over academic benchmarks and compute as the new production function, and five future themes Sequoia is betting on, such as persistent memory, AI-to-AI communication, and AI security.

How Reinforcement Learning can Improve your Agent

How Reinforcement Learning can Improve your Agent

This talk addresses the unreliability of current AI agents, arguing that prompting is insufficient. It posits that Reinforcement Learning (RL) is the most promising solution, delving into the mechanisms of RLHF and RLVR. The core challenge identified is 'reward hacking', and the discussion explores future directions to overcome it, such as RLAIF, data augmentation, and the development of interactive, online models that can learn in real-time.

Google DeepMind Lead Researchers on Genie 3 & the Future of World-Building

Google DeepMind Lead Researchers on Genie 3 & the Future of World-Building

Google DeepMind researchers Jack Parker-Holder and Shlomi Fruchter detail the creation of Genie 3, a model that generates interactive, persistent worlds from text in real time. They cover its breakthrough spatial memory, emergent physical intuition, and its potential to revolutionize gaming, robotics, and AI agent training.

913: LLM Pre-Training and Post-Training 101 — with Julien Launay

913: LLM Pre-Training and Post-Training 101 — with Julien Launay

Julien Launay, CEO of Adaptive ML, discusses the evolution of Large Language Model (LLM) training, detailing the critical shift from pre-training to post-training with Reinforcement Learning (RL). He explains the nuances of RL feedback mechanisms (RLHF, RLEF, RLAIF), the role of synthetic data, and how his company provides the "RLOps" tooling to make these powerful techniques accessible to enterprises. The conversation also explores the future of AI, including scaling beyond data limitations and the path to a "spiky" AGI.