Reinforcement learning

I’m Teaching AI Self-Improvement Techniques

I’m Teaching AI Self-Improvement Techniques

Aman Khan from Arize discusses the challenges of building reliable AI agents and introduces a novel technique called "metaprompting". This method uses continuous, natural language feedback to optimize an agent's system prompt, effectively training its "memory" or context, leading to significant performance gains even for smaller models.

Build Hour: Agent RFT

Build Hour: Agent RFT

Will Hang and Theophile Sautory from OpenAI provide a deep dive into Agent RFT, a powerful method for fine-tuning large language models to become more effective, tool-using agents. They explain how Agent RFT enables models to learn directly from their interactions with custom tools and reward signals, leading to significant improvements in performance, latency, and efficiency on specialized tasks. The session includes a detailed code demo, best practices, and success stories from companies like Cognition, Ambience, and Rogo.

Introducing serverless reinforcement learning: Train reliable AI agents without worrying about GPUs

Introducing serverless reinforcement learning: Train reliable AI agents without worrying about GPUs

Kyle Corbett and Daniel from CoreWeave (formerly Openpipe) discuss the practical advantages of Reinforcement Learning (RL) over Supervised Fine-Tuning (SFT) for building reliable and efficient AI agents. They introduce Serverless RL, a new platform designed to eliminate the infrastructure complexities of RL training, and share a playbook for teams looking to get started.

ChatGPT Atlas, OpenAI’s new web browser

ChatGPT Atlas, OpenAI’s new web browser

A discussion on OpenAI's new browser ChatGPT Atlas, Andrej Karpathy's pessimistic timeline for AI agents, the DeepSeek-OCR paper on visual context compression, and a study suggesting large language models can suffer from "brain rot" when trained on low-quality social media data.

Marc Andreessen & Amjad Masad on “Good Enough” AI, AGI, and the End of Coding

Marc Andreessen & Amjad Masad on “Good Enough” AI, AGI, and the End of Coding

Amjad Masad, founder of Replit, joins a16z to discuss the rise of AI agents that can now plan, reason, and code for hours. He explains how reinforcement learning and verification loops unlocked long-horizon reasoning, why AI is advancing fastest in verifiable domains like code, and debates whether "good enough" AI might be a local maximum that blocks the path to AGI.

Machine Learning Explained: A Guide to ML, AI, & Deep Learning

Machine Learning Explained: A Guide to ML, AI, & Deep Learning

A breakdown of Machine Learning (ML), its relationship with AI and Deep Learning, and its core paradigms: supervised, unsupervised, and reinforcement learning. The summary explores classic models and connects them to modern applications like Large Language Models (LLMs) and Reinforcement Learning with Human Feedback (RLHF).