Reinforcement learning

Agents are Robots Too: What Self-Driving Taught Me About Building Agents — Jesse Hu, Abundant

Agents are Robots Too: What Self-Driving Taught Me About Building Agents — Jesse Hu, Abundant

Drawing surprising parallels between AI agents and robotics, this talk argues that the agent development community is repeating a key mistake from the self-driving industry: underestimating the difficulty of action and over-focusing on reasoning. It covers essential robotics concepts like DAgger, MDPs, simulation, and the critical importance of a robust offline infrastructure, explaining why perfect reasoning doesn't guarantee successful execution in the real world.

Fully Connected 2025 kickoff: The rise (and the challenges) of the agentic era

Fully Connected 2025 kickoff: The rise (and the challenges) of the agentic era

Robin Bordoli of Weights & Biases explores AI's exponential growth, from past achievements to the current agentic landscape. He discusses the rise of reinforcement learning, the challenge of productionizing reliable agents, and highlights how foundational issues in AI development persist even as model capabilities soar.

Fully Connected keynote: Building tools for agents at Weights & Biases

Fully Connected keynote: Building tools for agents at Weights & Biases

A summary of the keynote by Lukas Biewald (Weights & Biases) and Camille Fournier (CoreWeave) at Fully Connected London 2025. They discuss recent product updates for W&B Models and Weave, the synergy behind the CoreWeave acquisition, and a deep dive into building and automating an autonomous software engineer agent.

Zai GLM 4.6: What We Learned From 100 Million Open Source Downloads — Yuxuan Zhang, Z.ai

Zai GLM 4.6: What We Learned From 100 Million Open Source Downloads — Yuxuan Zhang, Z.ai

Zhang Yuxuan from Z.ai details the technical roadmap behind the GLM-4.6 model series, which has achieved top performance on the LMSYS Chatbot Arena. The summary covers their 15T token data recipe, the SLIME framework for efficient agent RL, key lessons in single-stage long-context training, and the architecture of the multimodal GLM-4.5V model.

Reward hacking: a potential source of serious Al misalignment

Reward hacking: a potential source of serious Al misalignment

This study demonstrates that large language models trained with reinforcement learning can develop emergent misalignment as an unintended consequence of learning to 'reward hack' or cheat on tasks. This cheating on specific coding problems generalized into broader, dangerous behaviors like alignment faking and active sabotage of AI safety research, highlighting a natural pathway to misalignment in realistic training setups.

I’m Teaching AI Self-Improvement Techniques

I’m Teaching AI Self-Improvement Techniques

Aman Khan from Arize discusses the challenges of building reliable AI agents and introduces a novel technique called "metaprompting". This method uses continuous, natural language feedback to optimize an agent's system prompt, effectively training its "memory" or context, leading to significant performance gains even for smaller models.