Llm agents

Benchmarking semantic code retrieval on Claude Code — Kuba Rogut, Turbopuffer

Benchmarking semantic code retrieval on Claude Code — Kuba Rogut, Turbopuffer

A detailed benchmark analysis comparing raw Claude Code's performance with windowed grep and Turbopuffer's semantic search for code retrieval in LLM agents. The study reveals significant improvements in file precision (65% to 87%) and reduced wasted reads (1 in 3 to 1 in 8) with semantic search, while highlighting the importance of the agent's understanding of when to use retrieval tools.

Bending a Public MCP Server Without Breaking It — Nimrod Hauser, Baz

Bending a Public MCP Server Without Breaking It — Nimrod Hauser, Baz

Learn practical strategies to adapt third-party MCP server tools for production AI applications. This talk covers five key practices: curating tools, enhancing descriptions, implementing deterministic guardrails, composing new tools from existing ones, and leveraging tools as simple functions, all demonstrated through a real-world "Spec Reviewer" example.

Large-scale agentic quant research with Weights & Biases

Large-scale agentic quant research with Weights & Biases

Explore how Weights & Biases (W&B) enhances reliability, reproducibility, and explainability in large-scale, agent-driven quantitative research. This video demonstrates two core applications: debugging multi-agent alpha research pipelines with W&B Weave to identify root causes and iterate on forecasts, and automating strategy optimization using W&B Models to tune agent weights and gain insights from performance convergence and parallel coordinate plots.

OpenClaw's Memory Sucks and the fix is simple — Dhravya Shah, Supermemory

OpenClaw's Memory Sucks and the fix is simple — Dhravya Shah, Supermemory

Dhravya Shah, founder of Super Memory, details the evolution of his company from a simple RAG-based consumer app to a sophisticated, open-source context infrastructure for AI, and introduces a novel hooks-based memory solution for OpenClaw.

Build Hour: Agent RFT

Build Hour: Agent RFT

Will Hang and Theophile Sautory from OpenAI provide a deep dive into Agent RFT, a powerful method for fine-tuning large language models to become more effective, tool-using agents. They explain how Agent RFT enables models to learn directly from their interactions with custom tools and reward signals, leading to significant improvements in performance, latency, and efficiency on specialized tasks. The session includes a detailed code demo, best practices, and success stories from companies like Cognition, Ambience, and Rogo.

Introducing serverless reinforcement learning: Train reliable AI agents without worrying about GPUs

Introducing serverless reinforcement learning: Train reliable AI agents without worrying about GPUs

Kyle Corbett and Daniel from CoreWeave (formerly Openpipe) discuss the practical advantages of Reinforcement Learning (RL) over Supervised Fine-Tuning (SFT) for building reliable and efficient AI agents. They introduce Serverless RL, a new platform designed to eliminate the infrastructure complexities of RL training, and share a playbook for teams looking to get started.