LLM

Aug 06, 2025

How to look at your data — Jeff Huber (Choma) + Jason Liu (567)

A detailed summary of a talk by Jeff Huber (Chroma) and Jason Liu on systematically improving AI applications. The talk covers using fast, inexpensive evaluations for retrieval systems (inputs) and applying structured data analysis and clustering to conversational logs (outputs) to derive actionable product insights.

Aug 06, 2025

Evals Are Not Unit Tests — Ido Pesok, Vercel v0

Ido Pesok from Vercel explains why LLM-based applications often fail in production despite successful demos, and presents a systematic framework for building reliable AI systems using application-layer evaluations ("evals").

Aug 06, 2025

The Hidden Bottlenecks Slowing Down AI Agents

Paul van der Boor and Bruce Martens from Prosus discuss the real bottlenecks in AI agent development, arguing that the primary challenges are not tools, but rather evaluation, data quality, and feedback loops. They detail their 'buy-first' philosophy, the practical reasons they often build in-house, and how new coding agents like Devon and Cursor are changing their development workflows.

Aug 05, 2025

AI Coding Agents Change Software Development Forever

A discussion on the promise and limitations of coding agents, covering key challenges like verification and debugging, and exploring how they can support developers through improved abstraction, collaboration, and handling long-term tasks.

Aug 03, 2025

Full Workshop: Realtime Voice AI — Mark Backman, Daily

An in-depth look at building real-time, production-grade voice AI agents using the open-source Pipecat framework. This summary covers the core concepts of voice AI pipelines, the shift to speech-to-speech models like Gemini Live, and advanced techniques for managing latency, context, and turn-taking.

Aug 03, 2025

Practical tactics to build reliable AI apps — Dmitry Kuchin, Multinear

Moving an AI PoC from 50% to 100% reliability requires a new development paradigm. This talk introduces a practical, evaluations-first approach, reverse-engineering tests from real-world user scenarios and business outcomes to build a robust benchmark, prevent regressions, and enable confident optimization.

← Previous Next →