Latency

May 20, 2026

The Latency Goldilocks Zone Explained

Rafael Borger and Daniel Wolbert from iFood discuss the engineering and product strategy behind ILO-Agent, their conversational AI for 200 million users. They cover hyper-personalized recommendation systems, the "Latency Goldilocks Zone" where AI responses can be too fast for users to trust, and the architectural challenges of building multi-channel agents for text and voice.

May 09, 2026

Voice AI: when is the "Her" moment? — Neil Zeghidour, Gradium AI

Neil Zeghidour, CEO of Gradium AI, deconstructs the gap between current voice AI and the "Her" ideal. He argues that while cascaded systems are practical, they are architecturally flawed for natural conversation. The future lies in full-duplex, speech-to-speech models that not only solve latency but also integrate deep paralinguistic understanding and overcome significant cost barriers.

Feb 18, 2026

Build Hour: Prompt Caching

Explore prompt caching to significantly reduce latency and costs for your AI applications. This guide breaks down the mechanics of KV caching, best practices for maximizing cache hits using `prompt_cache_key` and the Responses API, and real-world implementation insights from the agentic development platform, Warp.

Feb 12, 2026

Inference at Scale:Breaking the Memory Wall

Sid Sheth, CEO of d-matrix, details their memory-centric approach to AI inference hardware, focusing on their Digital In-Memory Compute (DIMC) architecture. He explains how DIMC, an augmented SRAM technology, minimizes data movement to solve the memory bottleneck, delivering significant gains in latency and energy efficiency, particularly for the 'decode' phase of large language models.

Nov 13, 2025

Building Voice Agents Just Got Easier

Anoop Dawar from Deepgram discusses the evolution of voice AI, from basic transcription to sophisticated, real-time voice agents. He covers the key technical challenges in production, such as latency and interruption handling, and introduces Deepgram's Flux system. The talk concludes with a look at the future of speech-to-speech models that can understand emotional nuance, moving closer to passing the audio Turing Test.

Jul 31, 2025

Your realtime AI is ngmi — Sean DuBois (OpenAI), Kwindla Kramer (Daily)

Sean DuBois (OpenAI, Pion) and Kwindla Hultman Kramer (Daily, Pipecat) argue that to build successful real-time AI applications, developers must start from the network layer up, prioritizing WebRTC over WebSockets to manage latency effectively and enable advanced features like interruption and state management.