Real time ai

Flipping the Inference Stack — Robert Wachen, Etched

Flipping the Inference Stack — Robert Wachen, Etched

The current AI inference stack, reliant on general-purpose GPUs, is economically and technically unsustainable for real-time AI at scale. AI hardware expert Robert Wachen argues that the future is specialized hardware, like Transformer-specific ASICs, which can unlock currently bottlenecked applications such as real-time video, code generation, and large-scale enterprise deployments by solving critical latency and cost-per-user challenges.

Pipecat Cloud: Enterprise Voice Agents Built On Open Source - Kwindla Hultman Kramer, Daily

Pipecat Cloud: Enterprise Voice Agents Built On Open Source - Kwindla Hultman Kramer, Daily

A deep dive into the challenges of building production-grade, low-latency voice AI agents, and how the open-source, vendor-neutral framework Pipecat provides a comprehensive solution for development, deployment, and scaling. Learn about voice AI architecture, the trade-offs between speech-to-speech and text-based models, and practical deployment strategies.

Your realtime AI is ngmi — Sean DuBois (OpenAI), Kwindla Kramer (Daily)

Your realtime AI is ngmi — Sean DuBois (OpenAI), Kwindla Kramer (Daily)

Sean DuBois (OpenAI, Pion) and Kwindla Hultman Kramer (Daily, Pipecat) argue that to build successful real-time AI applications, developers must start from the network layer up, prioritizing WebRTC over WebSockets to manage latency effectively and enable advanced features like interruption and state management.

Serving Voice AI at $1/hr: Open-source, LoRAs, Latency, Load Balancing - Neil Dwyer, Gabber

Serving Voice AI at $1/hr: Open-source, LoRAs, Latency, Load Balancing - Neil Dwyer, Gabber

An in-depth look at Gabber's experience deploying the Orpheus text-to-speech model to production, covering latency optimization, high-fidelity LoRa-based voice cloning, and a cost-effective inference stack using vLLM and a consistent hash ring for load balancing.