Build Hour: GPT-Realtime-2
Explore GPT-Realtime-2, OpenAI's advanced voice AI model, through practical demos and a deep dive with Sierra on building production-grade, low-latency voice agents with complex reasoning and tool use.
Explore GPT-Realtime-2, OpenAI's advanced voice AI model, through practical demos and a deep dive with Sierra on building production-grade, low-latency voice agents with complex reasoning and tool use.
Neil Zeghidour, CEO of Gradium AI, deconstructs the gap between current voice AI and the "Her" ideal. He argues that while cascaded systems are practical, they are architecturally flawed for natural conversation. The future lies in full-duplex, speech-to-speech models that not only solve latency but also integrate deep paralinguistic understanding and overcome significant cost barriers.
In a YC Founder Fireside chat, AssemblyAI founder Dylan Fox discusses his journey from a solo founder in 2017 to leading a major voice AI infrastructure platform. He covers the early challenges, the technological shifts that fueled growth, the development of intelligent, promptable voice models, and the lessons learned in scaling a deep-tech company.
Founders of Simple AI, Catheryn Li & Zach Kamran, discuss their journey from building consumer apps to creating an AI sales agent that handles inbound calls for major brands. They cover their pivot, the technical challenges of integrating with legacy systems, and how their AI outperforms human reps by leveraging hyper-personalization and rapid A/B testing.
Mati and Piotr, the founders of ElevenLabs, discuss their journey from a weekend project to a major player in voice AI. They cover their unique remote-first culture, their philosophy of combining product and research, and their vision for voice as the next fundamental human-computer interface, aiming to create AI that can pass a 'vocal Turing test'.
Panos Stravopodis, CTO of Elyos AI, shares the infrastructure and orchestration challenges of building production-ready voice AI agents. He details the four pillars for success—latency, consistency, context, and recovery—and provides engineering patterns for error handling, context management, and achieving conversational coherence in real-time systems.