Speech to speech

Building Voice Agents Just Got Easier

Building Voice Agents Just Got Easier

Anoop Dawar from Deepgram discusses the evolution of voice AI, from basic transcription to sophisticated, real-time voice agents. He covers the key technical challenges in production, such as latency and interruption handling, and introduces Deepgram's Flux system. The talk concludes with a look at the future of speech-to-speech models that can understand emotional nuance, moving closer to passing the audio Turing Test.

Build Hour: Voice Agents

Build Hour: Voice Agents

A deep dive into building sophisticated voice agents using OpenAI's Realtime API and Agents SDK. The session covers architectural patterns like chained vs. end-to-end models, the use of multi-agent systems with handoffs for specialized tasks, and best practices for production including debugging with traces, implementing guardrails, and creating robust evaluations.

Introducing gpt-realtime in the API

Introducing gpt-realtime in the API

An overview of the new GPT-realtime speech-to-speech model and the general availability of the Real-Time API, detailing its architecture, advanced capabilities like image input and multilingualism, training methodology, and new enterprise-ready features.

Pipecat Cloud: Enterprise Voice Agents Built On Open Source - Kwindla Hultman Kramer, Daily

Pipecat Cloud: Enterprise Voice Agents Built On Open Source - Kwindla Hultman Kramer, Daily

A deep dive into the challenges of building production-grade, low-latency voice AI agents, and how the open-source, vendor-neutral framework Pipecat provides a comprehensive solution for development, deployment, and scaling. Learn about voice AI architecture, the trade-offs between speech-to-speech and text-based models, and practical deployment strategies.

Why Voice Security Is Your Next Big Problem

Why Voice Security Is Your Next Big Problem

Yishay Carmiel and Roy Zanbel of Apollo Defend explore the state of voice AI, detailing the shift from cascaded models to end-to-end speech-to-speech systems. They break down the imminent security threats, including accessible voice cloning and sophisticated agent-based attacks, and discuss the nascent defense mechanisms and the urgent need for a new layer of voice security for governments, enterprises, and consumers.