Speech recognition

Building Voice Agents Just Got Easier

Building Voice Agents Just Got Easier

Anoop Dawar from Deepgram discusses the evolution of voice AI, from basic transcription to sophisticated, real-time voice agents. He covers the key technical challenges in production, such as latency and interruption handling, and introduces Deepgram's Flux system. The talk concludes with a look at the future of speech-to-speech models that can understand emotional nuance, moving closer to passing the audio Turing Test.

Distant conversational speech recognition: Challenges and Opportunities

Distant conversational speech recognition: Challenges and Opportunities

Dr. Samuele Cornell from Carnegie Mellon University discusses the persistent challenges in distant automatic speech recognition (DASR) for spontaneous, multi-party conversations. He explains why state-of-the-art systems falter in real-world scenarios and presents recent advancements through three key efforts: (1) insights from the CHiME-7/8 DASR challenges, which benchmark robust meeting transcription; (2) progress towards unified end-to-end models that jointly handle diarization and recognition; and (3) novel techniques for generating realistic, large-scale training data using a combination of large language models and multi-speaker text-to-speech systems.

How DeepL Built a Translation Powerhouse with AI with CEO Jarek Kutylowski

How DeepL Built a Translation Powerhouse with AI with CEO Jarek Kutylowski

Jarek Kutylowski, CEO of DeepL, discusses the company's technical strategy for competing with large language models in the translation space. He covers their focus on specialized model architectures, the critical role of curated data, the engineering challenges of building custom GPU data centers and large-scale inference systems, and the future of AI-driven translation in enterprise workflows.