Multimodal ai

How Google’s Nano Banana Achieved Breakthrough Character Consistency

How Google’s Nano Banana Achieved Breakthrough Character Consistency

Nicole Brichtova and Hansa Srinivasan, the leads behind Google's Nano Banana image model, detail the technical breakthroughs in character consistency. They discuss how a focus on high-quality data, Gemini's multimodal architecture, and rigorous human evaluation enabled the model to realistically represent individuals from a single photo. The conversation covers the future of visual AI, moving beyond text prompts to specialized UIs, and the ultimate goal of a single, powerful model that can transform any modality into another, unlocking new applications in personalized education, professional design, and creative storytelling.

How LiveKit Became An AI Company By Accident

How LiveKit Became An AI Company By Accident

Russ d'Sa, CEO of LiveKit, recounts the company's unexpected journey from a pandemic-era open-source WebRTC project to becoming a crucial infrastructure provider for AI voice interfaces, most notably for OpenAI's ChatGPT. He details the serendipitous moments that led to this pivot and shares his vision for LiveKit as the nervous system for a multimodal AI future.

Introducing gpt-realtime in the API

Introducing gpt-realtime in the API

An overview of the new GPT-realtime speech-to-speech model and the general availability of the Real-Time API, detailing its architecture, advanced capabilities like image input and multilingualism, training methodology, and new enterprise-ready features.

Building Production-Grade RAG at Scale

Building Production-Grade RAG at Scale

Douwe Kiela, CEO of Contextual AI, explains the evolution from basic RAG to "RAG 2.0", an end-to-end, trainable system. He argues that this system-level approach, which integrates optimized document parsing, retrieval, reranking, and grounded models, is superior to relying on massive context windows alone and is a fundamental tool for next-generation AI agents.