Inference

The CEO Behind the Fastest-Growing AI Inference Company | Tuhin Srivastava

The CEO Behind the Fastest-Growing AI Inference Company | Tuhin Srivastava

Tuhin Srivastava, CEO of Baseten, joins Gradient Dissent to discuss the core challenges of AI inference, from infrastructure and runtime bottlenecks to the practical differences between vLLM, TensorRT-LLM, and SGLang. He shares how Baseten navigated years of searching for a market before the explosion of large-scale models, emphasizing a company-building philosophy focused on avoiding premature scaling and "burning the boats" to chase the biggest opportunities.

Building the Real-World Infrastructure for AI, with Google, Cisco & a16z

Building the Real-World Infrastructure for AI, with Google, Cisco & a16z

AI is driving an unprecedented buildout of physical infrastructure. Experts from Google and Cisco discuss the "AI industrial revolution," where power, compute, and networking are the new scarce resources, demanding a complete reinvention of the technology stack from silicon to software.

Nvidia CTO Michael Kagan: Scaling Beyond Moore's Law to Million-GPU Clusters

Nvidia CTO Michael Kagan: Scaling Beyond Moore's Law to Million-GPU Clusters

Nvidia CTO Michael Kagan explains how the Mellanox acquisition was key to scaling AI infrastructure from single GPUs to million-GPU data centers. He covers the critical role of networking in system performance, the shift from training to inference workloads, and his vision for AI's future in scientific discovery.

No Priors Ep. 126 | With Cloudfare CEO Matthew Prince

No Priors Ep. 126 | With Cloudfare CEO Matthew Prince

Matthew Prince, CEO of Cloudflare, discusses the internet's architectural and economic shift from a search-driven model to an AI-native one. He outlines the existential threat to content creators as AI consumes content without providing traffic, and proposes a new marketplace where creators are compensated for providing value and filling knowledge gaps, rather than generating clicks.

Solving AI Video: How Fal.ai is making AI Video Generation Fatser & Easier

Solving AI Video: How Fal.ai is making AI Video Generation Fatser & Easier

Fal co-founder Burkay Gur and head of engineering Batuhan Taskaya discuss their journey building a high-performance generative media cloud. They cover their strategic pivot to media models, core optimization principles born from early GPU scarcity, and the development of a customer-obsessed culture to navigate the fast-paced AI model landscape.

Flipping the Inference Stack — Robert Wachen, Etched

Flipping the Inference Stack — Robert Wachen, Etched

The current AI inference stack, reliant on general-purpose GPUs, is economically and technically unsustainable for real-time AI at scale. AI hardware expert Robert Wachen argues that the future is specialized hardware, like Transformer-specific ASICs, which can unlock currently bottlenecked applications such as real-time video, code generation, and large-scale enterprise deployments by solving critical latency and cost-per-user challenges.