Gpu

Why AI needs a new kind of supercomputer network — the OpenAI Podcast Ep. 18

Why AI needs a new kind of supercomputer network — the OpenAI Podcast Ep. 18

OpenAI's Mark Handley and Greg Steinbrecher detail Multipath Reliable Connection (MRC), a new networking protocol designed to overcome the unique challenges of large-scale AI model training. They explain how moving intelligence to the network's edge creates a resilient, efficient, and simple system that handles constant hardware failures without disrupting massive, synchronized GPU workloads.

The Small Model Infrastructure Nobody Built (So We Did) — Filip Makraduli, Superlinked

The Small Model Infrastructure Nobody Built (So We Did) — Filip Makraduli, Superlinked

Filip Makraduli from Superlinked discusses the common infrastructure gaps and profiling mistakes encountered when deploying small embedding and transformer models. He introduces the Superlinked Inference Engine (SIE), an open-source solution designed for dynamic model loading, hot-swapping, and memory-aware eviction to maximize GPU utilization and streamline the path from development to production.

Baseten CEO Tuhin Srivastava on Custom Models, and Building the Inference Cloud

Baseten CEO Tuhin Srivastava on Custom Models, and Building the Inference Cloud

Baseten CEO Tuhin Srivastava discusses the explosive growth in AI inference, driven by the adoption of specialized and post-trained open-source models. He covers the strategic importance of owning the software layer on top of compute, navigating the severe GPU supply crunch with a multi-cloud fabric, the evolving landscape of AI workloads, and the operational lessons learned from scaling 30x in one year.

Enter the Matrix • Conor Hoekstra • YOW! 2025

Enter the Matrix • Conor Hoekstra • YOW! 2025

Conor Hoekstra demonstrates how to achieve exponential productivity by combining AI-assisted development, array programming, and high-performance computing. Using a financial dashboard app built entirely with AI (Vibe Coding), he showcases a custom array-based DSL with a dual backend (interpreted BQN and compiled NVIDIA Parrot for GPUs), urging developers to fully embrace modern tools and elevate their expectations of what is possible.

Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs

Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs

Chris Fregly discusses his new book, "AI Systems Performance Engineering", covering the co-design and optimization of hardware, software, and algorithms across PyTorch, CUDA, and NVIDIA GPUs. The talk explores GPU architecture, system-level reliability challenges, and the use of modern coding agents for low-level kernel optimization.

Why AI Engineers Need to Understand GPU Hardware (with Chris Fregly)

Why AI Engineers Need to Understand GPU Hardware (with Chris Fregly)

Chris Fregly, author of 'AI Systems Performance Engineering', explains that true performance gains in AI come not from raw compute but from a deep, holistic understanding of the entire hardware and software stack. He emphasizes that memory bandwidth is the most critical GPU metric and introduces the concept of 'mechanical sympathy'—the co-design of hardware, software, and algorithms—as the key to unlocking efficiency and overcoming modern bottlenecks.