Performance engineering

Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs

Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs

Chris Fregly discusses his new book, "AI Systems Performance Engineering", covering the co-design and optimization of hardware, software, and algorithms across PyTorch, CUDA, and NVIDIA GPUs. The talk explores GPU architecture, system-level reliability challenges, and the use of modern coding agents for low-level kernel optimization.

Why AI Engineers Need to Understand GPU Hardware (with Chris Fregly)

Why AI Engineers Need to Understand GPU Hardware (with Chris Fregly)

Chris Fregly, author of 'AI Systems Performance Engineering', explains that true performance gains in AI come not from raw compute but from a deep, holistic understanding of the entire hardware and software stack. He emphasizes that memory bandwidth is the most critical GPU metric and introduces the concept of 'mechanical sympathy'—the co-design of hardware, software, and algorithms—as the key to unlocking efficiency and overcoming modern bottlenecks.