Cuda

Your Coding Agent Should Do AI System Engineering — Ben Burtenshaw, Hugging Face

Your Coding Agent Should Do AI System Engineering — Ben Burtenshaw, Hugging Face

Ben Burtenshaw from Hugging Face demonstrates how coding agents are tackling complex AI systems engineering tasks. He outlines a three-tiered approach: interactively writing CUDA kernels, autonomously fine-tuning LLMs, and deploying a multi-agent research lab (AutoLab) to parallelize experiments, all powered by file-based "skills" and open primitives on the Hugging Face Hub.

Jensen Huang – Will Nvidia’s moat persist?

Jensen Huang – Will Nvidia’s moat persist?

Nvidia CEO Jensen Huang discusses the company's core strategy, which he defines as transforming electrons into tokens by orchestrating a vast supply chain. He details how Nvidia's true moat lies in its ecosystem and its ability to manage supply bottlenecks. Huang contrasts Nvidia's versatile 'accelerated computing' platform with competitors like TPUs, arguing programmability via CUDA is key to AI innovation. He also presents a strong case against broad AI chip export controls on China, warning it could backfire by forcing the creation of a competing tech stack. Finally, he explains why Nvidia invests in the ecosystem rather than becoming a hyperscaler itself.

Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs

Performance Optimization and Software/Hardware Co-design across PyTorch, CUDA, and NVIDIA GPUs

Chris Fregly discusses his new book, "AI Systems Performance Engineering", covering the co-design and optimization of hardware, software, and algorithms across PyTorch, CUDA, and NVIDIA GPUs. The talk explores GPU architecture, system-level reliability challenges, and the use of modern coding agents for low-level kernel optimization.

Why AI Engineers Need to Understand GPU Hardware (with Chris Fregly)

Why AI Engineers Need to Understand GPU Hardware (with Chris Fregly)

Chris Fregly, author of 'AI Systems Performance Engineering', explains that true performance gains in AI come not from raw compute but from a deep, holistic understanding of the entire hardware and software stack. He emphasizes that memory bandwidth is the most critical GPU metric and introduces the concept of 'mechanical sympathy'—the co-design of hardware, software, and algorithms—as the key to unlocking efficiency and overcoming modern bottlenecks.

OpenAI, Oracle & AMD shake up AI

OpenAI, Oracle & AMD shake up AI

The panel discusses the shifting AI hardware landscape as Oracle and OpenAI bet on AMD, challenging Nvidia's dominance. They also analyze a US government report on the risks of the DeepSeek model, debate the viability of Reflection AI's new $2B open-source venture, and dissect the story of a VC fund replacing analysts with AI agents.

Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten

Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten

A deep dive into SGLang, an open-source serving framework for LLMs. This summary covers its core features, history, performance optimization techniques like CUDA Graph and Eagle 3 speculative decoding, and how to contribute to the project.