Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten
A deep dive into SGLang, an open-source serving framework for LLMs. This summary covers its core features, history, performance optimization techniques like CUDA Graph and Eagle 3 speculative decoding, and how to contribute to the project.