Inference

Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten

Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten

A deep dive into SGLang, an open-source serving framework for LLMs. This summary covers its core features, history, performance optimization techniques like CUDA Graph and Eagle 3 speculative decoding, and how to contribute to the project.

How DeepL Built a Translation Powerhouse with AI with CEO Jarek Kutylowski

How DeepL Built a Translation Powerhouse with AI with CEO Jarek Kutylowski

Jarek Kutylowski, CEO of DeepL, discusses the company's technical strategy for competing with large language models in the translation space. He covers their focus on specialized model architectures, the critical role of curated data, the engineering challenges of building custom GPU data centers and large-scale inference systems, and the future of AI-driven translation in enterprise workflows.