Communication bottleneck

Quantized LLM Training at Scale with ZeRO++ // Guanhua Wang // AI in Production 2025

Quantized LLM Training at Scale with ZeRO++ // Guanhua Wang // AI in Production 2025

Guanhua Wang from Microsoft's DeepSpeed team explains ZeRO++, a system that tackles the communication bottleneck in large-scale LLM training. By quantizing weights and gradients, ZeRO++ reduces communication volume by 4x, leading to training speedups of over 2x, particularly in low-bandwidth and small-batch-size environments.