Multimodal models

Zai GLM 4.6: What We Learned From 100 Million Open Source Downloads — Yuxuan Zhang, Z.ai

Zai GLM 4.6: What We Learned From 100 Million Open Source Downloads — Yuxuan Zhang, Z.ai

Zhang Yuxuan from Z.ai details the technical roadmap behind the GLM-4.6 model series, which has achieved top performance on the LMSYS Chatbot Arena. The summary covers their 15T token data recipe, the SLIME framework for efficient agent RL, key lessons in single-stage long-context training, and the architecture of the multimodal GLM-4.5V model.

Waymo: The future of autonomous driving with Vincent Vanhoucke

Waymo: The future of autonomous driving with Vincent Vanhoucke

Waymo Distinguished Engineer Vincent Vanhoucke discusses the core challenges of autonomous driving, explaining how Waymo fuses data from cameras, LiDAR, and radar to build a robust perception system. He delves into the "closed-loop" problem, the critical role of generative AI and simulation in training and validation, and how modern multimodal models are used in a teacher-student framework to distill vast world knowledge into the vehicle's onboard system, aiming for a safety standard that surpasses human performance.

Waymo's EMMA: Teaching Cars to Think - Jyh Jing Hwang, Waymo

Waymo's EMMA: Teaching Cars to Think - Jyh Jing Hwang, Waymo

An exploration of Waymo's research into EMMA, an End-to-End Multimodal Model for Autonomous Driving. This summary details how foundation models like Gemini are being adapted to create a single, generalizable system that processes raw sensor data directly into driving decisions, aiming to solve the long-tail problem and improve scalability. It also covers the use of generative AI for advanced sensor simulation and model evaluation.