Model Architecture

Model architecture

Apr 29, 2026

Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

Maxime Labonne from Liquid AI shares a playbook for post-training frontier small models (under 1GB) for on-device deployment. The talk breaks down the LFM2.5 recipe, which includes on-policy preference alignment and agentic reinforcement learning, and addresses unique challenges at the 1B scale, such as capability interference and 'doom loops', offering concrete solutions to build efficient models for tasks like data extraction and tool use.

Apr 21, 2026

Building Generative Image & Video models at Scale - Sander Dieleman (Veo and Nano Banana)

Sander Dieleman from Google DeepMind provides a behind-the-scenes look at the key components of training large-scale diffusion models for audio-visual data. The talk covers the entire pipeline, from the critical role of data curation and latent representations to the mechanics of diffusion, network architectures, sampling with guidance, and advanced control signals.

Dec 29, 2025

Memory in LLMs: Weights and Activations - Jack Morris, Cornell

This talk explores the limitations of current methods for providing knowledge to LLMs, such as large context windows and Retrieval-Augmented Generation (RAG). The speaker argues that the future lies in training knowledge directly into the model's weights. This is achieved through a combination of generating large synthetic datasets from small amounts of source material and using parameter-efficient fine-tuning (PEFT) techniques like LoRA to avoid catastrophic forgetting. The goal is to create more capable, personalized, and efficient models by fundamentally altering how they store and access information.

Aug 29, 2025

GPT-OSS vs. Qwen vs. Deepseek: Comparing Open Source LLM Architectures

A technical breakdown and comparison of the architectures, training methodologies, and post-training techniques of three leading open-source models: OpenAI's GPT-OSS, Alibaba's Qwen-3, and DeepSeek V3. The summary explores their different approaches to Mixture-of-Experts, long-context, and attention mechanisms.