Tokenless

Interactive discovery

Explore the topic map

Follow the connections between themes, people, and ideas across the Tokenless archive in an interactive topic modeling map.

Machine Learning

View All
Frontier results, on device - RL Nabors, Arize

Frontier results, on device - RL Nabors, Arize

RL Nabors discusses the significant costs associated with using frontier AI models, covering security, latency, and financial implications. She introduces a framework for right-sizing AI solutions by leveraging smaller, task-specific models and Small Language Models (SLMs). The framework details how to prove task feasibility, establish success criteria with golden datasets, conduct capability evaluations (using tools like Phoenix), and select the most appropriate "Small And Good Enough" (SAGE) model. Nabors further demonstrates how prompt engineering, particularly few-shot prompting, and post-processing can close performance gaps with larger models, while advocating for continuous regression evaluations to maintain performance integrity. The overarching message is to "prototype big, deploy small" to optimize AI deployments.

Research to Reality: Bringing Frontier ML Research to Production - Vaidas Razgaitis, Higharc

Research to Reality: Bringing Frontier ML Research to Production - Vaidas Razgaitis, Higharc

Vaidas Razgaitis, Senior Research Engineer at Higharc, shares three tactical tips to accelerate the transition of novel AI/ML research into production-ready features. He emphasizes addressing the critical handoff challenge between ML researchers and software engineers through structured documentation (Research Prototype Taxonomy Document), a well-organized monorepo utilizing decoupled microservices, and a systematic approach to code decomposition and PR review. These strategies aim to improve legibility, maintainability, and delivery speed for ML-driven products.

Uncertainty-Guided Data Augmentation for Engineers | Deep Dive - Yongmin Kwon

Uncertainty-Guided Data Augmentation for Engineers | Deep Dive - Yongmin Kwon

This session details a data-efficient method for training engineering surrogate models by using uncertainty quantification (UQ) to guide geometric data augmentation. Instead of random deformations, the approach lets the deep ensemble model identify its own knowledge gaps (epistemic uncertainty), then uses Free-Form Deformation (FFD) to generate new shapes specifically in those uncertain regions. This ensures every expensive simulation run yields maximally informative data, significantly improving model accuracy for a fixed computational budget across domains like structural mechanics and aerodynamics.

Artificial Intelligence

View All
Grant Sanderson (@3blue1brown) – AI and the future of math

Grant Sanderson (@3blue1brown) – AI and the future of math

Grant Sanderson and Dwarkesh Patel discuss AI's rapid but uneven progress in mathematics, exploring whether AI can achieve true conceptual breakthroughs, the challenge of measuring creativity, and the long-term implications for human understanding and the future roles of mathematicians. They delve into the unique 'grindability' of math for AI training, the potential of formalization, and why AI currently struggles with 'theory of mind' in writing, offering advice for students navigating an AI-transformed world.

Sustainable Augmented Development • Kent Beck • YOW! 2025

Sustainable Augmented Development • Kent Beck • YOW! 2025

Kent Beck's presentation at YOW! Australia 2025 explores the transformative impact of augmented development ('the genie') on software engineering. He argues that AI shifts programming into an exploratory phase, requiring a re-evaluation of traditional practices. Beck introduces the concept of 'resting between the notes' to foster optionality over mere feature churn, and makes a compelling case for the increased value of junior developers as AI tools become powerful learning aids, emphasizing that 'nobody knows' the future but we must 'take our time' to adapt effectively.

How KV Cache Speeds Up LLMs for Faster AI Models on GPUs

How KV Cache Speeds Up LLMs for Faster AI Models on GPUs

LLMs often slow down under heavy traffic due to inefficient GPU memory management during inference. This overview explains how KV cache and Paged Attention, implemented in VLLM, optimize memory usage across prefill and decode phases, significantly boosting LLM throughput, reducing latency, and improving GPU utilization through advanced context handling and specific tuning techniques like prefix caching and speculative decoding.

Technology

View All
Platforms: Build Abstractions, not Illusions • Gregor Hohpe • GOTO 2025

Platforms: Build Abstractions, not Illusions • Gregor Hohpe • GOTO 2025

Gregor Hohpe explains the critical role of platforms in managing the growing cognitive load on developers due to complex distributed systems. He contrasts platforms, driven by "economies of speed" and fostering innovation through diversity, with traditional IT services and oversimplified abstractions that create dangerous illusions. Hohpe emphasizes building platforms that provide intuitive, domain-specific abstractions to solve real business problems, rather than just repackaging existing cloud services.

Full Stack Greenfield Projects : Are they still relevant?

Full Stack Greenfield Projects : Are they still relevant?

Bharat Goenka, co-founder of Tally, discusses the company's unconventional approach to software development through "Full Stack Greenfield" projects. He explains why building every component from scratch, despite being a high-risk strategy, has been crucial for Tally's success in serving the SMB market, fostering extreme customer loyalty, and aspiring to connect 200 million businesses. The talk delves into the historical context, the philosophy of questioning and choosing constraints, and the distinction between product and custom engineering.

3‑2‑1 Backup Rule Explained: Protect Your Data from Disaster

3‑2‑1 Backup Rule Explained: Protect Your Data from Disaster

Jeff Crume outlines essential data resiliency strategies, starting with the 3-2-1 backup rule—three copies, two media types, one offsite—and expanding to include immutable or air-gapped backups, rigorous testing, and encryption. He emphasizes these principles for robust disaster recovery, ransomware protection, and minimizing costly downtime, highlighting the trade-offs in achieving high availability.


Recent Post

AI Won't Take Your Job—It Will Make You the CEO | The a16z Show

AI Won't Take Your Job—It Will Make You the CEO | The a16z Show

Balaji Srinivasan discusses the paradoxical nature of AI, which lowers creation costs while simultaneously raising verification costs. He argues this tension pushes society toward a "trusted tribe" model, similar to the Chinese internet, where AI excels within high-trust groups but struggles between them. The conversation covers why physical tasks are easier to automate than digital ones, how AI makes everyone a CEO rather than obsolete, and why crypto, particularly Zcash, serves as a necessary counterbalance for inter-tribe transactions in an AI-driven world.

SpaceX IPO & AI data centers in space

SpaceX IPO & AI data centers in space

A discussion on the feasibility of AI data centers in space, the user backlash against Bluesky's AI assistant "Attie," and the fine line between using AI as a tool (cognitive offloading) and relinquishing thought (cognitive surrender).

Moonlake: Multimodal, Interactive, and Efficient World Models — with Fan-yun Sun and Chris Manning

Moonlake: Multimodal, Interactive, and Efficient World Models — with Fan-yun Sun and Chris Manning

Moonlake AI presents a distinctive approach to world modeling, prioritizing interactive, action-conditioned environments built on symbolic representations and game engines over purely pixel-based generative models. This method focuses on causal reasoning, long-term consistency, and programmable rendering (via their 'Reverie' diffusion model) to create dynamic, multiplayer worlds, positioning itself as a platform for training embodied AI and revolutionizing game development.

How Bots, Deepfakes and AI Agents Are Forcing a New Internet Identity Layer | Alex Blania on a16z

How Bots, Deepfakes and AI Agents Are Forcing a New Internet Identity Layer | Alex Blania on a16z

Alex Blania, cofounder and CEO of Tools for Humanity (Worldcoin), details the critical challenge of proving human uniqueness in the AI era. He explains Worldcoin's iris biometric approach, its sophisticated privacy architecture using Multi-Party Computation and Zero-Knowledge Proofs, and the pervasive impact of AI agents and deepfakes on social media, dating, gaming, and government. Blania also outlines Worldcoin's strategy to scale this proof-of-human network globally, particularly in the US.

Learning API Styles • Lukasz Dynowski & Sam Newman • GOTO 2026

Learning API Styles • Lukasz Dynowski & Sam Newman • GOTO 2026

This GOTO Book Club episode features an in-depth conversation between Sam Newman and Lukasz Dynowski, co-author of "Learning API Styles," exploring the foundational network layer of APIs, various API styles, critical trade-off decisions, and future trends like WebTransport and gRPC. The discussion emphasizes treating APIs as products, understanding consumer context, and the eight key characteristics of a well-designed API, complemented by a cautionary tale on database access.

Large-scale agentic quant research with Weights & Biases

Large-scale agentic quant research with Weights & Biases

Explore how Weights & Biases (W&B) enhances reliability, reproducibility, and explainability in large-scale, agent-driven quantitative research. This video demonstrates two core applications: debugging multi-agent alpha research pipelines with W&B Weave to identify root causes and iterate on forecasts, and automating strategy optimization using W&B Models to tune agent weights and gain insights from performance convergence and parallel coordinate plots.

Stay In The Loop! Subscribe to Our Newsletter.

Get updates straight to your inbox. No spam, just useful content.