Tokenization

Training an LLM from Scratch, Locally — Angelos Perivolaropoulos, ElevenLabs

Training an LLM from Scratch, Locally — Angelos Perivolaropoulos, ElevenLabs

A practical guide to the engineering principles and trade-offs involved in training a small language model from scratch on a local machine, based on a workshop by Angelos Perivolaropoulos from ElevenLabs.

How AI Agents Will Transform the Financial System with Circle Co-Founder and CEO Jeremy Allaire

How AI Agents Will Transform the Financial System with Circle Co-Founder and CEO Jeremy Allaire

Circle CEO Jeremy Allaire delves into how programmable money and the Arc blockchain will power the emerging AI agentic economy. He explains how stablecoins like USDC offer an internet-native financial infrastructure for micro-transactions and large settlements, addressing the limitations of traditional banking for AI agents. The discussion covers the foundational principles of full-reserve banking, the unique attributes of Arc blockchain for machine-driven economic activity, the tokenization of real-world assets, and a bold vision for AI's potential to drive double-digit GDP growth and foster new on-chain organizational structures within the next decade.

Now Is The Best Time To Build In Crypto

Now Is The Best Time To Build In Crypto

A summary of the conversation between YC's Harj Taggar and Base's Jesse Pollak about the 'golden age of crypto,' covering the evolution to Fintech 3.0, the technological and regulatory shifts enabling it, and the key opportunities for founders in stablecoins, tokenization, and the intersection of AI and crypto.

Make some noise: Teaching the language of audio to an LLM using sound tokens

Make some noise: Teaching the language of audio to an LLM using sound tokens

Shivam Mehta from KTH presents a method for teaching Large Language Models (LLMs) to understand and generate audio by treating it as a discrete language. The approach involves a two-step process: first, creating an ultra-low bitrate (0.293 kbps) audio representation using a causal variational autoencoder, and second, fine-tuning a Llama 7B model with these audio tokens using LoRA.

Inside GPT – The Maths Behind the Magic • Alan Smith • GOTO 2024

Inside GPT – The Maths Behind the Magic • Alan Smith • GOTO 2024

A deep dive into the internal workings of Large Language Models like GPT, explaining the journey from a text prompt through tokenization, embeddings, and the attention mechanism to generate a response.