Audio generation

Introducing Sora 2

Introducing Sora 2

A detailed overview of OpenAI's announcement of Sora 2, a flagship video and audio generation model, and the new Sora app, which introduces novel features like "Cameo" for personalized content creation and a new social experience.

Make some noise: Teaching the language of audio to an LLM using sound tokens

Make some noise: Teaching the language of audio to an LLM using sound tokens

Shivam Mehta from KTH presents a method for teaching Large Language Models (LLMs) to understand and generate audio by treating it as a discrete language. The approach involves a two-step process: first, creating an ultra-low bitrate (0.293 kbps) audio representation using a causal variational autoencoder, and second, fine-tuning a Llama 7B model with these audio tokens using LoRA.