Introducing Sora 2
A detailed overview of OpenAI's announcement of Sora 2, a flagship video and audio generation model, and the new Sora app, which introduces novel features like "Cameo" for personalized content creation and a new social experience.
A detailed overview of OpenAI's announcement of Sora 2, a flagship video and audio generation model, and the new Sora app, which introduces novel features like "Cameo" for personalized content creation and a new social experience.
Shivam Mehta from KTH presents a method for teaching Large Language Models (LLMs) to understand and generate audio by treating it as a discrete language. The approach involves a two-step process: first, creating an ultra-low bitrate (0.293 kbps) audio representation using a causal variational autoencoder, and second, fine-tuning a Llama 7B model with these audio tokens using LoRA.