Lo ra

Introducing Our Approach to Design Document Review Using Business-Specific Large Language Models

Introducing Our Approach to Design Document Review Using Business-Specific Large Language Models

Hitachi's Financial Business Unit developed a specialized LLM to automate the review of system design documents, addressing the inadequacy of general-purpose AI for mission-critical systems. This presentation details the model's development using Continued Pre-training and LoRA on proprietary data, its integration into a multi-agent architecture, and the use of Weights & Biases for MLOps, which led to a 70% reduction in manual review workload.

Post-training best-in-class models in 2025

Post-training best-in-class models in 2025

An expert overview of post-training techniques for language models, covering the entire workflow from data generation and curation to advanced algorithms like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning (RL), along with practical advice on evaluation and iteration.

Memory in LLMs: Weights and Activations - Jack Morris, Cornell

Memory in LLMs: Weights and Activations - Jack Morris, Cornell

This talk explores the limitations of current methods for providing knowledge to LLMs, such as large context windows and Retrieval-Augmented Generation (RAG). The speaker argues that the future lies in training knowledge directly into the model's weights. This is achieved through a combination of generating large synthetic datasets from small amounts of source material and using parameter-efficient fine-tuning (PEFT) techniques like LoRA to avoid catastrophic forgetting. The goal is to create more capable, personalized, and efficient models by fundamentally altering how they store and access information.

Streamline evaluation, monitoring, optimization of AI data flywheel with NVIDIA and Weights & Biases

Streamline evaluation, monitoring, optimization of AI data flywheel with NVIDIA and Weights & Biases

A walkthrough of the NVIDIA Data Flywheel Blueprint, demonstrating how to use production data and Weights & Biases to systematically fine-tune AI agents. This process enhances model accuracy and efficiency by creating a continuous improvement cycle, moving beyond the limitations of prompt engineering.

Serving Voice AI at $1/hr: Open-source, LoRAs, Latency, Load Balancing - Neil Dwyer, Gabber

Serving Voice AI at $1/hr: Open-source, LoRAs, Latency, Load Balancing - Neil Dwyer, Gabber

An in-depth look at Gabber's experience deploying the Orpheus text-to-speech model to production, covering latency optimization, high-fidelity LoRa-based voice cloning, and a cost-effective inference stack using vLLM and a consistent hash ring for load balancing.

Make some noise: Teaching the language of audio to an LLM using sound tokens

Make some noise: Teaching the language of audio to an LLM using sound tokens

Shivam Mehta from KTH presents a method for teaching Large Language Models (LLMs) to understand and generate audio by treating it as a discrete language. The approach involves a two-step process: first, creating an ultra-low bitrate (0.293 kbps) audio representation using a causal variational autoencoder, and second, fine-tuning a Llama 7B model with these audio tokens using LoRA.