Fine tuning

Task Fidelity Scaling Laws — Kobie Crawdord, Snorkel

Task Fidelity Scaling Laws — Kobie Crawdord, Snorkel

An experiment by Snorkel AI reveals that in agentic AI training, the quality of tasks is paramount. Using the same model and compute, fine-tuning on high-quality tasks yielded a 6% performance improvement, a 5x greater uplift compared to the 1% gain from low-quality tasks. The key difference lies in the nature of the tasks: high-quality tasks are genuinely harder, featuring more tool calls and cleaner failure modes that provide a meaningful learning signal. In contrast, low-quality tasks often fail due to ambiguity and environmental noise, hindering effective model improvement.

Your Coding Agent Should Do AI System Engineering — Ben Burtenshaw, Hugging Face

Your Coding Agent Should Do AI System Engineering — Ben Burtenshaw, Hugging Face

Ben Burtenshaw from Hugging Face demonstrates how coding agents are tackling complex AI systems engineering tasks. He outlines a three-tiered approach: interactively writing CUDA kernels, autonomously fine-tuning LLMs, and deploying a multi-agent research lab (AutoLab) to parallelize experiments, all powered by file-based "skills" and open primitives on the Hugging Face Hub.

Your Agent Can Now Train Models — Merve Noyan, Hugging Face

Your Agent Can Now Train Models — Merve Noyan, Hugging Face

Merve Noyan from Hugging Face discusses how open-source models have achieved parity with closed-source counterparts, highlighting the Hugging Face ecosystem built to support this shift. She covers tools for model selection, local agent deployment, and the transformative "Hugging Face Skills" that allow agents to automate complex ML engineering tasks like fine-tuning models with a single prompt.

Lessons from Trillion Token Deployments at Fortune 500s — Alessandro Cappelli, Adaptive ML

Lessons from Trillion Token Deployments at Fortune 500s — Alessandro Cappelli, Adaptive ML

95% of GenAI pilots fail due to feedback integration issues, not deployment challenges. Alessandro Cappelli argues that Reinforcement Learning (RL) provides the only systematic way to incorporate business metrics and production signals to continuously improve models, especially for complex agent-based systems.

TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google

TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google

Cormac Brick from Google's AI Edge team details the dual trends of on-device AI: large, system-level models like Gemma 4 enabling complex agent skills, and fine-tuned tiny LLMs for high-performance, in-app tasks. The summary covers the architecture of on-device function calling, the engineering trade-offs for edge deployment, and the practical workflow for fine-tuning and deploying models under 1B parameters on platforms like Android and iOS.

Gemma, DeepMind's Family of Open Models — Omar Sanseviero, Google DeepMind

Gemma, DeepMind's Family of Open Models — Omar Sanseviero, Google DeepMind

A deep dive into Google DeepMind's Gemma 4, the latest family of open models. This summary covers the new model architectures like per-layer embeddings, on-device agentic capabilities, multimodal features, and the growing ecosystem of fine-tuned applications from medicine to sovereign AI.