Model evaluation

Fully Connected 2025 kickoff: The rise (and the challenges) of the agentic era

Fully Connected 2025 kickoff: The rise (and the challenges) of the agentic era

Robin Bordoli of Weights & Biases explores AI's exponential growth, from past achievements to the current agentic landscape. He discusses the rise of reinforcement learning, the challenge of productionizing reliable agents, and highlights how foundational issues in AI development persist even as model capabilities soar.

Beyond Chatbots: How to build Agentic AI systems with Google Gemini // Philipp Schmid

Beyond Chatbots: How to build Agentic AI systems with Google Gemini // Philipp Schmid

A deep dive into the evolution from static chatbots to dynamic, agentic AI systems. Philipp Schmid of Google DeepMind explores how to design, build, and evaluate AI agents that leverage structured outputs, function calling, and workflow orchestration with Google Gemini, covering key agentic patterns and the future of AI development.

Traditional vs LLM Recommender Systems: Are They Worth It?

Traditional vs LLM Recommender Systems: Are They Worth It?

This summary explores Arpita Vats's insights on how Large Language Models (LLMs) are revolutionizing recommender systems. It contrasts the traditional feature-engineering-heavy approach with the contextual understanding of LLMs, which shifts the focus to prompt engineering. Key challenges like inference latency and cost are discussed, along with practical solutions such as lightweight models, knowledge distillation, and hybrid architectures. The conversation also touches on advanced applications like sequential recommendation and the future potential of agentic AI.

Open AI Researchers Breakdown GPT-5

Open AI Researchers Breakdown GPT-5

OpenAI researchers discuss the step-change in capabilities in ChatGPT-5, from coding and reasoning to creative writing. They detail the data-centric training processes, the shift toward asynchronous agentic workflows, and the future of AI development and its impact on the startup ecosystem.

The 2025 AI Engineering Report — Barr Yaron, Amplify

The 2025 AI Engineering Report — Barr Yaron, Amplify

Barr Yaon of Amplify Partners presents early findings from the 2025 State of AI Engineering survey, covering LLM usage, customization techniques like RAG and fine-tuning, the state of AI agents, key challenges like evaluation, and community perspectives on the future of AI.

Prompt Engineering for Generative AI • James Phoenix, Mike Taylor & Phil Winder

Prompt Engineering for Generative AI • James Phoenix, Mike Taylor & Phil Winder

Authors James Phoenix and Mike Taylor discuss the evolution of prompt engineering from a creative art to a rigorous engineering discipline. They cover the core principles of prompting, the importance of programmatic evaluation, the role of agents, and how to manage application lifecycles as models evolve.