Ci cd

Evals Are Not Unit Tests — Ido Pesok, Vercel v0

Evals Are Not Unit Tests — Ido Pesok, Vercel v0

Ido Pesok from Vercel explains why LLM-based applications often fail in production despite successful demos, and presents a systematic framework for building reliable AI systems using application-layer evaluations ("evals").

The Cloud Native Attitude • Anne Currie & Sarah Wells

The Cloud Native Attitude • Anne Currie & Sarah Wells

Authors Anne Currie and Sarah Wells discuss the core principles of "The Cloud Native Attitude", defining it not as a specific technology stack but as a cultural mindset focused on removing bottlenecks and enabling rapid, iterative change. The summary covers the primacy of CI/CD, the evolution of orchestrators like Kubernetes, and how a cloud native approach is a critical enabler for building sustainable, green software.

Ship Agents that Ship: A Hands-On Workshop - Kyle Penfound, Jeremy Adams, Dagger

Ship Agents that Ship: A Hands-On Workshop - Kyle Penfound, Jeremy Adams, Dagger

A detailed summary of a workshop on building and deploying production-minded AI coding agents using Dagger. The session covers creating controlled, observable, and test-driven agent workflows and integrating them into CI/CD systems like GitHub Actions for automated, reliable software development.

Strategies for LLM Evals (GuideLLM, lm-eval-harness, OpenAI Evals Workshop) — Taylor Jordan Smith

Strategies for LLM Evals (GuideLLM, lm-eval-harness, OpenAI Evals Workshop) — Taylor Jordan Smith

Traditional benchmarks and leaderboards are insufficient for production AI. This summary details a practical, multi-layered evaluation strategy, moving from foundational system performance to factual accuracy and finally to safety and bias, using open-source tools like GuideLLM, lm-eval-harness, and Promptfoo.