Ci cd

CI/CD Evolution: From Pipelines to AI-Powered DevOps • Olaf Molenveld & Julian Wood

CI/CD Evolution: From Pipelines to AI-Powered DevOps • Olaf Molenveld & Julian Wood

CircleCI's Olaf Molenveld and AWS's Julian Wood explore the evolution of CI/CD, drawing parallels between managing production code and the 'factory' that builds it. They cover the shift to microservices pipelines, optimization strategies, platform engineering trends, and how AI is set to reshape DevOps by acting as an expert system for developers.

Evals Are Not Unit Tests — Ido Pesok, Vercel v0

Evals Are Not Unit Tests — Ido Pesok, Vercel v0

Ido Pesok from Vercel explains why LLM-based applications often fail in production despite successful demos, and presents a systematic framework for building reliable AI systems using application-layer evaluations ("evals").

The Cloud Native Attitude • Anne Currie & Sarah Wells

The Cloud Native Attitude • Anne Currie & Sarah Wells

Authors Anne Currie and Sarah Wells discuss the core principles of "The Cloud Native Attitude", defining it not as a specific technology stack but as a cultural mindset focused on removing bottlenecks and enabling rapid, iterative change. The summary covers the primacy of CI/CD, the evolution of orchestrators like Kubernetes, and how a cloud native approach is a critical enabler for building sustainable, green software.

Ship Agents that Ship: A Hands-On Workshop - Kyle Penfound, Jeremy Adams, Dagger

Ship Agents that Ship: A Hands-On Workshop - Kyle Penfound, Jeremy Adams, Dagger

A detailed summary of a workshop on building and deploying production-minded AI coding agents using Dagger. The session covers creating controlled, observable, and test-driven agent workflows and integrating them into CI/CD systems like GitHub Actions for automated, reliable software development.

Strategies for LLM Evals (GuideLLM, lm-eval-harness, OpenAI Evals Workshop) — Taylor Jordan Smith

Strategies for LLM Evals (GuideLLM, lm-eval-harness, OpenAI Evals Workshop) — Taylor Jordan Smith

Traditional benchmarks and leaderboards are insufficient for production AI. This summary details a practical, multi-layered evaluation strategy, moving from foundational system performance to factual accuracy and finally to safety and bias, using open-source tools like GuideLLM, lm-eval-harness, and Promptfoo.