Sre

Platform Engineering: From Theory to Practice • Liz Fong-Jones & Lesley Cordero

Platform Engineering: From Theory to Practice • Liz Fong-Jones & Lesley Cordero

Liz Fong-Jones and Lesley Cordero explore the evolution of platform engineering from its DevOps and SRE roots, discussing the challenges of building effective developer platforms, the importance of psychological safety, the complexities of open source sustainability, and the delicate balance between centralized platform teams and developer autonomy.

Reliability Engineering Mindset • Alex Ewerlöf & Charity Majors • GOTO 2025

Reliability Engineering Mindset • Alex Ewerlöf & Charity Majors • GOTO 2025

Alex Ewerlöf, author of "Reliability Engineering Mindset," discusses the significant gap between Google's idealized SRE practices and the resource-constrained reality of most companies. The conversation focuses on making Service Level Objectives (SLOs) practical by tying Service Level Indicators (SLIs) directly to business impact, using them as a data-driven communication tool to negotiate reliability costs, and moving from a "best practice" to a "fit practice" mindset.

AI Agents: Transforming Anomaly Detection & Resolution

AI Agents: Transforming Anomaly Detection & Resolution

Martin Keen explores how agentic AI can significantly reduce IT downtime and Mean Time To Repair (MTTR) by moving beyond naive data dumps and embracing context-aware analysis. The key lies in using topology-aware correlation to curate relevant data for an AI agent, which can then systematically identify the root cause, provide explainable insights, and generate actionable remediation steps, ultimately augmenting human SREs rather than replacing them.

From DevOps ‘Heart Attacks’ to AI-Powered Diagnostics With Traversal’s AI Agents

From DevOps ‘Heart Attacks’ to AI-Powered Diagnostics With Traversal’s AI Agents

Anish Agarwal and Raj Agrawal, co-founders of Traversal, discuss how their AI agents automate root cause analysis (RCA) for critical system failures. They detail their agent's architecture, which leverages causal inference and large-scale computation to systematically find the root cause in minutes, and argue that the rise of AI-generated code makes AI-powered debugging an essential capability for modern software engineering.