Llm

A2A:The Agent-to-Agent Protocol

A2A:The Agent-to-Agent Protocol

Heiko Hotz and Sokratis Kartakis of Google Cloud introduce the Agent-to-Agent (A2A) protocol, a new open standard for enabling stateful, secure, and asynchronous collaboration between AI agents built on different frameworks. They contrast it with tool-use protocols like MCP and discuss its microservices-like architectural benefits.

Orchestrating Complex AI Workflows with AI Agents & LLMs

Orchestrating Complex AI Workflows with AI Agents & LLMs

Eric Pritchett, President and COO of Terzo, explains the transformative impact of AI agents and LLMs on workflow orchestration. He contrasts the goal-oriented, flexible nature of AI agents with the limitations of traditional RPA, illustrating how a multi-agent system can automate complex processes like quote generation, marking a paradigm shift in automation capabilities.

Columbia CS Professor: Why LLMs Can’t Discover New Science

Columbia CS Professor: Why LLMs Can’t Discover New Science

Professor Vishal Misra of Columbia University introduces a formal model for understanding Large Language Models (LLMs) based on information theory. He explains how LLMs reason by navigating "Bayesian manifolds", using concepts like token entropy to explain the mechanics of chain-of-thought, and defines true AGI as the ability to create new manifolds rather than just exploring existing ones.

MCP vs gRPC: How AI Agents & LLMs Connect to Tools & Data

MCP vs gRPC: How AI Agents & LLMs Connect to Tools & Data

A deep dive into how AI agents connect to external tools, comparing the AI-native Model Context Protocol (MCP) with the high-performance gRPC framework. The summary explores their respective architectures, discovery mechanisms, and performance trade-offs, concluding with a vision for their complementary roles in future AI systems.

Evals Aren't Useful? Really?

Evals Aren't Useful? Really?

A deep dive into the critical importance of robust evaluation for building reliable AI agents. The summary covers bootstrapping evaluation sets, advanced testing techniques like multi-turn simulations and red teaming, and the necessity of integrating traditional software engineering and MLOps practices into the agent development lifecycle.

Evals in Action: From Frontier Research to Production Applications

Evals in Action: From Frontier Research to Production Applications

An overview of OpenAI's approach to AI evaluation, covering the GDP-val benchmark for frontier models and the practical tools available for developers to evaluate their own custom agents and applications.