The Protocol Stack Nobody Audited: Four Agent Protocols and the Cross-Protocol Audit Gap

KellerAI

Section 01

Abstract

A production enterprise agent rarely speaks a single protocol. It composes four complementary ones: the Model Context Protocol for tools, Agent-to-Agent for agent coordination, the Agent Communication Protocol for lightweight messaging, and the agentic-commerce layer for transactions. Vendors describe these four as a single layered stack, each protocol a complement to the others. ¹ ²

This paper makes a different argument from the rest of the KellerAI catalog. Where the companion work on the MCP supply chain locates a governance problem inside one protocol, this paper locates it between protocols. Each of the four was designed in isolation, by a different team, with its own authentication model and its own decision about what to record. MCP's 2026 roadmap lists structured audit trails as future work. ³ A2A delegates credential acquisition to an out-of-band process outside its own scope. ⁴ ACP is minimal by design and documents no audit standard. ⁵ The composition of four reasonable designs produces an accountability void no single design document addresses.

The method is explicit and self-limiting. This is a design-document analysis and a first-principles audit-gap inference, not an incident reconstruction. No public breach has been attributed to a cross-protocol audit failure as of 2026-05-30. The paper proceeds in three movements: it lays out the four protocols and their auth and audit models from primary sources; it traces a single request across all four layers to expose the discontinuity; and it maps the regulatory exposure and sketches what a cross-protocol audit surface would require. An Honest Limits section closes the argument by stating plainly what it does not establish.

For the leadership-level version of this argument — the four protocols, the confused-deputy anchor, and the four-line checklist — read the companion brief, The Protocol Stack Nobody Audited . ²⁴

Each protocol documents what it audits. None documents what falls between them. The seam between two complete audit logs is where accountability disappears.

The thesis

Section 02

The Four-Protocol Stack

The Model Context Protocol is the agent-to-tool layer. Introduced by Anthropic and reporting tens of millions of SDK downloads across a large public ecosystem, it is the most widely deployed of the four and the one most enterprises adopt first. ⁶ It defines how an agent host discovers and invokes the tools a server exposes, and it is the layer where the catalog's companion work on the MCP supply chain concentrates.

Agent-to-Agent is the agent-to-agent coordination layer. Google contributed A2A to the Linux Foundation in mid-2025, and it reached v1.0 in April 2026 with the backing of more than a hundred and fifty organizations, among them AWS, Cisco, Microsoft, Salesforce, SAP, and ServiceNow. ⁷ ²² A2A introduces signed Agent Cards in its v1.0 specification and explicitly positions itself as complementary to MCP rather than as a competitor. ²

The Agent Communication Protocol is the lightweight messaging layer. Originating at IBM Research and likewise donated to the Linux Foundation in early 2025, ACP is deliberately HTTP-native and minimal, built for low-ceremony REST messaging between services rather than for heavyweight orchestration. ⁵ Its minimalism is a design virtue at its own layer and, as we will see, an audit liability at the seam.

The agentic-commerce layer — UCP — is Google's protocol for cart, catalog, and checkout when an agent transacts on a user's behalf, with cart and catalog flows appearing in early 2026. ⁸ ²³ UCP is the narrowest of the four, confined to commerce, and we treat it accordingly: the cross-protocol audit argument is strongest for the MCP, A2A, and ACP triad, and we flag every place it leans on UCP.

A scoping honesty note belongs here. Some vendor guides put concrete adoption figures on this stack — for instance, a claim that a large share of Fortune-500 organizations already run MCP. We treat such figures as illustrative vendor estimates rather than measured facts and do not build any argument on a specific adoption percentage. ⁹ The argument that follows depends only on the uncontroversial observation that enterprises increasingly run more than one of these protocols at once.

Four protocols, contributed by four organizations, each one reasonable at its own boundary. The stack is the sum; the audit surface is not.

The composition

Section 03

Each Protocol's Auth and Audit Model

The factual core of this paper is a side-by-side reading of how each protocol authenticates a caller and what each protocol records. Read together, the four answers do not compose into a single coherent audit model, and the gaps line up at the boundaries.

MCP. The specification provides for OAuth 2.1, but the deployed population tells a different story. A March 2026 study of more than five thousand public servers found OAuth adoption at roughly 8.5%, with the overwhelming majority relying on static API keys and a meaningful fraction exposing no source at all. ¹⁰ The protocol's own 2026 roadmap names four enterprise-readiness gaps as future work: structured audit trails and SIEM integration, single-sign-on authentication, gateway authorization propagation, and configuration portability. ³ In other words, the protocol's maintainers agree that the audit and authorization primitives are not yet shipped.

A2A. The specification supports OAuth2, API keys, and mutual TLS, and v1.0 adds signed Agent Cards. But its security model contains a sentence that matters enormously for audit: credentials are “obtained through an out-of-band process outside the scope of A2A,” and authorization is delegated to the remote agent. ⁴ A protocol that records who was called but not how the caller obtained the right to call has a hole exactly where an auditor needs a record.

ACP. ACP is minimal HTTP by design and, in its published material, documents no audit standard at all. ⁵ This is a strong-absence claim, and we mark it as such: we are citing the lack of a documented standard, not asserting that no implementation ever logs. The point is that the protocol provides no contract for what a conformant ACP message should record.

UCP. The commerce layer uses request signatures and idempotency keys, but these are transaction-scoped by design. ⁸ A transaction-scoped signature proves that a single checkout was authentic; it says nothing about the agentic chain that decided to make the purchase, which is precisely the chain an auditor of an autonomous agent must reconstruct.

Laid out as a comparison — protocol, authentication model, documented audit standard, and spec-acknowledged gaps — the four rows do not share a column. MCP has a roadmap of named gaps; A2A has an out-of-band credential hole; ACP has no documented audit standard; UCP has a transaction-bounded one. There is no row for the composition, because no protocol owns the composition.

MCP: 8.5% OAuth, audit trails on the roadmap. A2A: credentials obtained out-of-band, outside scope. ACP: no documented audit standard. UCP: transaction-scoped only. No shared column.

The factual core

Section 04

The Confused-Deputy Problem

The closest empirical anchor to the cross-protocol gap is a problem documented within a single protocol: the confused deputy in MCP. At RSAC 2026, the Coalition for Secure AI's “Securing MCP” work posed the question every agentic audit must answer — who authorized this action, through what chain, and with what scope? ¹¹ The confused deputy arises when a component acts on a principal's authority without preserving the record of how that authority was delegated. A token passed downstream and reused lets a service act as though it were the original principal, and the audit log shows the service, not the principal who first authorized the chain.

The Coalition's response, an Agentic IAM Framework published in April 2026, prescribes signed manifests, continuous authorization, and on-behalf-of tokens that carry the delegation chain explicitly. ¹² These are the right primitives, and they directly address the confused deputy — within the MCP layer. The framework is scoped to intra-MCP identity and access management; it does not define how a delegation that originates in A2A and terminates in an MCP tool call is recorded as one chain across the boundary. ¹²

That the intra-layer problem is real and not hypothetical is established by the CVE record. The mcp-remote flaw, CVE-2026-30624, carried a CVSS score of 9.6 across a package downloaded on the order of hundreds of thousands of times, and it was one of seven MCP-related CVEs reported across a twelve-month window; researchers also reported tens of thousands of agent instances exposed directly to the public internet. ¹³ These are MCP-layer findings, and we present them as exactly that: a demonstration that the single-protocol audit problem already has teeth.

The honest framing follows directly. The confused deputy is a documented intra-MCP problem with a documented intra-MCP remedy. Carrying it across protocol boundaries — an A2A delegation whose downstream MCP tool call inherits an authority no single log records in full — is an analytical extension of a known pattern, not a reconstructed cross-protocol incident. No such incident is public as of 2026-05-30, and the rest of this paper is built on inference clearly labeled as inference.

The confused deputy is documented inside MCP and addressed inside MCP. The cross-protocol version is the same pattern carried across a seam no framework yet spans — an inference, stated as one.

The anchor and its limit

Section 05

The Cross-Protocol Audit Gap

This section is the core of the paper, and it is explicitly analytical rather than incident-based. No public breach has been attributed to a cross-protocol audit failure as of 2026-05-30. The argument that follows is a first-principles inference from the four protocols' documented designs, and we mark every inferential step.

Consider a single piece of agentic work as it traverses the stack. A coordinating agent receives a task and delegates a sub-task over A2A to a specialist agent. The specialist, to complete the sub-task, makes several MCP tool calls against connected servers. One of those tools emits an ACP message to a downstream service. And the workflow concludes with a UCP checkout that commits a transaction. ¹ This is not an exotic path; it is the “complete enterprise stack” the vendors describe, exercised once. ²

Now ask what each layer recorded. A2A logged a delegation, but the credential that authorized it was obtained out-of-band, outside A2A's scope, so the granting step is invisible in A2A's own record. ⁴ MCP logged its tool calls, to the extent the deployment configured logging at all, but MCP's structured audit trail is on the roadmap, not in the box. ³ ACP carried the message, but documents no audit standard that would tie the message back to the tool call that emitted it. ⁵ UCP signed the transaction, but its signature is scoped to that checkout, not to the delegation chain that decided on it. ⁸

Each layer's log is complete within its own boundary, and that completeness is the trap. The auditor who wants to answer the Coalition's question — who authorized this action, through what chain, and with what scope — finds four complete logs and no through-line connecting them. ¹¹ The A2A out-of-band credential is the first discontinuity; the absence of a shared correlation identifier across the four layers is the second; and the confused deputy, when it appears, compounds across the boundary rather than within it, so the impersonation that the MCP IAM framework would catch inside MCP slips through at the A2A-to-MCP seam. ¹²

The regulatory consequence is the sharpest way to see the gap. NIST's AI Risk Management Framework Govern function and the EU AI Act's record-keeping and post-market obligations in Articles 17 and 72 ask for a system-level reconstruction of what an automated system did and on whose authority. ¹⁴ ¹⁵ The per-layer logs cannot produce that reconstruction, because the system spans layers and no layer is responsible for the span. The obligation is system-level; the evidence is layer-level; the difference is the gap.

Each protocol's audit log is complete within its own boundary. The boundary is not where the attacker stops — and it is not where the regulator's question ends, either.

The core claim

Section 06

What the Designers Say

A fair reading of the protocols' designers strengthens the argument rather than weakening it, because the designers are right about their own layers. A2A describes itself as complementary to MCP, not a replacement, and the ecosystem literature treats the four protocols as a coherent layered stack precisely because each was built to do one job well. ² ¹ The complementarity is real. The gap is not a failure of any one design; it is the predictable consequence of four good designs that were never required to share an audit contract.

The MCP maintainers, to their credit, name the missing primitives themselves. The 2026 roadmap's inclusion of structured audit trails and SIEM integration is an admission of intent, and intent is not shipment. ³ ²¹ We phrase the status carefully throughout: these items are on the roadmap as of 2026-05-30, which means an operator who needs them today cannot assume they exist.

The market has produced partial answers at the MCP layer. Third-party gateways advertise MCP-layer SIEM export and policy enforcement, turning the single protocol's tool calls into auditable events. ¹⁶ These are genuinely useful, and an operator should run one. But they audit the MCP layer; in our survey of the available tooling we found no gateway that correlates an A2A delegation, the MCP tool calls it spawns, the ACP message that follows, and the UCP transaction that concludes into a single cross-protocol audit record. ¹⁶ That absence is a strong-absence claim, offered as the state of the tooling we could find rather than a proof that no such system could exist.

The steel-man, then, is that the designers did exactly what they set out to do, the market is filling the most acute single-layer gap, and the stewardship of two of the four protocols under one foundation creates at least the institutional possibility of a shared standard. The argument of this paper survives all of that, because none of it produces, today, the cross-protocol correlation a system-level audit requires.

Section 07

Regulatory Exposure Map

A regulated organization does not need any of these protocols named in a statute to be obligated to audit the system it runs on them. The framing throughout this section is deliberate and inferential: these obligations apply to systems that compose these protocols, not to the protocols as regulated objects, and we flag each mapping as one to verify against current text rather than as a settled determination.

The NIST AI Risk Management Framework's Map and Govern functions ask an organization to understand and govern the risks of the system it deploys, including risks introduced by third-party components — which every protocol in a composed stack is. ¹⁴ For financial institutions, the Federal Reserve and OCC's SR 11-7 guidance on model risk management expects an institution to understand and document a model's end-to-end behavior; an agent that acts across four protocols is, for SR 11-7 purposes, a model whose behavior must be reconstructable. ¹⁷

For systems in scope of the EU AI Act, the record-keeping obligation of Article 17 and the post-market monitoring obligation of Article 72 bear directly on a system that cannot reconstruct its own cross-layer actions. ¹⁵ ISO/IEC 42001, the AI management-system standard, similarly expects documented operational control over the AI system as a whole, not over its protocols one at a time. ¹⁸

Sector rules sharpen the same point. The HIPAA Security Rule's audit-controls standard at 45 CFR 164.312(b) requires mechanisms to record and examine activity in systems that handle protected health information — a requirement an agent that touches PHI across four protocols can satisfy only at the system level. ¹⁹ And under SOC 2, the change-management criterion CC8.1 expects controls over changes to a system that an organization cannot demonstrate if it cannot reconstruct what the composed system did. ²⁰ In every case the obligation is system-level reconstruction, and in every case the per-protocol logs fall short of it for the same structural reason.

Every framework that governs an AI system asks for a system-level record. A stack of four protocols, each auditing only itself, cannot produce one — not because any protocol failed, but because no protocol owns the system.

The exposure

Section 08

What a Cross-Protocol Audit Surface Would Require

This section is proposed rather than reported. No standard for a cross-protocol audit surface has shipped as of 2026-05-30, and the four requirements below describe what such a surface would need rather than what any product delivers. They follow directly from the discontinuities named in Section 5.

01A correlation identifier across boundaries. A single identifier minted at the first action — the A2A delegation — and propagated through every MCP tool call, ACP message, and UCP transaction it spawns. The correlation ID is the through-line that lets four complete logs be reassembled into one request, and it is the one primitive whose absence makes every other audit control local.
02A unified event schema. A shared record shape — actor, action, resource, and the authorization that permitted it — emitted at every layer so that a cross-layer query can join events. Without a common schema, even logs that happen to share a correlation ID cannot be compared, because they describe the same event in four incompatible vocabularies.
03SIEM-exportable, system-level events. Events that leave the protocol layer and land in a system-level store an auditor can query, rather than logs trapped inside each protocol's own tooling. The MCP roadmap names SIEM integration as future work for one layer; a cross-protocol surface needs it for all four, normalized into one stream.
04An authorization-chain trace. A record that carries the delegation chain explicitly — who authorized whom, on whose behalf, with what scope — so the A2A out-of-band credential and the confused-deputy hand-off both become visible rather than implicit. This is the cross-boundary generalization of the on-behalf-of token the MCP IAM work already prescribes within MCP.

The nearest existing building block is the Coalition for Secure AI's work on signed manifests and on-behalf-of tokens. Those primitives were designed to carry an authorization chain explicitly, and a cross-protocol audit surface is, in part, the generalization of that idea from inside MCP to across the stack. ¹² But generalizing it requires the protocols to agree on a shared identifier and a shared schema, and no such agreement exists today. We name the requirement and decline to name a vendor, because there is no standard to endorse and no product that spans the four layers.

One temptation to resist is a precise figure for how much of this work is already done in the field. We have seen claims — for example, that a large majority of agent deployments keep a human in the loop — offered as evidence that the gap is mitigated in practice. We could not locate a primary source for such figures and therefore do not rely on them; a human in the loop is, in any case, not a substitute for a cross-protocol audit record, since the human, too, needs the reconstruction this section describes.

The posture this paper recommends is therefore the same in spirit as the governance posture its companion recommends for the MCP supply chain: route the composition through a control point that can see across it, rather than trusting that four well-audited layers add up to one audited system. The MCP supply chain needed governance rather than a patch; the agent protocol stack needs an audit of its composition rather than four audits of its parts.

A correlation ID across boundaries, a unified event schema, SIEM-exportable system-level events, and an explicit authorization-chain trace. None has shipped. All four follow from the gap.

The proposal

Section 09

Honest Limits

A paper that argued for cross-protocol audit while overstating its own evidence would fail its own thesis. This section states plainly what the paper does not establish.

First and most important, the central argument is analytical, not empirical. No public breach has been attributed to a cross-protocol audit failure as of 2026-05-30, and we have not reconstructed one. The confused-deputy problem is a documented intra-MCP phenomenon; carrying it across protocol boundaries is a first-principles inference from the protocols' published designs, and we have labeled it as inference at every step rather than dressing it as incident. ¹¹ ¹³

Second, the evidence base is thinner than for a documented-incident paper, and the sources vary in strength. The MCP and A2A design facts rest on primary specifications and well-sourced reporting; the ACP audit claim is a strong-absence claim built on sparse primary material and secondary sources, and we have marked it as the absence of a documented standard rather than proof that no implementation logs. ⁵ We sought primary sources for the Coalition framework and the protocol-adoption figures and have flagged where only secondary or vendor material was available. ⁹ ¹²

Third, the MCP roadmap items are intent, not shipment. Structured audit trails, SIEM integration, and single-sign-on appear on the roadmap as of 2026-05-30, and an operator who needs them today cannot assume they exist; if and when they ship, the MCP-layer portion of this paper's argument weakens accordingly, and that is as it should be. ³

Fourth, UCP is commerce-scoped, and the cross-protocol argument is strongest for the MCP, A2A, and ACP triad. Where the argument reaches into UCP, it leans on the transaction-scoped nature of UCP's signatures rather than on a broad commerce audit gap, and a reader who excludes UCP entirely loses none of the core claim. ⁸

Fifth, the regulatory mappings are inferential. We have framed each as an obligation that applies to systems composing these protocols, not as a determination that any framework names any protocol or that any specific deployment is in scope. Whether SR 11-7, the EU AI Act's Articles 17 and 72, ISO/IEC 42001, HIPAA's audit-controls standard, or SOC 2's CC8.1 applies to a particular system is a fact-specific question requiring verification against current text and the system's own classification. ¹⁴ ¹⁵ ¹⁹

Finally, the proposed audit surface in Section 8 is a requirements sketch, not a specification or a product. No standard for cross-protocol correlation has shipped, and we name no vendor as the remedy. The contribution of this paper is the diagnosis — that four well-designed protocols compose into a system whose audit surface is less than the sum of its parts — and the claim that the seam, not the part, is where the accountability work now belongs.

For the short, leadership-level version of this argument — the four protocols, the confused-deputy anchor, and the Correlate / Unify / Locate / Map checklist — read the companion brief, The Protocol Stack Nobody Audited .

Not that a breach has happened — none has been attributed — but that four complete audit logs do not compose into one, and the system the regulator audits is the composition, not the part.

The honest promise

The Protocol Stack Nobody Audited

Context

The Finding