Skip to main content
kellerai.blog

Assure the action, not the model

Assurance for an AI agent must attach to what it does, not to the model it runs.

KellerAI White Paper · Regulation & Compliance · Jun 2026

Context

Autonomous LLM agents now take irreversible actions in production — moving money, deleting records, deploying code, sending communications on a firm's behalf. The assurance regimes built for the last generation of software target the wrong unit: you cannot certify a trillion-parameter model that is non-deterministic, changes weekly, and has no enumerable specification. Banking's SR 26-2 (the 2026 successor to SR 11-7), aviation's DO-178C, and autonomous-systems' UL 4600 each solved a piece of the assurance problem for their own domain — but none gates the individual runtime action an agent is about to commit.

The Finding

The LLM-Agent Assurance Standard (LAAS) attaches assurance to the action and makes consequence gate-derived: the gate — never the agent — computes an action's consequence tier from its observed effect surface (the tool, scope, and amount it actually invokes), so an agent cannot self-classify its way to a lower bar. A self-reported transfer labelled routine on a high-consequence surface is forced to the highest tier and blocked. LAAS ships as two coupled layers — normative prose plus a machine-evaluable policy bundle sharing one source of truth — so the standard is executable by the very agents it governs, with a runnable reference implementation.

Tags:
AI AssuranceAgent GovernanceModel Risk ManagementRegulatory Synthesis
Paper Details
CategoryRegulation & Compliance
AudienceAI risk, model-risk, compliance, and assurance leaders, and AI-platform engineers, in banking, healthcare, and aviation.
MethodSynthesis of three mature assurance regimes — DO-178C, SR 26-2, and UL 4600 — applied to LLM-agent runtime actions.
Length~865 · 4 min
Sections5
DateJun 2026
AuthorsKellerAI
Read the full paper
Section 01

The Wrong Unit of Assurance

Almost every attempt to govern agentic AI begins by trying to certify the model. This is a category error. A frontier model is non-deterministic, so the same prompt need not yield the same output twice; it is revised on a cadence measured in weeks, so any certificate is stale on arrival; and it has no enumerable specification, so there is nothing finite to test it against. You cannot issue a meaningful assurance claim over an object with those three properties.

What can be assured is far smaller and far more concrete: the single action the agent is about to take in the world. An agent that drafts a thousand sentences and then calls one external transfer has done exactly one thing that matters for assurance — the transfer. Payments, deletions, deployments, and outbound communications are discrete, observable, and consequential in a way the model's internal reasoning is not. Assurance must attach there, to the act of committing an effect, because that is the only place where being wrong actually costs anything.

Section 02

Consequence Is Gate-Derived

The sharp move in LAAS is who computes the stakes. An agent cannot self-classify its way to a lower bar. The gate — a mechanism outside the agent's control — derives the consequence tier from the observed effect surface of the action: the actual tool it invokes, the scope it touches, and the amount or blast radius involved. The agent may propose; only the gate's tier stands. A self-reported tier lower than the gate's is not merely ignored — it is flagged as a signal that something is trying to grade its own homework.

Make it concrete. An agent issues a $250,000 transfer to an external counterparty. The effect surface reads irreversible (no programmatic undo), public in scope (money leaving for an outside party), and high in value. The gate computes the highest-consequence tier and requires the full battery of checks — independent verification, human approval, complete evidence — regardless of whether the agent labeled the action “routine.” The default direction is unforgiving by design: any attribute the gate cannot positively determine is treated as the worst case. An unknown reversibility is irreversible; an unknown scope is public; an unknown consequence is high.

Zero-trust on both the model's output and the apparatus that gates it.

The governing invariant
Section 03

Three Mature Regimes Already Solved the Hard Part

None of this is invented from nothing. Three regulated disciplines have spent decades assuring systems no one can fully specify, and each got most of the way there — while leaving the same gap open.

SR 26-2, the US interagency model-risk standard (which superseded SR 11-7 in 2026), subjects every model to credible independent challenge and ongoing back-testing — nothing grades its own work, and outcomes are measured against reality over time. But it governs the model and its lifecycle; it never reaches down to gate the single runtime action an agent is about to commit.

DO-178C, aviation software assurance, scales rigor to consequence with formal assurance levels and demands verification independent of the author — the exact shape LAAS needs. But it assumes a frozen, enumerable specification to verify against, which an open-ended agent does not have.

UL 4600, the autonomous-systems standard, runs a standard-of-care safety case bounded by an operating domain, tracking residual risk through performance indicators. But it monitors reactively — it watches for trouble rather than blocking the irreversible action before it commits. Each regime solved a piece; each left the autonomous agent's individual action ungoverned.

Section 04

A Standard Agents Can Execute

LAAS is zero-trust pushed inward. It distrusts the model's output, the agent's self-classification of its own action, a verifier's claimed independence absent evidence, and the integrity of the gate itself. Every obligation in the standard is an instance of that one principle: if a design would let the constrained party tier, grade, or gate itself, it is malformed.

The standard ships as two coupled layers that share a single source of truth: human-readable normative prose and a machine-evaluable policy bundle. They cannot drift apart, because they are generated from the same definitions, and every action that passes through the gate emits a tamper-evident decision trace recording the derived tier, the verifier, the verdict, and the evidence behind it. The test of the design is literal — a fresh agent handed only the bundle and a task must be able to derive the tier, run the required checks, and emit a verdict of pass, fail, or abstain with no human standing by to explain the rules.

Financial services is the launch vertical. SR 26-2 already pulls institutions toward independent verification and traceable evidence, and the irreversibility of a payment maps cleanly onto the tier lattice — an external transfer is exactly the kind of high-consequence, hard-to-reverse action the standard is built to gate before it commits.

Section 05

A Standard of Care, Not a Guarantee

LAAS does not promise correctness, and it should not pretend to. A shallow check, a missed action class, or a poorly assembled evaluation set can still let an error through. What the standard does is bound and evidence the residual risk: it caps how often an undetected error can reach a consequential action, scales that bound to blast radius, and leaves a trace an examiner can reconstruct. It is a standard of care in the sense the mature regimes mean it — defensible, auditable, and honest about what remains — not a certificate of perfection.

For the full synthesis — the CT0–CT4 lattice and its ungameable proof, the escape-rate conformance check, verifier independence and enforcement-plane integrity, and the adoption path — with every claim cited and a runnable reference implementation, read the companion technical white paper, The LLM-Agent Assurance Standard: In-Depth .

End of brief

↑ Back to top