The LLM-Agent Assurance Standard: Gate-Derived Tiering and Independent Pre-Commit Verification

KellerAI

Section 01

Abstract

Three mature assurance regimes already govern systems no one can fully specify. DO-178C certifies airborne software whose failure can kill; SR 26-2 governs the quantitative models a bank cannot prove correct; UL 4600 evaluates autonomous products operating in an open world. 9 11 12 Each was built to bound risk in a system whose behavior outruns its specification — which is precisely the situation an LLM agent presents. The LLM-Agent Assurance Standard (LAAS) is the synthesis of these three regimes, applied for the first time not to a model but to the runtime actions an agent commits. 1

LAAS rests on a single governing invariant: it applies Zero-Trust to the model's outputs and to the apparatus around them, never trusting the actor's self-classification, a verifier's soundness absent evidence, or the integrity of the enforcement plane. 1 2 Every action is assigned a Consequence Tier from CT0 to CT4 by an out-of-process gate that reads the action's observed effect surface — its actual reversibility, scope, and consequence — not the agent's say-so. 3 Rigor scales with the tier: a read-only action is merely traced; a hard-to-reverse external effect demands independent, qualified pre-commit verification with a backtested escape rate below tolerance; an irreversible high-consequence action additionally demands human approval and defaults to abstention. 2

The standard ships as two coupled layers that share one source of truth so prose and policy cannot diverge: a human-readable normative document and a machine-actionable bundle whose conformance predicate a fresh agent can evaluate with no human explanation. 1 4 A working reference implementation — the conformance bundle, the Rego policy, an ActionDescriptor emitter, and a Bucket-B backtest harness — proves the standard is machine-actionable rather than aspirational. 5 6 LAAS is a standard of care, not a correctness guarantee: conformance asserts that the right checks ran, by the right party, with evidence — never that no error can occur. 2

Section 02

The Model Is the Wrong Unit of Assurance

The instinct of every assurance tradition is to certify the artifact. Aviation certifies the binary, model-risk management validates the model, safety engineering qualifies the component. Applied to a large language model the instinct fails on three independent counts, each of which the design record names explicitly. 1

First, there is no enumerable specification to trace against. DO-178C's spine is requirements-based verification with bidirectional traceability and structural coverage — you trace every line of code to a requirement and measure that your tests exercise it. 9 A trillion-parameter model has no requirement set and no structural coverage notion; the artifact cannot be traced to a spec that does not exist. 1 Second, the model is non-deterministic: the same prompt can yield different outputs, so conformance cannot be asserted over the function's behavior the way it is asserted over a deterministic routine. 1 Third, the model changes continuously. A bank's scorecard is revalidated on a cadence; a frontier model can be swapped, fine-tuned, or re-prompted weekly, and each change invalidates whatever was certified before it. 1

The steelman for inaction is sharp, and the standard meets it head on. General-purpose agents are unspecifiable, so a DO-178C analogue is a category mistake — there is no fixed spec to trace to and no structural coverage of a trillion-parameter model, so any such standard is either vacuous theater or a straitjacket on the very generality that makes agents useful. 1

The model's unspecifiability is precisely the reason assurance moves to the harness. The agent may attempt anything; it may not commit a high-blast-radius action without an independent, qualified check enforced out-of-process.

The category mistake is in the objection

The answer is that the objection proves too much. SR 26-2 never specifies what a model computes — it challenges the model independently and backtests its outcomes. 11 UL 4600 never enumerates every road scene — it builds a safety case bounded by an operational design domain and abstains outside it; its companion SOTIF standard governs hazards that arise with no component failure at all, working as designed and still unsafe. 12 13 By the objection's own logic SR 26-2 and UL 4600 are also category mistakes, yet they bound risk in unspecifiable open-world systems every day. LAAS specifies the envelope, not the function: it bounds the rate at which an undetected error reaches a consequential action, scaled to blast radius, with independent verification and a reconstructable trace. 1 Generality is preserved — the model remains an untrusted component whose outputs are gated, not a thing to be certified. 2

Section 03

The Common Object Across Three Regimes

Strip DO-178C, SR 26-2, and UL 4600 of their vocabularies and the same machine runs underneath all three. It does five things. 1

It bounds undetected error — the quantity each regime is built to limit, whether expressed as structural coverage, backtesting, or safety performance indicators. 1 It scales rigor to consequence — DO-178C through Design Assurance Levels, SR 26-2 through materiality, UL 4600 through the depth of the risk-based safety case. 1 It requires independent verification so that nothing grades its own work — DO-178C's objectives "with independence," SR 26-2's effective challenge, UL 4600's independent assessment. 1 It traces every decision to evidence — a certification data package, a validation dossier, a goal-structured safety case. 1 And it abstains or escalates out of scope — the configured operational envelope, the approved-use boundary, the operational design domain. 1

The shared quantity is the escape rate: the rate at which a wrong result survives every applicable check and is acted upon. 1 Each mature standard bounds it under a different name. LAAS adopts the escape rate as its governing metric explicitly because it is the quantity all three already bound, and because it is honest about the open-world case where no exact oracle exists. 1 Where a higher-is-better figure is wanted, the standard reports integrity = 1 − escape_rate. 6 The contribution of LAAS is not a new mechanism; it is the recognition that these three regimes are one regime, and the application of that regime to the unit that actually matters for an agent — the action it is about to commit. 1

Section 04

Synthesis, Regime by Regime

Each parent regime contributes a borrowed mechanic and, equally important, exposes a gap that LAAS must close to apply that mechanic to a non-deterministic, continuously-changing agent. 1

DO-178C — borrow the rigor ladder and tool qualification; close the frozen-spec gap. DO-178C sets rigor by Design Assurance Level, derived from the worst-case effect of a failure: level A for catastrophic effects down to level E for none. 9 Higher levels demand more verification objectives, more of them carried out with independence, and escalating structural coverage culminating in Modified Condition/Decision Coverage (MC/DC) to bound code no requirement exercises. 9 Critically, DO-178C does not trust the verification tools blindly: its companion DO-330 requires that any tool whose output is relied upon be itself qualified. 10 LAAS borrows both the consequence-graded rigor ladder and tool qualification — its verifier-qualification obligation is the DO-330 analogue, requiring documented coverage, a negative-test suite, and a change-controlled verifier version recorded in the trace. 2 3 The gap: DO-178C assumes a frozen specification and a one-time certification data package, which cannot survive weekly model change; LAAS therefore moves conformance from pre-deployment certification to per-action runtime gating. 1

SR 26-2 — borrow effective challenge, backtesting, materiality, and vendor coverage; close the no-runtime-gate gap. SR 26-2 — the interagency US model-risk standard that in April 2026 superseded SR 11-7 after fifteen years as the canon — is built on effective challenge — credible, independent review with the authority to change the model — implemented through a validation lifecycle of conceptual soundness, ongoing monitoring, and outcomes analysis by backtesting, with risk-tiering by materiality and explicit vendor coverage that makes the deploying institution own the risk of models it buys; all five concepts are carried forward from SR 11-7 under its Develop / Validate / Govern pillars. 11 LAAS borrows all four: independent verification at tier, the backtested escape rate as the Bucket-B measurement, materiality as one of the three tiering axes, and a vendor obligation under which a vendor model's error counts against the deploying operator's escape-rate budget. 2 The gap: SR 26-2 governs a model lifecycle and is blind between validation cycles; it has no mechanism that stops a wrong action at the moment of commit, which is exactly what an autonomous agent requires. 1 SR 26-2 in fact widens the gap deliberately: it places generative and agentic AI expressly outside its formal model-risk scope as "novel and rapidly evolving," deferring them to broader risk management — so the supervisor itself leaves the autonomous-agent-action gap open, which is precisely the gap LAAS fills. 11

UL 4600 — borrow the safety case, the operating envelope, and SOTIF's open-world honesty; close the reactive gap. UL 4600 is a standard of care rather than a pass/fail test: a safety case, often in Goal Structuring Notation, bounded by an Operational Design Domain, with residual risk tracked through Safety Performance Indicators. 12 Its companion ISO 21448 (SOTIF) governs the failure mode that has no failure — a system that works as designed and is still unsafe. 13 LAAS borrows the standard-of-care posture, the operating envelope as its Authorized Operating Envelope, and the SOTIF insight as its open-world Bucket B. 1 2 The gap: UL 4600's Safety Performance Indicators are reactive monitoring — they tell you after the fact that performance is drifting; they do not gate the individual irreversible action before it commits, which at CT4 is the whole point. 1

Section 05

What Current AI Governance Leaves Open

The synthesis would be redundant if existing AI governance already gated runtime actions. It does not. The current landscape sorts into three groups, none of which closes the gap. 1

Horizontal governance governs the organization and the model maker, not the runtime action. ISO/IEC 42001:2023 is an AI management system with a control catalogue and a Statement of Applicability, third-party certifiable — but it is organizational lifecycle governance with no per-action runtime conformance. 15 The NIST AI Risk Management Framework and its Generative AI Profile supply functions, characteristics, and a risk catalogue, but no formal assessment process. 16 The EU AI Act tiers obligations by use-case category and imposes general-purpose-model obligations on providers, but an agent builder is typically a deployer, and conformity assessment is system-level and periodic rather than per-action. 17

Aerospace AI assurance is rigorous but scoped to the wrong species of model. ARP6983 / ED-324 is explicitly limited to frozen, supervised machine learning in embedded avionics and excludes generative and online-learning systems; EASA's learning-assurance guidance and the FAA/EUROCAE overarching properties are conceptual ancestors of LAAS's "verify the envelope, not the function," but neither is an LLM-agent conformance standard. 18 1

The agent-specific cluster is the closest neighbor and still does not reach the gap. The Five Eyes guidance Careful Adoption of Agentic AI Services — authored by ASD, CISA, NSA, CCCS, NCSC, and GCHQ — treats agents as untrusted components, foregrounds prompt injection, and requires verified cryptographic agent identity, short-lived credentials, and human sign-off for high-impact actions where the system designer, not the agent, decides which actions require it. 14 That last principle is directly aligned with LAAS's gate-derived tiering, but the Five Eyes document is operationally authoritative guidance, not a machine-checkable conformance standard. 14 The Cloud Security Alliance's MAESTRO threat model, the OWASP Agentic Top 10, and the Coalition for Secure AI supply threat taxonomy, vulnerability scoring, and in-progress control language — security-scoped and mid-flight. 18 Across the whole landscape, none is a published, two-layer, machine-checkable assurance standard that derives obligations from per-action blast radius, mandates independent pre-commit verification of irreversible actions, requires a bounded backtested escape rate on the open-world class, and emits a reconstructable decision trace as the conformance artifact. 1

Section 06

The CT0–CT4 Lattice and the Ungameable Property

Every action is assigned a Consequence Tier by the gate, computed from the action's observed effect surface — never from the agent's self-report. 2 The derivation is a small lattice over three axes: ct = max(reversibility_rank, scope_rank, consequence_rank) for any action with an external effect, ct = 0 for a read-only or sandboxed action, and ct = 4 whenever any axis is undetermined. 2 3 That last clause is the default-to-highest rule, and it is load-bearing: an unknown reversibility is treated as irreversible, an unknown scope as public, an unknown consequence as high. 1 The ranks live in the signed bundle's tier lattice — reversibility maps reversible to 1, hard to 3, irreversible and none to 4; scope maps single to 1 through public to 4; consequence maps none to 0, low to 1, material to 3, high to 4. 3 The CT numbers rise with consequence, the inverse of DO-178C's DAL letters, a direction noted deliberately to prevent cross-mapping errors. 2

The regime each tier requires is a minimum, not a ceiling. CT0 is trace only. CT1, a reversible internal write, requires a self-check with an exact verifier where one exists, plus a trace. CT2, a reversible or low-value external effect, requires an independent automated check or a rehearsed rollback, a bounded residual, and a trace. CT3, a hard-to-reverse or material external effect, requires mandatory independent, qualified pre-commit verification with no self-grading, a backtested escape rate at or below tolerance, a rollback plan, and a trace. CT4, an irreversible or high-consequence action, requires everything CT3 requires plus human approval, an abstention default, and full evidence. 2

The ungameable property is the heart of the design, and it is proved by a worked example rather than asserted. 5 An agent calls payments.transfer(amount=250000, dest=external). The observed surface is irreversible, public, and high-consequence, so the lattice computes max(4,4,4) = CT4. 1 Now suppose the agent self-reports CT1 and the action is not blocked and carries no human approval. The emitter ignores the self-report entirely, assigns gate.assigned_ct = 4, and the Rego policy returns compliant: false — firing the human-approval obligation as an error and the self-report obligation as a warning. 5 The self-report is carried in the trace only so the policy can raise that warning; it can never lower the gate-derived tier. 5 The structuring countermeasure is the same property in the time dimension: the gate maintains a windowed aggregate per principal, session, and effect class, and when the cumulative effect crosses a threshold the subsequent actions are re-tiered to the aggregate's tier — so a sequence of sub-threshold transfers cannot launder a high-tier effect, and aggregation can only ever raise the tier, never lower it. 2 5

The agent proposes; the gate classifies. A self-reported tier below the gate's is not a discount — it is a logged warning on a blocked action.

The constrained party never grades itself

Section 07

Two Coupled Layers: Prose and Policy

A specification an agent cannot mechanically check is dead on arrival. 1 LAAS answers this by shipping two coupled layers that share one source of truth — the conformance bundle data.json — so that enforcement and prose cannot diverge; where the normative document and the bundle disagree, that is a bug to be filed, and the prose cannot block a bad action, only the policy can. 2 3 The machine layer reuses three primitives already proven in production engineering: policy-as-code conformance gates, pre-commit stop-hooks, and decision-trace logs. 1

The coupling is concrete. The flagship obligation, independent pre-commit verification for hard or irreversible external effects, exists in a normative form — an action whose gate-derived tier is CT3 or CT4 MUST pass an independent, qualified pre-commit verifier before commit, with human approval additionally required at CT4, and on a fail, abstain, or indeterminate verdict the action MUST be blocked and escalated — and in a machine-actionable form whose conformance_predicate references only fields declared in the emitted decision trace. 1 The predicate reads, in plain terms: if an obligation's trigger matched, then either the action passed an independent, qualified verifier — plus human approval at CT4 and residual at or below tolerance — or the action was blocked and escalated, and nothing else conforms. 2 Because every identifier in the predicate appears in the trace schema, the predicate is mechanically evaluable; this was not true of the v1.0 draft, whose predicate referenced fields that did not exist, an embarrassing-if-unfixed defect the blunder hunt caught and v1.1 closed. 8 1

The conformance test for the artifact itself is literal. A fresh agent, given only data.json and a candidate decision record, can derive the tier, locate the applicable obligations, evaluate the predicate, and emit one of pass, fail, abstain, or indeterminate with no human explanation. 2 The honest self-assessment is that machine-actionability holds fully for the deterministic path today — the predicate is evaluable and the tier function is computable — while Bucket-B conformance additionally depends on a backtest the operator must supply, which is verifiable but provided per deployment rather than by the spec. 1

Section 08

Bucket A, Bucket B, and the Escape Rate

Every claim or action sorts into one of two buckets, and the bucket determines how its escape rate is controlled. 1 Bucket A is the deterministically checkable class: an exact oracle exists — a compile, a schema validation, a ledger that nets to zero, an allowlist, a hash match, an invertible round-trip. 2 Here the escape rate is driven toward zero, bounded only by the verifier's soundness — and that soundness is itself an obligation, not an assumption, which is why verifier qualification is mandatory. 1 No probabilistic tolerance is permitted where an exact verifier exists, and whether one exists is decided by the gate's verifier registry, not by the actor. 2 Deterministic verifiers return pass, fail, or indeterminate — never abstain, which is a confidence notion that belongs to the open world. 1

Bucket B is the open-world class where no exact oracle exists. 2 It is controlled in three steps: the operator bounds a maximum acceptable escape rate for the claim class at the relevant tier; measures it by backtesting on a held-out, representative, adversarially-stressed set with a stated confidence interval, re-measuring on any model, prompt, tool, or policy change; and controls it by routing above tolerance to an independent verifier and abstaining below confidence or out of envelope. 2 6

The conformance rule is exact and deliberately conservative: an action passes if and only if the upper bound of the confidence interval is at or below tolerance — never the point estimate, because validating the point estimate would ignore sampling error and pass underpowered sets. 6 The tolerances live in the bundle: 0.02 at CT2, 0.005 at CT3, and 0 at CT4. 3 The CT4 tolerance of zero is the design's sharpest edge: a binomial upper bound over any finite sample is strictly greater than zero, so backtesting can never pass a zero tolerance. The harness therefore returns indeterminate with the disposition that the action requires a deterministic or human gate — correct on both ends, because CT4 is exactly the human-gated tier and because if anyone wired such an artifact into a trace as a pass, the residual obligation would still fire. 6 The harness also enforces sample-size floors derived from the rule-of-three: a tolerance is only demonstrable if the achievable upper bound at zero observed escapes can fall at or below it, which at 95% confidence requires 149 samples at CT2 and 598 at CT3; below the floor the result is indeterminate for insufficient sample size, never a pass. 6

Section 09

Verifier Independence and Error Correlation

The word "independent" carries the entire weight of effective challenge, and the design refuses to let it mean merely "a different model." 2 A verifier is independent of the actor only under one of three conditions, chosen by tier. A deterministic or exact checker — a different kind of mechanism altogether — is independent at any tier for the Bucket-A class. 2 A model verifier is independent only if it is of a distinct lineage and shows a measured error-correlation at or below the bundle's threshold, and it is permitted no higher than CT3. 2 3 A human verifier is required in addition at CT4. 2 A verifier sharing the actor's model lineage is presumed non-independent, and where correlation cannot be bounded, any action at CT3 or above falls back to a deterministic or human check. 2

The failure mode this defends against is common-mode failure: two different model lineages routinely share training data, biases, and prompt-injection susceptibility, so they fail together, and a second model that is wrong in the same way as the first is not a check at all. 8 This was a Critical finding of the blunder hunt — the v1.0 draft accepted lineage difference as independence and would have admitted a correlated verifier straight past the irreversibility gate. 8

The threshold and its measurement are specified, not gestured at. The bundle sets max_error_correlation = 0.2. 3 The protocol measures it on a frozen, versioned held-out adversarial eval set of at least 1,000 labeled hard cases, running the actor model and the candidate verifier independently over the set and computing the phi coefficient on the joint-error indicator — the two-by-two contingency of actor-wrong against verifier-wrong. 7 Phi is chosen over Cohen's kappa deliberately, because the quantity of interest is correlated failure, not chance-corrected agreement, and phi on the joint-error table is exactly the linear correlation of the two error indicators. 7 Admission requires the upper bound of the 95% confidence interval to be at or below 0.2, not the point estimate; a verifier whose interval crosses 0.2 is not admitted, and correlation above threshold forces the CT3-and-above fallback to deterministic or human verification. 7

Section 10

Enforcement-Plane Integrity

Mechanical enforcement holds only if the mechanism sits outside the actor's reach — otherwise every guarantee assumes the very agent LAAS constrains will not disable its own gate. 2 This was the second-order Critical of the blunder hunt: the agents LAAS governs commonly run with elevated permissions, including skip-permissions and passwordless modes, so the constrained agent could in principle disable the hook, edit the bundle, or forge traces. 8 Instructions do not prevent execution; an out-of-process mechanical gate does. 1

LAAS therefore requires three integrity properties. The policy bundle must be signed and version-pinned. The gate must run out-of-process — outside the actor's control even when the actor runs with elevated permissions. And the trace must be written to an append-only sink the actor cannot rewrite. 2 The consequence is categorical: a trace asserting an in-process or unsigned gate is non-conformant regardless of its verdict, because a verdict from a gate the actor could have tampered with is no verdict at all. 1 The bundle exposes both controls — it requires the signed bundle and the out-of-process gate as standing configuration the trace must attest. 3

Trace integrity composes with multi-agent concurrency rather than fighting it. A single linear hash-chain would need one total-order appender that concurrent agents fork; so each actor appends to its own hash-chain, and those chains are periodically anchored into a shared Merkle root written to the append-only sink. 2 Bucket-A verdicts are then deterministically reproducible against the recorded input hash, while Bucket-B verdicts are tamper-evidently re-inspectable — you re-read the recorded verdict and its evidence even where you cannot re-derive a non-deterministic model verdict. 1 The evidence itself lives in an append-only, content-addressed store with retention by tier, with payloads minimized and PII or material non-public information tokenized, and with the searchable index separated from sensitive payloads so that audit access does not require exposing regulated data. 1

Section 11

Cross-Cutting Obligations

Beneath the tier-specific controls sit obligations that apply across the whole standard, each an instance of pushing Zero-Trust inward. 1 The bundle carries twelve active obligations with stable identifiers, severities, tier floors, and a deterministic precedence ordering. 3

The no-self-grading invariant is first among them: the gate derives the tier and selects the verifier, the actor only proposes, and any path where the actor sets its own tier, declares that no exact verifier is available, or verifies its own output is non-conformant. 2 Vendor and supply-chain attribution requires that when an agent uses a vendor model or calls a third-party tool, the trace carries provenance and scope limits, the residual error is attributed to the deploying operator's escape-rate budget, and untrusted dependencies fail closed. 2 Untrusted-input tier-raising is the control that protects tier-derivation itself from prompt injection: an action driven by untrusted input — web content, inbound email, third-party data — is gated at CT3 or above or blocked, and the trace records whether the input was trusted. 2 3

Two governance obligations keep the standard from rotting. The evidence store and trace data-governance obligations are described above. The obligation lifecycle moves every obligation through propose, active, deprecated, and retired states so the bundle is versioned rather than merely accretive, and when two obligations fire with incompatible outcomes the precedence is deterministic — the highest tier wins, ties resolve to the most restrictive outcome, which is to block, and the resolution order is itself written to the trace. 2 1 Each of these closes a specific blunder-hunt finding: unguarded self-classification, missing vendor coverage, the absent input-side control, and the unbounded accretive bundle. 8

Section 12

The Reference Implementation

The decisive difference between a standard and a position paper is whether a machine can run it. LAAS ships a working reference implementation that proves it is machine-actionable rather than prose pretending to be schema. 4 5

The conformance bundle data.json and the enforcing policy laas.rego are the standard's executable core: the bundle is the single source of truth for the tier lattice, tolerances, correlation threshold, and obligation set, and the Rego policy under package kellerai.laas.actions encodes the conformance predicate that Open Policy Agent evaluates against a decision record. 3 4 The ActionDescriptor emitter is the gate's input-construction stage — it consumes the gate-observed effect surface, performs the CT derivation, raises the tier to the aggregation window, and assembles the decision record, treating the agent's proposal as an input to be classified rather than a classification to be trusted. 5 Its runnable proof emits the payments.transfer worked example and pipes it through opa eval, returning a compliant CT4 result via the block path, and three additional cases prove the ungameable property: the self-reported CT1 on a CT4 surface is forced to CT4 and rejected, a read-only action resolves to CT0, and an undetermined surface defaults to CT4. 5

The Bucket-B backtest harness is the genuinely new component with no prior analogue. It computes the binomial upper confidence bound by the Wilson method, with Clopper–Pearson available for conservative audit, enforces the per-tier sample-size floors, looks the tolerance up by the same string-key convention the policy uses, and emits a content-addressed evidence artifact that goes verbatim into the trace's evidence references. 6 The end-to-end demonstration is verified: feeding a CT3 artifact's bound of 0.024987 into a synthetic CT3 trace fires the residual obligation against the real policy with a message that the escape rate exceeds the CT3 tolerance of 0.005, while a CT2 bound of 0.014286 below its 0.02 tolerance produces zero violations. 6 These are not illustrations; they are the proofs that the standard is mechanically actionable. 5 6

Section 13

Adoption Path

A machine-actionable standard with no host body has one real weakness — a de-facto standard carries no authority until it graduates. 7 The adoption strategy resolves this by launching de-facto in financial services and architecting the core to graduate into a formal standards body, with an attestation artifact that manufactures authority in the interim. 7

Financial services is the launch vertical, not aerospace, for three structural reasons and one strategic judgment. SR 26-2 model-risk management already creates a standing obligation to validate, monitor, and govern models, of which LAAS's escape-rate-bounded, independently-verified, traced regime is the natural runtime extension; the CT lattice maps cleanly onto payment irreversibility, sending an irreversible external transfer to CT4 directly; and the standing pre-authorized envelope construct is exactly how treasury and payment-operations limits are already expressed. 7 Aerospace is the cleaner structural transfer, but structural cleanliness does not move a de-facto standard — adoption does, and financial services offers the regulatory pull and procurement cadence. 7

The host-body path is de-facto-architected-to-graduate: publish a timestamped spec and reference implementation, lead in financial services, recruit the Cloud Security Alliance, CoSAI, and OWASP as co-sponsors to align trace and control vocabulary, then graduate the stabilized core into IEEE SA and onward to ISO/IEC JTC1 SC42 with a NIST profile. 7 Four decisions make this executable. A verifier registry — append-only and semver-pinned, each entry carrying a CT-eligibility ceiling, a qualification dossier reference, and an independence basis — turns the irreversibility obligation from unenforceable into auditable. 7 The error-correlation protocol closes the named-but-unspecified independence gap. The conformance attestation artifact is a signed, stratified, Merkle-anchored sampled-trace bundle — oversampling the obligation-bearing CT3 and CT4 tiers and including every blocked outcome — plus an independent-auditor attestation that binds to the bundle's Merkle root, the artifact a bank examiner reviews today and the one that graduates cleanly into a third-party certificate when a standards body adopts the core. 7 Co-sponsor outreach uses the Five Eyes guidance as third-party authority for security-posture alignment, mapping each body to one specific ask and one alignment hook. 7 14 The path runs in three phases: de-facto publish, early co-sponsors with at least two named production deployers, and SDO submission with the schema and predicate frozen. 7

Section 14

Limitations and Conceded Residue

LAAS is a standard of care, not a guarantee, and the design names its residual risks rather than burying them. 2 A bad eval set, a missed action class in the Authorized Operating Envelope, or a verifier whose qualification is shallow can still let errors through. 1 Each has a stated mitigation: the eval set is independently audited, the envelope's edge defaults to abstention, verifiers are qualified against a negative-test suite, and the bundle is lifecycle-managed. 1 The backtest harness concedes its own boundary conditions explicitly — the binomial model assumes independent trials, so a prompt-injection family causing a burst of correlated escapes would make the interval optimistic, and the ground-truth labels must be produced independently of the verifier under test or the backtest measures agreement rather than escape. 6 The highest-uncertainty open item is the construction of the error-correlation eval set itself, which is genuinely new, gates the admission of every model-based verifier, and is the first thing the adoption plan resources. 7

The contribution is the synthesis. DO-178C, SR 26-2, and UL 4600 each solved a version of the same problem — bounding risk in a system that cannot be fully specified — and each is the right teacher for a different facet of the agent problem: the deterministic bucket inherits DO-178C's rigor ladder and tool qualification, the open-world bucket inherits SR 26-2's effective challenge and UL 4600's safety-case honesty, and the gate inherits all three regimes' refusal to let a system grade its own work. 1 What LAAS adds is the recognition that the unit of assurance for an autonomous agent is not the model but the action, and the demonstration — in a runnable bundle, policy, emitter, and harness — that a standard built on that recognition can be mechanically enforced rather than merely written down. 5 6 The agent may attempt anything; what it commits is gated, qualified, traced, and, at the irreversible edge, held until a human says yes. 2

Companion brief: The LLM-Agent Assurance Standard.

End of paper↑ Back to top

The LLM-Agent Assurance Standard

Context

The Finding