The Fallback Is the Feature: Minimal Risk Conditions and the UL 4600 Safety Case for Autonomous Systems

KellerAI

Section 01

The Maneuver That Licenses the Autonomy

A driving automation system is permitted to operate unsupervised only because, at every instant, it retains a pre-computed maneuver that brings the vehicle to a stable, stopped, low-risk state. SAE J3016 names that target the minimal risk condition — the state to which an automated driving system brings the vehicle after performing the dynamic-driving-task fallback, in order to reduce the risk of a crash when a trip cannot or should not be completed; the 2021 revision of the standard sharpened it to a stable, stopped condition. 1 2 The maneuver that drives the vehicle there — the minimal-risk maneuver, in industry usage — is, by definition, the response invoked after a performance-relevant failure or upon exit from the operational design domain. It is outside the dynamic driving task because it exists precisely for the moment the dynamic driving task can no longer be carried out. The vocabulary is not Cruise's or Waymo's alone: NHTSA adopted the J3016 minimal-risk-condition framing in its automated-driving-system guidance, and the same term appears in federal AV-safety material such as the NIST treatment of automated driving system safety. 16 When a regulator writes a rule against the minimal risk condition, it is writing the rule against the fallback — which is the clearest signal that the fallback, not the headline capability, is the thing the license is actually about.

This is the same inversion the aviation sibling of this paper draws from ETOPS: range is not an input to the calculation but an output of it, derived from where a reachable, adequate safe harbour sits in the worst-case envelope. The minimal risk condition is the road's version of the diversion airport. The legal corridor a twin may fly is constructed inward from the runways it can still reach on one engine; the operational design domain an automated vehicle may operate inside is, in the same way, the region where the minimal-risk maneuver remains reachable from the degraded state the system would actually be in. The capability to drive the route is not the question. The question is whether the fallback that catches a failed route is still in the envelope at every step.

Read this way, the pull-over is not the system conceding defeat. It is the system exercising the very mechanism that earns it the right to drive at all. An autonomous vehicle that could not, at any instant, bring itself to a stable stop would not be a more ambitious autonomous vehicle — it would be an unlicensable one, because the envelope it could be granted would collapse to zero. The minimal risk condition is load-bearing in the literal sense: remove it and the entire structure of permission that lets the vehicle leave the depot comes down with it. The fallback is what holds the autonomy up.

The minimal-risk maneuver is not the moment autonomy fails. It is the precondition that licenses autonomy at all — the operating envelope is a derived quantity that falls out of where the fallback stays reachable, never an absolute capability claim.

The inversion

This paper is the AV-framed sibling of Always a Runway, which proved the rule from aviation — the diversion airport, the point of safe return, the forecast-versus-observation gap, and conformal abstention. 26 27 Where that paper read the doctrine off the runway, this one reads it off the road, and adds the material the road contributes that the runway did not: a structured safety case, written down as an argument, continuously updated from the field record of every fallback the fleet has ever fired. The minimal risk condition is the diversion airport; the structured safety case is the new instrument this sibling brings to the stack.

Section 02

MRC vs. Reachable: Two Words That Carry the Doctrine

Two terms do most of the work, and conflating them is the most common way a fallback that looks sound turns out not to be. The minimal risk condition is the end-state: a stable, stopped, low-risk configuration the vehicle is brought to. 1 2 The minimal-risk maneuver is the dynamic-driving-task fallback that drives the vehicle there — the steering, braking, and lane choice that get the vehicle from the failure to the stop. A system can possess a well-defined minimal risk condition on paper and still fail catastrophically if the maneuver that reaches it is wrong for the state the vehicle is actually in.

The decisive caveat is one Koopman makes explicit in the J3016 user guide: the standard permits flexibility in how the minimal risk condition is achieved — for instance, "stop within its current travel path" — but it explicitly does not guarantee actual safety. 2 A minimal risk condition is a defined target state, not a proof that the state reached is safe. "Come to a stop" is a correct maneuver in a world with no one under the vehicle and a lethal one in a world where there is. The standard names the target; it does not, and cannot, certify that the target is benign in every arrival state.

This is exactly the adequate-versus-reachable distinction the aviation sibling draws, transposed to the road. 26 A fallback that merely exists is not the same as a fallback adequate to the state the vehicle will actually be in. Adequacy is a property of the arrival state: the minimal risk condition must be safe for the world the vehicle finds itself in after the failure — the post-collision world, the degraded-sensing world, the world with a vulnerable road user in an unexpected position — not for the clean world the planner assumed. Reachability is a property of the envelope: the maneuver must complete inside the worst-case budget of latency, sensing degradation, and time-to-stop. A minimal-risk maneuver that needs more perception confidence, more time, or more healthy actuators than the vehicle will have at the moment of failure is an airport outside the single-engine envelope — present on the map, foreclosed in fact.

There is no canonical published number for time-to-minimal-risk-condition, and none should be invented. The honest framing is conceptual: time-to-MRC is the reachability budget — the worst-case time and conditions under which the maneuver can still reach a stable stop, the road analogue of the one-engine-inoperative diversion-time circle. The discipline is the same as the dispatcher's: size the fallback to the degraded arrival, verify it against the world the system will be in, and never let a maneuver adequate for the assumed state stand in for one adequate to the observed state. The next two sections show what each half of that discipline looks like when it is done as an argument, and what it costs when it is not.

It is worth dwelling on why both properties are required and why neither substitutes for the other, because the substitution is the subtle error. A maneuver that is reachable but inadequate completes on time into an unsafe state — the vehicle reaches a stable stop, but the stop is on top of a person. A maneuver that is adequate but unreachable would be safe if it could finish, but the worst-case envelope forecloses it — the perception confidence needed to confirm the path is clear is exactly the confidence the degraded sensor stack has lost. The discipline is to verify the conjunction at the moment of action, not the design-time abstraction of it. A fallback set audited only against the happy path, against the clean world the planner assumed, is the road's version of evaluating diversion airports at dispatch-time conditions rather than at the conditions forecast for the moment of arrival — the precise gap that turns a fallback that looks robust on a chart into one that cannot catch the failure when the failure comes.

Section 03

Safety Is an Argument, Not a Checklist: UL 4600

UL 4600 — the Standard for Safety for the Evaluation of Autonomous Products, first published 1 April 2020 with a second edition in March 2022, led by Prof. Philip Koopman — does not ask whether a fixed checklist was completed. 3 4 It asks whether the developer can present a structured safety case: a set of specific goals (claims), each supported by an evidence-based argument that the system meets that goal, with the evidence attached. The standard is deliberately goal-based rather than prescriptive — it specifies what a safety case must address rather than dictating a particular engineering approach — and the case is typically expressed in Goal Structuring Notation, the claims-arguments-evidence triad familiar from aviation and rail assurance. 5

The distinction from a checklist is not cosmetic. A checklist is a static list of conditions; tick them and you are done. A safety case is an argument that can be attacked. Each claim has a burden of proof; the argument can be challenged for a gap; the evidence can be shown to be stale or unrepresentative. UL 4600 is, in the vocabulary of the corpus assurance paper, a standard of care, not a pass/fail correctness test: conformance asserts that the right argument was made, by the right means, with evidence — never that no harm can occur. 5 The same posture governs the LLM-Agent Assurance Standard, which borrows UL 4600's safety case and operating-envelope honesty directly and applies it to the actions an agent commits at runtime rather than to the model.

The property that makes the safety case the right instrument for an open-world system is that it is built to be maintained. UL 4600 tracks residual risk through Safety Performance Indicators and is designed for the safety case to be updated as field experience and operational data accumulate — a lifecycle discipline, not a one-time certificate. 5 An autonomous vehicle operates in a world it cannot fully enumerate; a safety case frozen at launch is an argument about a world that no longer exists by the second week. The standard's answer is to make the case a living document whose claims are continuously re-evidenced against what the fleet actually encountered. The right to keep operating is therefore a standing claim about a measured record, not a framed certificate on the wall.

The mechanics of an argument-shaped standard reward a moment of attention, because they are what make the difference operational rather than rhetorical. A claim — "the vehicle reaches a safe state after a perception failure" — is decomposed into sub-claims, each carrying its own argument and its own evidence: that the failure is detected, that the maneuver is selected correctly for the detected state, that the maneuver completes inside the reachability budget, that the state reached is genuinely low-risk for the arrival world. A reviewer attacks the weakest link: is the failure always detected, or only when the sensor degrades in the modeled way? Is the maneuver correct for a post-collision world, or only for a clean exit from the operational design domain? Each unsupported link is a defect in the argument that a checklist would never surface, because a checklist has no notion of an argument that can be incomplete — only of a box that is or is not ticked. The Safety Performance Indicators are the instrument that keeps the evidence honest over time: they are the leading and lagging measures the developer commits to monitor, so that a claim whose supporting evidence has drifted is flagged before the drift becomes a crash rather than after. 5

Treat safety as an argument that can be attacked, not a checklist that was ticked. A claim with stale evidence is a claim that has quietly become false — and the only way to know is to keep re-evidencing the argument against the field record.

The rule to learn

Two things about UL 4600 are deliberately left to the rest of this stack. Its companion hazard — the failure mode that arises with no component failure, a system working as designed and still unsafe — is governed by SOTIF (ISO 21448) and is owned by the sibling article on the species of failure where the hardware is correct and the answer is wrong. And the operating envelope itself — the operational design domain as the region of permitted operation — is owned by the first article in this series, on autonomy as an envelope. This paper takes the safety case as its subject and treats those two as the boundary it is bounded by and the hazard class it inherits, not as material to re-derive.

Section 04

When the Fallback Fires and Is Still Wrong: The Cruise Mishap

The cautionary anchor is precise because the fallback was not missing. It fired. On the evening of 2 October 2023, at Market and Fifth Streets in San Francisco, a human-driven vehicle in an adjacent lane struck a pedestrian and propelled her into the immediate path of a driverless Cruise robotaxi. The Cruise vehicle biased rightward and braked hard but still struck her. 6 9 13 To that point the automated vehicle had done roughly what a careful driver could; the initial collision was set up by a human driver in another lane. What happened next is the lesson.

After the impact, the Cruise automated driving system, per the NHTSA Office of Defects Investigation resume for Preliminary Evaluation PE23-018, inaccurately characterized the collision as a lateral collision and commanded the vehicle to attempt to pull over out of traffic — a pull-over minimal-risk maneuver — "pulling the individual forward, rather than remaining stationary." 9 The vehicle dragged the pedestrian approximately 20 feet and came to rest with a rear wheel on her legs. 6 9 12 The minimal risk condition the system was driving toward — pull out of the active lane, come to a stable stop — was a textbook, defensible target state. It was correct for the world the system believed it was in: a lateral collision, the path ahead clear. It was catastrophic for the world it was actually in: a person pinned beneath the vehicle.

Koopman's analysis names the mechanism in terms worth quoting for their precision. The programming "caused the robotaxi to lose tracking of and then in essence forget a pedestrian who was hit by an adjacent vehicle, and forget that the robotaxi had just run over a presumed pedestrian when beginning a subsequent repositioning maneuver. The computer driver was unable to detect the pedestrian being dragged even though her legs were partially in view of a robotaxi camera." 6 The fallback did not fail to fire. It fired on an internal model of the world that had silently diverged from the world the sensors could, in part, still see. A fallback that fires on a wrong world-model is worse than no fallback, because it converts a stationary vehicle — itself a minimal risk condition — into a moving hazard.

The regulatory consequences followed quickly and are worth keeping distinct. On 24 October 2023 the California DMV suspended Cruise's autonomous-vehicle deployment and driverless-testing permits, effective immediately, citing 13 CCR §§228.20(b)(6), 228.20(b)(3), 227.42(b)(5), and 227.42(c) — vehicles not safe for public operation, misrepresentation of safety information, and unreasonable risk to the public — and stating that Cruise had failed to disclose the vehicle's attempt to pull over after striking the pedestrian. 11 12 On 7 November 2023 Cruise filed NHTSA Recall No. 23E-086, covering 950 automated-driving-system units for a defect in post-collision behavior. 7 10 Two separate penalties followed: a $1.5 million NHTSA civil penalty under a consent order for the reporting failure, 8 and a distinct $500,000 settlement for the false report that omitted the dragging. 14 A third-party investigation commissioned from Quinn Emanuel Urquhart & Sullivan concluded that Cruise's leadership had failed to disclose the dragging to regulators, a finding that preceded a wave of executive departures. 15 The maneuver fired; the safety case was argued against the wrong world; and the organization compounded the engineering failure with a disclosure failure that the regulators treated as separately culpable.

Two readings of the incident must be held apart, because they belong to different articles in this stack and conflating them obscures the lesson. Seen from one side, the dragging is a SOTIF failure — a safety-of-the-intended-functionality hazard in which no component broke. The sensors worked; the software executed; the vehicle did precisely what its logic commanded. The hazard arose because the intended function — classify the collision, select a minimal-risk maneuver, execute it — was unsafe in a scenario its designers had not anticipated. That analysis belongs to the sibling article on the case where the hardware is correct and the answer is wrong, and this paper hands it there rather than re-deriving it. Seen from the other side — this paper's side — the dragging is the SOTIF failure mode viewed from the fallback: not a broken sensor but a correctly-executing maneuver argued against the wrong world-model. The same event reads as a perception-intent hazard to one article and as a fallback-adequacy failure to this one, and both readings are true. The contribution of the minimal-risk-condition framing is to make visible that even a flawless perception stack would not have saved the maneuver if the maneuver itself was permitted to fire on an unverified model past the point where the system could still check.

The investigation's arc is itself instructive about how a living safety record is supposed to behave. NHTSA's Office of Defects Investigation opened PE23-018 into Cruise vehicles "not exercising appropriate caution around pedestrians," and the resume captured the definitive post-collision mechanic — the lateral-collision mis-classification, the pull-over, the drag. 9 The office later closed PE23-018 following the recall and data showing reduced hard-braking incidents, while continuing a separate line of inquiry into pedestrian encroachment. 9 That is the regulator behaving as a safety case is meant to behave: the claim that the post-collision behavior was defective was raised, evidenced, acted on through the 950-unit recall, 7 10 and the residual concern was kept open rather than declared closed by proximity. The failure was never that a fallback was absent; it was that the fallback was licensed to fire on a world-model the system had no warrant to trust, and that the organization then withheld the very evidence the safety case needed to be corrected.

Section 05

The Assumed-vs-Observed Gap as the Real Failure Mode

Generalize the Cruise mishap and a clean failure mode emerges, distinct from any single coding defect. The minimal-risk maneuver fired on an internal model of the world-state — no obstruction, a lateral collision, the path clear — that diverged from the observed world — a pedestrian pinned beneath the vehicle, her legs partially in camera view. 6 The maneuver was correct for the assumed state and lethal for the real one. The error was not that a fallback was absent; it was that the fallback was executed against a world-model the system had no warrant to trust, and the divergence between that model and the observation was never treated as a reason to stop and reconsider.

This is the road's face of the forecast-versus-observation gap that the aviation sibling identifies as the load-bearing error. 26 There, an aircraft committed past its point of safe return on a forecast of conditions that the observation later contradicted, with the designed fallback already foreclosed. Here, a vehicle committed an irreversible repositioning maneuver on an assumed world-state that the available observation contradicted. The shared signature is a system acting on a belief about its situation that the world had quietly invalidated, past the point where it still checked. The danger is never merely that the model was wrong; models are sometimes wrong. The danger is that the system held a belief about its safe harbour — "the path ahead is clear; pulling forward reaches a stable stop" — that the world had already falsified, and acted on the belief without a gate that forced the divergence to surface.

The corrective is structural and conservative, and it is the same across aviation, the road, and agents: treat divergence between the predicted environment and the observed environment as a first-class abstain-or-divert trigger, and refuse to commit an irreversible maneuver on a predicted state when the observation is available to contradict it. Koopman's own counterfactual for the Cruise case is exactly this posture: "more conservative operational approaches could have avoided the dragging portion of the mishap entirely, such as waiting for remote confirmation before moving after a crash with a pedestrian." 6 The repositioning maneuver was the irreversible step; the conservative posture was to hold at the already-reached minimal risk condition — stopped — and escalate for human confirmation before moving again. The system instead pressed forward on the strength of an unverified model. The statistical-learning literature formalizes the sound move as selective prediction: when the evidence supporting an assertion is below threshold, abstain rather than assert. 26 In a stopped robotaxi after a collision, abstention is staying stopped.

Section 06

Earning the Envelope: ODD Expansion by Measured Reliability

The discipline done right looks different, and it has a public exemplar. Waymo publishes a Safety Case Approach that defines a safety case as "a documented body of evidence that provides a compelling, comprehensible and valid argument that a system is safe for a given application in a given environment," structured as claims, arguments, and evidence under a top-level goal of Absence of Unreasonable Risk. 17 This is the same structured-argument approach that UL 4600 standardizes — the safety-case definition the field uses traces to UK MoD Defence Standard 00-56 and was subsequently adopted into UL 4600 — and Waymo aligns its framework with that external best practice in briefings to regulatory working groups. 17 18 The honest caveat is that the published Safety Case Approach document is not the place to read an in-text "UL 4600" name-check; the linkage is solid at the standards and field level, where UL 4600 codifies precisely the structured-argument method Waymo publishes.

The argument is only as good as the evidence under it, and this is where the operating envelope stops being asserted and starts being earned. Through the end of October 2023 — contemporaneous with the Cruise mishap — Waymo had accumulated 7.14 million rider-only miles across Phoenix, San Francisco, and Los Angeles, and reported in December 2023 an 85% reduction in any-injury-reported crash rate and a 57% reduction in police-reported crash rate relative to human benchmarks over the same road miles. 19 The figures were subsequently published in the peer-reviewed literature, 20 and the record kept accumulating: a follow-up peer-reviewed study reported crash rates by crash type at 56.7 million rider-only miles in 2025. 21 Waymo's safety-readiness methodology gates operational expansion on demonstrated safety performance — the operating envelope grows as the field-data argument supports it, not as ambition outpaces it. 22

The envelope is therefore a renewable lease against a measured record, not a one-time grant. That is the through-line to the reliability sibling of this series, which argues earned autonomy by measured failure-rate directly: just as a twin earns a higher diversion-time tier by demonstrating a low, stable world-fleet in-flight-shutdown rate, an autonomous vehicle earns a wider operational design domain by demonstrating a low, stable, and independently scrutinized field-safety record. The grant is contracted on drift — it narrows automatically when the observed rate deteriorates — because the right to operate is a standing claim about the evidence, not a certificate immune to the field. The operating envelope itself — the operational design domain that is the subject of the first article in this series — is earned, not asserted, and the minimal-risk maneuver is what makes operating up to its edge responsible: the safety case argues over the domain, and the fallback defends its boundary.

The regulated ledger that makes this auditable in California is the annual Autonomous Vehicle Disengagement Report, in which every testing permit-holder files miles driven and disengagement events with the DMV. 23 For the 2023 reporting year Waymo reported 17,311 miles per disengagement over 3,669,962 autonomous miles, against an industry-wide total exceeding nine million test miles in the period. 24 25 The honest qualification belongs in the argument, not in a footnote: the disengagement rate is a widely criticized, partially gameable metric — it depends on operator-defined disengagement criteria, the difficulty of the operational design domain driven, and reporting discretion, so it is necessary-but-insufficient evidence. 24 It mirrors the reliability sibling's point exactly: the rate that earns the envelope must be measured, stable, and independently validated, not merely reported. An envelope expanded on a self-graded metric is an envelope asserted, not earned.

Section 07

Mapping to AI Agents: Abstention, Rollback, and the Living Trace

The translation to autonomous software agents is exact, and it binds to the five obligations the LLM-Agent Assurance Standard (LAAS) specifies for the actions an agent commits at runtime. Each AV mechanic maps to an agent mechanic and to one obligation, and the binding is to the mechanic, not merely the label.

Abstention and rollback are "always a runway." The minimal-risk maneuver is the vehicle's reachable safe harbour; the agent equivalent is a reachable rollback, abstention, or escalation at the moment of action. Every consequential action must carry a pre-identified undo, abandon, or escalate path that remains executable from the state the agent will be in when it fails — adequate to the worst-case arrival state and reachable inside the worst-case step budget. An agent with no abstention mechanism at its rollback horizon is a classifier forced to answer every query, the failure mode the selective-prediction literature exists to prevent. 26 27 28

Never commit past the rollback horizon on a predicted state. Cruise executed the pull-over on an assumed world-state that diverged from the observed one. The agent rule is the direct extension: do not commit an irreversible side effect on a predicted environment; treat predicted-versus-observed divergence as a first-class abstain-or-divert trigger. The bounded-risk implementation is conformal abstention — calibrate a threshold that holds the rate of committed-and-wrong actions at or below a chosen tolerance, and divert whenever confidence falls under it. 27 This is where OBL-INP-001, untrusted-input tier-raising, binds: LAAS requires that an action driven by untrusted input be gated at CT3 or above or blocked, and that the trace record whether the input was trusted. A degraded or self-contradicted world-model — the mis-classified collision is precisely such an event — is an untrusted input, and the action it drives must be tier-raised toward abstention, not executed on the unverified model.

The post-collision maneuver was a CT4 action. Under LAAS, an irreversible or high-consequence action — Consequence Tier 4 — requires everything CT3 requires plus human approval and defaults to abstention. OBL-IRR-001 is exactly this: the repositioning maneuver was irreversible, high-consequence, and adjacent to a vulnerable human; its correct default was abstain-and-await-remote-confirmation — Koopman's "wait for remote confirmation before moving" — not autonomous execution. 6 The maneuver violated the abstention default. OBL-HUM-001 sharpens the point: at CT4 a human verifier is required in addition, and that human must be reachable in the failure window, not merely staffed on an org chart. The adequate-versus-reachable test applies to the human exactly as it applies to a runway — a remote operator who cannot be reached in the seconds before the wheels turn is an airport with no firefighting cover.

The append-only decision trace is the living safety case. UL 4600 requires the safety case to be continuously updated from field data; OBL-TRC-001 is the agent realization. LAAS requires the trace to be written to an append-only sink the actor cannot rewrite, with per-actor hash-chains periodically anchored into a shared Merkle root — the enforcement-plane design carried over from the supervisor sibling of this stack, which owns those trace mechanics. Each invoked fallback, each abstention, and each observed-versus-predicted divergence is appended as evidence. The trace is not a log about the safety case; it is the safety case, evidenced action by action. A safety case that lives in a slide deck is a checklist; a safety case that lives in a tamper-evident, continuously-appended trace is an argument that can be re-attacked at any time against what actually happened.

The equivalence is worth stating sharply because it is the load this paper adds to the agent stack. UL 4600's requirement that the safety case be maintained from field experience is, in software terms, a requirement that the argument be backed by an append-only record of what the system actually did — every fallback it invoked, every action it abstained from, every moment its observation diverged from its prediction and it diverted. LAAS's decision trace is exactly that record, given an enforcement-plane that the constrained actor cannot edit: the gate runs out-of-process, the bundle is signed and version-pinned, and a trace asserting an in-process or unsigned gate is non-conformant regardless of its verdict. The consequence is that the safety case and the trace stop being two artifacts. The trace, evaluated against the conformance predicate, is the evidence layer of the safety case; the obligations are its claims; the per-action verdicts are its arguments. A regulator or an auditor does not read a document that describes what the agent is supposed to do — they re-evaluate the trace against the policy and read what it did, action by action, at the tier the gate assigned rather than the tier the actor claimed.

Residual tolerance is per-tier, and it scales with blast radius. UL 4600 tracks residual risk through Safety Performance Indicators; OBL-RES-001 sets the per-tier escape-rate tolerances — 0.02 at CT2, 0.005 at CT3, and 0 at CT4 — and validates them against the upper bound of the confidence interval, never the point estimate. The tolerated rate of committed-and-wrong actions tightens with consequence. The road analogue is unforgiving: the residual tolerated near a vulnerable road user is effectively zero, which is why a CT4 maneuver demands a deterministic or human gate rather than a probabilistic threshold. And the envelope itself — longer horizons, more consequential actions, fewer human gates — is a derived quantity earned by a measured, stable, independently-validated undetected-failure rate, exactly as the operational design domain is earned by the disengagement-and-miles ledger. The autonomy is the output; the measured reliability is the input.

Section 08

The Inversion Pays Off: Fallback Discipline Enables Wider Autonomy

The counter-intuitive payoff is that the reachable minimal-risk maneuver and the field-updated safety case are what unlock unsupervised operation and envelope growth — not a tax on them. This is the same inversion the aviation sibling draws from ETOPS: a stricter safe-harbour requirement, rigorously discharged, buys a less constrained trajectory, because the constraint the discipline removes — the conservative posture flown out of uncertainty about safe harbours — is far more expensive than the discipline of proving them. 26 The diversion-adequacy framework let twins fly the direct route; the minimal-risk-maneuver discipline and the structured safety case are what let an automated vehicle drive unsupervised at all.

Waymo's expansion is the inversion made concrete. The operating envelope grew — more cities, more conditions, fewer gates — because the field-data argument supported each step, the disengagement ledger accumulated, and the safety case was continuously re-evidenced. 19 22 24 The reachable fallback and the living argument were the precondition for the wider grant, not a brake on it. The contrast is the organization that refuses to pre-compute the minimal-risk maneuver and refuses to maintain the safety case as a living argument: it is stuck shipping a safety driver forever, because it can never assemble the evidence that would license the next expansion. It either over-constrains its system into commercial uselessness or — worse — lets it operate unsupervised without the safe harbours that would make unsupervised operation responsible, which is the posture that produced a vehicle dragging a pedestrian 20 feet on a mis-classified collision. 6

The AI corollary follows without strain, and it is the operating thesis of this entire stack. You grant an agent wider autonomy — longer horizons, more consequential actions, less human gating — precisely because you can prove a reachable, adequate fallback is always in the worst-case envelope and the trace substantiates it. Fallback rigor is what converts a capable agent into autonomy you can actually grant, exactly as diversion-adequacy doctrine converts a capable airframe into a profitable route. The fallback is not the cap on autonomy. It is the feature that licenses it.

The economics of the inversion are not incidental to the safety argument; they are why the discipline survives contact with a product roadmap. An organization that treats the minimal-risk maneuver and the safety case as cost centers — overhead to be minimized before launch — will under-invest in exactly the machinery that would let it widen the envelope, and will find itself permanently unable to remove the safety driver or expand the operational design domain, because it never accumulated the evidence that licenses the next step. The organization that treats them as the enabling investment they are spends the same effort and gets a wider envelope in return, because each rung of expansion is pre-justified by the field record the discipline produces. This is the same shape the reliability sibling draws from the in-flight-shutdown gate and the banking backtest: the bound is not asserted, it is operated under measurement, and the measurement is what unlocks the next tier. The cost of the discipline is real and front-loaded; the cost of skipping it is a suspended permit, a recall, and a disclosure scandal, paid later and at interest.

Section 09

The Posture: Pre-Compute the Point of Safe Return; Make the Trace the Safety Case

The posture that falls out of the doctrine reduces to three before-acting commitments, each with an honest limit.

First, pre-compute the point of safe return for every action. Before any consequential or irreversible action, identify the reachable minimal-risk maneuver — the abstain, rollback, or escalate path — and confirm it is adequate to the worst-case arrival state and reachable inside the worst-case envelope of latency, degraded sensing, and the post-event world. 1 2 The minimal risk condition must be verified against the world the system will actually be in, not the world it assumes. Cruise's lesson is the whole of this commitment in one sentence: a minimal-risk maneuver computed against a wrong world-model is a maneuver that fires and harms. 6

Second, never commit past the rollback horizon on a predicted world-state. Treat divergence between the predicted environment and the observed environment as a first-class abstain-or-divert trigger; default to abstention, and human approval, for irreversible high-consequence actions. "Wait for remote confirmation before moving" is the posture; 6 "execute the maneuver and hope the model was right" is the anti-posture that the regulators priced at a suspended permit, a 950-unit recall, and two distinct penalties. 7 11 8 14 The bounded-risk implementation is conformal abstention, holding committed-and-wrong rate below tolerance and diverting under it. 27

Third, make the trace the safety case. Maintain an append-only, tamper-evident decision trace that records every fallback invoked, every abstention, every predicted-versus-observed divergence, and the field-reliability evidence that earned the operating envelope. Safety is the continuously-updated argument this trace constitutes, in the UL 4600 sense — not a checklist signed once at launch. 5 The right to keep operating is a standing claim about the measured record, contracted on drift; the trace is what makes that claim auditable rather than asserted.

The honest limits deserve the same prominence as the commitments, and the Cruise case supplies most of them. A minimal-risk maneuver can be mis-specified — the target state can be defensible on paper and lethal in the arrival state, which is exactly what J3016's "does not guarantee actual safety" caveat warns of and exactly what the dragging realized. 2 6 A safety case can be argued against the wrong world-model: an argument is only as sound as the world it assumes, and a case evidenced against a clean-world abstraction is a confident summary that papers over the gap. And the residual migrates upstream — to the analyst who decided what "minimal risk" means for a given arrival state, which fallbacks belong in the set, and how untrusted a degraded world-model must be before it raises the tier. This is not a defect of the doctrine but its honest cost: it converts diffuse, in-the-moment failure into concentrated, nameable, fixable failure in the design layer — which is precisely the relocation that makes the residual auditable, and precisely why the design layer must be held to the standard the doctrine implies.

None of these limits is a reason to abandon the discipline; they are its own statement of scope, which is what makes it trustworthy. The closing line is the one the road earned the hard way and the agent field can adopt without repeating the lesson at a vulnerable road user's expense: the pull-over is not autonomy giving up. The always-reachable fallback is what licenses the autonomy, and the trace of fallbacks fired — re-evidenced against the world that was actually observed — is the safety case. The fallback is the feature.

The brief companion to this paper — The Fallback Is the Feature — introduces the inversion in a shorter form. This is the AV-framed sibling of Always a Runway , which proves the diversion-airport doctrine from aviation; Reliability You Can Bank develops earned autonomy by measured failure-rate; and The LLM-Agent Assurance Standard specifies the CT0–CT4 lattice and the obligations bound here.

End of paper

↑ Back to top

Fallback Doctrine Unlocks Autonomy

Context

The Finding