The Fallback Is the Feature

KellerAI

Section 01

The maneuver that licenses the autonomy.

A driving automation system is permitted to operate unsupervised only because, at every instant, it holds a pre-computed maneuver that brings the vehicle to a stable, stopped, low-risk state. SAE's standard for driving automation names that target the minimal risk condition — the state the system reaches after performing its fallback when a trip cannot or should not be completed; the 2021 revision sharpened it to a stable, stopped condition. The maneuver that drives the vehicle there is, by definition, the response invoked after a failure or upon exit from the region the system was cleared to operate in. It exists precisely for the moment ordinary driving can no longer be carried out.

This is the same inversion Always a Runway draws from aviation. ETOPS does not grant a twin permission to fly farther from a runway; it defines the maximum time the jet may ever be from a runway it can actually reach with one engine out. Range is an output of the calculation, not an input — derived from where a reachable, adequate safe harbour sits in the worst-case envelope. The minimal risk condition is the road's version of the diversion airport. The operating envelope a vehicle is granted — its operational design domain, the subject of the sibling article on autonomy as an envelope — is the region where the fallback stays reachable from the degraded state the system would really be in. The capability to drive the route is not the question. The question is whether the fallback that catches a failed route is still inside the envelope at every step.

Read this way, the pull-over is not the system conceding defeat. It is the system exercising the very mechanism that earns it the right to drive at all. A vehicle that could not, at any instant, bring itself to a stable stop would not be a more ambitious autonomous vehicle — it would be an unlicensable one, because the envelope it could be granted would collapse to zero. When a regulator writes a rule against the minimal risk condition, it is writing the rule against the fallback, which is the clearest signal that the fallback — not the headline capability — is the thing the license is actually about.

The minimal-risk maneuver is not the moment autonomy fails. It is the precondition that licenses autonomy at all — the operating envelope is a derived quantity that falls out of where the fallback stays reachable, never an absolute capability claim.

The inversion

Section 02

Adequate, and actually reachable.

Two words carry the doctrine, and conflating them is the most common way a fallback that looks sound turns out not to be. The minimal risk condition is the end-state: a stable, stopped configuration. The minimal-risk maneuver is the fallback that drives the vehicle there — the steering, braking, and lane choice that get it from the failure to the stop. A system can possess a well-defined target on paper and still fail catastrophically if the maneuver that reaches it is wrong for the state the vehicle is actually in. The standard names the target; it explicitly does not guarantee actual safety. "Come to a stop" is correct in a world with no one under the vehicle and lethal in a world where there is.

A fallback that merely exists is not the same as a fallback adequate to the state the vehicle will be in. Adequacy is a property of the arrival state: the stop must be safe for the world the vehicle finds after the failure — the post-collision world, the degraded-sensing world, the world with a vulnerable road user in an unexpected position — not the clean world the planner assumed. Reachability is a property of the envelope: the maneuver must complete inside the worst-case budget of latency, sensing degradation, and time-to-stop. A maneuver that needs more perception confidence, more time, or more healthy actuators than the vehicle will have at the moment of failure is an airport outside the single-engine envelope — present on the map, foreclosed in fact. Both properties are required, and neither substitutes for the other: a maneuver that is reachable but inadequate completes on time into an unsafe state; one that is adequate but unreachable would be safe if it could finish, but the worst-case envelope forecloses it.

A fallback that cannot catch the failure you will actually be in is not a fallback. It is a dot on the chart that looks like a runway — adequate on the happy path, foreclosed in the worst case.

The two words

Section 03

Safety is an argument, not a checklist.

The second half of the doctrine is that the proof of safety is not a list of boxes. UL 4600 — the Standard for Safety for the Evaluation of Autonomous Products, first published in 2020 — does not ask whether a fixed checklist was completed. It asks whether the developer can present a structured safety case: a set of goals (claims), each supported by an evidence-based argument, with the evidence attached. It is deliberately goal-based rather than prescriptive — it specifies what a safety case must address, not which engineering approach to take.

The distinction is not cosmetic. A checklist is static; tick the boxes and you are done. A safety case is an argument that can be attacked. Each claim carries a burden of proof; the argument can be challenged for a gap; the evidence can be shown to be stale or unrepresentative. UL 4600 is a standard of care, not a pass/fail correctness test — conformance asserts that the right argument was made, with evidence, never that no harm can occur. And the property that makes it the right instrument for an open-world system is that it is built to be maintained: residual risk is tracked through performance indicators, and the case is updated as field data accumulates. An autonomous vehicle operates in a world it cannot fully enumerate; a safety case frozen at launch is an argument about a world that no longer exists by the second week. The right to keep operating is a standing claim about a measured record, not a framed certificate on the wall.

Treat safety as an argument that can be attacked, not a checklist that was ticked. A claim with stale evidence is a claim that has quietly become false — and the only way to know is to keep re-evidencing the argument against the field record.

The rule to learn

One companion hazard is deliberately left to the rest of the stack. The failure mode that arises with no component failure — a system working as designed and still unsafe — is the province of SOTIF, owned by the sibling article on the fault-free hazard, where the hardware is correct and the answer is wrong. This paper takes the safety case as its subject and inherits that hazard class rather than re-deriving it.

Section 04

When the fallback fires and is still wrong.

The cautionary anchor is precise because the fallback was not missing. It fired. On the evening of 2 October 2023 in San Francisco, a human-driven car struck a pedestrian and propelled her into the path of a driverless Cruise robotaxi. The Cruise vehicle braked hard but still struck her. To that point it had done roughly what a careful driver could; the initial collision was set up by another driver entirely. What happened next is the lesson. The automated system mis-classified the event as a lateral collision and commanded the vehicle to pull over out of traffic — a textbook minimal-risk maneuver — "pulling the individual forward, rather than remaining stationary." It dragged the pedestrian roughly 20 feet and came to rest with a wheel on her legs.

The maneuver was correct for the world the system believed it was in — a lateral collision, the path ahead clear. It was catastrophic for the world it was actually in — a person pinned beneath the vehicle. The system, in the analysis of the standard's own author, lost track of and in essence forgot a pedestrian whose legs were partially in camera view, then fired a repositioning maneuver on that diverged model. A fallback that fires on a wrong world-model is worse than no fallback, because it converts a stationary vehicle — itself a minimal risk condition — into a moving hazard.

The regulatory consequences were swift and worth keeping distinct. The California DMV suspended Cruise's deployment and driverless-testing permits effective immediately, citing vehicles not safe for public operation and misrepresentation of safety information. Cruise filed a recall covering 950 automated-driving units for the post-collision defect. Two separate penalties followed — a $1.5 million civil penalty for the reporting failure and a distinct $500,000 settlement for the false report that omitted the dragging — and a commissioned third-party investigation found that leadership had failed to disclose the dragging to regulators, a finding that preceded a wave of executive departures. The maneuver fired; the safety case was argued against the wrong world; and the organization compounded the engineering failure with a disclosure failure the regulators treated as separately culpable.

Generalize the mishap and a clean failure mode emerges, distinct from any single coding defect. The minimal-risk maneuver fired on an internal model of the world that diverged from the observed world. This is the road's face of the forecast-versus-observation gap Always a Runway names as the load-bearing error: an aircraft committing past its point of safe return on a forecast the observation later contradicted, with the designed fallback already foreclosed. The shared signature is a system acting on a belief about its situation that the world had quietly invalidated, past the point where it still checked. The corrective is structural and conservative: treat divergence between the predicted environment and the observed environment as a first-class abstain-or-divert trigger, and refuse to commit an irreversible maneuver on a predicted state when the observation is available to contradict it. The author's own counterfactual is exactly this posture — wait for remote confirmation before moving after a crash with a pedestrian. In a stopped robotaxi after a collision, abstention is staying stopped.

Section 05

Earning the envelope — and the agent that inherits the doctrine.

The discipline done right has a public exemplar. Waymo publishes a safety-case approach built as claims, arguments, and evidence under the top-level goal of absence of unreasonable risk — the same structured-argument method UL 4600 standardizes. And the argument is only as good as the evidence under it. Through the end of October 2023, contemporaneous with the Cruise mishap, Waymo had accumulated 7.14 million rider-only miles and reported an 85% reduction in any-injury-reported crash rate and a 57% reduction in police-reported crash rate relative to human benchmarks over the same roads; the figures were later published in peer-reviewed literature, and the record kept accumulating to 56.7 million rider-only miles. The operating envelope grew — more cities, more conditions, fewer gates — because the field-data argument supported each step, not as ambition outpaced it.

The envelope is a renewable lease against a measured record, not a one-time grant — the through-line to the reliability sibling of this series, which argues earned autonomy by measured failure-rate directly. The honest qualification belongs in the argument: California's annual disengagement ledger, in which Waymo reported 17,311 miles per disengagement over 3,669,962 autonomous miles for 2023, is a widely criticized, partially gameable metric. It depends on operator-defined criteria and reporting discretion, so it is necessary-but-insufficient evidence. The rate that earns the envelope must be measured, stable, and independently validated — not merely reported.

The translation to autonomous software agents is exact. The minimal-risk maneuver is the vehicle's reachable safe harbour; the agent equivalent is a reachable rollback, abstention, or escalation at the moment of action. Every consequential action must carry a pre-identified undo, abandon, or escalate path that is executable from the state the agent will be in when it fails — adequate to the worst-case arrival state and reachable inside the worst-case step budget. Never commit past the rollback horizon on a predicted state: treat predicted-versus-observed divergence as a first-class abstain-or-divert trigger, exactly as Cruise should have held at the stop it had already reached. An irreversible, high-consequence action adjacent to a human defaults to abstain-and-await-human — and the human must be reachable in the failure window, not merely staffed on an org chart, the adequate-versus-reachable test applied to people.

The last move is the one this paper adds to the agent stack. UL 4600 requires the safety case to be maintained from field experience; in software terms, that is a requirement that the argument be backed by an append-only, tamper-evident decision trace of what the system actually did — every fallback invoked, every abstention, every moment its observation diverged from its prediction and it diverted. The trace is not a log about the safety case. It is the safety case, evidenced action by action. A safety case that lives in a slide deck is a checklist; one that lives in a continuously-appended, un-rewritable trace is an argument that can be re-attacked at any time against what actually happened.

The payoff is the inversion paying off. Fallback rigor is not the price you pay for autonomy — it is the thing that lets you grant autonomy at all. You let an agent operate over a longer horizon, with less supervision, across more irreversible actions, because you can prove a reachable, adequate fallback is always inside the worst-case envelope and the trace substantiates it. The team that treats the minimal-risk maneuver and the safety case as cost centers stays stuck shipping a safety driver forever, because it never accumulates the evidence that licenses the next expansion. The team that engineers the fallback set as carefully as ETOPS engineers a diversion airport flies the direct route. The pull-over is not autonomy giving up. The always-reachable fallback is what licenses the autonomy, and the trace of fallbacks fired — re-evidenced against the world that was actually observed — is the safety case.

Fallback doctrine is not the tax on autonomy. It is the runway that lets you fly the direct route. You go farther because a safe harbour is provably always reachable — not despite the discipline, but because of it.

The inversion pays off

The in-depth companion develops the full argument: the precise SAE J3016 vocabulary and the adequate-versus-reachable budget, the complete anatomy of the Cruise mishap and the assumed-versus-observed failure mode, the UL 4600 safety case decomposed into claims-arguments-evidence, Waymo's field-earned envelope and its honest metric caveats, and the mapping to agent abstention, rollback, the consequence-tier lattice, and the append-only trace as the living safety case. Read it at The Fallback Is the Feature: Minimal Risk Conditions and the UL 4600 Safety Case for Autonomous Systems .

End of paper

↑ Back to top

The Pull-Over Is Permission

Context

The Finding

The maneuver that licenses the autonomy.

Adequate, and actually reachable.

Safety is an argument, not a checklist.

When the fallback fires and is still wrong.

Earning the envelope — and the agent that inherits the doctrine.