Autonomy Is an Envelope, Not a Capability: The Operational Design Domain for Autonomous Systems

KellerAI

Section 01

The Load-Bearing Rule: Autonomy Is the ODD, Not the Car

Ask how autonomous a vehicle is and the instinctive answer is a number on a spec sheet — a level, a feature name, a fraction of the road it can drive. That answer mistakes the output for the input. SAE J3016, the international taxonomy for driving automation, defines every level of automation below full relative to an Operational Design Domain: the operating conditions under which the system was specifically designed to function. 1 2 The level number is not a capability score. It is an allocation of the dynamic driving task between human and machine, valid only inside a declared envelope. Autonomy is therefore a derived quantity — it falls out of where the envelope is — and it is never honestly asserted in the absolute. A car that "drives itself" on a mapped, geofenced, fair-weather route is not more or less autonomous than the spec sheet says; it is exactly as autonomous as its ODD is wide, and not one meter wider.

The discipline lives at the boundary, not the center. Inside the envelope, the interesting questions are about performance: how smoothly the system tracks the lane, how early it brakes, how it handles the merge. Those questions matter, but they are not where autonomy is governed. Autonomy is governed at the edge — the line past which the system was never validated — because the edge is where the only two honest behaviors are to hand back control or to stop. J3016 names this precisely: leaving the ODD triggers the DDT fallback toward a minimal risk condition. 2 The envelope is not a soft preference the system optimizes against; it is a hard line whose crossing is a designed-for transition to not-operating. A system that keeps operating past its edge has not exceeded its capability — it has violated its design.

This inverts the intuition that capability sets reach. The intuitive model says: build a more capable model, and it can safely do more. The envelope model says: a system may do only what its validated envelope covers, and capability that outruns the envelope is not headroom — it is unaccounted risk. The two models diverge most sharply exactly where it matters: at the moment the world presents a condition the system was not designed for. The capability model asks "can it cope?" and gambles on the answer. The envelope model asks "is this inside the validated conditions?" and, if the answer is no, refuses to gamble at all. The whole of the discipline is in preferring the second question to the first.

Autonomy is not a property of the system; it is the envelope. An agent with no stated, validated envelope has no defensible autonomy — only ungoverned reach. The only honest statement of how autonomous a system is, is the statement of where its envelope ends.

The inversion

The reframe for AI agents is the same single move the diversion-airport companion paper makes for range. 25 There, an aircraft's legal range is not asserted from the airframe's capability; it is derived from where the reachable, adequate alternate airports are. Here, an agent's autonomy is a derived quantity that falls out of where its validated effect-surface ends. The unit of governance is therefore never "the agent" in the abstract — it is the (agent, task-class) pair: the agent operating within a specific, bounded envelope of consequence. And outside that envelope, the default is to refuse or escalate, not to try. The rest of this paper is the elaboration of that one sentence into a standard, an incident, a runtime mechanism, and a posture.

Section 02

DDT, ODD, and the Fallback: The Four Words That Carry the Doctrine

Four defined terms in J3016 do nearly all the work, and they are worth quoting by their section numbers because the precision is the point. The first is the dynamic driving task (DDT), §3.10: "all of the real-time operational and tactical functions required to operate a vehicle in on-road traffic, excluding the strategic functions such as trip scheduling and selection of destinations and waypoints." 2 The DDT is the moment-to-moment work of driving — steering, speed control, watching the road, responding to what appears on it — and it is what gets allocated between human and machine. It explicitly excludes the strategic layer; deciding where to go is not part of the task being automated.

The second is the Operational Design Domain (ODD), §3.21: "operating conditions under which a given driving automation system or feature thereof is specifically designed to function, including, but not limited to, environmental, geographical, and time-of-day restrictions, and/or the requisite presence or absence of certain traffic or roadway characteristics." 2 The ODD is the envelope. It is declared in advance, it is a property of the design, and J3016 establishes that Level 1 through Level 4 features operate exclusively within their respective ODDs. The envelope is not discovered after a crash; it is a design-time commitment that says, concretely, here are the conditions I was built for and validated against.

The third is the most load-bearing fact in the standard for this paper's purposes. The DDT fallback, §3.12, is "the response by the user to either perform the DDT or achieve a minimal risk condition (1) after occurrence of a DDT performance-relevant system failure(s) or (2) upon operational design domain (ODD) exit, or the response by an ADS to achieve minimal risk condition, given the same circumstances." 2 Read clause (2) carefully: ODD exit is itself a fallback trigger, on equal footing with a system failure. Leaving the envelope is not an error state the system stumbles into; it is a named, designed-for transition to not-operating. The standard treats "I have left the conditions I was validated for" with exactly the same gravity as "a component has failed."

The fourth is the minimal risk condition (MRC), §3.16: "a stable, stopped condition to which a user or an ADS may bring a vehicle after performing the DDT fallback in order to reduce the risk of a crash when a given trip cannot or should not be continued." 2 The 2021 revision tightened the wording to "a stable, stopped condition"; earlier editions said merely "a condition." The MRC is the correct behavior outside the envelope made concrete: not a degraded version of driving, but a stop. It is the subject of the third paper in this stack — The Fallback Is the Feature — so this paper names it and hands it off rather than developing it. What matters here is the chain the four words form: the DDT is what gets allocated; the ODD is the envelope that allocation is valid within; ODD exit triggers the fallback; and the fallback's terminus is a stop. Out-of-ODD is not a failure to be patched over. It is a transition the system is supposed to make.

Section 03

Levels Are an Allocation, Not a Ladder of Power

The six levels are the most quoted and least understood part of J3016. They are routinely read as a power ladder — Level 4 is more capable than Level 2, Level 5 is the top — and that reading is the capability fallacy in its purest form. The levels are an allocation of the dynamic driving task between human and machine, paired with how restricted the ODD is. They are not a score. 1 3 The taxonomy is not merely an industry convention: NHTSA adopted the SAE J3016 levels as federal reference policy in Automated Driving Systems 2.0: A Vision for Safety, which defines an Automated Driving System as SAE Levels 3 through 5, 13 14 and the successor guidance, Automated Vehicles 3.0, built on rather than replaced that framing. 15

L0 · No Driving Automation — human performs the entire DDT.
L1 · Driver Assistance — system does part of the DDT (lateral or longitudinal); human does the rest and supervises.
L2 · Partial Driving Automation — system does lateral and longitudinal; human supervises and is responsible for monitoring.
L3 · Conditional Driving Automation — ADS performs the entire DDT when engaged; a fallback-ready human takes over on request.
L4 · High Driving Automation — ADS performs the entire DDT and the fallback within its ODD; no human fallback needed in-ODD.
L5 · Full Driving Automation — ADS performs the entire DDT under all driver-manageable conditions: an unlimited ODD.

The decisive discontinuity is the seam between Level 2 and Level 3, and it is a discontinuity in who performs the task, not in how good the system is. J3016 reserves the term Automated Driving System (ADS) for "a Level 3, 4, or 5 driving automation system," defined as "collectively capable of performing the entire DDT on a sustained basis." 2 Below the seam — Levels 1 and 2 — the features "are capable of performing only part of the DDT, and thus require a driver to perform the remainder of the DDT, as well as to supervise the feature's performance while engaged." 2 A Level 2 system that steers and brakes beautifully is still a Level 2 system: the human is performing and supervising the DDT. Crossing to Level 3 is not "the system got better"; it is "the system now performs the entire task when engaged, and the human's role changes from supervisor to fallback." The seam is an allocation boundary, and confusing it for a quality boundary is exactly the error that the next section's fatality turns on.

Higher level, then, means two things together — a wider, less restricted ODD, and more of the DDT allocated to the machine — and it means nothing about raw capability. This is sharpest at the top of the ladder. Level 5 is defined as the level that operates under all driver-manageable conditions: it is precisely the unlimited-ODD level. 2 That is why Level 5 is special in a way that has nothing to do with being "the best": it is the only level with no envelope edge to detect, because there is no condition outside its ODD. Every other level — including the formidable Level 4 robotaxi — is defined by the edge of its envelope, and is therefore defined by the discipline of detecting that edge and stopping at it. For every real system short of the unlimited case, the level is a statement about where the envelope ends and who catches the car when it does.

Section 04

Runtime ODD Monitoring: Detecting the Edge While Moving

A declared envelope is worthless if the system cannot tell, at runtime, that it has reached the edge. The ODD is a design-time artifact; the edge is crossed at sixty miles an hour, in traffic, in conditions that change while the wheels are turning. The active discipline that closes this gap is runtime ODD monitoring: the system must continuously verify that it is still inside its validated conditions and trigger the fallback the moment it is not. J3016 makes this obligation structural by tying the DDT fallback to ODD exit (§3.12) — the system is required to respond to leaving the envelope, which presupposes it can detect the leaving. 2

The crucial property of this monitor is the instant against which it checks. The edge must be detected against observed conditions — what the sensors and the world are actually presenting right now — not against the conditions assumed at the start of the trip. This is the structural counterpart to aviation's forecast-versus-observation gate analyzed in the diversion-airport companion: an alternate that the dispatch-time forecast said would be open, but the in-flight observation finds fogged in, is a safe harbour you believe you have and do not. 25 A vehicle that assumes, from the route it was dispatched on, that it remains inside its ODD — while the road ahead has quietly changed character — is making the same mistake. Runtime ODD monitoring exists precisely to replace the assumed world-state with the observed one, continuously, so that the envelope edge is detected against reality rather than against a dispatch-time prediction.

The industry already measures the rate at which this edge is hit, even if it does not always describe it that way. California's autonomous-vehicle regulation, 13 CCR §227.50, defines a "disengagement" as "a deactivation of the autonomous mode when a failure of the autonomous technology is detected or when the safe operation of the vehicle requires that the autonomous vehicle test driver disengage the autonomous mode and take immediate manual control of the vehicle." 17 Read in the vocabulary of this paper, a disengagement is the AV-native form of "runtime ODD monitoring fired and handed control back." The state itself is large: permit holders logged more than 9 million autonomous test miles in California in the December 1, 2024 to November 30, 2025 reporting period. 18 Representative figures from earlier filings put the rate in concrete terms — one operator reported roughly 17,311 miles per disengagement (3,669,962 miles across 212 disengagements, December 2022 to November 2023). 19 The federal layer measures edge-hits from the other direction: NHTSA's Standing General Order 2021-01 requires manufacturers to report crashes in which an ADS or an SAE Level 2 ADAS was engaged within 30 seconds of the crash — a reporting boundary that, in effect, records the moments the envelope edge coincided with a collision. 16

But miles-per-disengagement must be handled honestly, because it is a contested and gameable metric — it varies with where testing happens, with conservative-versus-aggressive disengagement policy, and with the breadth of the ODD itself. The DMV explicitly states that the reports "are not designed for comparative analysis across companies." 18 The figure is not a cross-vendor ranking, and it is not used as one here. Its value is narrower and more useful: it is evidence that the field measures the rate at which the envelope edge is encountered. A higher disengagement rate is not simply "worse" — it can mean a harder operating environment or a more conservative monitor that hands back control sooner rather than gambling. The metric is a window onto the monitor's behavior, not a verdict on it. What matters for the doctrine is that the monitor exists, fires on observed conditions, and produces a clean hand-back when it does.

Section 05

Anatomy of Williston: A Level-2 System Used Outside Its Envelope

The cost of confusing capability for envelope is best read from the first fatality involving a driver-assistance system engaged at the moment of a crash. The NTSB recorded it precisely: "At 4:36 p.m. eastern daylight time on Saturday, May 7, 2016, a 2015 Tesla Model S 70D car, traveling eastbound on US Highway 27A (US-27A), west of Williston, Florida, struck a refrigerated semitrailer powered by a 2014 Freightliner Cascadia truck-tractor. The driver and sole occupant of the car died in the crash; the commercial truck driver was not injured." 9 The investigation is NTSB/HAR-17/02, case HWY16FH018, adopted September 12, 2017. 10 The driver — anonymized as "the car driver" in the report body and identified as Joshua Brown in the public docket and contemporaneous press — had the vehicle's Autopilot engaged. 11 12

The geometry is the first half of the lesson. "At the crash location, US-27A is a four-lane divided highway with two through lanes in each direction separated by a 60-foot-wide paved median." 9 The truck was turning left across the eastbound through lanes — at-grade cross traffic — on a divided highway, not a limited-access freeway. This is the kind of intersecting traffic a highway-driving assistant is not designed to resolve. The NTSB framed the entire investigation around this category mismatch: §2.4 of the report is titled "Operational Design Domains for Level 2 Vehicle Automation," and the report characterizes the system directly — "According to these definitions, the Tesla car involved in the Williston crash was equipped with a Level 2 automated driving system. When operating a Level 2 vehicle, the driver is responsible for monitoring the driving environment." 9 A Level 2 system. The human is the monitor. The system was engaged in conditions outside the environment it was designed to handle.

The second half of the lesson is that the system operated as designed. NHTSA's parallel defects investigation, PE 16-007, "did not identify any defects in the design or performance of the AEB or Autopilot systems, nor any incidents in which the systems did not perform as designed." 11 The car was not broken. This is the part that separates the envelope discipline from a reliability discipline: a system can be entirely non-defective and still be operating catastrophically outside the conditions it was validated for. The NTSB's Finding 3 establishes the envelope edge concretely — "The Tesla's automated vehicle control system was not designed to, and did not, identify the truck crossing the car's path or recognize the impending crash." 9 The crossing truck was outside the envelope. The system did exactly what it was built to do, which did not include seeing it.

The probable cause must be quoted with care, because the envelope-limiting language is easy to misattribute. The NTSB determined "that the probable cause of the Williston, Florida, crash was the truck driver's failure to yield the right of way to the car, combined with the car driver's inattention due to overreliance on vehicle automation, which resulted in the car driver's lack of reaction to the presence of the truck. Contributing to the car driver's overreliance on the vehicle automation was its operational design, which permitted his prolonged disengagement from the driving task and his use of the automation in ways inconsistent with guidance and warnings from the manufacturer." 9 The cause names over-reliance and a design that permitted use inconsistent with the manufacturer's guidance. The sharper envelope-limiting wording lives not in the cause but in Finding 5: "If automated vehicle control systems do not automatically restrict their own operation to those conditions for which they were designed and are appropriate, the risk of driver misuse remains." 9 The recommendations carried it into action — H-17-38 directed NHTSA to develop a method to verify that manufacturers "incorporate system safeguards that limit the use of automated vehicle control systems to those conditions for which they were designed," and H-17-41 directed six named Level 2 manufacturers to incorporate exactly those safeguards. 9

The failure was upstream of the moment. A non-defective system, operating exactly as designed, was permitted to run outside the envelope it was validated for — with no mechanism to detect the edge and stop, and a human who assumed a competence the system did not have.

The Williston shape

Read at the level of doctrine, the observed-versus-assumed lesson is stark. The human assumed — forecast — a competence the system did not have; the system was, in fact, observed to be outside its envelope; and there was no out-of-process mechanism reading the observed conditions and forcing the system to stop. The NTSB also found, in Finding 6, that the steering-wheel torque the vehicle used to gauge driver engagement "provides a poor surrogate means of determining the driver's degree of engagement" — the monitor of the human was itself measuring the wrong thing. 9 It is the road-going twin of the diversion-airport companion's core failure: committed past the point of safe return on a forecast rather than an observation, with the designed fallback foreclosed. 25 The difference is that here the fallback was a human who had been allowed to stop being a fallback.

Section 06

Mapping to AI Agents: The (Agent, Task-Class) Envelope

The translation to AI agents is one move, stated in the units the LLM-Agent Assurance Standard already defines. 24 The ODD — "the operating conditions under which a given system is specifically designed to function" (§3.21) 2 — becomes the (agent, task-class) effect-surface envelope: the bounded set of consequence conditions — reversibility, scope, consequence — within which a given agent on a given task-class has been validated to operate. The governance unit is never "the agent" in the abstract, exactly as a car's autonomy is never stated without its ODD. It is always the agent operating within a specific, bounded, validated envelope of effect. "How capable is the agent" is the wrong axis — it is the diversion-airport "how far can the twin fly" error 25 — and the right axis is "what is the validated envelope, and is this action inside it."

Runtime ODD monitoring becomes a runtime gate on the observed effect surface. J3016 ties the DDT fallback to ODD exit (§3.12), requiring the system to detect, at runtime, that it has left its envelope. 2 The AI form is the LAAS gate that reads an action's observed effect surface — never the agent's self-report — and computes a consequence tier: ct = max(reversibility, scope, consequence) for any action with an external effect. 24 The gate is the runtime ODD monitor: per action, it checks whether the observed surface is inside the validated envelope. This is the observed-versus-assumed discipline rendered in software — the edge is detected against the observed effect surface, not against the agent's forecast of its own behavior. Williston is precisely an assumed-competence operation past the observed edge, and the gate is the mechanism whose absence made Williston possible.

Out-of-ODD becomes default to the highest tier. The correct behavior on ODD exit is to not operate autonomously — to perform the DDT fallback toward a minimal risk condition (§3.12, §3.16). 2 The AI form is the LAAS default-to-highest rule: ct = 4 whenever any axis is undetermined — an unknown reversibility is treated as irreversible, an unknown scope as public, an unknown consequence as high. 24 When the effect surface is undetermined — when the action is outside or at the unverified edge of the validated envelope — the gate defaults the action to CT4, which under the LAAS lattice requires human approval, an abstention default, and full evidence. Undetermined surface, assume the worst tier, refuse or escalate. This is exactly "outside the ODD, the correct behavior is to not operate," and it is the same instinct the selective-prediction and conformal-abstention literatures formalize: when the evidence supporting an assertion falls below threshold, the sound move is to abstain rather than to commit and hope. 20 23 The bounded-risk version of that abstention is the conformal-prediction construction — distribution-free guarantees on the rate of error among the actions the system does commit to 21 22 — which is the formal counterpart of an ODD edge calibrated to a chosen tolerance: refuse whenever observed confidence in being inside the envelope falls below the line.

These three moves bind to a single conformance obligation. The LAAS bundle's first cross-cutting invariant is gate-derived tiering: the gate derives the tier and selects the verifier; the actor only proposes; any path where the actor sets its own tier is non-conformant, and a self-reported tier below the gate's is not a discount but a logged warning on a blocked action. 24 The article's envelope obligation, OBL-TIER-001, binds to that mechanic directly: the consequence tier of an (agent, task-class) action is derived by the out-of-process gate from the observed effect surface, defaulting to CT4 (expected_ct := 4) whenever any axis of that surface is undetermined. This is the AV envelope made into a runtime conformance obligation. The agent never gets to assert it is inside its ODD; the gate observes whether it is, and where the gate cannot tell, the action is treated as out-of-ODD and held at the strictest tier. The Williston shape is exactly the absence of OBL-TIER-001: a system that self-classified as in-envelope — the human assumed so, the design permitted it — with no out-of-process gate reading the observed surface and forcing the strict-tier default.

Section 07

The Cross-Stack Cross-Reference: Where SOTIF, MRC, and the Aviation Parent Sit

This article defines the envelope. It is one of three that together form a single autonomous-driving stack, and the boundaries between them are worth drawing precisely so the three read as one argument rather than three overlapping ones. Begin with the companion that governs what happens inside a working envelope: Correct Hardware, Wrong Answer, built on ISO 21448 — the Safety Of The Intended Functionality (SOTIF). SOTIF is the standard for "the absence of unreasonable risk due to hazards resulting from functional insufficiencies of the intended functionality." 4 The system performs exactly as built, the hardware is correct, and the answer is still wrong because the intended function itself is insufficient for the scenario. The bridge to this paper: a validated ODD bounds where SOTIF must reason about functional insufficiency, and narrowing the envelope is SOTIF's primary risk-reduction lever. Where this paper says "detect the edge and stop," the SOTIF article says "even well inside the edge, the correct hardware can produce the wrong answer." The envelope is necessary but not sufficient; what happens inside it is the SOTIF article's subject. (SOTIF has a fault-present complement, ISO 26262, which governs hazards from malfunctioning E/E systems via the ASIL A–D integrity levels; 7 8 it locates the boundary of SOTIF's no-fault domain and is otherwise outside this stack's scope.)

Next, the companion that develops the act of stopping: The Fallback Is the Feature, built on the minimal risk condition and UL 4600. This paper establishes that the correct behavior outside the ODD is to stop operating; that article develops the stop itself as the load-bearing feature. J3016's minimal risk condition — "a stable, stopped condition after performing the DDT fallback" (§3.16) 2 — is the AV-native object UL 4600 builds its safety case around: claim, argument, evidence, validated in the field through Safety Performance Indicators, for products running without human supervision. 5 6 The bridge: this paper hands the edge-detection problem to the fallback; that article argues the fallback is not a degraded mode but the primary safety feature — the road-going equivalent of the diversion-airport rule's "the fallback rule was the runway that let the twin fly the direct route." 25 An ODD without a designed, validated fallback is an envelope with no exit.

Finally, the parent the whole stack descends from: Governed Like Aviation, Audited Like Banking, whose load-bearing concept is integrity — bounding the rate of undetected false assertions, what aviation calls Hazardously Misleading Information: wrong information delivered without a warning flag. The point is not raw accuracy but the suppression of confident, unflagged error. Williston is HMI in physical form: a Level-2 system that was wrong — it did not detect the crossing truck — and did not say so. The human had no warning flag, trusted the automation, and did not react. 9 Runtime ODD monitoring is the AV form of the parent's integrity monitor: it exists to ensure the system flags when it has left the conditions under which its outputs can be trusted, rather than silently continuing to assert competence it no longer has. The parent's "abstain when you cannot certify" is this stack's "perform the DDT fallback on ODD exit." Three papers, one discipline: define the envelope (here), govern the no-fault failure inside it (SOTIF), and make the exit safe (the fallback).

Section 08

The Cost Inversion: The Envelope Enables Autonomy, It Does Not Cap It

The most counter-intuitive payoff of the envelope discipline is that a validated, monitored ODD is not a restriction on autonomy — it is the precondition for granting wider autonomy responsibly. The instinct runs the other way: surely a system hemmed in by a declared envelope, required to stop at its edge, can do less than one allowed to press on. The instinct is exactly backwards, and the reason is the same one the diversion-airport companion finds in aviation, where a stricter safe-harbour requirement, rigorously discharged, is what let twin-engine aircraft fly the direct ocean routes they had been forced to dogleg around. 25 The discipline did not make the aircraft fly less; it made them flyable farther, accountably.

Consider the two systems that lack the discipline. The first is the undisciplined-but-gated system: because no one can certify where its envelope ends, it is over-gated into uselessness — a human must approve nearly everything, because nothing can be trusted to stop on its own. The second is the undisciplined-and-ungated system: it is allowed to operate past an edge it cannot detect. That second system is the Williston shape — a capable system, operating exactly as designed, permitted to run outside the conditions it was validated for, with no mechanism to catch the exit. 9 Neither system can be granted wide autonomy honestly. The first is too expensive to be autonomous; the second is too dangerous to be.

The disciplined system can be trusted to operate unsupervised precisely because it will stop at its edge. Validating the envelope is what converts a capable agent into autonomy you can actually grant — the envelope is the ground you are allowed to stand on, not the cap on how far you may reach.

The payoff

The disciplined system is the one that can be let go furthest. Because its envelope is declared and its edge is monitored against observed conditions, it can be trusted to operate unsupervised inside that envelope — and the envelope can then be widened, deliberately and with evidence, exactly as the diversion-airport tiers were widened only after the demonstrated reliability data justified each rung. 25 The mechanism of the saving is the same in both fields: the discipline does not buy autonomy by making the system more capable; it buys autonomy by certifying the conditions under which the existing capability is safe to grant. An agent whose (agent, task-class) envelope is validated and runtime-monitored is one you can let run longer, touch more consequential actions, and operate with less human gating — not despite the envelope but because of it. The organization that refuses to specify envelopes is the one stuck choosing between uselessly over-gated agents and dangerously ungated ones. The envelope is what dissolves that false choice.

Section 09

The Posture: Define the Envelope, Detect Its Edge, Refuse Outside It

The posture that falls out of the envelope doctrine reduces to three commitments, each with an AV precedent and an honest limit. The commitments are sequential: you cannot detect an edge you have not defined, and you cannot refuse outside an envelope whose edge you cannot detect.

First, define the envelope. State the (agent, task-class) operating envelope explicitly before granting autonomy — the bounded effect surface (reversibility, scope, consequence) within which the agent has been validated to operate unsupervised. The AV precedent is J3016 §3.21: an ODD is declared, not discovered after a crash. 2 An agent with no stated envelope has no defensible autonomy, only ungoverned reach. The honest limit: the envelope can be mis-specified, and the residual risk then migrates to the envelope-definition layer — exactly as the diversion-airport paper's near-miss showed the residual migrating to the dispatch risk assessment that populated the alternate set. 25 The doctrine does not eliminate the failure; it relocates it to a layer where it is nameable and fixable, and that layer must then be held to the standard the doctrine implies.

Second, detect its edge at runtime. Monitor, per action, against the observed effect surface — not the agent's self-report — whether the action is inside the validated envelope. The AV precedent is twofold: J3016 §3.12 ties the DDT fallback to ODD exit, and Williston's NTSB Finding 5 demands that systems "automatically restrict their own operation to those conditions for which they were designed." 2 9 The LAAS form is an out-of-process gate that computes ct = max(reversibility, scope, consequence) from the observed surface (OBL-TIER-001). 24 The honest limit: runtime monitoring can fail to detect the edge, and a monitor that shares the agent's blind spots — built from the same lineage, trained on the same data — will miss the same exits. The parent paper's monitor-independence requirement applies: the thing that watches for the edge must not be the thing that drove past it.

Third, refuse outside it. On ODD exit — or whenever the surface is undetermined — default to the strictest tier (expected_ct := 4) and refuse or escalate, exactly as the correct behavior outside the ODD is to perform the DDT fallback toward a minimal risk condition rather than to keep operating. The AV precedent is J3016 §3.16 (the MRC); the AI form is the LAAS default-to-highest rule, the conformal-abstention construction made operational. 2 23 24 The competent system stops at its edge. The honest limit: refuse-on-uncertainty has a real coverage cost — an agent that escalates whenever its surface is undetermined accomplishes less than one that presses on, and the trade between coverage and safety is a governance decision, not a technical one. 20 The selective-prediction literature names this trade exactly: lower the abstention threshold and you answer more queries but commit more errors; raise it and you are safer but cover less. 19 Where to set it is a choice about acceptable risk, and it must be made by someone accountable, not defaulted into by the system.

None of these limits is a reason to abandon the doctrine; they are the doctrine's own statement of its scope, which is what makes it trustworthy. Williston is the standing proof of the alternative: a capable, non-defective system, permitted to operate outside the envelope it was validated for, with no mechanism to detect the edge and stop. 9 The correction the field made afterward was not to make the cars more capable; it was to demand that systems restrict their own operation to the conditions they were designed for. 9 That is the whole of the posture, carried from the road to the agent. The ODD did not make the car drive less — it made autonomy grantable. The envelope is not the cap on autonomy; it is the validated ground on which you are allowed to stand.

The brief companion to this paper — Autonomy Is an Envelope — introduces the core argument in a shorter form. This is the first of three papers in the autonomous-driving stack: it defines the envelope; Correct Hardware, Wrong Answer governs the no-fault failure mode inside a working envelope (SOTIF / ISO 21448); and The Fallback Is the Feature develops the act of stopping itself as the load-bearing safety feature (the minimal risk condition / UL 4600). The stack descends from the aviation-and-banking parent, Governed Like Aviation, Audited Like Banking , and runs parallel to the diversion-airport doctrine of Always a Runway .

End of paper

↑ Back to top

Where the Envelope Ends

Context

The Finding