Autonomy Is an Envelope, Not a Capability

KellerAI

Section 01

The rule is the envelope, not the car.

Ask how autonomous a vehicle is and the instinctive answer is a number on a spec sheet — a level, a feature name, a fraction of the road it can drive. That answer mistakes the output for the input. SAE J3016, the international taxonomy for driving automation, defines every level of automation below full relative to an Operational Design Domain: the operating conditions under which the system was specifically designed to function — its environmental, geographical, and time-of-day limits, and the road and traffic characteristics it was built for. The level number is not a capability score. It is an allocation of the driving task between human and machine, valid only inside a declared envelope.

Autonomy is therefore a derived quantity. It falls out of where the envelope is, and it is never honestly asserted in the absolute. A car that drives itself on a mapped, geofenced, fair-weather route is exactly as autonomous as its envelope is wide, and not one meter wider. The discipline lives at the boundary, not the center. Inside the envelope the questions are about performance — how smoothly the system tracks the lane, how early it brakes. Those matter, but they are not where autonomy is governed. Autonomy is governed at the edge, because the edge is where the only two honest behaviors are to hand back control or to stop.

This inverts the intuition that capability sets reach. The intuitive model says: build a more capable system, and it can safely do more. The envelope model says: a system may do only what its validated envelope covers, and capability that outruns the envelope is not headroom — it is unaccounted risk. The two diverge most sharply exactly where it matters: at the moment the world presents a condition the system was never designed for. The capability model asks "can it cope?" and gambles on the answer. The envelope model asks "is this inside the validated conditions?" and, if the answer is no, refuses to gamble at all.

Autonomy is not a property of the system; it is the envelope. An agent with no stated, validated envelope has no defensible autonomy — only ungoverned reach.

The inversion

This is the same move the sibling diversion-airport paper makes for an aircraft's range: range is not asserted from the airframe's endurance, it is derived from where the reachable, adequate alternate runways are. You may fly only as far as you can still turn back. Here, an agent's autonomy is derived from where its validated effect-surface ends — and outside that, the default is to refuse, not to try.

Section 02

The standard makes leaving the envelope a designed-for transition.

J3016 does not treat the envelope as a soft preference the system optimizes against. It treats crossing the edge as a named, designed-for transition to not-operating. The standard ties four defined terms together into a chain. The dynamic driving task is the moment-to-moment work of driving — steering, speed control, watching the road — and it is what gets allocated between human and machine. The Operational Design Domain is the envelope that allocation is valid within: declared in advance, a property of the design, not discovered after a crash.

The load-bearing fact is the third term. The DDT fallback — the response to either resume the task or reach a safe stopped state — is triggered by two things on equal footing: a system failure, or exit from the Operational Design Domain. Leaving the envelope is not an error state the system stumbles into. It is treated with exactly the same gravity as a component failing. The fallback's terminus is the minimal risk condition: a stable, stopped condition. The 2021 revision tightened the wording to specify a stopped condition where earlier editions said merely "a condition." Out-of-envelope is not a failure to be patched over; it is a transition the system is supposed to make.

The six levels are the most quoted and least understood part of the standard. They are read as a power ladder — Level 4 more capable than Level 2, Level 5 the top — and that reading is the capability fallacy in its purest form. The levels are an allocation of the driving task paired with how restricted the envelope is. A higher level means a wider envelope and more of the task allocated to the machine; it means nothing about raw capability. The decisive discontinuity, the seam between Level 2 and Level 3, is a change in who performs the task — at Level 2 the human supervises and is responsible for monitoring; at Level 3 the system performs the entire task when engaged. A Level 2 system that steers and brakes beautifully is still a Level 2 system. Confusing that allocation seam for a quality boundary is exactly the error the next section turns on. Only Level 5 — the unlimited-envelope level, operating under all driver-manageable conditions — has no edge to detect. Every real system short of that is defined by the discipline of detecting its edge and stopping at it.

Section 03

Williston: a non-defective system, used outside its envelope.

The cost of confusing capability for envelope was paid on 7 May 2016, near Williston, Florida, when a Tesla Model S travelling on a divided highway struck a tractor-semitrailer turning left across its path. The driver, with Autopilot engaged, died; the truck driver was not injured. The NTSB investigation characterized the system precisely: a Level 2 automated driving system, under which the human is responsible for monitoring the driving environment. The truck was at-grade cross traffic on a divided highway — the kind of intersecting traffic a highway-driving assistant is not designed to resolve. The investigation built an entire section around that category mismatch, titled "Operational Design Domains for Level 2 Vehicle Automation."

Here is the part that separates an envelope discipline from a reliability discipline: the system operated as designed. NHTSA's parallel defects investigation identified no defects in the design or performance of the automatic emergency braking or Autopilot systems, and no incident in which they failed to perform as designed. The car was not broken. The NTSB's findings made the envelope edge concrete: the system was not designed to, and did not, identify the crossing truck or recognize the impending crash. The crossing truck was simply outside the envelope. The system did exactly what it was built to do, which did not include seeing it.

The shape of the failure is the whole lesson: a non-defective system, operating exactly as designed, was permitted to run outside the envelope it was validated for — with no mechanism to detect the edge and stop, and a human who assumed a competence the system did not have. The probable cause named the truck driver's failure to yield and the car driver's inattention from overreliance on the automation — and, contributing to that overreliance, an operational design that permitted use inconsistent with the manufacturer's guidance. The sharper envelope-limiting language lived in the findings: if automated control systems do not restrict their own operation to the conditions they were designed for, the risk of misuse remains. The board's recommendations carried it into action, directing manufacturers to incorporate safeguards that limit operation to the conditions for which the systems were designed. The correction the field made was not to make the cars more capable. It was to demand that systems restrict themselves to their envelope — and that they detect the edge while moving.

That detection is the active discipline: runtime monitoring against the conditions observed right now, not the conditions assumed at the start of the trip. It is the road-going twin of the diversion-airport failure — committing past the point of safe return on a forecast rather than an observation, with the designed fallback already foreclosed. The difference at Williston is only that the fallback was a human who had been allowed to stop being one.

Section 04

The AI translation: the (agent, task-class) envelope.

The translation to AI agents is one move. The Operational Design Domain — the operating conditions a system is specifically designed to function within — becomes the (agent, task-class) effect-surface envelope: the bounded set of consequence conditions — reversibility, scope, consequence — within which a given agent on a given task-class has been validated to operate. The governance unit is never "the agent" in the abstract, exactly as a car's autonomy is never stated without its envelope. It is always the agent operating within a specific, bounded, validated envelope of effect. "How capable is the agent" is the wrong axis. The right axis is "what is the validated envelope, and is this action inside it."

Runtime envelope monitoring becomes a runtime gate on the observed effect surface — never the agent's self-report. The gate reads an action's observed surface and computes a consequence tier as the maximum across reversibility, scope, and consequence, for any action with an external effect. The gate is the runtime monitor: per action, it checks whether the observed surface is inside the validated envelope. Williston is precisely an assumed-competence operation past the observed edge, and the gate is the mechanism whose absence made Williston possible.

Out-of-envelope becomes a default to the highest tier. The correct behavior on exit is to not operate autonomously — to perform the fallback toward a stop. The AI form is the default-to-highest rule: when any axis of the effect surface is undetermined, set expected_ct := 4 — treat an unknown reversibility as irreversible, an unknown scope as public, an unknown consequence as high — and refuse or escalate. Undetermined surface, assume the worst tier. The agent never gets to assert it is inside its envelope; the gate observes whether it is, and where the gate cannot tell, the action is treated as out-of-envelope and held at the strictest tier. This is the same instinct the selective-prediction literature formalizes: when the evidence supporting an assertion falls below threshold, the sound move is to abstain rather than to commit and hope.

Define the envelope before granting autonomy; detect its edge at runtime against the observed surface, not the agent's self-report; and outside it, default to the strictest tier and refuse. The agent does not get to certify that it is inside its own envelope.

The learnable rule

This stack does not stand alone. Inside a working envelope, a correct system can still produce the wrong answer — the no-fault functional-insufficiency hazard the SOTIF companion governs; the envelope is necessary but not sufficient. And the act of stopping itself — the minimal risk condition — is the subject of the fallback companion, which argues the fallback is not a degraded mode but the primary safety feature, the road-going equivalent of the runway that lets the aircraft fly the direct route. Three papers, one discipline: define the envelope, govern the no-fault failure inside it, and make the exit safe.

Section 05

The envelope enables autonomy — it does not cap it.

The most counter-intuitive payoff is that a validated, monitored envelope is not a restriction on autonomy. It is the precondition for granting wider autonomy responsibly. The instinct runs the other way — surely a system hemmed in by a declared envelope, required to stop at its edge, can do less than one allowed to press on. The instinct is exactly backwards, and it is the same reason a stricter aviation safe-harbour requirement, rigorously discharged, is what let twin-engine aircraft fly the direct ocean routes they had been forced to dogleg around. The discipline did not make the aircraft fly less; it made them flyable farther, accountably.

Consider the two systems that lack the discipline. The first is over-gated into uselessness: because no one can certify where its envelope ends, a human must approve nearly everything, since nothing can be trusted to stop on its own. The second is the Williston shape — a capable system, operating exactly as designed, permitted to run past an edge it cannot detect. Neither can be granted wide autonomy honestly. The first is too expensive to be autonomous; the second is too dangerous to be. The envelope is what dissolves that false choice.

The disciplined system can be trusted to operate unsupervised precisely because it will stop at its edge. Validating the envelope is what converts a capable agent into autonomy you can actually grant.

The payoff

An agent whose (agent, task-class) envelope is validated and runtime-monitored is one you can let run longer, touch more consequential actions, and operate with less human gating — not despite the envelope but because of it. The envelope can then be widened, deliberately and with evidence, exactly as aviation tiers were widened only after demonstrated reliability data justified each rung. The discipline does not buy autonomy by making the system more capable; it buys autonomy by certifying the conditions under which the existing capability is safe to grant. The organization that refuses to specify envelopes is the one stuck choosing between uselessly over-gated agents and dangerously ungated ones. The envelope is not the cap on autonomy. It is the validated ground on which you are allowed to stand.

The in-depth companion develops the full argument: the precise J3016 definitions and the four words that carry the doctrine, the complete anatomy of Williston and its observed-versus-assumed failure mode, the runtime-monitoring discipline and how the field measures the rate at which the edge is hit, the full mapping to the gate-derived tiering and the OBL-TIER-001 conformance obligation, and the honest limits of each commitment. Read it at Autonomy Is an Envelope, Not a Capability: The Operational Design Domain for Autonomous Systems .

End of paper

↑ Back to top

Envelope, Not Capability

Context

The Finding

The rule is the envelope, not the car.

The standard makes leaving the envelope a designed-for transition.

Williston: a non-defective system, used outside its envelope.

The AI translation: the (agent, task-class) envelope.

The envelope enables autonomy — it does not cap it.