Skip to main content
kellerai.blog

Intended Use Is the Envelope

A regulatory clearance authorizes one use, not a model — and renders every other use unvalidated by construction.

KellerAI White Paper · Engineering Discipline & Verification · Jun 2026

Context

A regulatory clearance is read as a certificate that a model is safe and good. It is the opposite: a regulator authorizes one use, on one population, for one decision — and renders every other use unvalidated by the literal construction of the authorization. The first autonomous diagnostic AI was licensed because its envelope was narrow, not because its model was trusted.

The Finding

Off-label autonomous action is unvalidated by definition, not merely under-tested. The operator's deliverable is the envelope — define the indication, detect off-label and out-of-distribution input, and refuse outside it — not the model.

Tags:
AI GovernanceIntended UseFDA SaMDAutonomous AgentsConsequence TieringClinical AI
Paper Details
CategoryEngineering Discipline & Verification
AudienceEngineering and governance leads deploying autonomous AI agents in regulated or high-consequence settings.
MethodDoctrine read-across — FDA SaMD intended-use scope mapped to consequence-tiered autonomous-agent governance.
Length~2,000 · 8 min
Sections5
DateJun 2026
AuthorsKellerAI
Read the full paper
Section 01

The Model Is Never the Thing That Is Cleared

The unit of FDA authorization is not the software. It is the indication for use — the short statement, recorded on a standard form, that names what a device is for, in whom, and under what conditions. The statement is brief by design, because it is load-bearing: it is the legal boundary of the authorization. What a device is authorized to do is what its labeling claims it does, and nothing else. A use the labeling does not claim is not an authorized use. It is off-label, and off-label is a term of art for outside the envelope the regulator examined.

This matters more for software than for a scalpel, because software invites the opposite intuition. A scalpel does one thing; no one imagines that clearing it to incise tissue also clears it to perform anesthesia. But a trained model is general by construction — the same weights that read a retinal image can be prompted to read a chest film or triage a symptom list — and the generality seduces the operator into reading a clearance as a competence certificate. The FDA's framework for software as a medical device refuses that reading at the root: a SaMD is defined by the medical purpose it serves, not by the code it contains. A change of purpose is a change of device. The same binary, pointed at a different decision, is a different regulated article requiring its own evaluation. The framework even makes the grade granular — crossing the significance of the information to the decision against the seriousness of the situation — so the category is a property of the use, not the model, and moving the model to a new decision moves it to a new category whose evaluation it has not earned.

A clearance does not certify that a model is good. It authorizes one use, on one population, for one decision — and renders every other use unvalidated by construction. The envelope is the authorization; the model is merely what runs inside it.

The inversion
Section 02

IDx-DR: Autonomy Licensed Because the Envelope Was Narrow

In 2018 the FDA permitted marketing of IDx-DR, the first device authorized to use artificial intelligence to autonomously detect a condition — to deliver a diagnostic result that does not require a specialist to interpret the underlying image. It is the landmark case for autonomous diagnostic AI, and the temptation is to read it as a triumph of the model. The correct reading is the opposite: the autonomy was licensed not because the model was powerful but because the envelope was narrow.

The cleared indication is a single sentence doing an enormous amount of work. IDx-DR was indicated to detect more-than-mild diabetic retinopathy — a specific severity threshold, not retinopathy in general — in adults with diabetes not previously diagnosed with retinopathy, a precisely bounded population, using one named fundus camera, in the primary-care setting where that question is screened. Each clause is a wall of the envelope. One disease, at one threshold, in one population, on one camera. The system was not cleared to read retinas. It was cleared to answer a single binary question. The pivotal trial that licensed it reported its sensitivity and specificity on that population, that camera, that decision — numbers that are a property of the use, not the model. They say nothing about how the same software would perform reading a different camera, screening a child, or detecting a different pathology, and the authorization makes no claim that it would.

IDx-DR was permitted to act without a human reading the image not despite its narrow indication but because of it. The walls of the envelope — one disease, one threshold, one population, one camera — are what made the unattended decision defensible. Widen any wall and the evidence that licensed the autonomy no longer reaches.

Why the autonomy was grantable

The most instructive feature of the design is what the system does at the edge of its envelope. IDx-DR does not always answer. When it detects retinopathy it does not treat or prescribe; it refers the patient to an eye care professional — it hands the consequential decision up to a human. When the image quality is insufficient for confidence, it does not guess; it declines to return a result and routes the patient to imaging it can stand behind. The off-envelope behavior is defined: refer-or-refuse, both forms of the same move — abstain from the autonomous action and escalate. The autonomy and the abstention are not in tension. The defined abstention is the precondition of the autonomy.

Section 03

Off-Label Autonomy Is Unvalidated by Definition

The phrase doing the heaviest lifting here is by definition. The claim is not merely that off-label autonomous action is risky or under-tested — both true but weaker. The claim is that it is unvalidated by the construction of the authorization itself. A clearance is a statement of the form: this use, on this population, for this decision, supported by this evidence, is authorized. An action outside the indication is not an action the statement permits-but-warns-about; it is an action the statement does not address, because the evidence it rests on was never gathered there. Validation is a relation between a use and the evidence for that use. Where there is no evidence for the use, there is no validation — not weak validation, no validation.

Off-label use is a familiar and often legitimate clinical practice: a licensed physician may prescribe an approved drug for an indication the label does not carry, exercising judgment and accepting liability for the departure. But what makes that legitimate is a credentialed human in the loop, deciding deliberately to act beyond the evidence and owning the consequence. Transpose it to an autonomous agent and that human is gone. An agent acting off-label is not a physician exercising judgment beyond the label; it is the label acting on its own, outside the only evidence that ever authorized it, with no one having decided to make the departure — which is the precise sense in which autonomous off-label action is categorically worse than its clinical namesake.

This is why the autonomy of IDx-DR was confinable and the autonomy of a general agent is not, absent a deliberately constructed envelope. A single decision is small enough to trial. A general-purpose model has no such envelope by default. It will answer any question, read any image, take any action its tools permit — and for the overwhelming majority of those uses there is no trial, no measured performance, no evidence at all. To grant such a model autonomy on a use is therefore to grant autonomy off-label by default: outside any indication for which evidence exists. The generality that makes the model useful is the same property that makes its autonomy unvalidated everywhere the operator has not done the work of drawing and evidencing an envelope.

Section 04

The AI Mapping: Tiering Bounded by the Effect Surface

The clinical envelope translates into AI agent governance through a concrete unit, and the translation should be mechanical rather than metaphorical. Every action an agent proposes is assigned a Consequence Tier on a lattice from CT0 to CT4, computed by an out-of-process gate from the action's observed effect surface — its actual reversibility, scope, and consequence — never from the agent's self-report. The effect surface is the software analogue of the indication: the bounded region of real-world consequence the action actually touches, and the thing the governance must be scaled to.

The load-bearing rule is what the gate does when the effect surface is undetermined, and it is the precise mechanism that encodes "off-label is unvalidated." The obligation OBL-TIER-001 — tier derivation takes the tier as the maximum over the reversibility, scope, and consequence ranks, and defaults the expected tier to CT4 the moment any axis is undetermined. An unknown reversibility is treated as irreversible, an unknown scope as public, an unknown consequence as high. Map that onto the clinical frame and it is the digital form of refer-or-refuse: when the agent cannot confirm that an action falls inside an evidenced envelope, it does not assume the action benign and proceed. It assumes the strictest tier and escalates. The default is not optimism; the default is the highest consequence, exactly as an off-label use is presumed unvalidated rather than presumed fine.

When the agent cannot place an action inside an evidenced envelope, the gate does not give it the benefit of the doubt. It assigns the strictest tier and demands a human. Default-to-highest is the software form of treating the off-label action as unvalidated — because it is.

The default is the discipline

The second obligation governs the input side, and it closes the route by which an envelope is most easily breached without anyone noticing. OBL-INP-001 — out-of-distribution input raises the tier or blocks requires that any action driven by untrusted or out-of-distribution input — web content, inbound email, a population the system was not calibrated on — is gated at the higher tier or blocked, with the input's trust status written to the trace. An out-of-distribution input is the software version of a patient outside the validated population — the chest film handed to the retinal model. IDx-DR's insufficient-image-quality refusal is an out-of-distribution detector wired to abstention; OBL-INP-001 generalizes that reflex, so the boundary crossing is auditable rather than silent.

Two further units complete the mapping. The governance unit is not the model, nor even the agent, but the (agent, task-class) pair: this agent, on this class of task, is evidenced to a certain tier; the same agent on a different task-class is a different governance object with its own evidence requirement, exactly as the same model on a different indication is a different regulated device. And the envelope is revocable — granted on evidence, held only while the evidence holds, and revoked when drift, distribution shift, or a post-market signal pulls the measured behavior outside it.

Section 05

The Operator Posture: Define, Detect, Refuse

The discipline this brief asks of an operator is a posture adopted before the agent acts, not a review conducted after, and it has three moves. The first is to define the indication. Before granting an agent any autonomous authority on a task class, write down the envelope: the population of inputs it is validated for, the decisions it is permitted to make unattended, the tier each sits at, and the evidence that supports the grant. An agent with no written indication is not a broadly-capable agent; it is an agent operating off-label everywhere, because there is no envelope for any of its actions to be inside.

The second move is to detect off-label and out-of-distribution. The envelope is worthless if the system cannot tell when it is leaving it. The gate must compute, per action, whether the action falls inside an evidenced indication — and when it cannot establish that, default to the strictest tier rather than assume the action benign. A system that cannot recognize an input it was not validated for will act confidently on exactly the inputs where it is most likely to be silently wrong.

The third move is to refuse outside the envelope. Detection without refusal is theater. When an action falls outside the evidenced indication, the system must do what IDx-DR does at its boundary — refer to a human or decline to act — rather than proceed on the borrowed authority of a clearance that does not reach the action. The refusal is not a degradation of the autonomy; it is the condition that makes the autonomy grantable in the first place.

Define the indication; detect off-label and out-of-distribution; refuse outside it. The envelope is the deliverable, not the model.

The engineering posture before acting

A model is never cleared; a use is. The instinct to read an authorization as a competence certificate for the software is the same instinct that reads an impressive demo as a license for autonomous deployment, and it fails for the same reason: it mistakes a property of one bounded use for a property of the general system. The first autonomous diagnostic AI was licensed not because its model was trusted but because its envelope was narrow and its exits were defined — and that is the entire transferable lesson. Build the envelope. Evidence it. Detect departure from it. Refuse outside it.

This brief is the short version. Intended Use Is the Envelope — in depth carries the full FDA SaMD and IDx-DR record, the consequence-tier machinery, the statutory assistive/autonomous line, and the citations. It is Article 1 of a three-article clinical governance stack; read it with Risk Is Measured in Harm, Not Accuracy and The Clinician Is the Diversion Airport .

End of brief

↑ Back to top