Skip to main content
kellerai.blog

The Clinician Is the Diversion Airport

Clinical-AI autonomy is not the removal of the clinician — it is the guaranteed-reachable fallback that licenses the autonomy.

KellerAI White Paper · In-Depth · Engineering Discipline & Verification · Jun 2026 · ~27 min read

Context

Clinical-AI autonomy is misread as removing the clinician — engineering the human out once the model is accurate enough. The reality runs the opposite way: autonomy is licensed by a guaranteed-reachable fallback for every action above a harm threshold, and by proof of safety produced continuously after deployment rather than asserted once at clearance. It is aviation's ETOPS diversion-airport doctrine, ported to the clinic.

The Finding

The fallback is what licenses the autonomy. Keep a clinician reachable for high-harm actions, make the append-only decision trace your post-market surveillance, and contract authority the moment harm signals rise.

Tags:
Clinical AI GovernanceHuman OversightPost-Market SurveillanceFDA PCCPISO 14971Consequence-Scaled Autonomy
Cite this paper

KellerAI. (2026, June 21). The Clinician Is the Diversion Airport: Human Oversight, Post-Market Surveillance, and the PCCP. KellerAI. https://kellerai.blog/the-clinician-is-the-diversion-airport-in-depth

Paper Details
CategoryEngineering Discipline & Verification
AudienceClinical-AI engineering leads, medical-device regulatory and quality teams, and health-system AI governance owners.
MethodDoctrine read-across — FDA CDS/PCCP, ISO 14971 post-market surveillance, and IEC 62304 lifecycle mapped to human-reachable, continuously-monitored autonomous-agent governance, with primary-source citations.
Length~6,500 · ~27 min
Reading levelTechnical
Sections9
References23
Versionv1.0 · Updated Jun 2026
PublishedJun 2026
Key Takeaways
  • Autonomy in medicine is not the removal of the clinician but the guaranteed reachability of one for every action above a harm threshold — the clinician is the ETOPS diversion airport, the pre-positioned fallback whose reachability is what licenses the autonomy in between.
  • Safety is a lifecycle obligation, not a launch gate: ISO 14971's post-production loop, 21 CFR Part 803 / MAUDE adverse-event reporting, and the append-only decision trace (OBL-TRC-001) are one continuous-evidence substrate, because a model validated at clearance degrades silently under dataset shift.
  • The FDA's December 2024 Predetermined Change Control Plan pre-bounds, monitors, and revokes the authority to change a model rather than re-asserting it at every update — the regulatory twin of a runtime envelope that contracts automatically (OBL-AGG-001) the moment the measured escape rate rises.
Related
Placeholder — pending analytics
Section 01

The Fallback Is What Licenses the Autonomy

Start with the aviation doctrine, because it states the principle in its purest form. A twin-engine airliner is not permitted to fly a transoceanic route because a regulator judged it broadly safe. It is permitted because, at every point on the route, an adequate alternate airport stays within a defined diversion budget — the maximum time the aircraft could be from a runway with a single engine inoperative, in still air. 11 The tier names, ETOPS-120 through ETOPS-370, are exactly those budgets in minutes. The autonomy to fly far from land is sold by the minute, and the price is the guaranteed reachability of the fallback. Take the diversion airport away and the autonomy evaporates with it: the same aircraft, the same engines, the same crew, are barred from the route the instant no runway is reachable within budget. The fallback does not merely improve the operation. It is the thing that authorizes it.

A companion paper in this series develops that doctrine — the always-a-runway discipline — for AI autonomy in domains where being confidently wrong is expensive. 20 This paper is its clinical instantiation, and the substitution is exact. In medicine the diversion airport is a reachable clinician, or more generally a reachable human escalation path, for any action whose consequence crosses a harm threshold. An autonomous clinical system is licensed to act on its own precisely as far as a qualified human stays reachable when its recommendation reaches the harm tier where a wrong call cannot be cheaply undone. The envelope of unsupervised action is bounded by the reachability of the fallback, exactly as the over-water envelope is bounded by the reachability of the runway.

This reframes the human-in-the-loop debate. The question is not whether a clinician is involved at all — a binary that invites the lazy answer of removing the human as the model improves. The question is the same one aviation asks of the runway: is the fallback reachable when it is needed, and at what consequence tier does it become mandatory? A read-only surfacing of a suggestion the clinician will weigh before acting needs no standing escalation, because the human is already the actor. An irreversible, high-consequence action — a dosing instruction that will be administered, a stop-treatment call — needs the clinician reachable as a hard precondition, because the action crosses the threshold past which the model's error cannot be recalled. Between those poles sits a lattice of consequence, and the reachability requirement scales up it.

A twin flies far from land only because a runway stays reachable the whole way. A clinical AI acts on its own only as far as a clinician stays reachable when the action crosses the harm tier. The fallback is not the limit on autonomy — it is the license for it.

The diversion-airport doctrine, ported

The two papers that precede this one in the stack supply the bounds the license sits inside. The first establishes that a model's intended use is its operating envelope — the indication inside which any safety claim is even meaningful, and outside which the system has no business acting. 19 The second establishes the governing variable inside that envelope: not accuracy, but harm-weighted integrity — the bounded rate at which a wrong claim reaches a decision-maker without a warning, weighted by what the wrong claim costs. 18 This paper takes the bounded envelope and the harm-weighted variable as given and asks the operational question they leave open: what makes that bounded autonomy continuously defensible once the system is in production? The answer has two halves — a reachable fallback for high-harm actions, and continuous proof of safety after deployment — and the rest of the paper is those two halves and the regulatory machinery that already implements both.

Section 02

The Carve-Out That Already Encodes Reachability

Congress wrote the reachable-clinician principle into law before the current wave of clinical AI existed, and it is worth reading the statute closely because it draws the exact boundary this paper argues for. Section 3060 of the 21st Century Cures Act, enacted in December 2016, amended the Federal Food, Drug, and Cosmetic Act to add §520(o)(1)(E), which excludes certain clinical decision support software from the definition of a medical device — and therefore from device regulation — provided it meets four criteria. 3 The first three concern what the software consumes and produces: it must not acquire or analyze a medical image or a signal from a diagnostic device; it must display or analyze medical information normally communicated between clinicians; and it must offer recommendations to a healthcare professional rather than a directive output. 3 The fourth criterion is the one that matters here, and it is precisely the reachability principle in statutory form.

Non-device CDS must provide the healthcare professional with sufficient information about the basis of its recommendation that the professional can independently review it, and must not be intended for the professional to rely primarily on the recommendation in making a clinical decision. 3 4 Read that as an autonomy boundary. Software that explains itself well enough for a clinician to check the reasoning — and that is positioned as an input the clinician weighs rather than a verdict the clinician defers to — stays on the assistive side of the line and outside device regulation. The instant the software makes the call, or makes its basis unreviewable so that the clinician must rely primarily on it, it crosses into being a regulated device. The FDA's final Clinical Decision Support Software guidance, issued in September 2022, interprets exactly this fourth criterion: the independent-review requirement is what separates a tool a clinician supervises from a tool that supervises the clinician. 4

The structural insight is that the law has already located the diversion airport. The reviewable basis is the reachability of the human override. A recommendation whose rationale a clinician can independently inspect keeps the clinician in the position to divert — to reject, to modify, to escalate — because the clinician retains the information needed to land the decision somewhere other than where the model pointed. A recommendation the clinician must take on faith has foreclosed the diversion: there is no runway, because there is no basis on which to choose a different one. The Cures Act carve-out, in other words, regulates exactly the property this paper calls reachability, and it does so at exactly the right boundary — the line past which the human can no longer independently arrive at a different answer.

This maps cleanly onto the consequence lattice the framework paper in this stack defines. 17 Assistive CDS with a reviewable basis lives in the lower tiers, where the human is the actor and the model is an input; autonomous diagnosis lives at the top, where the model's output is the decision and the law treats it as a device subject to the full risk apparatus. The carve-out is the regulatory expression of consequence-scaled oversight: the more the software substitutes for the clinician's judgment rather than informing it, the more regulation attaches — because the fallback has receded. The shape of the deployed field reflects this boundary: of the many hundreds of AI-enabled devices the FDA has authorized, the overwhelming majority were cleared as assistive software through the 510(k) pathway rather than as autonomous diagnostic devices — the market has clustered, so far, on the side of the line where the clinician stays the decider. 23 An engineering team building clinical AI can read the four criteria as a design specification, not merely a classification test: keep the basis reviewable, keep the clinician the decider above the harm threshold, and the autonomy you ship is autonomy with a reachable runway built into its interface.

A recommendation a clinician can independently check leaves the clinician able to divert. A recommendation taken on faith has foreclosed the diversion. The Cures Act drew the device boundary at exactly the line past which the human can no longer reach a different answer.

The reviewable basis is the runway

One honest qualification belongs here, because the override has to be real and not nominal. A clinician who is technically able to review a recommendation but who, in practice, reflexively dismisses or rubber-stamps it has a fallback in form but not in function. The literature is blunt about this failure mode: a pooled analysis of drug-drug-interaction alerts found physicians overriding ninety percent of them, with acceptance falling further with each additional alert in an encounter. 15 A diversion airport that exists on the chart but is fogged in is not a diversion airport. Designing for genuine reachability means designing against alert fatigue and automation bias as much as for the presence of a review step — a point Section 6 returns to, because it is where the clinical fallback most often fails silently.

Section 03

Safety Is a Lifecycle, Not a Launch Gate

The second half of the license is continuous proof, and the device world codified it long before clinical AI arrived. ISO 14971, now in its 2019 third edition, is the standard that governs risk management for every device sold into a regulated market, and its most underappreciated feature is that it does not stop at clearance. 5 The standard mandates a production and post-production phase: once a device is on the market, the manufacturer must actively collect information from the field — adverse events, performance data, the actual experience of use — and feed it back into the risk-management file, re-evaluating whether the risks estimated before launch still hold and whether new ones have appeared. 5 Risk management under ISO 14971 is a closed loop that runs for the life of the device, not a dossier assembled once and filed.

This is the conceptual core of the diversion-airport doctrine applied to evidence rather than to escalation. ETOPS approval is not a certificate earned once; it is contingent on a continuing reporting system that tracks the world fleet's reliability on a rolling basis and can contract the authority if the measured failure rate drifts upward. 1120 ISO 14971's post-production loop is the same instrument: a standing obligation to keep watching, with the authority to revise the safety case when the field data diverges from the launch assumptions. A clinical AI whose risk file was completed at clearance and never reopened is the medical equivalent of an airline that flew its ETOPS hours once, framed the certificate, and stopped tracking the fleet rate. The certificate is not the safety. The continuing measurement is.

The companion process standard supplies the change-control machinery the loop needs. IEC 62304, the medical-device software lifecycle standard, defines both a software maintenance process and a problem-resolution process: every change to released software is handled under a controlled procedure with documentation, verification, and configuration management, so that a fix or an update cannot silently alter the safety posture of the deployed system. 6 For conventional software this is housekeeping. For a learning system whose behavior can shift with a retrain, a re-prompt, or a swapped dependency, it is the backbone that makes a change auditable rather than invisible — and it is exactly the backbone the Predetermined Change Control Plan builds on, as Section 5 develops.

The reframe ISO 14971 forces onto clinical AI is the one this stack's second paper drew for the variable being measured, now extended along the time axis. 18 A model's safety is not a property established at validation and thereafter assumed; it is a measured quantity that decays. The mechanism of decay has a name in the clinical literature — dataset shift, the gradual divergence between the population a model was validated on and the population it now serves, as case mix, documentation practice, and care patterns move underneath it. 12 A model validated at one health system can degrade silently at another, or at the same one a year later, with not a single weight changing. 12 ISO 14971's post-production loop exists precisely because safety established at a point in time does not stay established, and the only defense is to keep measuring. The launch gate proves the system was safe once. The lifecycle proves it still is.

Section 04

The Continuous Evidence Stream

A lifecycle obligation is empty without a mechanism to feed it, and the device world built one decades ago. Title 21 of the Code of Federal Regulations, Part 803 — Medical Device Reporting — mandates that manufacturers, importers, and device user facilities report adverse events: deaths and serious injuries to which a device may have contributed, and malfunctions that would be likely to cause death or serious injury if they recurred, on defined timelines. 7 Those reports flow into the Manufacturer and User Facility Device Experience database — MAUDE — the public repository that turns scattered field incidents into a queryable, standing record of how deployed devices actually fail. 8 MDR and MAUDE are the post-market surveillance plumbing: the continuous evidence stream that the ISO 14971 lifecycle loop consumes.

The structural lesson for clinical AI is that post-market surveillance requires a substrate — an append-only, queryable record of what the system did and what happened next — and that this substrate is an engineering artifact, not a compliance afterthought. The framework paper in this stack names exactly such a substrate as a cross-cutting obligation: OBL-TRC-001 — the append-only, tamper-evident decision trace.17 Every consequential action an agent commits is written to an append-only sink the actor cannot rewrite, gated out-of-process so the system cannot suppress its own record, with the evidence bundled and signed. 17 Read MDR and MAUDE as the regulatory precedent and OBL-TRC-001 as the runtime mechanism: both exist so that the question how is this system failing in the field? has an evidentiary answer rather than an anecdotal one.

The decision trace is, in the clinical setting, the post-market surveillance database for the AI itself. Where MAUDE records device adverse events after a human files a report, the decision trace records every action at the moment it is committed — the tier the gate assigned, the verifier that checked it, the verdict, the evidence, and whether a human approved it. 17 That makes two things possible that a passive reporting system alone cannot. First, it makes the escape rate measurable: the rate at which a wrong, consequential output reached a decision-maker without a warning can be computed from the trace and tracked over time, which is the harm-weighted integrity variable the second paper insisted on, now instrumented. 18 Second, it makes the surveillance active rather than reactive — MAUDE's own documented limitation is that it is a passive system with reporting biases, capturing only what someone chose to file 8 — whereas a trace captures every action by construction, including the ones no one would have thought to report.

ISO 14971 demands a post-market loop. MDR and MAUDE feed it for devices. For a clinical AI the substrate is the append-only decision trace — OBL-TRC-001 — which makes the field failure rate a measured number instead of a filed anecdote.

Surveillance needs a substrate

There is a reason the trace must be append-only and out of the system's own reach, and it is the same reason MAUDE is a public external database rather than a log the manufacturer keeps privately. A surveillance record the watched system can edit is not surveillance; it is a system grading its own field performance. The framework paper makes enforcement-plane integrity a hard requirement for exactly this reason: a trace asserting an in-process or unsigned gate is non-conformant regardless of its verdict, because a record the actor could have tampered with is no record at all. 17 The independence that bank supervision calls effective challenge and that ISO 14971 implies in its demand for an auditable file is, in the runtime, the property that the evidence stream is produced by a mechanism the agent cannot suppress. 21 Continuous proof of safety is only proof if the proof cannot be quietly rewritten by the thing it indicts.

Section 05

The PCCP: Pre-Authorized, Bounded, Monitored Change

The hardest problem in governing a learning system is change. A model that is validated once and frozen is governable by the classical apparatus, but the entire value of a learning system is that it improves — and every improvement is a change that invalidates the validation that preceded it. The naive regulatory answer is to re-clear the device on every update, which is so slow it would either freeze the model or push the changes outside the regulatory perimeter entirely. The FDA's answer, finalized in December 2024, is the centerpiece of this paper because it is the exact mechanism the diversion-airport doctrine implies: the Predetermined Change Control Plan.1

The PCCP guidance — its full title is Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence-Enabled Device Software Functions, issued December 4, 2024 — lets a manufacturer specify, in advance and as part of the original authorization, the modifications it intends to make to a model after clearance, together with the protocol by which those modifications will be developed, validated, and rolled out, and an assessment of their impact. 1 A PCCP has three required components: a Description of Modifications that bounds what may change; a Modification Protocol that fixes how each change will be made and verified; and an Impact Assessment that analyzes the risk the changes introduce. 1 The components are the lineal descendants of the SaMD Pre-Specifications and Algorithm Change Protocol the FDA floated in its 2019 discussion paper and 2021 AI/ML Action Plan — the Description of Modifications and Modification Protocol are the renamed and matured SPS and ACP. 2 The effect is decisive: changes that fall inside the authorized PCCP can be implemented without a new marketing submission, while changes outside it still require one. 1

Read the structure carefully, because it is the same object as the ETOPS envelope and the LAAS consequence lattice. The authority to change the model is pre-bounded: the Description of Modifications draws the envelope of permitted change before any change is made. It is monitored: the Modification Protocol specifies the verification each change must pass and the performance it must maintain, and the post-market obligations keep watching. And it contracts: a change that drifts outside the envelope, or a performance signal that breaches the protocol's bounds, forces the change back into full review rather than letting it pass on the pre-authorization. 1 The PCCP does not grant a model the right to change itself however it likes. It grants a bounded, monitored envelope of change and reserves the authority to pull it back. Authority that is pre-authorized, bounded, and revocable on drift is the diversion-airport doctrine applied to the act of modification itself.

The PCCP fixes the envelope of permitted change once, monitors every change against it, and contracts the authority the moment a change or a performance signal leaves the envelope. The right to change the model is pre-authorized and revocable — never re-litigated update by update, never granted without bounds.

Authority pre-bounded, not re-asserted

The PCCP is the regulatory twin of the assurance standard's pre-authorized envelope, and the correspondence runs deeper than analogy. The framework paper describes a standing, pre-authorized operating envelope inside which an agent acts without re-asserting authority at every step, and an out-of-process gate that contracts that envelope as the measured escape rate rises. 17 The obligation that bounds correlated change in that framework — OBL-AGG-001 — aggregation and error-correlation bounding, which requires verifier independence measured by a phi coefficient at or below 0.2 on the upper bound of a 95% confidence interval and contracts the envelope as the measured escape rate climbs — is the runtime expression of exactly the PCCP discipline: a bounded envelope whose authority tightens automatically as the safety signal degrades. 17 The PCCP bounds change at the regulatory layer over a release cadence; OBL-AGG-001 bounds it at the runtime layer over an action stream. Both refuse the two failure modes at the extremes — re-asserting authority on every change, which is unworkable, and granting unbounded authority once, which is unsafe — and both land on the same middle: a pre-specified envelope, continuously monitored, that contracts when the evidence turns.

This is also where post-market surveillance stops being a courtesy and becomes the load-bearing input to the change authority. A PCCP's Modification Protocol is only as good as the field data that tells you whether a change held its performance, and that data is the MDR/MAUDE stream and the decision trace from the previous sections. 7817 The machinery composes into one loop: the envelope of permitted change is fixed in advance; each change is verified against the protocol; the deployed system's performance is surveilled continuously through the trace and the adverse-event stream; and when the surveillance shows drift, the authority contracts and the change returns to review. The diversion airport, the lifecycle risk file, the evidence stream, and the change envelope are not four separate requirements. They are one doctrine seen from four angles.

Section 06

Mapping to Concrete Units

The doctrine is only useful if it reduces to units an engineering team can build. It does, and the framework paper supplies the vocabulary: a consequence lattice from CT0 to CT4, an append-only trace, a bounded and backtested escape rate, and a set of cross-cutting obligations. 17 Mapping the clinical diversion-airport doctrine onto those units is direct, and it makes the abstract requirements of the preceding sections concrete enough to specify.

Human approval at the top tier — OBL-HUM-001, human approval required at CT4. The irreversible, high-consequence clinical action — a dosing instruction that will be administered, a stop-treatment call, a discharge — sits at CT4, the top of the lattice, where the framework requires human approval as a hard precondition before the action commits. 17 This is the reachable clinician made mechanical: the diversion airport is not a hope that someone will review the call but a gate that will not let the call through until a qualified human says yes. The CDS carve-out's fourth criterion and OBL-HUM-001 are the same requirement at two layers — the law says a clinician must be able to independently review the basis; the runtime says the action does not commit until the clinician does. 317

Abstention and rollback — OBL-IRR-001, irreversibility handling. When the reversibility of an action is unknown, the framework treats it as irreversible, and at the edge of the operating envelope the default is to abstain rather than to act. 17 This is the always-a-runway rule encoded as a precondition: a clinical AI that cannot determine whether its recommendation can be taken back does not get to assume it can — it inherits the most conservative tier and abstains, escalating to the clinician. 20 The missing move in the cautionary cases this stack anatomizes was exactly this: a system that emitted a confident output on every input, with no mechanism to decline when its evidence was thin or out of distribution. 9 Abstention is the runway the system keeps for itself — the principled refusal to commit when the certificate is absent and the fallback is the only safe destination.

The decision trace as the surveillance substrate — OBL-TRC-001, the append-only tamper-evident trace. As Section 4 developed, the trace is the post-market surveillance database for the AI, gated out-of-process and written to an append-only sink so the system cannot edit its own field record, with each action's evidence bundled and signed. 17 It is the substrate the ISO 14971 lifecycle loop consumes and the instrument that makes the escape rate a measured quantity rather than an asserted one. 518

The contracting envelope — OBL-AGG-001, aggregation and error-correlation bounding. Verifier independence is enforced by a measured error-correlation — a phi coefficient required at or below 0.2 on the upper bound of a 95% confidence interval, so that a second model wrong in the same way as the first does not count as a check — and the operating envelope contracts automatically as the measured escape rate, or the adverse-event rate, rises. 17 This is the PCCP and the ETOPS reporting rule in runtime form: authority that tightens on its own as the safety signal degrades, with no human needing to notice first. 111

Model and vendor attribution — OBL-VEN-001, vendor and supply-chain attribution. Every action carries the provenance of the model and any third-party tool that produced it; the residual error of a vendor model is charged to the deploying operator's escape-rate budget, and untrusted dependencies fail closed. 17 This is the clinical analogue of the device world's refusal to let a buyer outsource the duty to govern — the deploying hospital owns the risk of the model it runs, regardless of who built it, exactly as bank supervision makes the deploying institution own the risk of a vendor model. 21 When a proprietary model is embedded across hundreds of hospitals, attribution is what makes its field failure rate traceable to a responsible operator rather than diffused into the vendor's marketing claims.

One caution belongs in this mapping, because it is where the clinical fallback fails most often and most quietly. A reachable clinician who is buried under low-yield alerts is not, in function, reachable — the documented ninety-percent override rate on drug-interaction alerts is what happens when the fallback is technically present but practically saturated. 15 The contracting-envelope obligation is part of the defense: a system whose alert volume drives the escape rate up — because the clinician has stopped reading — should see its autonomy contract, not expand, because the reachability it depends on has degraded. The harm-weighted second paper made the same point about alert fatigue as a harm in its own right. 18 Designing the diversion airport means designing the clinician's attention budget, not just the existence of a review step.

Section 07

The Forecast and the Foreclosed Fallback

The doctrine has a precise failure shape, and a named case anatomizes it. On 7 October 2013, a Royal New Zealand Air Force Boeing 757, callsign NZ7571, flew from Christchurch toward Pegasus Field on the Ross Ice Shelf in Antarctica with 130 people aboard. 16 The aircraft could not return to Christchurch without refueling at Pegasus, so a point of safe return was computed before departure — the last position from which it retained the fuel to turn around and reach its origin. The designed fallback was a return to Christchurch, and that fallback had an expiry. As the flight approached the point of safe return, forecasters assured the crew the weather at Pegasus would improve and cleared it to continue. Roughly twenty minutes after the aircraft crossed the point — its fallback now foreclosed by fuel and range — a fog bank enveloped the runway in near-whiteout. 16 The forecast on which the irreversible commitment had been made diverged from the reality the aircraft then had to fly into. The crew flew three approaches and landed safely below minima; the inquiry, AO-2013-009, faulted not their airmanship but the upstream risk assessment, which had committed the flight past an irreversible point on a forecast, with an under-specified set of fallbacks. 16

The structure of the NZ7571 failure is the structure of every clinical-AI failure this doctrine is built to prevent, and it has three conditions. There is an irreversible commit point — the point of safe return, past which the designed fallback was gone. There is a decision to cross it on a forecast — a prediction of improving weather — rather than on an observation. And there is an under-specified fallback set — no alternate the 757 could use, so that once Pegasus fogged in the option space had collapsed to a single runway. 16 Any system that commits irreversibly on a prediction, without a reachable alternative when the prediction fails, inherits this exact risk profile.

A clinical AI that commits an irreversible action — an order placed and administered, a treatment stopped — on the model's predicted assessment of the patient rather than on a verified observation, with no reachable clinician when the prediction is wrong, is NZ7571 rendered in software. The model's confident output is the favorable forecast. The administered order is the crossed point of safe return. The absent or saturated clinician is the foreclosed diversion airport. The corrective is the same one the aviation inquiry implies and the framework paper encodes: abstain at the commit point when the model's assessment cannot be verified against observation and a reachable fallback cannot be confirmed. 1617 OBL-IRR-001 is that corrective stated as a precondition on irreversible action, and the reachable-clinician requirement of OBL-HUM-001 is the guarantee that the diversion airport is not fogged in when the forecast fails. 17

The aircraft was committed past its point of no return on a forecast, with the runway it counted on already gone when the forecast failed. A clinical AI that acts irreversibly on a prediction, with no reachable clinician when the prediction is wrong, has flown the same approach.

The lesson NZ7571 teaches the clinic

The observed-versus-assumed reading is the point. NZ7571's risk assessment assumed an acceptable operation: the forecast would hold, the weather would improve, the fallback would not be needed. The observed reality was a foreclosed fallback and a runway in whiteout, and only the quality of the crew supplied the margin the plan had left out. 16 The cautionary clinical cases this stack examines have the identical shape — a high assumed performance from the development data, an observed field performance that was a different and more dangerous model, and no abstention to catch the gap. 9 1018 The sepsis case sharpens what the foreclosed fallback costs in this domain specifically: each additional hour to antibiotic treatment is associated with higher in-hospital mortality across tens of thousands of patients, so a missed alert is not a deferred inconvenience but a runway that closes by the hour. 22 Post-market surveillance is the discipline of forcing the observed onto the ledger before the foreclosed-fallback day arrives. The risk assessment that flatters the forecast is the one that kills; the one that prices the day the forecast is wrong and the runway is gone is the one that keeps the diversion airport reachable.

Section 08

The Honest Limits

A doctrine that only flattered itself would not survive a clinician's or a regulator's scrutiny, and four limits bound it. The first is that a reachable clinician is a contested resource, not an infinite one. The diversion-airport analogy can mislead if it implies the fallback is always available at no cost: a clinician's attention is finite, and a system that escalates every borderline case to a human does not achieve safe autonomy — it achieves a bottleneck that itself becomes a harm, as deferred decisions queue behind an overwhelmed reviewer. The consequence lattice is the discipline that prevents this, reserving mandatory human approval for the top tier rather than spending it on every action, but drawing the tier boundaries correctly is a clinical judgment the framework cannot make on a hospital's behalf. 17 The doctrine relocates that judgment from a hidden assumption to an explicit, recorded tiering decision; it does not eliminate it.

The second limit is that post-market surveillance is only as good as the signal reaching it, and MAUDE's own design concedes the problem: it is a passive system with reporting biases, capturing what someone chose to file and undercounting what no one recognized. 8 The decision trace improves on this by capturing every action by construction, but it inherits a subtler version of the same limit — it records what the system did, not always what happened to the patient as a result, and closing that loop requires linking the trace to clinical outcomes that arrive later and elsewhere. 1217 A failure mode absent from the surveillance data — a rare presentation, an underrepresented subpopulation — is a tail the monitoring never samples, exactly the NZ7571 gap of an unrepresented fallback failure. 16 Continuous surveillance bounds the risk it can see; it does not abolish the unsampled tail.

The third limit is that the PCCP bounds change, it does not validate a change's safety on its own. A Predetermined Change Control Plan pre-authorizes an envelope of modification, but the Modification Protocol inside it is only as rigorous as the validation it specifies, and an under-specified protocol can pre-authorize a change that should have triggered full review. 1 The envelope is a governance instrument, not a correctness guarantee; drawing it too wide is the failure mode, and the discipline of drawing it correctly is the work the guidance leaves to the manufacturer. The runtime analogue carries the same caveat — a contracting envelope is only as protective as the escape-rate measurement that drives the contraction, and a backtest on an unrepresentative set will contract too late. 17

The fourth limit is jurisdictional and worth stating plainly so the stack is not read as overclaiming. The regimes assembled here — the Cures Act CDS carve-out, ISO 14971, IEC 62304, 21 CFR Part 803, the PCCP, and the EU AI Act's human-oversight and post-market articles — are real and binding, but they bind devices and their manufacturers under specific authorities and thresholds, and not every clinical AI is a regulated device. 313 The EU AI Act classifies AI that is a medical device or its safety component as high-risk and attaches human-oversight and post-market monitoring obligations, with the medical-device provisions phasing in through 2027 14 — but an assistive tool sitting inside the CDS carve-out may face a lighter regulatory touch than the harm it can cause would justify. The argument of this paper is therefore an engineering argument, not merely a compliance one: the diversion-airport doctrine is the right design whether or not a given system is captured by a given rule, because the fallback is what licenses the autonomy regardless of which regulator is watching. The regulations are evidence that mature fields converged on the doctrine; they are not the whole of the case for it.

These limits do not weaken the doctrine; they locate it. The reachable-clinician-and-continuous-proof view does not claim to make a clinical AI infallible. It claims to put the autonomy on a ledger — a fallback whose reachability is guaranteed at the harm tiers that need it, a safety case that is re-proven continuously rather than assumed from clearance, an evidence stream the system cannot suppress, and a change envelope that is bounded and revocable. The residual is relocated to the specification, where it is visible and fixable, rather than hidden in the confident assumption that an accurate model needs no runway.

Section 09

The Engineering Posture Before Acting

Before deploying an autonomous clinical AI, the operator's posture should be set by the diversion-airport doctrine, and it collapses to three commitments. Keep the clinician reachable for high-harm actions. Tier every action by consequence, make a qualified human a hard precondition at the irreversible, high-consequence top of the lattice, and treat unknown reversibility as irreversible — OBL-HUM-001 and OBL-IRR-001, the reachable runway and the principled abstention. 17 The Cures Act already drew this boundary in law: keep the basis of every recommendation reviewable, so the clinician can always reach a different answer. 3 Reachability has to be genuine, not nominal — design against alert fatigue, because a saturated clinician is a fogged-in runway. 15

Make the decision trace your post-market surveillance. Write every consequential action to an append-only, tamper-evident record the system cannot rewrite, gated out-of-process — OBL-TRC-001 — and use it as the substrate that the ISO 14971 lifecycle loop consumes, turning the field escape rate into a measured quantity instead of a filed anecdote. 5717 The device world built MDR and MAUDE for exactly this; the runtime analogue is active rather than passive, capturing what would never have been reported. 8 Safety established at clearance does not stay established as the population shifts underneath the model; only continuous measurement keeps the proof current. 12

Contract authority the moment harm signals rise. Pre-bound the envelope of permitted change the way a PCCP does — fixed in advance, monitored against its protocol, revocable on drift — and let the runtime envelope contract automatically as the measured escape rate climbs, through OBL-AGG-001's independence bound and consequence-scaled tolerances. 117 Attribute every action to its model and vendor so the field failure rate has a responsible owner — OBL-VEN-001 — exactly as the device and banking regimes refuse to let the deploying institution outsource the duty to govern. 1721 The right to keep operating autonomously is earned continuously and surrendered the moment the evidence turns, the way an ETOPS tier is contracted on a rising fleet rate. 1120

This paper is the third and final article in a stack on clinical-AI governance, and the three compose into one argument. The first, Intended Use Is the Envelope , established that a model's indication is its operating boundary — the scope inside which any safety claim is meaningful. 19 The second, Risk Is Measured in Harm, Not Accuracy , established the governing variable inside that envelope: harm-weighted integrity, not accuracy, and anatomized the Epic Sepsis Model and Watson for Oncology as the observed-versus-assumed cases. 18 This third article explains what licenses the autonomy those two bounded — the reachable fallback and the continuous proof — and shows that the Cures Act carve-out, ISO 14971, 21 CFR Part 803, and the December 2024 PCCP are one doctrine the regulators reached by the same route aviation reached the diversion airport. 1357 It is the clinical instantiation of the always-a-runway sibling, Priced in Failure-Rate Data , which prices the tail and contracts authority on drift in domains where being confidently wrong is expensive rather than lethal — and it draws its runtime units from the framework paper, The LLM-Agent Assurance Standard , whose obligations make the diversion airport mechanical. 1720

Medicine is the sharpest test of consequence-scaled, bounded, human-reachable autonomy because it is the domain where being confidently wrong is lethal and irreversible — where the administered drug cannot be unadministered and the missed deterioration cannot be un-missed. That is exactly why it is the right place to insist that the autonomy be licensed by its fallback rather than excused by its accuracy. The wrong question is how good does the model have to be to remove the clinician. The right question is is the clinician reachable when the action crosses the harm threshold, is the system still proving itself safe in the field, and does the authority contract the moment the evidence turns. Keep the diversion airport reachable, make the decision trace your surveillance, and pre-bound the authority to change — and the autonomy you grant a clinical AI is autonomy you can defend at the bedside, on the day the forecast is wrong and the runway is the only thing that matters.

End of paper

↑ Back to top

References
  1. 1U.S. Food and Drug Administration (2024). Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence-Enabled Device Software Functions — Final Guidance, issued December 4, 2024. A PCCP comprises three required components: a Description of Modifications, a Modification Protocol, and an Impact Assessment. fda.gov/regulatory-information/search-fda-guidance-documents/marketing-submission-recommendations-predetermined-change-control-plan-artificial-intelligence (accessed 2026-06-21).
  2. 2U.S. Food and Drug Administration (2021). Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan, January 12, 2021; and the 2019 discussion paper 'Proposed Regulatory Framework for Modifications to AI/ML-Based SaMD' (April 2, 2019), which introduced the SaMD Pre-Specifications (SPS) and the Algorithm Change Protocol (ACP) — the conceptual predecessors of the 2024 PCCP's Description of Modifications and Modification Protocol. fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device (accessed 2026-06-21).
  3. 3Federal Food, Drug, and Cosmetic Act §520(o)(1)(E), as amended by §3060 of the 21st Century Cures Act (Public Law 114-255, enacted December 13, 2016). Excludes from the medical-device definition clinical decision support software that meets four criteria, the fourth being that it provides sufficient basis for the recommendation that the healthcare professional can independently review it and is not intended to rely primarily on it. govinfo.gov/app/details/PLAW-114publ255 (accessed 2026-06-21).
  4. 4U.S. Food and Drug Administration (2022). Clinical Decision Support Software — Final Guidance, issued September 28, 2022. Interprets the §520(o)(1)(E) Non-Device CDS criteria, including the independent-review requirement (Criterion 4). fda.gov/regulatory-information/search-fda-guidance-documents/clinical-decision-support-software (accessed 2026-06-21).
  5. 5International Organization for Standardization (2019). ISO 14971:2019 — Medical devices — Application of risk management to medical devices (3rd ed.). Risk management is a total-product-lifecycle obligation: the standard requires a production and post-production phase in which information from the deployed device is actively collected and fed back into the risk-management file. iso.org/standard/72704.html (accessed 2026-06-21).
  6. 6International Electrotechnical Commission (2006, amended 2015). IEC 62304:2006/AMD1:2015 — Medical device software — Software life cycle processes. Defines the software maintenance process and the change/problem-resolution process: every change to released medical-device software is handled under a controlled procedure with documentation, verification, and configuration management. iso.org/standard/64686.html; webstore.iec.ch/en/publication/22794 (accessed 2026-06-21).
  7. 7U.S. Code of Federal Regulations, Title 21, Part 803 — Medical Device Reporting (MDR). Mandates adverse-event reporting by manufacturers, importers, and device user facilities: deaths and serious injuries to which a device may have contributed, and malfunctions likely to cause death or serious injury on recurrence, on defined timelines. govinfo.gov/link/cfr/21/803 (accessed 2026-06-21).
  8. 8U.S. Food and Drug Administration. Manufacturer and User Facility Device Experience (MAUDE) database — the public repository of mandatory and voluntary medical-device adverse-event reports filed under 21 CFR Part 803, updated monthly. A passive surveillance system with known reporting biases. fda.gov/medical-devices/mandatory-reporting-requirements-manufacturers-importers-and-device-user-facilities/about-manufacturer-and-user-facility-device-experience-maude-database (accessed 2026-06-21).
  9. 9Wong, A., Otles, E., Donnelly, J. P., et al. (2021). External Validation of a Widely Implemented Proprietary Sepsis Prediction Model in Hospitalized Patients. JAMA Internal Medicine, 181(8), 1065–1070. doi:10.1001/jamainternmed.2021.2626. jamanetwork.com/journals/jamainternalmedicine/fullarticle/2781307 (accessed 2026-06-21).
  10. 10Habib, A. R., Lin, A. L., & Grant, R. W. (2021). The Epic Sepsis Model Falls Short — The Importance of External Validation. JAMA Internal Medicine, 181(8), 1040–1041. doi:10.1001/jamainternmed.2021.3333. pubmed.ncbi.nlm.nih.gov/34152360 (accessed 2026-06-21).
  11. 11Federal Aviation Administration (2007). 14 CFR Part 121 and Advisory Circular 120-42B — Extended Operations (ETOPS). Diversion time is a worst-credible-case reachability budget: the maximum time an aircraft may be from an adequate alternate airport, single engine inoperative, in still air. The diversion airport is the pre-positioned fallback that licenses the extended-range operation. faa.gov; cfr.gov (accessed 2026-06-21).
  12. 12Finlayson, S. G., Subbaswamy, A., Singh, K., et al. (2021). The Clinician and Dataset Shift in Artificial Intelligence. New England Journal of Medicine, 385, 283–286 — distribution shift degrades a deployed clinical model's measured performance over time, making post-deployment monitoring a safety requirement rather than a courtesy. doi:10.1056/NEJMc2104626 (accessed 2026-06-21).
  13. 13World Health Organization (2021). Ethics and Governance of Artificial Intelligence for Health. Names human oversight, transparency, and continuous post-deployment monitoring as preconditions for the safe clinical use of AI; human autonomy over health decisions is the first of its six core principles. who.int/publications/i/item/9789240029200 (accessed 2026-06-21).
  14. 14European Union (2024). Regulation (EU) 2024/1689 (EU AI Act), Article 14 (human oversight) and Article 72 (post-market monitoring), with Article 6 / Annex I classifying AI that is a medical device or its safety component as high-risk. High-risk obligations apply from August 2026, with an extended transition to August 2027 for AI embedded in regulated medical devices. artificialintelligenceact.eu/article/14/; artificialintelligenceact.eu/article/72/ (accessed 2026-06-21).
  15. 15Felisberto, M., dos Santos Lima, G., Celuppi, I. C., et al. (2024). Override rate of drug-drug interaction alerts in clinical decision support systems: a brief systematic review and meta-analysis. Health Informatics Journal, 30(3) — pooled physician override of drug-drug-interaction alerts of 90% (95% CI 85–95%). A reachable clinician who reflexively dismisses every alert is not, in practice, reachable. doi:10.1177/14604582241263242 (accessed 2026-06-21).
  16. 16Transport Accident Investigation Commission, New Zealand (2013). Inquiry AO-2013-009: Boeing 757, NZ7571, weather-related landing below minima, Pegasus Field, Antarctica, 7 October 2013. The flight was committed past its point of safe return on a forecast, with an under-specified fallback set — the canonical case of an irreversible commitment made on a prediction rather than an observation. taic.org.nz (accessed 2026-06-21).
  17. 17KellerAI (2026). The LLM-Agent Assurance Standard: Gate-Derived Tiering and Independent Pre-Commit Verification (in-depth). The CT0–CT4 consequence lattice (§06), Bucket A / Bucket B and the escape rate (§08), verifier independence and the phi-coefficient error-correlation threshold (§09), append-only trace and enforcement-plane integrity (§10), and the cross-cutting obligations including human approval, irreversibility handling, vendor attribution, and the append-only trace (§11). kellerai.blog/llm-agent-assurance-standard-in-depth.
  18. 18KellerAI (2026). Risk Is Measured in Harm, Not Accuracy (in-depth) — Article 2 of this clinical-AI governance stack. Harm-weighted integrity over accuracy; the Epic Sepsis Model and Watson for Oncology as the observed-versus-assumed cases. kellerai.blog/harm-not-accuracy-in-depth.
  19. 19KellerAI (2026). Intended Use Is the Envelope (in-depth) — Article 1 of this clinical-AI governance stack. The FDA Software-as-a-Medical-Device principle that a model's indication is its operating boundary. kellerai.blog/intended-use-is-the-envelope-in-depth.
  20. 20KellerAI (2026). Priced in Failure-Rate Data: The Reliability Accounting Behind Earned Autonomy (in-depth) — the ETOPS 'always-a-runway' sibling; consequence-scaled tolerances earned over accumulated operating experience, and authority contracted on drift. kellerai.blog/reliability-you-can-bank-in-depth.
  21. 21Board of Governors of the Federal Reserve System, Office of the Comptroller of the Currency, and Federal Deposit Insurance Corporation (2026). SR 26-2: Revised Guidance on Model Risk Management (effective 2026-04-17; supersedes SR 11-7). Effective challenge — independent review with the authority to change a model — and outcomes analysis by backtesting on a continuing cadence. federalreserve.gov/supervisionreg/srletters/SR2602.htm (accessed 2026-06-21).
  22. 22Seymour, C. W., Gesten, F., Prescott, H. C., et al. (2017). Time to Treatment and Mortality during Mandated Emergency Care for Sepsis. New England Journal of Medicine, 376, 2235–2244 — each additional hour to antibiotic administration was associated with higher in-hospital mortality across roughly 35,000 patients. The harm of a missed sepsis alert is measured in hours, and the hours in lives. doi:10.1056/NEJMoa1703058 (accessed 2026-06-21).
  23. 23U.S. Food and Drug Administration (2024). Artificial Intelligence-Enabled Medical Devices list. The number of AI/ML-enabled devices the FDA has authorized has grown into the many hundreds, the overwhelming majority cleared through the 510(k) pathway as assistive software rather than autonomous diagnostic devices. fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-enabled-medical-devices (accessed 2026-06-21).