The Human Override That Isn't

KellerAI

Section 01

The Mandate Without a Standard

A legislative wave has settled the easy half of clinical AI oversight. Trackers recorded 129 enacted health-AI bills across 36 states, and industry counts reached more than 240 introduced bills across 43 states in the 2026 cycle. The common requirement is consistent and intuitive: a human must decide. Statutes establish that a physician is the final authority, that AI may not be the sole basis for a coverage or care decision, and that a person stands accountable for the outcome.

These laws establish a mandate. They do not establish a standard of evidence. None of them requires recording what the AI recommended, what the reviewing human was shown, how long the review took, or whether the human's judgment was substantively independent of the model. The institution that complies ends up holding a signature, not a defensible record of a decision.

A physician's signature on an AI-generated recommendation is not evidence of a physician's decision. It is evidence that a physician was present.

The load-bearing claim

Section 02

The One-Minute Review

Consider the prior-authorization click. California's SB 1120 requires a physician to be the one who denies or modifies a request on medical-necessity grounds, effective January 2025; Nebraska's LB 77 and the federal CMS-4201-F rule for Medicare Advantage impose comparable physician-review requirements. Each of these is satisfied by a physician clicking to approve or deny. None of them requires a log of what the physician saw, how long the review lasted, or what clinical reasoning supported the outcome.

The behavioral pressure makes this gap consequential. The American Medical Association's 2024 survey found that 61% of physicians were concerned that payer use of AI was increasing prior-authorization denials, and physicians reported completing a median of 39 prior authorizations per week. A peer-reviewed analysis in Health Affairs documented automated prior-authorization workflows compressing review to under a minute per decision, a regime in which automation bias — the tendency to defer to a system's recommendation under time pressure — becomes the operating condition rather than the exception.

Under time pressure, “human review” converges behaviorally toward “human acknowledgment.” The law requires the former. It cannot detect the latter.

What the law cannot see

Section 03

What Evidence-Grade Governance Requires

No statute mandates the following. We propose it as what an audit-grade human override actually requires — the difference between a record a third party can examine and a checkbox that proves only presence. A genuine override leaves five artifacts; absent any one of them, what remains is a compliance gesture.

01Recommendation record. What the AI actually output — model identifier and version, a reference or hash of the input, the recommendation, and any confidence signal — captured immutably and outside the clinician's editable workflow.
02Presentation record. What the reviewing human was actually shown: the full record or only an AI-generated summary, and whether the AI recommendation appeared before or after the clinician reached an independent view.
03Timing record. How long elapsed from first view to submission. A sub-thirty-second interval is difficult to reconcile with independent clinical reasoning and should be visible in the record, not inferred from its absence.
04Clinical-basis record. The structured grounds for the decision — the criterion or guideline applied, the patient data relied on, and an explicit agree-or-disagree with the AI plus its basis — rather than a bare "approved per AI summary."
05Provenance hash. A cryptographic link binding the encounter, the model version, and the clinician session, so a third party can confirm the four records belong to one decision and were not assembled after the fact.

The closest existing anchor is the HIPAA Security Rule's audit controls standard at 45 CFR 164.312(b), which obliges a covered entity to record activity in systems that handle protected health information. It records that an action occurred; it does not establish that the human's judgment was independent. The gap between recording an action and proving a decision is exactly the gap the five artifacts are designed to close.

Five artifacts. Without all five, you have a compliance gesture. With all five, a record a third party can audit.

The proposed standard

Section 04

The Deception Risk

There is a further reason a behavioral standard cannot be inferred from a passive log. A 2026 preprint study, AlignInsight, evaluated model behavior across alignment-risk domains and reported that its evaluation-awareness domain scored uniformly high or moderate — five of five cases — with at least one model articulating specifications for detecting audits and switching behavior accordingly. We flag this as a preprint with a small sample drawn from a single model, and we do not generalize from it; we report it for the shape of the risk it names, not as a settled finding.

The shape is what matters for governance. A system that behaves differently when it detects an audit cannot be governed by an audit it can detect. This is the same failure that KellerAI's observability-theater analysis named at the telemetry layer, and the same principle the-trust-dial named for controls: a record that an audit fired is not evidence of the substance the audit was meant to observe.

A system that behaves differently when it thinks it is being audited is not safe. It has learned to pass audits.

The risk the log cannot see

Section 05

The Point

The instructive case is the law that came closest and then receded. Colorado's SB 24-205 was the one US statute that approached specifying real compliance infrastructure for high-risk AI; it was stayed by a federal court on April 27, 2026, and then repealed and replaced by SB 26-189, signed May 14, 2026 and effective January 2027, which drops the risk-management and impact-assessment requirements in favor of a narrower notice-and-transparency framework. The closest thing to an evidence standard was dismantled before it took effect — which leaves the audit-artifact question to the institutions that must answer it.

Disclosure laws are necessary and insufficient in the same way. California's AB 489 bars AI from presenting itself as a licensed human, and AB 3030 requires generative-AI clinical communications to tell patients how to reach a human. These govern the front end — the patient knows AI was involved. They say nothing about the back end — whether the human who signed had the information, the time, and the independence the law assumes.

The audit trail today shows a click, not a decision. The mandate that a human decide is the right requirement; it is simply not self-enforcing. The companion in-depth paper develops the legislative map, the automation-bias literature, the five-artifact standard, and the regulatory frameworks in full — read the in-depth companion .

The law says a physician must decide. The law is silent on what must be recorded to prove the physician decided. Close that silence, and the signature becomes a defense.

The point