Skip to main content
kellerai.blog

The System Passed Every Test, Then Killed Someone

SOTIF and the fault-free hazard: a system can execute its specification perfectly and still be lethally wrong.

KellerAI White Paper · Engineering Discipline & Verification · Jun 2026

Context

Engineering assumes danger means a bug — that an unsafe output traces back to something broken, and that fixing the defect removes the hazard. The whole discipline of testing, from unit tests to fault injection, rests on that premise. But a modern perception model can detect, track, classify, and plan around an object while executing every line of its specification faithfully on faultless hardware, and still deliver an answer that is lethally wrong.

The Finding

The hazard is not a malfunction but a performance limitation of the intended function operating at the edge of its competence — no test suite built to find bugs finds it. Automotive safety has a name for this class: SOTIF. The discipline to govern it is to make sure the system never delivers a wrong answer without a warning flag.

Tags:
Fault-free hazardsFunctional insufficiencyIntegrity architecture
Paper Details
CategoryEngineering Discipline & Verification
AudienceEngineering, platform, and AI-governance leaders
MethodCross-discipline analysis · functional-insufficiency framing
Length~2,100 · 9 min
Sections5
DateJun 2026
AuthorsKellerAI
Read the full paper
Section 01

No Bug, Still Dangerous

An engineer trained on conventional safety reasons backward from a hazard to a cause, and the cause is presumed to be a defect: a failed sensor, a corrupted input, a logic error, a race condition. Find the defect, remove it, and the hazard is gone. The whole discipline of testing — unit tests, integration tests, fault injection, coverage analysis — rests on the premise that an unsafe output is evidence of something broken. The premise is wrong, and it is wrong in exactly the place that matters most for machine-learning systems.

A modern perception model can detect an object, track it, classify it, and plan around it — executing every line of its specification faithfully, on hardware that reports no fault — and the answer it produces can still be lethally wrong. Nothing broke. The model performed its function; the function was simply insufficient for the situation it met. The hazard is not a defect in the execution of the intended function but a limitation of that function operating at the edge of its competence.

This is hallucination's exact shape. When a language model emits a false assertion, it has not malfunctioned. It has performed its function — next-token prediction over a learned distribution — flawlessly, and the wrongness is a property of that function operating where its training distribution thins out, not of a defect in its execution. The model is doing what it was built to do, and what it was built to do is sometimes insufficient — confidently, fluently, and without any signal that this is one of those times.

So the danger is not that the system is sometimes wrong. Systems are sometimes wrong; no amount of engineering drives the error rate of an open-world perception or generation function to zero. The danger is that the system is wrong and does not say so — that the wrong answer arrives carrying the same confidence and the same interface as a right one, with nothing to distinguish it. A more accurate model that is still silently wrong in the cases it gets wrong has improved the wrong number.

A system that passes every test and runs on healthy hardware can still be lethally wrong, because the hazard is a performance limitation of the intended function — not a defect in its execution. No test suite built to find bugs was built to find it.

The fault-free hazard
Section 02

Two Standards, Two Failure Worlds

Automotive safety engineering draws the line this argument depends on with two standards that govern two different worlds. ISO 26262 is the functional-safety standard for road-vehicle electronics. It governs malfunctioning behaviour: hazards caused by a fault — a random hardware failure, a failed sensor, a systematic design or software defect. Its question is the conventional one: did a component fail, and if so, how often, and how badly? It answers with a probability budget — the Automotive Safety Integrity Level, ASIL A through ASIL D, derived from severity, exposure, and controllability, with the most demanding functions held to failure rates on the order of one in a hundred million operating hours. Every one of those numbers presumes a fault and budgets its frequency.

That apparatus is a triumph — and it has nothing to say about a perception function that never broke. There is no failures-in-time figure for a classifier that oscillates between labels while executing its specification correctly, because no failure occurred. The ASIL machinery would certify the hardware as healthy, and it would be right. The hazard is not in the hardware; it is in the answer the healthy hardware computed. The same is true of a language model whose every weight and every operation executed without error to produce a fabricated citation: a fault audit finds nothing, because there is nothing of that kind to find.

ISO 21448 — Safety Of The Intended Functionality, SOTIF — governs the world where there is no fault to budget. Published as a full International Standard in 2022, it defines safety as the absence of unreasonable risk due to functional insufficiencies: hazards that arise when sensors, algorithms, and system logic operate exactly as designed but produce unsafe outcomes through performance limitations or incomplete scenario coverage. No component fault is required, and no fault-probability budget can describe the hazard, because there is no fault whose frequency to bound.

The contrast is the spine of the matter. ISO 26262 asks did a component fail? and assigns a probability budget to the failure. ISO 21448 asks is the function, working perfectly, still unsafe in this scenario? — a question no fault-probability budget can answer. The fault-free hazard lives entirely in SOTIF's world, and that is the world a machine-learning perception model occupies by default, the same way a language model does. A test suite tuned to ISO 26262's question — did anything break? — will pass the system that produces the fault-free hazard, every time.

Section 03

Tempe: A Hazard With No Fault

The canonical fault-free hazard has a name, a date, and a federal investigation. On the night of 18 March 2018, a developmental automated driving system operated by Uber's Advanced Technologies Group struck and killed Elaine Herzberg as she walked a bicycle across a road in Tempe, Arizona. The National Transportation Safety Board investigated and did not find a software bug, a sensor failure, or a hardware malfunction as the perception cause. The system worked as designed. That is what makes it the case worth turning on.

The perception system detected Herzberg 5.6 seconds before impact — roughly 350 feet ahead — and tracked her continuously to impact. There was no sensing fault: the object was seen, and seen with substantial lead time. What failed was downstream, in a function that ran exactly as its logic specified. The system never settled on a stable classification; it cycled between unknown object, vehicle, and bicycle — and with each reclassification it treated her as a brand-new object, discarding the location history that would have let it predict where she was going. A continuously observed object was repeatedly handled as a first-time observation. The hardware was correct. The answer was wrong.

Each of those properties is a functional insufficiency in the precise SOTIF sense — a limitation of the intended function, working as designed, meeting a triggering condition it could not handle: an out-of-distribution object, a person walking a bicycle across an unlit mid-block road at night. The classification oscillation is a textbook performance insufficiency. The discarded history is a specification insufficiency. An open-world language model that re-derives its understanding of a conversation on every turn, throwing away what it established earlier, fails in the identical shape: the information is present, and the function discards it by design.

Crucially, there was no integrity mechanism. At no point did the system flag I do not know what this is, and I have not known for several seconds — act conservatively. The oscillation between labels was consumed silently, the instability invisible to the planning logic that depended on a stable track. This is the failure aviation calls Hazardously Misleading Information: not simply wrong information, but wrong information delivered without a warning flag — a confident, continuously wrong picture handed to a consumer that has no reason to distrust it. The system was wrong, and the architecture ensured it did not say so.

One further decision turned a hazard into a fatality. The system did eventually recognize the danger — it determined a collision was imminent 1.3 seconds before impact, late but not too late. Then it waited: the design suppressed hard braking for a full second after detecting a situation that required it, the stated rationale being to avoid acting on a false alarm and to let the human safety driver intervene. A full second of the 1.3 available was spent not acting on a hazard already identified — while the factory automatic emergency braking, Volvo's independent vendor-certified safety layer, had been switched off entirely. Tempe is two fault-free hazards stacked: a perception function confidently wrong with no flag, and an action architecture that detected the residual danger and silenced its own response. Neither was a bug. Both were the system doing what it was designed to do.

Section 04

Govern Integrity, Not Accuracy

The reframe Tempe forces is the one aviation and banking made before automotive did. Accuracy is the raw rate of correct outputs — a useful summary statistic. Integrity is harder to achieve and more important to guarantee: it bounds the rate of errors that escape detection, the rate of assertions that are false and that the monitoring layer failed to catch and flag. The two are not the same number, and they do not move together. A system can be highly accurate and have low integrity if its monitor is poorly calibrated; it can be only modestly accurate and have high integrity if it reliably flags the cases it gets wrong. The practical achievement in both aviation and banking was never the elimination of error — it was the elimination of undetected error, at a tolerance scaled to the consequence of a miss.

A 98%-accurate perception model with no integrity mechanism is the Tempe system — usually right, and when wrong, silently wrong. The fix is not a higher accuracy figure; it is an integrity architecture, and it has three parts. First, for claims a check can settle exactly — schema validity, arithmetic, citation resolution, a policy or allowlist, a track that must persist — run a sound, default-deny verification gate that drives the escape rate toward zero. The Tempe system violated checkable invariants silently: a tracked object cannot teleport; a reclassification cannot reset the kinematic history. A deterministic monitor would have flagged exactly that.

Second, for the open-world class that no exact check can settle — open-world perception, open-world factual assertion — you cannot eliminate the error, only bound it. Set a consequence-scaled tolerance and hold the undetected-error rate below it through calibrated abstention, validated as the upper bound of a confidence interval rather than the observed point estimate — the difference between "we saw no escapes in our sample" and "we can state, with confidence, that the true rate is at or below tolerance." The Tempe system had no budget and no abstention: it could not say unknown — brake conservatively.

Third, the gate must run out-of-process and be signed. Integrity is worthless if the actor can disable its own check. The suppression logic at Tempe was the system grading its own emergency determination — deciding, in-process, that its own hazard signal might be false and choosing not to act — exactly the self-grading a safety architecture must forbid; and the one independent, qualified, vendor-certified integrity layer was switched off. Verify the checkable to a zero escape rate, bound the open-world residual at confidence, abstain on the rest — and run the check where the failing component cannot suppress its own warning.

The single most important property is not being right more often; it is never being silently wrong. Verify the checkable to a zero escape rate, bound the open-world residual at confidence, abstain on the rest — and run the check out-of-process, so the failing component cannot suppress its own warning.

The integrity rule
Section 05

The Threat, the Envelope, and the Runway

The fault-free hazard does not stand alone. It is the threat in a three-part stack, and it is load-bearing for the other two. SOTIF and the operational design domain — the declared envelope of conditions under which a function's performance has been validated — are two faces of the same honesty. The envelope says where the guarantee holds; SOTIF's whole project is to shrink the unknown-unsafe region inside and at the edges of that envelope. The Tempe scenario — an unlit, mid-block crossing by a pedestrian walking a bicycle — was a triggering condition near the envelope's edge, where the perception function's performance silently degraded. You cannot budget a fault-free hazard you have not bounded the operating domain for; where the declared envelope ends, abstention — not extrapolation — is the only sound move.

Tempe is also a fallback failure, and that binds it to the third part of the stack. The suppression design treated the human safety driver as the fallback — an inadequate, unreachable one: nominally present, a human in the seat, but not actually catching the failure state, given roughly a second to react to a hazard the system had silenced. This brief establishes why you need a fallback — the fault-free hazard is irreducible, and no test pass eliminates it. Its companion in the stack establishes what makes a fallback adequate and reachable: a minimal risk condition the system can always reach, the way an aircraft is never permitted past the last point from which it can still turn to a usable runway. The fault-free hazard is the threat; the envelope says where you are safe; the runway is what you turn to when you are not.

That three industries reached the same object independently is the strongest part of the argument. Aviation named it Hazardously Misleading Information and built independent integrity monitoring that abstains when integrity cannot be confirmed. Banking reached the same discipline through model-risk governance and backtesting. Automotive engineering named the hazard SOTIF and is still building its equivalent. HMI in aviation, the SOTIF performance-limitation hazard, and a hallucinating model are the same object in three regulated fields — which is strong evidence that the answer is forced by the structure of the problem, confident undetected error reaching a consequential decision, rather than by any one field's conventions.

The closing argument is the one automotive engineering wrote into a standard after a death it could not attribute to any fault. Danger does not mean a bug. The most dangerous system is not the one that crashes; it is the one that runs perfectly, passes every test, and is confidently, silently wrong at the edge of its competence. Correct hardware can still give the wrong answer. The discipline is to make sure it never gives the wrong answer without saying so — budget the failure you cannot eliminate, detect it with an independent check, and signal it before you act.

The most dangerous system is not the one that crashes. It is the one that runs perfectly and is confidently, silently wrong at the edge of its competence. Make sure correct hardware never gives the wrong answer without saying so.

The closing rule

The in-depth companion develops the full argument: the precise ISO 21448-versus-ISO 26262 framing and the ASIL machinery, the complete SOTIF vocabulary of functional insufficiency and triggering condition, the full NTSB anatomy of the Tempe collision and its action-suppression decision, the integrity-versus-accuracy axis and Hazardously Misleading Information in the aviation register, and the three-part architecture stated in concrete units — deterministic verification, a confidence-bounded probabilistic budget with abstention, and a signed out-of-process gate. Read it at Correct Hardware, Wrong Answer: SOTIF and the Fault-Free Hazard .

End of paper

↑ Back to top