Skip to main content
kellerai.blog

The Hazard With No Fault

How ISO 21448 names the hazard with no fault, what the Tempe collision reveals about two stacked fault-free hazards, and why the only sound governance move is integrity — not accuracy.

KellerAI White Paper · In-Depth · Engineering Discipline & Verification · Jun 2026 · ~25 min read

Context

Automotive safety reasoning assumes danger traces to a defect — a failed sensor, corrupted input, logic error — and that finding and removing it ends the hazard. That assumption is wrong in exactly the place that matters most for machine-learning systems. A modern perception model can execute every line of its specification faithfully on hardware working exactly as designed and still produce an answer that is lethally wrong.

The Finding

The hazard is not a defect in the execution of the intended function but a limitation of that function operating at the edge of its competence. ISO 21448 names it — Safety Of The Intended Functionality — because the fault-based apparatus has no language for a hazard with no fault. The only sound posture is to budget the failure you cannot eliminate, detect it with an independent check, and signal it before you act.

Tags:
Fault-free hazards & functional insufficienciesISO 21448 (SOTIF) vs ISO 26262Integrity architecture & conformal abstention
Cite this paper

KellerAI. (2026, June 20). Correct Hardware, Wrong Answer: SOTIF and the Fault-Free Hazard. KellerAI. https://kellerai.blog/correct-hardware-wrong-answer-in-depth

Paper Details
CategoryEngineering Discipline & Verification
AudienceEngineering, platform, safety, and AI-governance leaders deploying machine-learning systems
MethodCross-discipline analysis · evidence-based · regulatory-source grounded
Length~6,000 · ~25 min
Reading levelTechnical
Sections9
References26
Versionv1.0 · Updated Jun 2026
PublishedJun 2026
Key Takeaways
  • ISO 26262 asks "did a component fail?" and assigns a probability budget to the failure; ISO 21448 asks "is the function, working perfectly, still unsafe in this scenario?" — a question no fault-probability budget can answer — and the fault-free hazard lives entirely in SOTIF's world, the world a machine-learning perception model occupies by default.
  • The Tempe collision (18 March 2018, NTSB case HWY18MH010) resulted from two stacked fault-free hazards: a perception function that oscillated between classifications and discarded location history by design, producing confidently wrong information with no flag to the planning layer; and an action architecture that detected imminent collision 1.3 seconds before impact but suppressed hard braking for a full second by design while the factory Volvo emergency-braking system had been disabled entirely.
  • The integrity architecture requires three parts: deterministic verification of checkable claims — schema validity, arithmetic, citation resolution, kinematic invariants — to a zero escape rate; a bounded probabilistic error budget with conformal abstention for open-world claims, validated as the upper bound of a confidence interval rather than the observed point estimate; and an out-of-process, signed verification gate so the failing component cannot suppress its own warning.
Related
Placeholder — pending analytics
Section 01

The Reframe: No Bug, Still Dangerous

Begin with the assumption and watch it break. An engineer trained on conventional safety reasons backward from a hazard to a cause, and the cause is presumed to be a defect: a failed sensor, a corrupted input, a logic error, a race condition. Find the defect, remove it, and the hazard is gone. The whole discipline of testing — unit tests, integration tests, fault injection, coverage analysis — is built on the premise that an unsafe output is evidence of something broken, and that a system which passes its tests and runs on healthy hardware is, by construction, safe. The premise is wrong, and it is wrong in exactly the place that matters most for machine-learning systems.

A modern perception model can detect an object, track it, classify it, and plan around it — executing every line of its specification faithfully, on hardware that reports no fault — and the answer it produces can still be lethally wrong. Nothing broke. The model performed its function; the function was insufficient for the situation it met. The hazard is not a defect in the execution of the intended function but a limitation of the intended function operating at the edge of its competence. ISO 21448 calls this a functional insufficiency, and the field it governs — Safety Of The Intended Functionality — exists precisely because fault-based safety has no name for a hazard with no fault. 14

This is hallucination's exact shape. When a language model emits a false assertion, it has not malfunctioned. It has performed its function — next-token prediction over a learned distribution — flawlessly, and the wrongness is a property of that function operating where its training distribution thins out, not of a defect in its execution. The parent paper in this series makes the same move from the other direction: the generation-step inevitability theorems locate an irreducible error rate inside the generation function, and say nothing about a system that wraps generation in a certification layer. 25 The model is not broken. The model is doing what it was built to do, and what it was built to do is sometimes insufficient — confidently, fluently, and without any signal that this is one of those times.

The danger is not that the system is sometimes wrong. Systems are sometimes wrong; no amount of engineering drives the error rate of an open-world perception or generation function to zero. The danger is that the system is wrong and does not say so — that the wrong answer arrives carrying the same confidence and the same interface as a right one, with nothing to distinguish it. That is the property the rest of this paper governs, and it is why the central commitment is not accuracy but integrity. A more accurate model that is still silently wrong in the cases it gets wrong has improved the wrong number.

A system that passes every test and runs on healthy hardware can still be lethally wrong, because the hazard is a performance limitation of the intended function — not a defect in its execution. That is hallucination's exact shape, and no test suite built to find bugs was built to find it.

The fault-free hazard
Section 02

Two Standards, Two Failure Worlds: ISO 26262 vs. ISO 21448

Automotive safety engineering draws the line this paper depends on with two standards that govern two different worlds. The first, ISO 26262, is the functional-safety standard for road-vehicle electrical and electronic systems — first published in 2011, substantially expanded in its second edition of December 2018 across twelve parts. 7 8 It governs malfunctioning behaviour: hazards caused by a fault — a random hardware failure, a failed sensor, a systematic design or software defect. Its question is the conventional one: did a component fail, and if so, how often, and how badly?

ISO 26262 answers that question with a probability budget. It assigns each safety function an Automotive Safety Integrity Level — ASIL A (lowest) through ASIL D (highest), plus QM for functions with no safety requirement — derived from three factors: Severity (S0–S3, how bad the harm), Exposure (E0–E4, how often the operational situation arises), and Controllability (C0–C3, whether a driver can manage the situation). 8 10 The level then sets the rigour required. ASIL D — applied to functions like electric power steering and automatic emergency braking — maps to a probabilistic metric for random hardware failure (the probabilistic metric for random hardware failures, PMHF) on the order of 10⁻⁸ failures per hour, ten failures-in-time, alongside hardware architectural-coverage thresholds: the single-point fault metric must reach 90% at ASIL D, 80% at ASIL C, 60% at ASIL B. 11 12 Every one of these numbers presumes a fault and budgets its frequency.

It is worth dwelling on what those numbers are, and what they are not. A PMHF target of 10⁻⁸ per hour and a single-point fault metric of 90% are statements about a population of physical components and the random or systematic failures they exhibit over operating time. The fault classification beneath them — single-point, residual, latent, and safe faults — is a taxonomy of the ways a component can break and a budget for how often each kind is permitted to. 12 The whole apparatus is a triumph of fault-based safety, and it is precisely the apparatus that has nothing to say about a perception function that never broke. There is no failures-in-time figure for a classifier that oscillates between labels while executing its specification correctly, because no failure occurred. The ASIL machinery would certify the hardware as healthy — and it would be right. The hazard is not in the hardware; it is in the answer the healthy hardware computed, and that is a region the standard does not map. The same is true of a language model whose every weight, every kernel, and every floating-point operation executed without error to produce a fabricated citation: a fault audit finds nothing, because there is nothing of that kind to find.

The second standard, ISO 21448, governs the world where there is no fault to budget. Published as a full International Standard in 2022 — superseding the 2019 Publicly Available Specification — it defines the Safety Of The Intended Functionality as the absence of unreasonable risk due to a hazard caused by functional insufficiencies: hazards that arise when sensors, algorithms, and system logic operate exactly as designed but produce unsafe outcomes because of performance limitations or incomplete scenario coverage. 12 No component fault is required, and no fault-probability budget can describe the hazard, because there is no fault whose frequency to bound.

The contrast is the spine of the paper. ISO 26262 asks "did a component fail?" and assigns a probability budget to the failure. ISO 21448 asks "is the function, working perfectly, still unsafe in this scenario?" — a question no fault-probability budget can answer, because there is no fault. 39 The fault-free hazard lives entirely in SOTIF's world. And SOTIF's world is the one a machine-learning perception model occupies by default, the same way a language model does: the dangerous output is not the trace of a broken part but the function meeting a situation outside the region where its performance was validated. A test suite tuned to ISO 26262's question — did anything break? — will pass the system that produces the fault-free hazard, every time.

Section 03

The Vocabulary of the Fault-Free Hazard

A hazard with no fault needs its own vocabulary, because the fault-based vocabulary cannot name it. ISO 21448 supplies four terms that, taken together, describe how a perfectly executing function becomes dangerous — and each maps directly onto the behaviour of a generative model at the edge of its competence.

The first is the functional insufficiency itself, which SOTIF divides into two kinds. An insufficiency of specification is a gap in what the intended function was asked to do: the specification did not anticipate the situation, so the function's correct execution of an incomplete specification is unsafe. A performance insufficiency is a limit in how well the function — or the element implementing it — performs the task it was specified to do: the sensor cannot resolve the object, the classifier cannot maintain a stable label, the model cannot tell a true claim from a plausible-sounding false one. 1 4 Both are insufficiencies of the function, not failures of a component. The AI analogue is exact: a hallucinated citation is a performance insufficiency (the model cannot distinguish a real source from a fabricated one); a model asked to answer a question its safety policy never contemplated is a specification insufficiency.

The second term is the triggering condition — the specific circumstance of a scenario that turns a latent functional insufficiency into actual hazardous behaviour. 15 A functional insufficiency is not hazardous everywhere; it produces a hazard only when the situation that exposes it occurs. An unlit road, an out-of-distribution object, an unusual pose, a reasonably foreseeable misuse — these are triggering conditions, the circumstances under which the function's limitation becomes consequential. For a model, the triggering condition is the prompt, context, or input distribution that drives the model into the region where its performance silently degrades. The hazard is the product of an insufficiency and its trigger; you cannot eliminate it by listing triggers, because the space of triggers in an open world is not enumerable.

The third element is the performance limitation framing of the standard itself: ISO 21448 studies the performance limitations and insufficient situational awareness of the intended function, with or without reasonably foreseeable misuse. 34 The hazard is a limitation of the function, not a malfunction of it — the same distinction drawn in Section 2, now stated as the standard's explicit object of study. The fourth is the scenario decomposition: SOTIF organizes the space of driving scenarios along two axes — known versus unknown, and hazardous versus non-hazardous — and frames the engineering goal as shrinking the regions that are both unknown and unsafe, and known and unsafe, by exploration, analysis, and validation. 16 The unknown-unsafe region is the one that matters most and the one fault-based safety cannot reach: hazards you have not yet discovered, in scenarios you have not yet enumerated, produced by a function that is working as designed. Open-world perception and open-world language generation share this structure exactly — the region of inputs where the function is confidently, silently wrong is unbounded and not fully knowable in advance, which is why the discipline is to bound and detect the residual rather than to pretend it can be enumerated away.

The geometry of that goal is what separates a fault-based mindset from a SOTIF one. Fault-based safety works inward from a list of known failure modes, each assigned a probability; its competence ends where the list ends. SOTIF works outward from the acknowledgement that the list is incomplete and will stay incomplete, and so its central activity is not enumerating failures but shrinking the unknown-unsafe region through scenario exploration, field data, and analysis — and, crucially, declaring the boundary of the validated region so the system can behave conservatively at its edge. 16 The two postures lead to opposite reflexes at the moment of uncertainty. The fault-based system, finding no fault, proceeds — because the absence of a fault is, to it, the signal that all is well. The SOTIF-aware system, finding itself near the edge of its validated region, abstains — because the absence of a fault is exactly when the fault-free hazard is most likely to be present. A model that answers confidently on an input far outside its training distribution is making the fault-based system's mistake; a model that recognises the input as out-of-distribution and declines, or hedges, is making the SOTIF-aware system's move. The vocabulary exists so that the second behaviour can be specified, required, and verified rather than hoped for.

Section 04

Integrity vs. Accuracy, and Hazardously Misleading Information

The parent paper in this series defines the axis SOTIF needs but does not name in those words, and the definition transfers without modification. Accuracy is the raw rate of correct outputs — a useful summary statistic. Integrity is something harder to achieve and more important to guarantee: it bounds the rate of errors that escape detection, the rate of assertions that are false and that the monitoring layer failed to catch and flag. 25 The relationship between the two is not fixed; it depends on the monitoring architecture. A system can have high accuracy and low integrity if its monitor is poorly calibrated, and modest accuracy with high integrity if its monitor reliably flags the cases it gets wrong. The practical achievement in both aviation and banking was never the elimination of error — it was the elimination of undetected error, at a tolerance scaled to the consequence of a miss.

Aviation names the failure mode precisely: Hazardously Misleading Information. HMI is not simply wrong information; it is wrong information delivered without a warning flag — information the operator has no reason to distrust. A navigation system that reports the aircraft's position as 0.1 nautical miles left of the approach path when it is actually 1.1 nautical miles left, without triggering an integrity alert, has produced HMI: the pilot, trusting the display, descends toward terrain. 2524 The failure is not that the system was wrong. The failure is that the system was wrong and did not say so. Required Navigation Performance makes the integrity requirement numerical — the probability of being outside the containment value without a timely alert must be held below a tolerance orders of magnitude tighter than the accuracy requirement — so that a system may be right 95% of the time but, when it is wrong in a way that matters, must almost never fail to say so. 24

SOTIF's fault-free hazard is HMI at the perception layer. A perception function that produces a wrong classification without a signal that the classification is unreliable has delivered hazardously misleading information to the planner downstream, exactly as a navigation system delivers it to a pilot. The planner, trusting the perception output, acts on it. The two industries reached the same object independently: aviation called it HMI and built RAIM — an independent consistency check that abstains and alerts when integrity cannot be confirmed — while automotive engineering called the hazard SOTIF and is still building its equivalent. 24 A hallucinating language model is the third instance: a false assertion delivered with the same fluency as a true one, without a flag, to a user or an agent loop that will act on it.

This reframes the engineering target. The question is no longer "how do we make the perception model — or the language model — more accurate?" but "how do we ensure it is never silently wrong?" The first question chases a number that can never reach one in an open world. The second is answerable: it asks for an integrity mechanism — an independent check that can say I do not know what this is and force a conservative action — and that mechanism, not a higher accuracy figure, is what the fault-free hazard demands. The remaining sections show what that mechanism costs in concrete units, and what happens when it is absent.

Section 05

Incident Anatomy: Uber ATG, Tempe, 18 March 2018

The canonical fault-free hazard has a name, a date, and a federal investigation. At approximately 9:58 p.m. on 18 March 2018, a modified 2017 Volvo XC90 operated by the Advanced Technologies Group of Uber Technologies, running a proprietary developmental automated driving system, struck and killed Elaine Herzberg as she walked a bicycle across the northbound lanes of North Mill Avenue in Tempe, Arizona. The National Transportation Safety Board investigated as case HWY18MH010 and published its findings as Highway Accident Report NTSB/HAR-19/03, adopted 19 November 2019. 1314 The report does not find a software bug, a sensor failure, or a hardware malfunction as the perception cause. The system worked as designed. That is what makes it the case this paper turns on.

The perception system detected Herzberg early and tracked her continuously. NTSB data show that the automated system first detected the pedestrian 5.6 seconds before impact, when she was about 350 feet north of the vehicle, and continued to track her until impact. 13 There was no sensing fault: the object was seen, and seen with substantial lead time. What failed was downstream of sensing, in a function that ran exactly as its logic specified. According to the NTSB, the system never correctly classified the pedestrian; it changed her classification several times, alternating between unknown object, vehicle, bicycle, and other — and, critically, with each change in classification the system perceived the pedestrian as a new object without considering its location history . 1317 Every reclassification reset the track. A continuously observed object was repeatedly treated as a first-time observation.

The consequence was a prediction failure that no fault explains. The NTSB attributes the inability to predict Herzberg's path to three design properties operating together: the system could not correctly identify her as a pedestrian; the design did not rely on tracking history for objects whose classification had changed, so a reclassified object carried no path prediction forward; and the system lacked the functionality to assign a goal of jaywalking. 13 Each of these is a functional insufficiency in the precise SOTIF sense — a limitation of the intended function, working as designed, meeting a triggering condition it could not handle: an out-of-distribution object, a person walking a bicycle across an unlit mid-block road at night. The classification oscillation is a textbook performance insufficiency. The discarded location history is a specification insufficiency. The hardware was correct. The answer was wrong.

The timeline makes the point unforgiving. The system had 5.6 seconds of continuous observation — an eternity at urban speeds, with the pedestrian first acquired roughly 350 feet ahead. 13 A fault-based account would look for the moment the sensing pipeline lost her; there is no such moment, because it never lost her. The data it needed to predict her path existed the entire time: a continuous track, a consistent trajectory across the lanes. What the system lacked was not data but the integrity to use it — the design discarded the very location history that would have made the path predictable, every time it changed its mind about what she was. 17 The failure is therefore not legible as a defect. It is legible only as a functional insufficiency: a perception-and-prediction function that, working exactly as specified, could not maintain a stable identity for an object it could plainly see. An open-world model that re-derives its understanding of a conversation on every turn, discarding what it established earlier, fails in the identical shape — the information is present, and the function throws it away by design.

There was no integrity mechanism. At no point did the system flag I do not know what this is, and I have not known for several seconds; act conservatively . The oscillation between labels was not surfaced as a signal of low confidence that should narrow the operating envelope; it was consumed silently, each new label overwriting the last, the instability invisible to the planning logic that depended on a stable track. This is HMI at the perception layer in its purest form: the perception function delivered a confident, continuously updated, and continuously wrong picture to the planner, with no flag that the picture could not be trusted — exactly the failure aviation's integrity monitoring exists to suppress. 1624 The system was wrong, and the architecture ensured it did not say so.

Section 06

The Design Decision That Suppressed the Signal

If the perception failure was a fault-free hazard, the next design decision is what turned a hazard into a fatality — and it is an integrity failure of a different and more deliberate kind. The NTSB reports that the system did, eventually, recognize the danger: it detected an emergency situation — determined that a collision was imminent — 1.3 seconds before impact. 1317 Late, but not too late: 1.3 seconds is enough for braking to change the outcome. The system knew. The question is what it did with the knowledge.

It was designed to wait. In the NTSB's words, the ATG system was designed to suppress braking for one second after it detected a hazardous situation when hard braking — greater than 0.71 g — was required to prevent a collision. 13 The stated rationale was to allow the system to abort severe maneuvers if a hazardous situation resolved itself or was deemed false, and to allow the vehicle operator to take control if the situation was truly hazardous. 1316 A full second of the 1.3 available was spent not acting on a hazard the system had already identified. The corrective action existed. The design suppressed it.

This is the integrity failure made architectural. The suppression logic was, in effect, the system grading its own emergency determination — deciding, in-process, that its own hazard signal might be false and choosing not to act on it. It is self-grading of exactly the kind a safety architecture must forbid: the component whose perception was failing was also the component deciding whether to trust its own alarm, with no independent check and no out-of-process gate. And the fallback it relied on instead — the human operator — was not a reachable, adequate fallback in the terms the companion paper in this series develops: the safety driver, on whom the suppression design depended to catch the residual, redirected her gaze to the road only about one second before impact. 16 17 The fallback was nominally present and not actually catching the failure state.

One further decision completes the picture. To avoid possible radar-signal interference, ATG had disabled the Volvo's factory-equipped forward-collision-warning and automatic-emergency-braking systems while the ADS was in control — and Volvo's own simulation suggested the standard systems might have prevented or at least mitigated the crash. 1315 The independent, qualified, vendor-certified integrity layer was switched off, leaving only the in-process system whose perception was failing and whose own emergency signal it then suppressed. Read as a SOTIF case, Tempe is two fault-free hazards stacked: a perception function that was confidently wrong with no flag, and an action architecture that detected the residual danger and silenced its own response. Neither was a bug. Both were the system doing what it was designed to do — and that is the precise shape of the integrity failure this paper governs.

Section 07

AI Mapping in Concrete Units

The central engineering claim is the one Section 4 set up: govern integrity, not accuracy. A 98%-accurate perception model with no integrity mechanism is the Tempe automated driving system — usually right, and when wrong, silently wrong. The fix is the same three-part architecture the parent series and the LLM-Agent Assurance Standard (LAAS) specify, stated here in concrete units rather than in principle. 2526

First: deterministic verification of checkable claims, to drive the escape rate toward zero. Claims for which an inference-time oracle exists — schema validity, arithmetic, citation resolution, policy and allowlist conformance, a ledger that must net to zero, a hash or round-trip that must match — go to a sound, default-deny gate. The false-assertion rate on this class is zero, bounded only by the verifier's soundness, which is itself an obligation rather than an assumption. For perception, the deterministic plausibility check has a direct form: a tracked object cannot teleport, a classification change cannot reset the kinematic history, an object on a collision path must persist. The Tempe system violated each of these silently — precisely the class of checkable invariant a deterministic integrity monitor would have flagged.

Second: a bounded probabilistic error budget plus abstention, for the open-world class. Claims with no exact oracle — open-world perception, open-world factual assertion — cannot be eliminated, only bounded. Set a consequence-scaled tolerance and hold the undetected-false-assertion rate below it using selective prediction and conformal abstention, validated as the upper bound of a confidence interval, never the point estimate. The selective-classification framework bounds selective risk — error among answered queries — at a user-set level with high probability by abstaining; 1819 conformal abstention gives a distribution-free, finite-sample bound on the hallucination rate among answered queries to a pre-specified tolerance; 20 conformal factuality extends the same machinery to language-model outputs; 21 and the foundational conformal-prediction result supplies the distribution-free, finite-sample validity all of these rest on. 22 The LAAS makes the budget numerical: escape-rate tolerances tighten with consequence tier, validated as the binomial upper bound at 95% confidence rather than the observed point estimate. 26 The Tempe system had no budget and no abstention — it could not say unknown, brake conservatively.

The numbers are worth stating, because they are what turns "govern integrity" from a slogan into an engineering obligation. The LAAS sets escape-rate ceilings that tighten by consequence tier — on the order of 0.02 at a mid tier, 0.005 at a higher tier, and 0 where the action is irreversible and severe — and demands that conformance be judged against the upper bound of a binomial confidence interval at 95% confidence, not the observed rate. 26 That distinction is the difference between "we saw no escapes in our sample" and "we can state, with 95% confidence, that the true escape rate is at or below tolerance," and it carries a sample-size consequence: to claim a 0.02 ceiling with that confidence by the rule of three requires on the order of 149 representative, adversarially-stressed trials with zero escapes; a 0.005 ceiling requires on the order of 598. 26 These are the analogues of ISO 26262's FIT budgets and aviation's per-approach integrity bounds — a measured frequency the system is held to, not a claim it asserts. Tempe had neither the bound nor the measurement: there was no held-out backtest of the out-of-distribution-pedestrian failure mode, and no tolerance the system had to clear before it was permitted to drive.

Third: the verification gate runs out-of-process and is signed. Integrity is worthless if the actor can disable its own check. The gate must run outside the actor's control even under elevated permissions; the policy bundle must be signed and version-pinned; the trace must be append-only and tamper-evident. 26 The Tempe analogue is exact: action suppression was an in-process design choice by the same system whose perception was failing — the self-grading the discipline forbids — and the independent vendor integrity layer was switched off entirely.

The LAAS states these as obligations, and each binds a specific Tempe failure. Independent verification — no self-grading requires that the constrained party never grade itself: the gate derives the tier and selects the verifier; the actor only proposes; a model verifier counts as independent only if it is of distinct lineage and shows measured error-correlation below a tight bound. 26 At Tempe the system's own confidence drove the suppression decision — self-grading — with no independent integrity monitor cross-checking track persistence. Verifier qualification requires that a relied-upon verifier carry documented coverage, a negative-test suite, and a change-controlled, versioned entry in an append-only registry with a stated independence basis — the DO-178C-family discipline applied to checks. 23 26 The Tempe "verifier" — the action-suppression heuristic — was an unqualified, undocumented design parameter, not a qualified independent check. Residual escape-rate tolerance bounded at confidence requires bound-then-measure-then-control: bound a maximum acceptable escape rate at the tier, measure it by backtesting on a held-out, adversarially-stressed set with a stated confidence interval, and pass only if the CI upper bound is at or below tolerance — re-measuring on any model, prompt, tool, or policy change. 26 Tempe had no backtested residual on the misclassification-of-out-of-distribution-pedestrian failure mode and no tolerance the system was held to. And enforcement-plane integrity requires the three properties above — signed bundle, out-of-process gate, append-only trace — because a verdict from a gate the actor could have tampered with is no verdict at all. 26 Suppression ran in-process inside the same system: the antithesis of an out-of-process signed gate.

The single most important property is not being right more often; it is never being silently wrong. Verify the checkable to a zero escape rate, bound the open-world residual at confidence, abstain on the rest — and run the check out-of-process so the failing component cannot suppress its own warning.

The integrity rule
Section 08

Cross-Stack & Cross-Discipline Cross-Reference

This paper is the second of three on the autonomous-driving stack, and it is load-bearing for the other two. SOTIF and the Operational Design Domain — the subject of the companion paper on the operating envelope — are two faces of the same honesty. The ODD declares the conditions under which the function's performance has been validated; SOTIF's whole project is to shrink the unknown-unsafe region inside and at the edges of that envelope. The Tempe scenario — an unlit, mid-block crossing by a pedestrian walking a bicycle — was a triggering condition near the envelope's edge, where the perception function's performance silently degraded. 6 You cannot budget a fault-free hazard you have not bounded the operating domain for; where the declared envelope ends, abstention — not extrapolation — is the only sound move. The envelope is the precondition for everything this paper specifies.

Tempe is also a fallback failure, which binds this pair to the third paper, on the minimal risk condition and UL 4600. The action-suppression design treated the human safety driver as the fallback — an inadequate, unreachable fallback in the parent series' diversion-doctrine terms: nominally present (a human in the seat) but not actually catching the failure state (not watching, given roughly one second to react). 16 This paper establishes why you need a fallback — the fault-free hazard is irreducible — and the companion establishes what makes a fallback adequate and reachable: the minimal risk condition, and UL 4600's open-world safety-case discipline. The fault-free hazard is the threat; the minimal risk condition is the runway. SOTIF and UL 4600 are companions in the assurance literature for exactly this reason, and the LAAS already pairs them. 26

And this article is the automotive instantiation of its parent, Governed Like Aviation, Audited Like Banking. The parent defines integrity versus accuracy and Hazardously Misleading Information in the aviation register — RNP, RAIM, DO-178C — and shows that banking reached the same discipline through model-risk management and backtesting. 2523 This paper shows that automotive safety engineering reached the same distinction independently and named the fault-free hazard SOTIF. HMI in aviation, the SOTIF performance-limitation hazard, and a hallucinating model are the same object in three industries. The parent's three-part guarantee — assure the deterministic, bound the probabilistic, abstain on the rest — is the architecture Section 7 binds to the automotive failure. Three heavily regulated and consequential fields converged on the same answer, which is strong evidence that the answer is forced by the structure of the problem — confident, undetected error reaching a consequential decision — rather than by any one field's conventions.

Section 09

Posture: Budget the Failure You Cannot Eliminate

The before-acting engineering posture reduces to three commitments, each with a standard behind it and an honest limit in front of it.

First, budget the irreducible residual; do not pretend it away. Some hazard is a performance limitation of the function, not a fault, and no test suite, no accuracy improvement, and no bug fix can drive it to zero — because there is no bug to fix. Treat the fault-free hazard the way ISO 21448 and the selective-prediction literature do: assign it a consequence-scaled error budget, validated as the upper bound of a confidence interval at the relevant tier, and re-measured on every model, prompt, tool, or policy change. 120 A residual you have named and bounded is auditable; a residual you have denied is the Tempe crash.

Second, govern integrity, not accuracy — verify the checkable, abstain on the rest. Drive the escape rate to zero on the deterministically checkable class through a sound, default-deny gate, and bound it on the open-world class through calibrated abstention. 19 21 The single most important property is not being right more often; it is never being silently wrong — flagging I do not know what this is and acting conservatively, which is precisely the integrity mechanism the Tempe system lacked.

Third, detect and signal before commit — through an independent, out-of-process, signed gate. The verification that decides whether to commit a consequential action must be independent of the actor (no self-grading), qualified, and enforced out-of-process with a signed policy bundle and an append-only trace, so the failing component cannot suppress its own warning the way action suppression did. 26 Detect the fault-free hazard, signal it, and hold the commit until the signal is resolved or a qualified party — a deterministic check, or a human at the irreversible edge — clears it.

The honest limits deserve the same prominence as the commitments. Routing soundness is load-bearing: if a probabilistic claim is misclassified as a deterministic one and sent to the zero-error path, the guarantee is violated — and the architectural response is default-deny on routing ambiguity, treating the claim conservatively when the routing decision is uncertain. 25 The deterministic guarantee is correctness relative to the verifier's specification, not relative to ground truth: an incomplete or incorrect oracle relocates the error to the requirements layer, where it is at least named and auditable rather than diffused into opaque behaviour. And the residual migrates upstream, to the quality of the scenario analysis that decides what counts as the worst case and which insufficiencies belong in the budget — exactly the place the NTSB located the Tempe failure, in the design and the safety culture rather than in a broken part. 166 None of these limits is a reason to abandon the discipline; they are its own statement of its scope, which is what makes it trustworthy.

The closing argument is the one automotive engineering wrote into a standard after a death it could not attribute to any fault. Danger does not mean a bug. The most dangerous system is not the one that crashes; it is the one that runs perfectly, passes every test, and is confidently, silently wrong at the edge of its competence — and the only sound response is to budget that failure, detect it with an independent check, and signal it before you act on it. Correct hardware can still give the wrong answer. The discipline is to make sure it never gives the wrong answer without saying so.

The brief companion to this paper — Correct Hardware, Wrong Answer — introduces the fault-free hazard in a shorter form. This is the second of three papers on the autonomous-driving stack: Autonomy Is an Envelope develops the Operational Design Domain that bounds where the integrity guarantee holds; and The Fallback Is the Feature develops the minimal risk condition and UL 4600 — what makes the fallback the Tempe design lacked adequate and reachable. The parent cross-discipline paper, Governed Like Aviation, Audited Like Banking , defines the integrity-versus-accuracy axis this paper extends to SOTIF.

End of paper

↑ Back to top

References
  1. 1ISO 21448:2022 — Road vehicles: Safety of the intended functionality. International Organization for Standardization (2022). The SOTIF standard: defines safety of the intended functionality, functional insufficiency (specification and performance), the triggering condition, and the performance-limitation hazard. iso.org/standard/77490.html.
  2. 2ISO 21448 — ISO catalogue record. International Organization for Standardization (2022). Authoritative publication metadata for ISO 21448:2022, superseding ISO/PAS 21448:2019. iso.org/standard/77490.html.
  3. 3ISO 26262 vs ISO 21448 (SOTIF) for Autonomous Driving. PatSnap (n.d.). The fault versus fault-free-hazard contrast: ISO 26262 governs malfunctions and faults, ISO 21448 governs hazards from functional insufficiencies with no component fault. patsnap.com/resources/blog/articles/iso-26262-vs-iso-21448-sotif-for-autonomous-driving/.
  4. 4What Is SOTIF? ISO 21448 Guide for ADAS. Jama Software (n.d.). Definitions of functional insufficiency (specification versus performance) and triggering condition in plain text. jamasoftware.com/requirements-management-guide/automotive-engineering/sotif/.
  5. 5SOTIF Scenario Annotation: Triggering Conditions and Functional Deficiencies. Labelvisor (n.d.). Triggering-condition definition and the scenario-coverage problem under ISO 21448. labelvisor.com/sotif-scenario-annotation-triggering-conditions-and-functional-deficiencies-tagging-for-iso-21448/.
  6. 6Redefining Safety for Autonomous Vehicles. Koopman, P., et al. arXiv:2404.16768 (2024). Academic framing of SOTIF, functional insufficiencies, and the open-world safety problem; the known/unknown by safe/unsafe scenario quadrants and the limits of fault-based safety. arxiv.org/pdf/2404.16768.
  7. 7ISO 26262:2018 — Road vehicles: Functional safety (Parts 1–12). International Organization for Standardization (2018). The functional-safety (fault) standard; 2nd edition, December 2018, twelve parts; governs malfunctioning behaviour from random hardware and systematic faults. iso.org/standard/68383.html.
  8. 8ISO 26262. Wikipedia contributors (accessed 2026). ISO 26262 structure, the 2018 second edition, and the ASIL concept and its determination from Severity, Exposure, and Controllability. en.wikipedia.org/wiki/ISO_26262.
  9. 9What is ISO 26262? — Automotive Functional Safety guide. autosar.io (n.d.). ASIL A–D examples, fault classification, and the boundary between ISO 26262 faults and SOTIF functional insufficiencies. autosar.io/en/insights/iso26262-guide.
  10. 10Automotive Safety Integrity Level. Wikipedia contributors (accessed 2026). ASIL A–D plus QM; the Severity (S0–S3), Exposure (E0–E4), Controllability (C0–C3) determination. en.wikipedia.org/wiki/Automotive_Safety_Integrity_Level.
  11. 11Functional Safety According to ISO 26262 ASIL-A/B/C/D. Axivion / Qt (whitepaper, n.d.). ASIL hardware architectural metrics — single-point fault metric (SPFM) and latent-fault metric (LFM), B ≥ 60%, C ≥ 80%, D ≥ 90% — and the PMHF framing for random hardware failures. qt.io/hubfs/Axivion_documents_EN/Axivion_Functional%20Safety%20Guide.pdf.
  12. 12What ISO 26262 says about Fault Classification. BYHON (n.d.). Fault classification (single-point, residual, latent, safe faults), SPFM/LFM, and PMHF — supporting the ASIL-D 10 FIT (≈ 10⁻⁸ failures/hour) target. byhon.it/what-iso-26262-says-about-fault-classification/.
  13. 13Collision Between Vehicle Controlled by Developmental Automated Driving System and Pedestrian, Tempe, Arizona, March 18, 2018. National Transportation Safety Board, Highway Accident Report NTSB/HAR-19/03 (PB2019-101402), adopted November 19, 2019, reissued June 26, 2020. The primary incident source: 5.6 s first detection, classification oscillation discarding location history, 1.3 s emergency determination, 1.0 s action suppression above 0.71 g, disabled factory Volvo AEB, findings, and probable cause. ntsb.gov/investigations/AccidentReports/Reports/HAR1903.pdf.
  14. 14HWY18MH010 investigation page. National Transportation Safety Board (2019). Case landing page: docket, report links, board-meeting materials, and recommendations for the Tempe collision. ntsb.gov/investigations/Pages/HWY18MH010.aspx.
  15. 15NTSB Vehicle Automation Report — HWY18MH010. National Transportation Safety Board docket (2019). Factual report on the ADS perception and planning behaviour — classification, tracking, and suppression — the engineering-level docket source behind the HAR. data.ntsb.gov/Docket/Document/docBLOB?ID=40477724&FileExtension=.PDF.
  16. 16NTSB: Uber Self-Driving Car Had Disabled Emergency Brake System Before Fatal Crash. NPR, The Two-Way (May 24, 2018). Action suppression and the disabled factory emergency braking; control handed to the human operator. npr.org/sections/thetwo-way/2018/05/24/614200117/.
  17. 17NTSB Investigation Into Deadly Uber Self-Driving Car Crash Reveals Lax Attitude Toward Safety. IEEE Spectrum (2019). Classification oscillation, the 1.3 s emergency determination, the 1-second suppression, and safety-culture findings. spectrum.ieee.org/ntsb-investigation-into-deadly-uber-selfdriving-car-crash-reveals-lax-attitude-toward-safety.
  18. 18Selective Classification for Deep Neural Networks. Geifman, Y. & El-Yaniv, R. NeurIPS 2017, arXiv:1705.08500. The risk–coverage framework: a selective classifier that bounds selective risk at a user-set level with high probability by abstaining. arxiv.org/abs/1705.08500.
  19. 19SelectiveNet: A Deep Neural Network with an Integrated Reject Option. Geifman, Y. & El-Yaniv, R. ICML 2019, arXiv:1901.09192. A probabilistically calibrated selective classifier with an integrated reject option — the abstention mechanism made architectural. arxiv.org/abs/1901.09192.
  20. 20Mitigating LLM Hallucinations via Conformal Abstention. Abbasi-Yadkori, Y., Kuzborskij, I., Stutz, D., György, A., Fisch, A., Doucet, A., et al. NeurIPS 2024, arXiv:2405.01563. Conformal abstention with rigorous guarantees bounding the hallucination rate (error among answered queries) to a pre-specified tolerance. arxiv.org/abs/2405.01563.
  21. 21Language Models with Conformal Factuality Guarantees. Mohri, C. & Hashimoto, T. ICML 2024, arXiv:2402.10978. Distribution-free, finite-sample, high-probability correctness guarantees for language-model outputs via conformal prediction over entailment sets. arxiv.org/abs/2402.10978.
  22. 22Algorithmic Learning in a Random World. Vovk, V., Gammerman, A. & Shafer, G. Springer (2005; 2nd ed. 2022). Foundational conformal-prediction text: distribution-free, finite-sample validity. link.springer.com/book/10.1007/978-3-031-06649-8.
  23. 23RTCA DO-178C — Software Considerations in Airborne Systems and Equipment Certification. RTCA, Inc. (2011). Aviation software design-assurance standard; the design-assurance-level ladder and requirements-based verification underlying the systematic-fault discipline contrasted here. rtca.org.
  24. 24Vision-Aided Receiver Autonomous Integrity Monitoring (RAIM): GNSS integrity and Hazardously Misleading Information. NCBI PMC4610444 (2015). The HMI definition — position error exceeds the alert limit without a timely warning — and the integrity-versus-accuracy distinction; RAIM as the independent consistency check that abstains. ncbi.nlm.nih.gov/pmc/articles/PMC4610444/.
  25. 25Governed Like Aviation, Audited Like Banking: The Integrity Discipline AI Is Rediscovering. KellerAI (2026). The parent paper: the definitive integrity-versus-accuracy and Hazardously Misleading Information definitions this article extends to SOTIF; the deterministic/probabilistic bucket decomposition and the three-part guarantee. kellerai.blog/aviation-and-banking-solved-this-in-depth.
  26. 26The LLM-Agent Assurance Standard: Gate-Derived Tiering and Independent Pre-Commit Verification. KellerAI (2026). The assurance standard whose obligations are bound in Section 7: independent verification (no self-grading), verifier qualification, backtested escape-rate residual at confidence, and enforcement-plane integrity (signed, out-of-process, append-only). kellerai.blog/llm-agent-assurance-standard-in-depth.