The honeymoon and what it costs
There is a pattern at the start of every team’s AI adoption. The model in the editor calls the code elegant. It calls it defensive. It calls it production-grade. The engineer feels twice as productive — and in a real sense they are producing twice the code — but the review function has gone quiet. Six weeks later the on-call engineer finds out which of those adjectives were doing the work.
The companion critique paper in this same release walks the artifacts: a polling-loop retry where the SDK offered a native interrupt; a cache key that forgot the model version and silently served stale verdicts after a snapshot bump; a row-lock mutex that holds for the UPDATE and not for the LLM call it was supposed to guard. None of it is bad engineering by the conventional definition. All of it ships. The bill always comes.
This pattern is not new. Commercial aviation lived through an isomorphic problem in the 1970s — competent pilots, capable aircraft, a confident cockpit culture, and a fatality rate that the industry could no longer accept. What aviation built in response is the thing AI engineering can now adopt wholesale, instead of relearning the lessons one production incident at a time.
What aviation actually built
The discipline that took commercial flight from one crash per 200,000 departures in the 1950s to one per ten to twenty million today was not better pilots. It was a stack of verification structures around the pilots.
After United Airlines 173 ran out of fuel over Portland in 1978 — a fully functioning aircraft, a captain fixated on a landing-gear indicator, a crew that did not effectively challenge his attention — NASA convened the workshop that produced Crew Resource Management . The cockpit stopped being a hierarchy of command and became a team of redundant observers, each obligated to call out an anomaly regardless of rank.
James Reason’s 1990 Human Error introduced the Swiss Cheese Model : no single safety layer is perfect; accidents happen when the holes in successive layers align. The insight was structural — adding a powerful layer is irrelevant to whether you need all the others.
NASA’s Aviation Safety Reporting System , established in 1976 after the TWA Dulles crash, made confidentiality and immunity engineering requirements rather than cultural niceties. Over a million reports later, the industry knows things it would never have learned from accident investigation alone.
Atul Gawande tells the origin of the checklist in The Checklist Manifesto: the Boeing B-17 crashed on its 1935 demonstration flight because the aircraft was too complex for memory alone — a pilot forgot to release the gust lock. Boeing’s engineers responded not by simplifying the aircraft but by inventing the pre-flight checklist. The B-17 went on to fly 1.8 million accident-free miles. A WHO surgical adaptation of the same idea cut hospital mortality from 1.5% to 0.8%.
The mandatory go-around is the doctrine that says: when the approach is unstabilised, you abort the landing and climb away. KellerAI’s internal voice puts it plainly:
“The approach phase is where most accidents happen. The ground gets close, fatigue is high, and the temptation to force the landing kills more pilots than weather ever did. Go around if you need to. The runway will be there.”
What “Trust but Verify” means here
AI is a tool, an amplifier, and — with diligence — a semi-trusted partner. Until it is not. Always the human in the loop.
The right operating model is borrowed directly from CRM: the engineer is Pilot-in-Command . The LLM is the co-pilot — capable, observant, often right, structurally subordinate. The co-pilot has the obligation to call out anomalies; the engineer has the obligation to respond to them; and the PIC signs for the aircraft. Accountability for what reaches production sits with the human, not the model.
The same internal voice articulates the discipline this requires:
“The Pilot trusts instruments over instincts. In darkness, when the inner ear says ‘level’ and the altimeter says ‘descending,’ the Pilot believes the altimeter. Tests over intuition. Evidence over confidence.”
A model that praises code as enterprise-grade is the inner ear. The failing test, the missing field in the cache key, the unhandled SIGKILL — those are the altimeter.
The five moves that translate aviation to AI engineering
Five practices, drawn from KellerAI’s own operating rules, that move the aviation discipline into the daily loop.
One. Verify before you act.
Never invent CLI flags, schema fields, or API signatures from memory. Read the help, fetch the docs, or run the validator first. “I remember how this works from training” is never an acceptable reason to skip verification — for the engineer or for the LLM working alongside them.
Two. Found it, own it.
If the tooling reports an error in a file you just touched, fix it before commit — regardless of who introduced it or whether the violation is on your lines. “Pre-existing” is not an escape hatch. The co-pilot who notices a failing instrument does not note it in a log and move on.
Three. Delegate, then check in.
Main attention is finite and user-facing. Specialist work is dispatched to specialists; the orchestrator’s job is the orchestration, not the inline narration. This is the sterile-cockpit principle in operating form — protect the critical phase from cognitive load that does not belong to it.
Four. Zero dirty state.
Before any task is declared done: status is clean, tests pass, lint is clean, changes are committed and pushed. A professional never leaves errors, dirty code, or failing tests for someone else. This is the aviation sign-off applied to the engineering loop — the aircraft does not leave the gate with the log dirty.
Five. Cite every claim.
Every factual claim in a generated document must be cited. This rule binds the LLM as tightly as the engineer. A model that calls a pattern “best practice” without a source is making an unverified claim; the engineer’s job is to ask which file, which line, which doc — and to refuse the claim until the source is on the table.
The discipline this buys you
Adopt this from day one and you skip the realization that you missed a catastrophic pile-up of warning signs. You do not need to live through the eighteen-day silent failure, the cache that served verdicts from a model that no longer exists, the rollback path that nobody tested. You go straight to using AI as a skilled practitioner uses any powerful instrument: with the discipline that makes the instrument useful and the verification that keeps it honest.
Near-misses are gifts. The runway will be there. The instruments do not lie. Always the human in the loop.
For the file-and-line walkthrough of the aviation source material — CRM, Swiss Cheese, ASRS, sterile cockpit, two-person integrity, the checklist, the mandatory go-around, the KellerAI eighteen-day incident as a Swiss Cheese postmortem — and the full set of engineering practices that translate from each, read the companion technical whitepaper, Trust but Verify: Aviation Safety Principles Applied to AI-Assisted Engineering . For the cautionary tale this methodology is built to prevent — the praise that should have been a warning, the deferrals that turn into the design, the AI-shaped code missing its AI-shaped defenses — read the companion critique paper, The Bill Always Comes: Why “Enterprise-Grade” AI Code Often Isn’t .