Citations or Guesses: The Five-Pass Rule and the Standard Behind It

KellerAI

Audience: engineering leaders deciding what gates a merge, and reviewers who have already noticed that “the LLM said it looked good” is not, in fact, a review
Scope: why a single charitable LLM pass misses the smoking gun, what KellerAI's adversarial review and spec workflows do instead, and the verbatim standard the firm enforces at the boundary
Companion: the in-depth technical paper walks the 5pass passes and the 14-round spec workflow with file-and-line citations

Section 01

One charitable read

Three reasoning agents read the same pull request. Each one steelmanned the code in front of it — the state machine, the Pydantic schema, the try/ finally, the database lock — and each one came back with the same verdict the LLM in the editor had already given: defensive, production-grade, well-structured. None of them mentioned the polling loop where a native SDK primitive should have been. None of them mentioned the cache key missing model_version. None of them mentioned the conditional UPDATE mutex that holds the row lock for the UPDATE itself and not for the seconds-to-minutes of LLM inference that follow.

That is what a charitable LLM review looks like in production. It is not malice and it is not laziness. It is geometry. A single read of a target document by a single model is a stylistic match against the corpus the model was trained on — a corpus that does not contain this codebase, this schema, this SDK version, this deferral. The model finds the patterns it has seen. The patterns it has not seen are invisible to it, in exactly the way a fish does not notice water.

The Series 1 critique on this site is the bill that comes when that geometry is the only review structure in place. This paper is the methodology answer.

Section 02

Why 5pass exists

The /5pass skill ( ~/.claude/skills/5pass/SKILL.md ) was written for the exact failure mode above: a charitable read by one lens misses the issues a different lens would have caught instantly. The skill fires five passes against a target document in a fixed order, where each pass operates under a lens orthogonal to the others, and where Pass 3 reads the prior passes to surface contradictions between them.

The five lenses, verbatim from the skill:

Red-Team

— assume the target is wrong somewhere; find blunders, attack vectors, plan-killers; run live-state Bash checks against the real repo wherever the target makes claims about it.
Errors of Omission

— hunt absence, not presence; what is not in the target that should be? what preconditions are assumed but never verified? what rollback paths are missing?
Sloppy Thinking and Logical Flaws

— internal contradictions, fabricated convergence claims (“all reviewers agreed” when they did not), undefended assumptions, recommendation-label hygiene; this pass also reads the other four and flags contradictions between them.
Misconceptions About Ground Truth

— for every factual claim in the target, run the minimal Bash command that confirms or refutes it; record claim, command, output, verdict.
Synthesis and Revised Target

— produce what survives, what must change with replacement prose, what is unresolved, and a confidence verdict of SAFE,
NEEDS-MORE-WORK
, or
FUNDAMENTALLY-BROKEN
.

Each pass writes a durable file to disk before any orchestration message goes out. Each finding cites a file and a line, or a Bash command and its output. A pass that returns “looks good to me” is not a 5pass output; it is a failed pass.

Section 03

From 5pass to spec-init — fourteen phases, four model tracks

/spec-init ( ~/.claude/plugins/cache/kellerai-dev-marketplace/kellerai-feature-spec/0.18.0/skills/spec-init/SKILL.md ) extends the same discipline to the upstream side of the work: the specification, before the code is written. The default phase manifest defines fourteen phases, grouped into seven stages, with cross-model selected as the default mode.

The fourteen phases, verbatim from the manifest:

1.1 Codebase Reconnaissance · 1.2 Deepen with Ambition · 1.3 Inversion Analysis
2.1 OODA Loop · 2.2 Red Team · 2.3 Steelman
3.1 Error Scrub (five iterations)
4.1 Background Sections · 4.2 TDD Anchoring
5.1 Second Error Scrub · 5.2 De-slopify
6.1 Validate Beads Format · 6.2 Import and Wire Dependencies
7.1 OODA Closure Loop

The three Phase-2 phases — OODA, Red Team, and Steelman — run as type: cross-model and dispatch four parallel tracks per phase: claude, codex, grok, and gemini, with synthesis blocked until all tracks complete. That is twelve adversarial reads per spec across three lenses and four models, before a single line of code is written.

The point of running the same target through more than one model is not that any model is “better.” It is that each model fails differently. A spec that survives OODA, Red Team, and Steelman across four model families with no surfaced contradiction is a spec where the most obvious failure modes of any one model have been canceled out by the others. A spec where one track flags an issue the others missed is exactly the case the workflow is built to catch.

Section 04

Citations or Guesses

The standard KellerAI applies at the boundary is one sentence, verbatim from ~/.claude/rules/core/citations.md :

EVERY factual claim in a generated document MUST be cited. No exceptions.

The rule enumerates what requires a citation: any security finding, any compliance assessment, any DORA metric, any architectural claim (“uses STI”, “no audit log”), any reference to a file or function or configuration value, any metric or count or measurement. The rule then forbids the three failure modes that look like citations but are not: never write “no audit log exists” — cite the absence by name; never write “plaintext storage” — cite the schema migration with a line number; never write “disabled in config” — quote the commented-out line.

Stacked next to this rule is the decision trace. A finding without a citation is unverifiable. A decision without a trace — the prior options considered, the evidence weighed, the reason for the choice — is unrepeatable. Both are the same failure under different names: a claim with no audit path back to the artifact that grounds it.

That is the line the firm draws:

KellerAI only accepts Citations and Decision Traces. Anything else is just a guess.

A non-cited finding is a guess wearing the costume of a review. A missing decision trace is a guess wearing the costume of a plan. The standard is to reject both at the boundary, before they become commitments — because a commitment built on a guess is the bill that always comes in Series 1.

Section 05

What this prevents

Walk back to the five “joints” the Series 1 brief named. The file-poll retry loop where the native SDK primitive should have been — exactly the kind of finding a 5pass Errors of Omission pass surfaces, because the question it forces (“what is not in the target that should be?”) is the question “why is the SDK's native primitive not in this design?” The cache key missing model_version — exactly the kind of finding a 5pass Misconceptions About Ground Truth pass surfaces, because a Bash check against the cache key definition reveals the absence. The conditional- UPDATE mutex whose lock scope is the UPDATE statement and not the LLM call — exactly the kind of finding a spec-init Red Team phase across four model tracks would flag, because at least one of the four would attack the lock-scope claim. The deferred-operations stack (“we can loop back on that”) — exactly the kind of finding an Inversion Analysis phase surfaces by asking what the design looks like if every deferral is permanent. The zeroed cost meter — exactly the kind of finding an OODA Closure Loop flags as an unresolved question that should not pass the gate.

None of these are exotic. All of them are visible to a structured adversarial review with citations and decision traces as its acceptance criterion. None of them are visible to a charitable single-pass read by a single model — which is why the bill always comes when the charitable read is the only review structure on the merge path.

For the file-and-line walkthrough of the /5pass skill, the /spec-init phase manifest, the cross-model track templates, and the decision-trace artifacts that close the loop, read the companion technical whitepaper, Citations or Guesses: The Five-Pass Rule, In Depth . For the Series 1 critique this methodology answers, see The Bill Always Comes . For the Series 2 sibling methodology on aviation-grade verification posture, see Trust but Verify .

The Five-Pass Rule and the Standard Behind It

Context

The Finding

One charitable read

Why 5pass exists

From 5pass to spec-init — fourteen phases, four model tracks

Citations or Guesses

What this prevents