The Application You Can Audit: Hidden Instructions, Honest Submissions, and the Screening Signal

KellerAI

Section 01

The Stance, Stated First

We will state our position before we argue for it, because a posture you have to read to the end to discover is not a posture you can be held to.

KellerAI's standing policy on every application, proposal, and submission we make is this: the materials we send read identically to a human reader and to a machine reader. We embed no instructions addressed to evaluator systems, and there is no text in any file we submit that a person reviewing it would not also see. Nothing in what we send speaks privately to a model. Where a screening process permits it, we make a visible, plain-text request for human review, in language the reviewer and the screener both read. And we keep what we send auditable, so the document on file is provably the document we wrote and the document every party evaluates.

We publish this so that the stance is citable. A proposal of ours, a profile, a cover memo, can point at this paper, and what it points to is a commitment made in public that we can be measured against. That is the entire mechanism. A signal of integrity is worth nothing if it cannot be verified, so we have made ours a matter of record.

This is the technical layer beneath a shorter companion piece. The brief, The Application You Can Audit , states the same posture in plain terms for a general reader. This paper substantiates it: the threat model, the prevalence data that makes the manipulation channel real, the integrity and detection arguments in full, the incentive structure for organizations running screens, and the connection to the design values we already publish in our specifications. We are fluent in defending against the tactic this paper declines, and that fluency is precisely what makes declining it meaningful rather than naive.

A posture you have to read to the end to discover is not a posture you can be held to. So we state ours first, and spend the rest of the paper earning it.

Section 02

The Pipeline That Reads Before a Human Does

A machine reads your application before a person does. That sentence is now true across most of the contexts where applications are evaluated at volume, and it is the precondition for everything that follows.

The adoption data is unambiguous. A 2024 survey of business leaders found that a majority of companies already use AI somewhere in hiring, that more than four in five use AI specifically for resume review, and, most consequentially, that roughly one in five permit the AI to reject a candidate with no human review at all. ¹² That last figure is the one that matters here. It means there exist screening processes in which the only reader of an application is a model, and the model's verdict is final. When the only reader is a machine, a channel opens: text can be addressed to that machine alone, and no human will ever see what was said to it.

This is not a hypothetical funnel. It is the operating reality of high-volume hiring, and the same architecture is spreading to grant review, vendor selection, and academic peer review. Where an automated system ranks, filters, and forwards, the human who finally decides sees only what survived the machine. The screen is the first reader, and increasingly the deciding one.

The same survey that documented this adoption also recorded that most of the leaders deploying these systems believe the systems are biased. ¹² That tension, organizations relying on tools they themselves distrust, is the backdrop against which a manipulation tactic and a defensive posture both make sense. The screen is powerful, consequential, and imperfect, and people respond to that combination in two ways. Some try to game it. We argue, and practice, the other response.

Section 03

The Anti-Pattern, Conceptually

The tactic we are describing is the application-layer instance of a vulnerability class that security research named years ago. We will describe it only at the level of concept, because the value of this paper is in naming and neutralizing the tactic, never in performing it.

Indirect prompt injection was established as an attack class in early 2023, when researchers showed that adversaries could compromise LLM-integrated applications by planting hostile instructions inside the external data the model was asked to process. ¹ The model, unable to separate the instructions it was given from the content it was handed, treats injected text as a command. The OWASP Top 10 for LLM Applications now ranks prompt injection, direct and indirect, as the single highest risk for these systems, and states the root cause plainly: current models cannot reliably distinguish trusted instructions from untrusted content. ² The screening manipulation is that exact move pointed at a hiring funnel. The application material is the data the screener processes. The hidden instruction is the injection. The conceptual statement is enough: text present in a submission that a human reviewer will not see but a parsing model will read, addressed to the evaluator system rather than to the person.

The lineage is older than the language models, though. The crude ancestor is keyword stuffing in applicant tracking systems, documented as far back as the early 2000s and revived in the AI era as white-on-white text: keywords rendered invisible to the eye but legible to the parser. ⁵ A 2023 security commentary traced exactly this continuity, from the early ATS gaming to the new font tricks, and was already skeptical that the edge would last. ⁶ What changed with capable evaluator models is that the hidden text stopped being a list of keywords and became an instruction: not a denser keyword profile, but a message asking the evaluator to rate the candidate higher or to disregard its own criteria. ⁷ ⁸

The same pattern surfaced in a parallel domain, which is instructive because the stakes there are scholarly rather than commercial. Researchers found concealed instructions planted in arXiv manuscripts, aimed at AI systems assisting human peer reviewers, with reported success rates above ninety-eight percent against undefended review pipelines. ¹⁰ A companion line of work studied in-paper injection against AI reviewers directly, finding that hidden prompts frequently induced full evaluation scores, and that detection-based defenses reduce but never eliminate the effect. ¹¹ And in the hiring domain itself, a December 2025 benchmark study of LLM resume screeners measured several attack vectors and reported success rates exceeding eighty percent for some of them against undefended systems, while also measuring that layered defenses cut combined attack success substantially. ⁹ The class works against screeners that do nothing to resist it. That is precisely why a defensive posture is worth publishing, and why an undefended screen is a liability for the organization that runs it.

Section 04

Why It Fails: The Integrity Argument

Whether or not the tactic works is the wrong question to start with. It fails on integrity grounds before detection ever enters the picture. That failure stands regardless of effectiveness.

A document that reads one way to a human and another way to a machine is a falsified record. There is no gentler description that survives scrutiny. The reviewer believes they are evaluating the same artifact the system evaluated; they are not. Two parties are looking at two different documents that happen to share a visible surface. Whatever the hidden text requests, however modest, the act of planting it misrepresents what was submitted. The submission is no longer a single honest artifact. It is a public face and a private message, and the gap between them is the deception.

The industry that screens applications has begun to treat this for what it is. One widely cited framing places hidden prompts and white-text stuffing on a single integrity spectrum that runs through to outright fabrication, and notes that candidates caught at any point on it risk being flagged or blacklisted. ²¹ Employment-law practitioners go further, framing AI-enabled manipulation of application materials as a form of misrepresentation that can rise to fraud and create real liability, in an environment where the broader problem of fabricated and synthetic applicants is growing fast. ²⁰ The legal frame is not decorative. It means the planted instruction is not merely frowned upon; it is the kind of artifact that, surfaced later, supports a finding of intent to deceive.

That durability is the heart of the integrity problem. Whatever screening edge the tactic might buy is transient, but the artifact is permanent. The submission is stored, indexed, and retained, often for years, and it carries inside it the evidence of how it was constructed. A modest request, planted once, becomes durable proof that the applicant chose to say one thing to a person and another to a machine. You cannot make that artifact honest after the fact, and you do not control where it sits.

Section 05

Why It Fails: The Detection Argument

The integrity argument is sufficient on its own. The detection argument makes the trade not merely wrong but plainly bad, because the asymmetry runs entirely against the manipulator.

Detection is already happening at scale, not in a laboratory. One large staffing group reports finding hidden text in roughly one hundred thousand resumes a year, on the order of a tenth of those it scans, and a major hiring platform reported finding hidden white-text messages in one percent of all resumes it processed in the first half of 2025, against a base of three hundred million. ⁷ These are not projected capabilities. They are current operating numbers from the systems that read applications. The mechanics are mundane and robust: applicant tracking parsers strip formatting as a routine step, which surfaces hidden text directly to human reviewers, and discovery typically disqualifies the applicant regardless of their actual qualifications. ⁸

Above that operational baseline sits principled detection research. A 2025 detector for hidden LLM prompts in structured documents works by comparing the text extracted from a file against the text a human would actually see, rendered and read by OCR; divergence between the two is the signal. Evaluated across more than three thousand documents including resumes, it reported a false-positive rate near one tenth of one percent. ¹⁸ The detection principle is exactly the inverse of the tactic: the tactic depends on extracted text and rendered text disagreeing, and the detector looks for precisely that disagreement. On the model side, defenses are improving too. A late-2025 report on prompt-injection defenses for an agentic system described a three-pronged approach, adversarial training, classifier scanning of untrusted content, and continuous red-teaming, and reported a one-percent attack success rate against adaptive attackers for the strongest model tested. ¹⁹

Two honest caveats belong here, and they sharpen rather than soften the argument. First, the class is never fully mitigable. The UK national cyber authority has warned that prompt injection may never be eliminated the way SQL injection eventually was, and has called for resilience by design rather than a hoped-for total fix. ³ The strongest vendor result still describes significant progress, not a solved problem. ¹⁹ Second, none of this saves the manipulator, because the imperfection of detection runs the wrong way for them. Detection improves monotonically over time, while the artifact sits unchanged in storage.

Detection improves monotonically. The artifact does not change. A tactic that slips past a screener in 2026 is re-scannable evidence in 2028, against the very same stored file, and you do not get to retract it.

That is the asymmetry stated as a trade. A small, temporary advantage in one screening round is purchased with a permanent, re-scannable record of a falsified submission. The screener that missed the planted instruction this year may catch it next year against the identical file, and the better detector arrives whether or not the applicant consents. There is no honest accounting in which that is a good exchange.

Section 06

What Rewarding It Selects For

Turn the problem around to the organization running the screen, because its response to detection is not neutral. It sets an incentive, and incentives compound.

Consider what an organization optimizes for if it advances the applicants who successfully inject instructions into its screener. It is selecting, by construction, for willingness to falsify a record when the perceived odds of detection are low and the only witness is a machine. That is the precise condition under which the manipulation is attempted: a moment when the applicant believes no human is watching. An intake process that rewards success under those conditions is a filter tuned to find people who will deceive when they think they can get away with it, and it will reliably find them. Worse, the trait does not stay at the door. The applicant who games the machine to get in carries that same disposition into the role, into the systems they will be trusted with, into the next moment when they believe no one is watching.

This is the inversion of the screen's purpose. A screen exists to find parties an organization can trust with things that matter. A screen that rewards hidden manipulation finds the opposite of that and labels it a top candidate, and it trains its own applicant pool, over time, that the process pays for deception. The signal it broadcasts is that the way in is to lie to the machine convincingly.

The corrective is a single policy choice: flag, do not reward. When a screen detects an injected instruction, the correct reading is that it has surfaced a screening signal worth recording, not a flash of cleverness worth advancing. The applicant who hid a message to the model has disclosed something genuinely useful about their behavior under exactly the conditions an organization most needs to understand. Treated as integrity data, surfaced to a human, and weighed like any other integrity signal, that detection makes the screen more discerning. Treated as ingenuity, it makes the screen actively counterproductive.

Section 07

The Defender's Brief

For an organization that wants to run a screen worth trusting, the governance is concrete, and most of it is already implied by law and by the detection research. What follows is the defensive program, not a payload.

Scan submissions for hidden-content divergence. The detection principle is established and inexpensive in concept: compare the text a parser extracts from a file against the text a human actually sees rendered, and treat disagreement as the signal. ¹⁸ This is the same extracted-versus-rendered comparison that the principled detection research formalizes, and it is robust precisely because the tactic depends on that divergence existing.

Treat detection as a recorded signal with human adjudication, never as silent auto-rejection. Two reasons converge on this. The practical one is false-positive discipline: even a detector with a false-positive rate near a tenth of a percent will occasionally flag an innocent document, and silent rejection turns a rare false positive into an invisible, unappealable harm. ¹⁸ The regulatory one is that human oversight of consequential AI decisions is increasingly a legal duty, not a courtesy. The EU AI Act classifies systems used to filter applications and evaluate candidates as high-risk, with explicit obligations for human oversight, documentation, and transparency. ¹⁴ A detected injection should raise a flag to a person, not trip a hidden trapdoor.

Log what the detector found and how the human adjudicated it. The screen then becomes a process someone can inspect.

All of this sits inside an existing legal frame that already requires accountable, bias-audited, human-overseen screening. Beyond Local Law 144 and the EU AI Act, federal anti-discrimination doctrine has been applied to AI hiring tools, with the employer bearing liability for adverse impact even when a vendor's tool produced the outcome, though that particular guidance was withdrawn from the agency's site in early 2025 following an executive order and should be cited with that caveat. ¹⁵ State law is moving in the same direction, with broad legislation on AI in employment decisions now enacted in more than one state. ¹⁷ The standards bodies frame the technical risk to match: the US standards institute classifies direct and indirect prompt injection as an information-security risk to be governed, mapped, measured, and managed, ⁴ and the UK cyber authority frames injection as a standing risk to be managed by design rather than eliminated. ³ The defender's brief, then, is not exotic. It is detection, human adjudication, logging, and bias-audited accountability, assembled into a screen that can itself be audited.

Section 08

The Trustworthy Alternative

The applicant-side posture is the mirror image of the defender's brief, and it is the one KellerAI practices.

Send materials that read identically to every reader. The whole pathology of the tactic is the gap between what a human sees and what a machine sees; close that gap, and the integrity problem cannot arise. There is no private channel because there is no second document. Disclose visibly rather than hiding. If there is something an applicant wants the evaluating system to know, the honest form is plain text that the human reviewer reads too. Where the process permits, make a stated, polite request for human review, openly, so that the request is part of the visible argument rather than a message smuggled past the person. And keep the submission auditable: what you sent is what every party can later verify you sent, which is the same property that makes the defender's logged screen trustworthy, viewed from the other side.

This is not a posture we adopted for this paper. It is the same design value that runs through KellerAI's published specifications, applied to our own submissions. grounded-rag-spec makes a system's answers traceable to the evidence they rest on, so a reader can inspect the ground beneath a claim instead of accepting the claim on faith. ai-provenance-spec makes the origin of generated material inspectable, so anyone downstream can see where something came from rather than guessing. Both specifications rest on one conviction: the thing being evaluated should be exactly the thing it appears to be, open to inspection by every party. An application that reads identically to human and machine is that conviction wearing different clothes. We made provenance and grounding inspectable in the systems we build, and we hold our own application materials to the same standard.

Transparency, in this frame, is not a handicap we accept reluctantly while wishing we could compete on the manipulator's terms. It is the defensible position, the one that holds up when someone looks closely, and once artifacts are permanent and re-scannable, holding up under close inspection is the only test that matters.

Section 09

How This Connects

This paper sits inside a small set of KellerAI pieces that argue, from different angles, that trust is earned through inspectability rather than asserted.

The Audit You Can Audit makes the case that an audit becomes worth acting on not when its analysis gets sharper but when its process becomes inspectable. An application earns trust the same way. The reason a transparent submission is defensible is identical to the reason a self-accountable audit is: the artifact is exactly what it appears to be, and any party can check.

Trust but Verify is about building verification structures around trusted actors rather than relying on blind trust. The applicant who welcomes verification, whose materials are identical to all readers and auditable after the fact, is the human-side counterpart to that argument. They are not asking to be trusted; they are making themselves verifiable.

The Trust Dial argues for infrastructure that does not depend on any party being honest, where autonomy and trust are earned through a clean, inspectable history rather than granted on assertion. The transparent submission is the smallest version of that principle: a record clean enough that nobody has to take your word for what is in it, because the document itself settles the question.

Section 10

Close: The Application You Can Audit

The stance is the signal. The most reliable indicator that a party will behave well inside systems that matter is how it behaves when a machine is the only thing watching, and our answer is on the record: we submit what we mean, in the open, to human and machine alike.

What makes that refusal meaningful is fluency in the attack it declines. A party that does not understand hidden-instruction manipulation and abstains from it has merely failed to think of it. A party that defends production systems against indirect prompt injection, understands exactly how the screening variant works, and publishes a commitment not to use it, is making a choice that carries information. The choice is the offer. We decline the tactic not because we cannot perform it but because the artifact it produces is durable evidence of deception, and because an organization that rewards it is selecting for the trait it should least want. The transparent submission is the one that holds up when someone looks closely, which, once the record is permanent, is every submission.

For the shorter argument and the plain statement of the posture, read the companion brief, The Application You Can Audit .

The Application You Can Audit

Context

The Finding