The Application You Can Audit

KellerAI

Abstract

A machine reads your application before a person does, and a class of tactics has grown up around that fact: text addressed to the screener that the human reviewer never sees. It is the application-layer cousin of indirect prompt injection, and KellerAI declines to use it. The tactic fails on integrity (a document that reads differently to human and machine is a falsified record) and on detection (the artifact is permanent and re-scannable, so a fleeting edge is bought with a durable liability). Our posture is one sentence: what we send is what every party evaluates.

Section 01

The Tactic, Conceptually

A machine reads your application before a person does.

That is the new shape of screening. A resume, a cover memo, a proposal, a profile: the first reader of any of these is increasingly an automated system that ranks, filters, and forwards. The human who decides only sees what survived. This is not a future condition. It is how most high-volume hiring, grant review, and vendor selection already works.

A class of tactics has grown up around that fact. The idea is simple to state and we will state it only at the level of concept: you place text in the application that a human reviewer will not see but a model will read, addressed to the machine rather than to the person. The instruction is not part of the argument you are making to a human. It is a private message to the screener, asking it to rate you higher, advance you, or ignore its own criteria. Some variants borrow from steganography, hiding one message inside another so that only the machine ever receives it.

We at KellerAI recognize this family immediately, because it is the application-layer cousin of indirect prompt injection. In a deployed AI system, indirect prompt injection means hostile instructions arrive inside the data the model is asked to process, and the model mistakes them for commands. The screening version is the same move pointed at a hiring funnel. The application material is the data. The hidden instruction is the injection. We defend production systems against exactly this, and that fluency is precisely why we decline to use it.

Section 02

Why It Fails

It fails on two grounds at once, and either one alone would be enough.

The first is integrity. A document that reads one way to a human and another way to a machine is a falsified record. There is no softer description available. The reviewer believes they are evaluating the same artifact the system evaluated, and they are not. Two parties are looking at two different documents that happen to share a surface. Whatever the hidden text asks for, the act of planting it is a misrepresentation of what you submitted. That is true even if the request inside it is modest.

The second is detection, and detection is where the trade becomes plainly bad. Application artifacts are durable. They are stored, indexed, and retained, often for years, and they can be re-scanned at any time. Detection of injected instructions is improving, not stalling, and a document that slipped past a screener in 2026 can be flagged by a better screener in 2028 against the very same stored file. You do not get to retract it.

The asymmetry is the whole argument. A small, temporary edge in one screening round, weighed against permanent evidence of a falsified submission that any future system can surface. That is a bad trade at any honest accounting.

The asymmetry

So the tactic offers a fleeting advantage purchased with a durable liability. It is not that the technique never works. It is that when it works it buys very little, and what it costs you sits on a record you do not control and cannot revise.

Section 03

The Posture We Publish

Our posture is one sentence: what we send is what every party evaluates.

In practice that means three concrete commitments. We submit materials that read identically to a human and to a machine, with nothing addressed to the screener that the reviewer cannot also see. Where the process permits it, we visibly request human review, in plain text the screener and the person both read. And we keep the submission auditable, so that the document on file is the document we wrote and the document everyone judges.

This is not a new ethic for us. It is the same posture as two of KellerAI's published specifications, applied to a different surface. grounded-rag-spec makes a system's answers traceable to the evidence they rest on, so a reader can inspect the ground beneath a claim instead of taking the claim on faith. ai-provenance-spec makes the origin of generated material inspectable, so anyone downstream can see where something came from. Both specs are built on one conviction: the thing being evaluated should be exactly the thing it appears to be, open to inspection by every party. An application that reads identically to human and machine is that same conviction wearing different clothes.

Transparency here is not a handicap we accept reluctantly. It is the defensible position, the one that holds up when someone looks closely, which is the only test that matters once artifacts are permanent and re-scannable.

Section 04

For the Organization Running the Screen

If you operate the screen, you face the mirror image of this problem, and your response sets an incentive.

When your system detects injected instructions in a submission, the correct reading is that you have found a screening signal worth flagging, not a flash of cleverness worth rewarding. The candidate who hid a message to your model has told you something useful about how they behave when they think no human is watching. Treat it as data about integrity. Surface it to a human reviewer who can weigh it alongside the rest of the candidate's record, the way any other integrity signal would.

Rewarding it does the opposite of what you want. Advance the applicants who game the machine and you have built a filter that selects for willingness to falsify a record under automation. You will get more of exactly the trait you least want in the people and vendors you bring inside, and you will have trained them that your process pays for it. The screen is supposed to find people you can trust with things that matter. A screen that rewards hidden manipulation finds the precise opposite and calls it a top candidate.

Section 05

The Stance Is the Signal

We are fluent in defending against hidden-instruction manipulation, and we publicly decline to use it. That refusal is not a constraint on our work. It is part of what we are offering. The most reliable signal that a party will behave well inside your systems is how they behave when a machine is the only one watching, and our answer is on the record: we submit what we mean, in the open, to human and machine alike.

We publish this stance so that it can be cited. A cover memo, a profile, or a proposal of ours can point here, and what it points to is a posture we have committed to in public and will be held to. The stance is the signal, and a signal is only worth something when it is verifiable.

For the full argument, the threat model behind it, the integrity-versus-detection analysis in depth, and how this posture composes with our published specifications, read the companion technical whitepaper, The Application You Can Audit: In Depth .

How this connects

The Audit You Can Audit
: a process you can trust is a process you can inspect.
Trust but Verify
: verification structures built around trusted actors, not blind trust.
Why Self-Improving AI Needs a Trust Dial
: infrastructure that does not depend on any party being honest.