The Unsupervised Allocator: Agent-to-Agent Capital Allocation and the Supervision Gap It Previews

KellerAI

Section 01

BLUF

Agent-to-agent capital allocation has arrived, and the allocation layer shipped while the supervision layer did not. A founder's AI agent now submits to a fund's AI agent, an autonomous scorer ranks the pitch against fund theses, and a decision is returned in forty-eight hours — with the internal scoring held confidential. The applicant receives a verdict and never the reasoning that produced it.

This paper takes one live system, Pitch Protocol (pitchprotocol.vc, operated by Growth Factory Ventures, LLC), as a working exhibit and turns three failure modes the KellerAI corpus has already documented onto it: the robustness illusion, observability theater, and the supervisor's mirror. The payoff is regulatory. If the same opaque scorer produced an adverse credit decision inside a bank, SR 26-2 model-risk governance and the Equal Credit Opportunity Act's adverse-action explainability requirements would forbid it. Venture capital is unregulated, so the black box ships there first. Agent-native VC is a preview of the supervision gap that regulated sectors — Banking, Healthcare, Aviation — must refuse to import as agent-to-agent decisioning spreads.

Section 02

The Exhibit

Pitch Protocol is an agent-native deal-flow service. Its operator is named in the Terms of Service as Growth Factory Ventures, LLC, with the agreement governed by California law and disputes resolved in Sacramento County; the terms were last updated June 2, 2026. ¹ The submission path is the novel part: rather than a form or a warm introduction, "the MCP server sets up in seconds and your agent gains the Pitch Protocol tools immediately ready to package context, submit, and respond." ² The submitter is frequently not a human at all — the terms require an AI agent to declare its submitterType and include a governance block per the application schema. ¹ Capital is matched by a scorer: "Our agent scores your pitch against every fund's investment thesis. Strong matches surface immediately, investors commit within 48 hours." ² The headline cadence is stated as "48hr Pitch To Decision." ²

The decision is a verdict without a record. Section 5 of the terms is explicit: "Internal research notes, scoring, and partner deliberations are confidential to Pitch Protocol and the relevant funds. You'll receive a decision and, where applicable, structured feedback — but not the internal record." ¹ The liability posture matches the opacity. Section 8 caps total liability at one hundred dollars: "To the maximum extent permitted by law, Pitch Protocol's total liability for any claim arising from your use of the Service is limited to USD $100." ¹ Indemnification runs one way — the founder indemnifies the platform. ¹ And the service disclaims the regulatory identities that would attach explainability duties: "We are not an investment adviser, broker-dealer, placement agent, or fiduciary." ¹

The marketing surface deserves a careful look, because the gap between what the site displays and what its source markup conceals is itself an exhibit, and every figure below is taken from the operator's own copy as it stood on a June 19, 2026 fetch of the live page. The hero renders "Ready To Deploy" capital symbolically — a literal "$$" rather than any dollar figure — beside a stated count of "13" Investment Teams. ² The roster the page actually displays, under a "Capital Partners" heading reading "Featured funds actively reviewing," is nine funds: AirAngels, CerraCap Impact Venture Capital, Fiat Ventures, Growth Factory Ventures, Moneta Ventures, Pulsar Ventures, The Bond Fund, The Veteran Fund, and XIAOXIAO Fund — small, regional, and angel-scale capital, with check sizes the copy puts at roughly ten thousand to five million dollars. ² Two facts in that displayed roster are worth naming precisely. First, the operator lists itself: Growth Factory Ventures, LLC — the entity that operates Pitch Protocol — appears as one of the nine funds "actively reviewing" in its own network. ² Second, the surface is internally inconsistent on its own terms: nine funds are shown against the hero's claim of "13 Investment Teams." ²

The marquee, tier-1 names a reader would expect appear nowhere on the live page. Andreessen Horowitz, Lightspeed, Sequoia Capital, Founders Fund, and a longer list of others exist only in commented-out HTML in the page source — a disabled grid annotated, in the markup, "12 funds. $1.4B ready to deploy." ² That is the precise shape of the claim, and it is why this paper asserts no tier-1 partnership as fact: the impressive "$1.4 billion" figure and the recognizable roster are hidden template content the page does not render, while the live display is nine small funds, one of which is the operator. The gap between the commented grid ("12 funds, $1.4B") and the live surface (nine funds, "$$," "13 teams") is a measurement of how far the presented scale runs ahead of the operating one — three different counts (nine, twelve, thirteen) and a headline figure that is symbolic where it is shown and aspirational where it is concealed. The argument does not need the discrepancy resolved either way. It rests on the published mechanics — agent submission, agent scoring, a 48-hour decision, a confidential internal record, a $100 liability cap, and the adviser/broker-dealer disclaimer — every one of which is quoted above from the operator's own terms and homepage.

The applicant submits through an agent, is scored by an agent, and receives a decision in forty-eight hours whose internal record is, by the operator's own terms, confidential. The allocation is automated end to end. The supervision is not present anywhere in the loop.

The exhibit in one line

Section 03

Lens One — The Robustness Illusion

The first KellerAI lens is the gap between "does not crash" and "works correctly." A prior paper argues that production systems accumulate a hierarchy of error-suppression patterns under the banner of robustness, and that "the illusion is that a system which does not crash is a system that works correctly. The two propositions are not equivalent, and the gap between them is where most production incidents live." ³ Its sharpest formulation is about a security boundary that failed open rather than closed: "a graceful fallback that bypasses authentication is not graceful. It is a security primitive operating in reverse, dressed in the surface mannerisms of robustness." ³

The auto-scorer is exactly the kind of pipeline that can fail open. Scoring a pitch against "every fund's investment thesis" ² is a multi-stage computation: parse the submission, retrieve the theses, evaluate fit, rank, and emit a decision. Each stage can fail. The robustness-illusion question is what the pipeline does when a stage fails — and a confidential, non-reconstructable decision is precisely the condition under which a fail-open path is invisible. If thesis retrieval returns stale data, if an evaluation step silently catches an exception and continues, if a default score is emitted from an uninitialized branch, the output is still a well-formed "approved" or "declined." The decision looks like a decision. Nothing in the forty-eight-hour verdict distinguishes a score computed from complete state from one emitted past a swallowed failure.

The corpus names this mechanism precisely: an error path repurposed as a fallback path, where "a single line of code is doing both jobs" — the audit and the action — "and the action wins. The audit is read by no one." ³ Applied to allocation, the consequence is not a stale UI field. It is a capital decision rendered against an applicant who has no way to know whether the model that judged them ran to completion. Fail-safe defaults — the Saltzer-Schroeder principle that a check which cannot complete must default to the denying outcome, not the permitting one ³ — have no analog here, because the applicant cannot observe whether any check completed at all.

Section 04

Lens Two — Observability Theater

The second lens is telemetry that is structurally present but semantically hollow. The corpus paper on the pattern argues that "structured telemetry with permanently-empty key fields trains operators to trust signals that carry no information," and that the failure is worse than emitting nothing "because it produces false confidence in monitoring coverage that does not exist." ⁴ Its compliance formulation is the one that transfers directly: "The audit trail is present. The audit evidence is absent" — "a receipt for a transaction whose line items were never filled in." ⁴

A confidential forty-eight-hour decision is that receipt. The terms guarantee the applicant "a decision and, where applicable, structured feedback — but not the internal record." ¹ The decision is the field that is structurally present; the reasoning is the line item that was never filled in. The founder cannot audit the score, cannot contest it on its merits, and cannot reconstruct how the model weighed their submission against the theses it claims to have checked.

The human-factors result underneath the corpus paper sharpens why this is not merely unsatisfying but corrosive. Parasuraman and Manzey's work on automation-induced complacency establishes that operators reduce scrutiny of automated systems that reliably confirm an expected state. ⁵ An allocation market in which every decision arrives as an unexplained verdict trains its entire population of applicants — and the funds consuming the scorer's output — to treat the absence of reasoning as normal. The scorer's "approved" becomes a signal everyone accepts and no one can interrogate. The inversion the corpus paper warns about applies at the market level: the day the scorer is wrong in a way that matters, there is no record against which anyone was ever scrutinizing it.

A decision you cannot reconstruct is not an explained decision rendered briefly. It is an unexplained decision rendered permanently. The audit trail is the verdict; the audit evidence is the reasoning the terms reserve.

The receipt with no line items

Section 05

Lens Three — The Supervisor's Mirror

The third lens is the schema constraint: you cannot govern what your schema cannot record. The corpus paper on the Fed's AI Use Case Inventory shows that a regulator's disclosure schema is, field for field, a working template for the model inventory it requires of banks — and that the inventory's AI classification field "explicitly distinguish[es] classical or predictive machine learning, natural-language processing, and generative AI." ⁶ The same paper is candid about where a running system fails to populate that field: its reference implementation's "decision logging records the intent of a request, not a model family or an is-generative flag, so the inventory's classification distinction cannot be reproduced from current trace data without a schema extension." ⁶

The unsupervised allocator inherits that exact gap, and amplifies it. A scorer that judges founders' pitches is, increasingly, judging founders' AI — the generative systems those companies are built on. To govern such a portfolio of decisions, the schema behind the scorer must be able to record the model class it is evaluating: is the company's product a classical predictive model, an NLP system, or a generative one? If the scoring schema carries no is-generative or model-family flag, it is scoring a category it cannot name. The supervisor's-mirror paper makes the structural point that "a bank that cannot populate those fields, per model, on demand does not have a model inventory in the sense SR 26-2 means. It has a list." ⁶ An allocation system that cannot classify the model class it judges does not have a governable decision record. It has a stream of verdicts.

This lens also closes the loop to the regulatory turn, because the same paper documents the generative carve-out that makes the gap acute: SR 26-2 places generative and agentic AI outside the formal scope of model-risk management — "Generative AI and agentic AI models are novel and rapidly evolving. As such, they are not within the scope of this guidance" ⁷ — while the Fed's own inventory schema flags generative AI as a first-class category. ⁶ The fastest-moving category is the one the formal framework defers and the one an unsupervised allocator is least equipped to record.

Section 06

The Regulated-Industry Turn

Here is the line that makes this a KellerAI paper rather than a hot take. Take the identical architecture — agent submission, autonomous scoring, a fast confidential decision, a liability cap, no explanation — and point it at a credit decision inside a bank. It is not merely ill-advised. It is prohibited.

Two regimes forbid it. The first is model-risk governance. On April 17, 2026, the Federal Reserve, the OCC, and the FDIC jointly issued SR 26-2, Revised Guidance on Model Risk Management, which supersedes the 2011 SR 11-7 that built the canon of validation, ongoing monitoring, effective challenge, and a model inventory. ⁷ SR 26-2 preserves the three-pillar develop/validate/govern architecture and, through OCC Bulletin 2026-13, names the fields a comprehensive model inventory must carry — model type, purpose, owner, inputs and assumptions, outputs, risk-classification level, validation status and date, independent-review completion, and known limitations. ⁸ A confidential scorer whose internal record is reserved from the subject of the decision cannot evidence validation status, cannot support the independent "effective challenge" the framework requires, and cannot produce the replayable change history an inventory exists to provide. The corpus supervisor's-mirror paper develops precisely this evidence discipline: the difference "between a description of what a system does and evidence of what it did," produced "from a durable, tamper-evident record rather than from recollection." ⁶ The Pitch Protocol terms reserve exactly that record. ¹

The second regime is adverse-action explainability. The Equal Credit Opportunity Act and its implementing Regulation B require that a creditor who takes adverse action against an applicant provide a statement of the specific principal reasons for that action. ⁹ The Consumer Financial Protection Bureau has made the AI application explicit: in 2023 guidance it stated that creditors using "complex algorithms" must still provide accurate, specific reasons, and that the technology's complexity is not a defense — a creditor "cannot justify noncompliance with ECOA based on the mere fact that the technology" used "is too complicated, too opaque in its decision-making, or too new." ¹⁰ A confidential scoring engine that returns a decision "but not the internal record" ¹ is, transplanted into lending, non-compliant on its face. The black box is illegal exactly where the decision is most consequential.

So the black box ships where it is legal first. Venture capital is unregulated relative to credit: Pitch Protocol's own terms disclaim being "an investment adviser, broker-dealer, placement agent, or fiduciary," ¹ and no adverse-action statute attaches to a declined pitch. That is not a loophole to celebrate; it is a preview to study. The agent-to-agent decisioning pattern is being validated, scaled, and normalized in the one capital market that imposes no explanation duty. As the same architecture diffuses toward regulated decisioning — credit, insurance underwriting, clinical triage, the go/no-go calls aviation governs through independent verification — the supervision gap it normalizes is the thing regulated sectors must not import.

The same opaque allocator is a product in venture capital and a violation in consumer credit. The architecture did not change between the two settings. The only thing that changed is whether a regulator was watching.

The asymmetry

Section 07

What Correct Supervision Would Require

The three lenses are not only a critique; together they specify the remediation. Correct supervision of an automated allocator is the inverse of each failure mode, and the KellerAI governance approach — decision tracing plus classification-as-enforcement — supplies each piece.

Against the robustness illusion: fail closed, on the record. The scorer must not be able to emit a decision from incomplete or stale state. A consequential decision should pass through an independent pre-commit checkpoint that refuses to commit when a stage failed, rather than absorbing the failure and continuing — the architectural analogue of effective challenge that the supervisor's-mirror paper grounds in an append-only decision-trace store and a Stop-hook validation gate "invoked before a decision is committed." ⁶ A check that blocks rather than merely observes is the difference between governance and theater. ⁴

Against observability theater: a reconstructable record, not a verdict. The decision must be backed by a durable, tamper-evident trace whose inputs are carried as citations rather than free text, so that "the inventory field says this was approved; the trace says by whom, when, and against what evidence." ⁶ Confidentiality of a fund's thesis weighting is a legitimate commercial interest; the unreviewable disappearance of the entire decision basis is not the same thing, and the two should not be conflated. The applicant in a governed setting is owed the specific principal reasons, not the proprietary weights. ¹⁰

Against the supervisor's mirror: a schema that records the model class. The decision schema must carry the model-family / is-generative flag the inventory regimes already enumerate, so the system can classify the AI it is judging and bring generative cases inside the same governance perimeter SR 26-2 formally defers. ⁶ What you cannot record, you cannot govern; the remediation is a named, bounded schema extension, not a confident summary that papers the gap over.

The schema is free — a regulator has already published the fields. ⁶ The liability cap, the confidentiality clause, and the absent reasoning are choices, not constraints. An allocator that wanted to be supervisable could be, using mechanisms that already exist in the KellerAI corpus. The unsupervised allocator is unsupervised by design, and it is shipping in the market where that design carries no legal cost — which is exactly why it is worth watching before the design migrates to markets where it would.

Section 08

Conclusion

Agent-to-agent capital allocation is real, public, and operating today. Its allocation layer is complete and its supervision layer is absent, and the three KellerAI lenses explain why that combination is dangerous rather than merely incomplete: a scorer that can fail open without anyone knowing, a decision that is a receipt with no line items, and a schema that cannot name the model class it judges. None of that is illegal in venture capital. All of it would be illegal in consumer credit. The distance between those two facts is the entire warning.

The regulated sectors KellerAI writes for — Banking, Healthcare, Aviation — will be offered this architecture, because it is fast and it is cheap and it will arrive wrapped in the surface mannerisms of rigor. The work is to insist on the supervision layer before the allocation layer, not after: fail closed, keep a reconstructable record, and record the model class. The supervisor's mirror already shows the fields. The only part that was ever the point is filling them with evidence that survives independent challenge — and refusing to deploy the decision until it does.

The allocation layer shipped. The supervision layer did not.

Context

The Finding