Skip to main content
kellerai.blog

You Cannot Outsource the Obligation to Govern

The vendor ran the eval. You still own the governance.

KellerAI White Paper · Engineering Discipline & Verification · Jun 2026

Context

Deployers of frontier models routinely treat the vendor's benchmark suite, model card, and enterprise indemnity as a de facto discharge of their governance obligation. SR 26-2 and two decades of model-risk management say otherwise.

The Finding

Buying the model does not buy the accountability. The vendor's eval is an artifact-level measurement; the deployer's governance is a deployment-level relationship that no quantity of upstream diligence can constitute.

Tags:
model risk managementvendor accountabilitySR 26-2agent governanceaudit trail
Paper Details
CategoryEngineering Discipline & Verification
AudienceRisk officers, AI governance leads, and engineering leaders deploying third-party models in regulated or high-consequence contexts.
MethodAnalytical · evidence-based
Length~500 · 2 min
Sections0
DateJun 2026
AuthorsKellerAI
Read the full paper
Section 01

The Eval You Bought Is Not the Governance You Owe

The pitch is seductive: the foundation-model vendor ran a benchmark suite, published a model card, scored a safety eval, and signed an enterprise agreement — so the hard part of governance is done, and what remains is integration. That description is true, and it misses the point entirely. The vendor's eval is the vendor's evidence about the vendor's artifact under the vendor's conditions. It is not your governance, because governance is not a property of the model. It is a property of the deploying institution that lets the model commit actions in its name.

Buying the model does not buy the accountability. When a bought agent moves money, sends an email, files a ticket, or amends a record, the consequence lands on the deployer — not on the lab that trained the weights. The deployer owns the residual risk of every model it runs, including the ones it did not build, and especially the ones it cannot inspect.

Section 02

Banking Already Litigated This

Model-risk management settled the question fifteen years ago and re-settled it in 2026. The interagency US standard SR 26-2 — which superseded SR 11-7 in April 2026 — places vendor and third-party models explicitly inside its scope. A model bought from outside is still the deploying institution's model risk to own. The validation lifecycle, the ongoing monitoring, the outcomes analysis: the bank performs them on the vendor's model, because the regulator holds the bank, not the vendor, accountable for what the model is allowed to do.

The doctrine has a name and a shape. Rigor is risk-tiered by materiality — capital is proportional to consequence — and a full audit trail sufficient to reconstruct the decision is required. You do not get to point at a supplier's certificate when the model is wrong. You get to explain, from your own records, what it did and why you let it.

Section 03

The Ratings That Were AAA Until They Weren't

The cleanest enforcement anchor is not an enforcement action at all — it is the 2007–08 reliance on external credit-rating-agency models. Banks and investors treated AAA ratings from Moody's and S&P on structured products as a substitute for their own model-risk assessment. The ratings were the outsourced judgment. They were catastrophically wrong, and the losses landed on the deployers who had leaned on them, not on the agencies that issued them.

An external party's assessment of a model never transferred the obligation. The deployer that relied on someone else's eval still owned the failure when the eval was wrong.

The load-bearing lesson

That is the whole of it. SR 11-7's explicit vendor-model clause was, in part, written to foreclose exactly this move — the move of treating a supplier's score as a discharge of your own duty to govern.

Section 04

What This Means for a Bought Agent

The agent you license is, governance-wise, a vendor model that takes actions. Three disciplines follow directly. First, vendor attribution: when the agent uses a vendor model or a third-party tool, its errors count against your escape-rate budget, not the vendor's reputation. Second, a reconstructable trace: every gated action emits an append-only, tamper-evident record sufficient to reconstruct what happened, at what tier, checked by whom. Third, consequence-scaled rigor: the gate derives the tier from the action's blast radius and prices scrutiny to it — the agent proposes, it never grades its own consequence.

None of this is the vendor's to provide. The vendor cannot trace your decisions, cannot attribute its errors to your budget, and cannot set your tiers. Accountability is non-delegable downward, to a supplier, for the same reason it is non-delegable inward, to the builder. You cannot outsource the obligation to govern.

The in-depth companion develops the full argument — vendor attribution to the deployer's escape budget, the append-only hash-chained trace as the conformance artifact, and consequence-scaled rigor where the gate derives the tier and the actor only proposes.

Read the in-depth companion →

End of paper↑ Back to top