The Supervisor's Mirror: The Fed's AI Inventory as a Model-Risk Template

KellerAI

Section 01

The Schema the Supervisor Already Published

The 2025 AI Use Case Inventory is a federal-agency disclosure obligation, not a piece of bank supervision. Under OMB Memorandum M-25-21, federal agencies must inventory and publicly report their own AI use cases. 5 The Federal Reserve publishes its inventory separately from the government-wide rollup; the figures in this paper are taken directly from the Fed's own data file as it stood on the January 28, 2026 snapshot. 1 A point of arithmetic hygiene first, because the public summaries invite an error: the government-wide rollup lists roughly fifty-six participating agencies, and that number is sometimes mistaken for a use-case count. 6 A second path to the same error runs through the identifiers themselves: the inventory's Use Case IDs run from FRB-0001 to FRB-0056, so the maximum ID is exactly fifty-six, and a reader who infers the row count from the highest identifier lands on the wrong number. The IDs are not contiguous — eighteen are skipped — and the true count is the rows that are present. The Fed's own inventory records thirty-eight individual AI use cases, verified row by row against the published data file. 2 Every count below is date-stamped to that 2026-01-28 snapshot.

Of the thirty-eight, sixteen are recorded as deployed, seven as pilot, and fifteen as pre-deployment. 2 By AI classification, twenty-one are classical or predictive machine learning, fifteen are natural-language processing, and two are generative AI. 2 The two generative cases are FRB-0021, a Body Worn Cameras Data Management System in pilot under the Inspector General, and FRB-0054, a Virtual Benefits Assistant deployed in Management/HR. 2 Zero use cases are flagged as high-impact. 2 That last figure is a screening outcome, not a proof of absence of risk, and we return to its honest reading in Section 7. The point for now is structural: the inventory does not merely count systems, it characterizes each one against a fixed set of fields.

Those fields are the contribution. The inventory schema carries twenty-six metadata columns per use case. 2 Reading them in order, they record a use case identifier and name; the owning agency and component; whether the entry is withheld from public reporting; the stage of development; whether the use case is high-impact and, if not, the justification; the topic area; the AI classification — explicitly distinguishing classical or predictive machine learning, natural-language processing, and generative AI; the problem the system is intended to solve; the expected benefits; a description of the system's outputs; the operational or pilot start date; whether the system was purchased from a vendor, developed under contract, or built in-house; the vendor name; whether an Authorization to Operate exists; the system name; the data used to train, fine-tune, or evaluate the model; a link to any Federal Data Catalog entry; whether the use case involves personally identifiable information; a link to any Privacy Impact Assessment; and whether demographic variables are used as model features. 2

Each of those fields exists because it answers a governance question someone will eventually ask. Stage of development answers is this in production, and since when? AI classification answers what kind of model is this, and does it fall in a category that warrants extra scrutiny? The training-and-evaluation-data field answers what went into it, and can its provenance be traced? The vendor-versus- in-house field answers who built it, and does a third-party documentation chain exist? Authorization to Operate answers was it approved to go live, and by whom? The high-impact screen answers how material is it? A regulator did not choose these fields to satisfy curiosity. It chose them because they are the minimum facts required to govern a portfolio of models — which is precisely the task it sets for the banks it supervises.

A regulator has already written down, and made public, the exact fields its supervised institutions must be able to fill. The form is free. The discipline of filling it is the work.

The disclosure schema is a governance schema

Section 02

What a Model-Risk Program Must Evidence

On April 17, 2026, the Federal Reserve, the Office of the Comptroller of the Currency, and the FDIC jointly issued SR 26-2, Revised Guidance on Model Risk Management, which supersedes and replaces both SR 11-7 from 2011 and the SR 21-8 BSA/AML statement. 7 SR 11-7 built the canon — validation, ongoing monitoring, effective challenge, and a model inventory 9 — and we name it here only as the rescinded predecessor; the live argument anchors entirely on SR 26-2 and its companion OCC Bulletin 2026-13. 8 The revised guidance is principles-based and risk-proportionate: overall model risk is inherent risk assessed in the context of materiality, where materiality is a function of exposure and purpose. 10 It is primarily directed at institutions above thirty billion dollars in total assets but scales by model risk, and it is non-binding guidance rather than a rule — non-compliance does not, by itself, result in supervisory criticism. 10

SR 26-2 preserves the three-pillar architecture of its predecessor while adding a dedicated vendor and third-party section. 7 The first pillar, develop, covers model development, implementation, and use: design documentation, data inputs and provenance, assumptions, the intended-use scope and prohibited uses, pre-deployment testing, third-party documentation, and change control. 10 The second pillar, validate, covers conceptual soundness, outcomes analysis through back-testing, and ongoing monitoring on a risk-based cadence including drift detection; under SR 26-2 the independence of validation is framed in terms of rigor rather than reporting-line structure. 10 The third pillar, govern, covers board and senior- management accountability, a written model-risk policy, effective challenge, the model inventory itself, and internal audit — which under the revised guidance must not duplicate validation but assess whether the model-risk program is rigorous and effective. 8

Governance assigns those pillars to three lines of defense, following the IIA's 2020 Three Lines Model. 11 The first line is the model owners, developers, and business units who build and run models. The second line is independent model- risk management and validation. The third line is internal audit. The arrangement is the banking analogue of an independent checkpoint — the same structural pattern that aviation reaches through independent verification and validation, arrived at separately because the problem forces it.

The spine of the whole apparatus is the model inventory. OCC Bulletin 2026-13 names the fields a comprehensive inventory is expected to carry: model name, model type, model purpose, model owner, the organizational unit responsible, development methodology, key inputs and assumptions, outputs produced, risk- classification level, validation status, validation date, independent-review completion, and known limitations or exceptions. 8 Industry practice extends the list with version, developer source, materiality, monitoring thresholds and cadence, change history, production status, and a vendor or black-box indicator. 12 A bank that cannot populate those fields, per model, on demand does not have a model inventory in the sense SR 26-2 means. It has a list.

Section 03

The Read-Across

Set the Fed inventory's twenty-six fields beside the SR 26-2 model inventory's required fields and the correspondence is not approximate — it is field-level. This is the paper's spine: the supervisor's public disclosure schema is a working template for the model inventory it requires of banks. The mapping below pairs each inventory field with the SR 26-2 pillar it serves and the bank model-inventory field it instantiates. The frame stays constructive throughout: the value is the isomorphism of fields, which we develop here, not a claim that the two regimes share a mandate, which Section 6 explicitly disclaims.

Fed inventory field	SR 26-2 pillar	Bank model-inventory analog
Use Case ID / Name	Govern	Model ID and name in the inventory
Stage of Development	Develop	Development stage / production status
AI Classification (incl. generative) ‡	Develop	Model type
Problem solved / Outputs description	Develop	Model purpose / outputs produced
Training / fine-tune / eval data description	Develop	Key inputs / data sources / provenance
Vendor vs in-house / Vendor name	Develop (vendor section)	Developer source / vendor
Authorization to Operate (ATO)	Validate / Govern	Validation and approval status
Is high-impact? + justification	Validate	Risk classification / materiality
PII involved? / PIA link	Validate (impact assessment)	Impact assessment / data sensitivity
Custom code / open-source link	Develop	Implementation platform / code provenance
Human oversight (M-25-21 minimum practice) †	Validate	Human-oversight posture
Ongoing monitoring (M-25-21 minimum practice) †	Govern	Ongoing monitoring

† The final two rows are not among the inventory's twenty-six schema columns. They derive from the M-25-21 minimum risk-management practices the inventory presupposes — human oversight and ongoing monitoring — rather than from a disclosed column, and are included here because they complete the govern-pillar mapping. 3 The field-by-field correspondence claim applies to the schema columns above the marked rows.

‡ The AI-classification field is a model-type tag the present reference implementation does not natively capture: its decision logging records the intent of a request, not a model family or an is-generative flag, so the inventory's classification distinction cannot be reproduced from current trace data without a schema extension. This gap is stated in full in Section 7; the row is retained because the bank model-inventory field exists and is the right target — the evidence backing is what is missing.

Read column by column, the table says something specific. The develop pillar is the densest, because the Fed inventory was built to characterize how a system came to be: its classification, its purpose, its data, its provenance, and its code are all develop-pillar facts, and the inventory captures each one. The validate pillar is carried by the high-impact screen and its justification — a materiality call by another name — together with the PII-and-PIA pairing, which is the inventory's impact-assessment hook. The govern pillar is carried by the identity fields and by the M-25-21 minimum practices the inventory presupposes: human oversight and ongoing monitoring, which the Fed requires of its own high-impact AI and which map directly onto a bank's oversight posture and monitoring obligation. 3

The correspondence is not merely conceptual; it is reproducible in a running system. In the keller-platform reference implementation, the identity and lifecycle fields are backed by an append-only decision trace store (src/server/src/agents/kai/integrations/decision_tracing/store.py) and a project lifecycle whose stages are typed in src/server/src/database/types.py — so the inventory's Use Case ID and Stage of Development fields are not free-text cells but the keys of a durable record. The training-and- evaluation-data field maps to observation citations carried on each decision (src/server/src/agents/kai/integrations/decision_tracing/tools.py), the vendor-versus-in-house field maps to a governance-pack source attribute (src/server/src/database/types.py), and the Authorization-to-Operate field maps to project-approval events (src/server/src/database/types.py). The point is not that this particular platform is required — it is that each inventory field has a concrete evidence backing, which Section 5 develops, and Section 7 states plainly where that backing does not yet exist.

The deeper reading is that the regulator and the supervised institution face the same problem and the same decomposition. A portfolio of models has to be characterized, classified by consequence, traced to its inputs, and approved before use — and the facts that decomposition needs are a small, fixed set. The Fed enumerated them for its own disclosure. A bank can lift the enumeration wholesale. The two related KellerAI papers treat the upstream evidence problem — what makes a decision trail auditable in the first place 15 and how classification becomes enforcement rather than dashboard theater 16 — and this read-across stands on both.

Section 04

The Generative Carve-Out

SR 26-2 does something its 2011 predecessor never had to: it draws a boundary around generative and agentic AI. The revised guidance places those models outside the formal scope of model-risk management in plain terms — “Generative AI and agentic AI models are novel and rapidly evolving. As such, they are not within the scope of this guidance” — and directs banks to rely on broader risk-management practices for those systems instead. 7 8 The reasoning is defensible — these architectures do not fit the back-testing-and-benchmark mold the guidance was built around — but it leaves a conspicuous gap. The fastest-moving category of AI a bank is likely to deploy is the one the model-risk framework explicitly declines to govern in its own terms.

The Fed's inventory schema does not have that gap. Its AI classification field already distinguishes generative AI as a first-class category, and two of the Fed's thirty-eight use cases are recorded under it. 2 The supervisor, in other words, has built a disclosure field for precisely the AI category its supervisory guidance carves out. That is not a contradiction; the two documents serve different purposes. But it is an opportunity. A bank told by SR 26-2 to handle generative AI under broader risk management still has to handle it somehow, and the inventory schema is a ready structure for doing so: classify the system as generative, record its purpose and outputs, trace its training and evaluation data, capture its vendor provenance, and run it through the same impact screen as everything else.

The guidance carves generative AI out of formal model-risk scope. The supervisor's own inventory already flags it. The disclosure field is a ready template for evidencing exactly the category the framework defers.

The carve-out and the field

This matters more as the deployed category grows. The serving- model substitution problem — where a request addressed to one model is answered by another inside a single call — is a live governance question for generative systems, and a companion KellerAI paper treats it as a model change that existing model- risk vocabulary already knows how to handle. 14 The read-across reinforces that conclusion from the regulator's side: the inventory field for generative classification, paired with the data-provenance and output-description fields, is the minimum record a bank needs to bring a generative system inside the same governance perimeter as its predictive models — even while SR 26-2 leaves the formal validation methodology to broader practice. The supervisor wrote the field. The framework deferred the methodology. The bank can use the first to discharge the second.

Section 05

From Schema to Evidence

A field is only as good as the trace behind it. A model inventory whose cells are typed in by hand, once, and never reconciled against the running system is a compliance artifact, not a governance one. The discipline that makes the read-across worth anything is the production of each field from a durable, tamper-evident record rather than from recollection — the difference between a description of what a system does and evidence of what it did. The principles that make such a trail genuinely auditable are the subject of a companion paper; 15 here we map the three SR 26-2 pillars onto concrete evidence mechanisms, grounded in the keller-platform reference implementation.

Develop. The develop pillar asks what a model is, what it consumes, and how it changes. In the reference implementation, the inputs behind each decision are carried as observation citations rather than as free text (src/server/src/agents/kai/integrations/decision_tracing/tools.py), and change control is enforced by write-validation and restricted- execution hooks (src/server/src/agents/kai/hooks.py) that refuse unauthorized mutations rather than logging them after the fact. Provenance becomes a property of the record, not a recollection appended to it. One develop-pillar field has no native backing today: the inventory's AI-classification tag — the model family, including the generative distinction — is not captured in the current trace schema, a gap Section 7 states in full rather than papering over with a citation that would not bear it.

Validate. The validate pillar asks whether a model was checked before use and is watched after. The reference implementation gates consequential decisions through a Stop-hook validation gate (src/server/src/agents/kai/integrations/decision_tracing/integration.py, lines 72–84) — an independent checkpoint invoked before a decision is committed, the architectural analogue of effective challenge and the four-eyes requirement. This is the classification-to-enforcement move a companion paper develops at length: a check that blocks rather than merely observes is the difference between governance and observability theater. 16 Ongoing validation is carried by compliance drift scanning whose results land as append-only finding events (src/server/src/services/drift/orchestrator.py; src/server/src/database/types.py), so a drift finding is a fact in the record, not a transient dashboard reading.

Govern. The govern pillar asks who is accountable and whether the inventory is real. The identity and approval fields are backed by the append-only decision trace store (src/server/src/agents/kai/integrations/decision_tracing/store.py) and by project-approval events (src/server/src/database/types.py), so that the Authorization-to-Operate analog is an event with an actor and a timestamp rather than a checkbox. Because the underlying stores are append-only, the govern-pillar record supports the one thing a model inventory most needs and most often lacks: a change history an independent reviewer can replay. The inventory field says this was approved; the trace says by whom, when, and against what evidence.

The mapping is deliberately concrete because the failure mode is concrete. Banks already maintain model inventories; what they frequently lack is the evidence chain that makes each field defensible under independent challenge. The Fed's schema tells a bank which fields to carry. The evidence discipline tells it how to carry each one so that the field survives an auditor who asks show me. The schema is the form; the trace is the proof; neither is sufficient alone.

Section 06

Not a Gotcha

It would be easy, and wrong, to read this paper as an accusation: the Fed demands of banks a discipline it exempts itself from. The record does not support that reading, and we will not make it. The Fed governs its own AI under OMB Memorandum M-25-21, the federal-agency AI governance mandate, with named structures: a Chief AI Officer, Anderson Monken, who holds approval authority for high-impact AI; an AI Program Team that administers policy and maintains the inventory; an AI Enablement Working Group; and a Technology Oversight Committee that reviews enterprise AI investment. 3 4 For high-impact AI, the Fed's published compliance plan specifies minimum risk-management practices: impact assessments, explicit CAIO approval, human-in- the-loop oversight, ongoing monitoring, annual waiver recertification, technical controls to detect and terminate noncompliant use, and documentation and validation. 3 That is a model-risk discipline applied to the Fed's own systems.

The obvious rebuttal to the read-across is therefore that M-25-21 and SR 26-2 serve different mandates, and the rebuttal is correct. M-25-21 is an executive-branch directive governing how federal agencies use their own AI. 5 SR 26-2 is interagency supervisory guidance addressed to the banks the agencies oversee. 7 One is an agency governing itself; the other is a supervisor guiding the supervised. They rest on different legal authorities, pursue different objectives, and bind different parties. Nothing in this paper claims otherwise, and any argument that depended on their equivalence would collapse on contact with that distinction.

The contribution survives the distinction precisely because it never depended on equivalence. The claim is schema isomorphism, not mandate equivalence. Two regimes with entirely different legal foundations converged on the same small set of fields because both face the same underlying task: characterizing a portfolio of AI systems well enough to govern it. That convergence is evidence that the fields are not arbitrary — they appear to be a recurring core of the metadata that governing a model portfolio requires, recurring across the regimes examined here. A bank does not adopt the Fed's schema because the Fed is subject to the bank's rules. It adopts the schema because the Fed, solving its own version of the same problem, already did the work of enumerating the fields, and published the result.

Section 07

The Honest Gaps

A read-across that only flattered both schemas would not be worth printing. Two sets of limits bound the argument, and stating them is part of the discipline. The first set concerns what the inventory schema does and does not capture. The second concerns what the reference implementation does not yet evidence.

The inventory schema has real boundaries. It carries no separate rights-impacting or safety-impacting flag of the kind some agency inventories maintain; impact is collapsed into the single high-impact screen and its justification. 2 And the zero-high-impact count is a screening outcome, not a proof that no use case carries material risk. A screen returns zero when no entry trips its threshold; that is a statement about the threshold and the entries, not a guarantee about the world. A bank lifting this schema inherits the same limit: the high-impact field is only as protective as the materiality criterion behind it, and SR 26-2's materiality call — exposure and purpose — is exactly where the judgment lives. 10 The schema tells you to make the call. It does not make it for you.

The reference implementation has gaps of its own, and they map onto specific inventory fields. There is no explicit PII field in the decision-trace schema — no pii_involved boolean or equivalent exists in the trace types — so the inventory's PII-and-PIA pairing cannot be reproduced from the current record without a schema extension. There is no demographic-variable or fair-lending feature flag, so the inventory's demographic-variables field — demographic variables used as model features — has no analog in the present implementation; for a lending model that omission is material, and we name it rather than paper over it. The AI-classification logging does not carry a model-family or is-generative flag (src/server/src/agents/kai/agent.py), so the inventory's generative distinction — the one Section 4 leans on — cannot be reconstructed from current trace data without extending the schema. And, to forestall a conflation we have seen elsewhere: there is no OPA/Rego policy engine inside keller-platform. The Rego conformance gates referenced in KellerAI's governance writing live in the whitepaper build pipeline, not in keller-platform; the two should not be merged in the reader's mind.

These gaps do not weaken the read-across; they locate it. The schema isomorphism is a claim about fields, and it holds at the field level. Whether a given system can populate every field is a separate, implementation-specific question, and the honest answer for the reference implementation is: most of them, today, and the rest require named, bounded schema extensions. That is the correct shape for a governance claim — the residual is relocated to the specification, where it is visible and fixable, rather than hidden in a confident summary.

Section 08

Adopting the Mirror

The practical recommendation is direct: lift the twenty-six- field schema as the skeleton of a model-inventory template. The fields are already public, already enumerated by a regulator, and already proven to characterize a real portfolio of thirty-eight systems. 1 A bank starting from a blank model- inventory specification can begin from the Fed's columns rather than from first principles, then extend them with the SR 26-2 and OCC Bulletin 2026-13 inventory fields the disclosure schema does not carry — risk-classification level, validation date, independent-review completion, known limitations. 8 The union of the two lists is close to a complete model-inventory specification.

Each field should then be tagged to a pillar and wired to an evidence source, exactly as Section 5 maps it. Tag the classification, purpose, data, and provenance fields to develop, and source them from the decision-and-citation record. Tag the high-impact screen, the impact-assessment pairing, and the human-oversight field to validate, and source them from the pre-commit checkpoint and the screening criterion. Tag the identity, approval, and monitoring fields to govern, and source them from append-only approval and finding events. The mapping turns a static form into a living inventory whose every cell has a provenance an auditor can follow.

Seen this way, the Fed inventory takes its place beside the other regulator-authored evidence schemas a governed AI program already contends with. The EU AI Act's Annex IV specifies the technical documentation a high-risk system must maintain — a second regulator-authored schema for the same underlying problem 13 — and a companion KellerAI paper treats it as exactly that. 17 The Fed's inventory is a third such schema, authored by a different regulator under a different mandate, converging on much the same fields. When three regulatory regimes — two of them banking or AI-governance regimes rather than fully independent samples — enumerate substantially the same metadata, the metadata is unlikely to be a compliance artifact peculiar to one rule. It reads as a recurring core of what governing an AI system requires, and the convergence looks structural rather than coincidental.

The supervisor published the mirror. It enumerated, for its own disclosure, the exact fields by which an AI portfolio is characterized, classified, traced, and approved — and made the enumeration public. A regulated bank does not need to ask what its model-risk inventory should contain. It can look into the supervisor's mirror and read the fields back. The form is free; the discipline of filling each field with evidence that survives independent challenge is the work that remains — and it is the only part that was ever the point.

Related KellerAI papers: The EU AI Act's August Enforcement treats Annex IV as a parallel regulator-authored evidence schema; The Audit You Can Audit sets out the principles that make a decision trail auditable; and From Observability to Action develops the classification-to-enforcement move this paper's evidence discipline depends on.

End of paper

↑ Back to top

The Supervisor's Mirror

Context

The Finding