The agent cannot gate itself.
Every production agent operates inside an authorization gap. The model knows what it was asked to do. It does not know what it is permitted to do — not with the rigor a financial or infrastructure control requires. Prompt instructions are not access policies. A system prompt that says "only process refunds under $500" is a suggestion the model can misread, misapply, or be manipulated into ignoring.
The gap matters most at the edges: high-value transactions, irreversible operations, cross-tenant actions. Precisely the moments where an authorization error is most costly are the moments when a reasoning model under an adversarial or ambiguous input is least reliable as its own enforcer.
Prompt instructions are not access policies. A system prompt is a suggestion the model can misread, misapply, or be manipulated into ignoring.
This is not an indictment of the model. It is an architectural observation. The model is the wrong layer for authorization. The right layer is an out-of-process gate that the model cannot override — one that evaluates identity, permission scope, and resource context before any external action is executed.
Three enforcement layers, composed.
A working access-control regime for agents does not look like a single permission flag. It looks like three enforcement layers that must all pass before an action reaches the external world.
Identity. Every agent call carries a verifiable identity — not a session token the model generated, but a cryptographically bound credential issued at provisioning time. The identity determines which capability tier the agent operates in and which workspaces it can touch.
Permission. Capabilities are not granted individually. They compose through a conjunctive lattice: an agent that holds payments:write and accounts:read can initiate a refund, but only if the target account is in its assigned workspace. No single permission grants cross-workspace reach. The conjunction is enforced by the gate, not asserted by the model.
Resource. Even a permitted action on a permitted resource is gated by context — transaction ceiling, rate limit, reversibility class. A DELETE on a soft-deletable record clears. The same call on a hard-delete path escalates to a human confirmation queue before execution.
Every action that clears all three layers is written to an immutable trace. Not a log the agent writes — a trace the gate writes, independently, before returning the response. The trace is the audit artifact SR 26-2 governance requires.
Silent scope expansion is the quiet failure mode.
The failure mode that access engineering is specifically designed to prevent is not dramatic. It does not announce itself. It looks like this: a stronger model replaces a weaker one, inheriting the same capability tier. The new model is better at reasoning, better at tool use, better at recovering from ambiguous instructions. It begins reaching further — not because its permissions changed, but because it is more capable of using the permissions it already has.
Scope expansion through model upgrade is not a security breach. No access control was violated. Every action the new model took was permitted. The problem is that the risk profile of the tier shifted without any deliberate human decision to expand it. The access regime was designed for a less capable actor and was never recalibrated.
This is the gap that today's deployments have not closed: there is no mechanism to measure the effective escape rate of a capability tier across model versions, and no backtest harness to simulate what a new model would have done with the same permissions against historical inputs. The in-depth companion documents both where these controls are implemented and where they are not yet.
The in-depth: what the architecture actually looks like.
The patterns described here are not theoretical. They are drawn from a deployed system operating across payment, deployment, and data-deletion surfaces. The in-depth companion works through the full architecture: the 16-permission lattice, workspace isolation mechanics, action-trace schema, capability tier assignment, and the honest accounting of what is and is not yet instrumented.
It also covers the regulatory frame — where SR 26-2 model-risk governance intersects with agentic authorization, what a model-risk examiner would ask to see, and which of those artifacts the current architecture produces versus which remain gaps. If your organization is evaluating agent deployment in a supervised financial or healthcare context, the in-depth is the operative document.
Read the in-depth companion →