Skip to main content
kellerai.blog

Who gets the unrestricted model?

Who gets the unrestricted model?

KellerAI White Paper · Model Governance & Upgrades · Jun 2026

Context

In April 2026, Anthropic restricted its Mythos model to a vetted partner cohort, citing the damage it could do in the wrong hands. Two months later it shipped an equally powerful sibling, Claude Fable 5, to the general public — safe, by the vendor's account, because classifiers re-route risky requests — while Mythos 5 stayed gated behind Project Glasswing.

Anthropic's Responsible Scaling Policy names the mechanism separating the two products: a vetting process that acts as a compensating control when a partner's use case requires adjusting safeguards. When the safeguard can be lifted for the right customer, the safeguard is no longer the control. The vetting is.

The Finding

The vendor publishes the vetting program's outcomes — Anthropic reports roughly 50 initial partners found more than 10,000 high- or critical-severity flaws — but not the criteria that gate access, which it describes only as "our security requirements." The load-bearing safety control is asserted, not auditable.

The brief hands an executive the mirror-image discipline: a capability tier is a new data-sensitivity axis, governed with the access machinery your organization already runs — a tier registry that includes the fallback model you never chose, role mapping for people and service principals, a change-managed approval gate for tier escalation, and an incident path for tier changes you did not initiate.

Tags:
Tiered model accessAI access governanceFrontier model safeguards
Paper Details
CategoryModel Governance & Upgrades
AudienceExecutives accountable for AI risk who will not read the technical companion
MethodExecutive distillation of the in-depth companion's argument; all factual claims attributed in prose to their sources (Anthropic announcements and system card, CNBC, OpenAI program pages) and substantiated with full citations in the companion whitepaper.
Length~1,400 · 6 min
Sections5
DateJun 2026
AuthorsKellerAI
Read the full paper
Related
Placeholder — pending analytics

KellerAI Brief · June 2026 · Frontier Tier Governance

When Access Is the Safeguard

Who gets the unrestricted model?

The frontier now ships in two boxes. On June 9, 2026, Anthropic released what it describes as one model in two configurations: Claude Fable 5, generally available with safeguards that block high-risk work, and Claude Mythos 5, the same reported capabilities with relevant safeguards lifted, restricted to vetted partners. Same reported model, same price. What separates the two products is not capability but vetting — which means access control just became the safety mechanism, and your side of that boundary is yours to govern.

Section 01

Two Boxes, One Frontier

On June 9, 2026, Anthropic released its newest frontier model as two products. The company describes them as two configurations of one model: Claude Fable 5, for general use, carrying safeguards that block tasks in high-risk domains such as biology and cybersecurity; and Claude Mythos 5, with the relevant safeguards lifted, available only to a small number of trusted partners. Anthropic states that the model's capabilities exceed those of any model it has ever made generally available.

Look at what does not separate the two products. Anthropic's platform documentation lists both at the same price — $10 per million input tokens, $50 per million output — with the same context window. The same-underlying-model claim is Anthropic's own and cannot be inspected from outside, but the conclusion holds either way: the only thing the vendor sells differently is the safeguard posture, and the only thing separating the customers is a vetting decision.

The public tier's safeguards are disclosed, not hidden. Anthropic states that when Fable 5's classifiers detect a request touching cybersecurity, biology and chemistry, or model distillation, the response is handled by Claude Opus 4.8 instead, and that users are informed whenever this occurs. In the developer API the default is a block with a structured refusal; routing to the older model is an explicit opt-in. One safeguard in the same release works differently: Anthropic's system card reports a category covering frontier-LLM development that degrades capability with no fallback and no notification, at a reported ~0.03% of traffic — a figure no outside party can measure. That spectrum, from disclosed to invisible, is the subject of this series' first pair. This brief is about the boundary all of it sits inside.

When the same reported model is safe enough for everyone in one box and too dangerous for almost everyone in the other, the difference is not the model. It is the list of people allowed to open the second box.

Section 02

The Two-Month Flip

Here is the one story to hold onto. In April 2026, CNBC reports, Anthropic unveiled a model called Mythos that excelled at finding security flaws in software — and restricted it to a select group, explicitly because of concerns about the model's potential to do damage in the wrong hands. Access ran through Project Glasswing, Anthropic's program for vetted security partners. Two months later, the company shipped an equally powerful model to the general public as Fable 5, safe — by its own account — because classifiers re-route the risky requests.

The capability did not change in those eight weeks. The vendor's risk determination did. And the policy behind that determination is explicit about what carries the weight: Anthropic's Responsible Scaling Policy describes an enhanced due-diligence process for partners and states that this vetting acts as a compensating control when a use case requires adjusting safeguards. Compensating control is auditor language. It means the primary control is absent and something else is standing in for it. For the unrestricted tier, the something else is the vetting.

When the safeguard can be lifted for the right customer, the safeguard is no longer the control. The vetting is.

The claim

Now ask the auditor's next question: vetted against what? Anthropic reports the program's outcomes with precision — roughly 50 initial partners who found more than 10,000 high- or critical-severity security flaws, and an expansion announced June 2 to roughly 150 new organizations across more than 15 countries. The criteria those organizations passed are described, in full, as "our security requirements." They are not enumerated in any public document we could find. The outcome metric is published; the gate is not. When vetting substitutes for safeguards, the vetting criteria are the safety documentation — and for this tier, that documentation is one phrase long.

Section 03

You Already Run This Discipline

If this sounds like a novel governance problem, it is not. Your organization already decides who may touch production, who may see regulated data, who may approve a payment above a threshold. Each of those is an access boundary where the control is not the system's behavior but the list of people cleared to use it. A model capability tier is the same decision on a new axis. Deciding who may invoke a frontier model with safeguards lifted is structurally the decision you already make about who may read the files marked restricted.

And the tier structure is the industry's direction, not one vendor's experiment. OpenAI announced Trusted Access for Cyber in February 2026 — an identity- and trust-based framework for placing enhanced cyber capability in the right hands — and in April it shipped GPT-5.4-Cyber, a deliberately more permissive variant, to its highest tiers. Two vendors converged on identity-based tiered access within four months. One difference deserves executive attention: OpenAI states its vetting uses clear, objective criteria such as identity verification, while Anthropic's security requirements remain unenumerated. If your organization consumes frontier capability from both vendors, you are now subject to two differently shaped vetting regimes for the same class of capability — and only your own records will tell you which regime vetted which credential.

Section 04

The Four Questions

What should you commission? Four questions, each answerable with machinery your organization already runs. They are the discipline we recommend, not tooling anyone ships today — no published framework yet maps access control onto model capability tiers — which is exactly why the questions belong on your desk rather than in a procurement checklist.

1. The tier registry. Which model tiers does your organization consume? This one release yields three capability surfaces in a single vendor relationship: the public safeguarded tier ( claude-fable-5), the restricted unsafeguarded tier (claude-mythos-5), and the fallback model that answers re-routed requests — a tier the vendor chose for you. A registry that omits that last entry is a list of the tiers you picked, not the tiers you run.

2. The role mapping. Who — person or service principal — may invoke each tier, and who granted it? A credential for the unrestricted model passed the vendor's due diligence; the internal grant of that credential to a team or an agent pipeline is your mirror of that vetting, and it deserves a named grantor and a review cycle.

3. The approval workflow. What change-management gate stands between a team and a tier escalation? Moving a workload from the safeguarded tier to the unrestricted one is a change in every sense your change-control process already recognizes. The gate exists; it needs the new column.

4. The incident path. When a workload is answered by a model or a safeguard posture you did not approve — the disclosed fallback, or the degradation Anthropic reports it cannot make visible — who is paged? A tier change you did not initiate is an incident, and an incident with no owner is a finding.

Section 05

Govern Your Side of the Boundary

Anthropic describes the two-tier split as temporary, and its own launch-week statements disagree on the timeline — one report relays "coming weeks," another that no timeline exists. Treat the disagreement as the lesson. Boundaries move; this one moved twice in two months. But the obligations a boundary creates are standing. Someone in your organization can now hold credentials to a safeguards-lifted frontier model, and the vendor can move the boundary without asking you. Both facts persist whether this particular boundary lasts three weeks or three years.

The vendor's vetting governs their side of the boundary. Nothing but your own access discipline governs yours. One note on method: this brief was written with the model family it describes, which is why every Anthropic claim above is attributed rather than asserted. The four questions are how you apply the same skepticism to your own side of the ledger.

For the full argument — the vetting programs and policy text, the OpenAI comparison, the regulatory overlay, and the complete enterprise framework with full citations — read the companion technical whitepaper, Tiered Access Governance .

End of brief

↑ Back to top