The Cost Tier of Frontier Access: Tier-Stratified Rates, Fallback Billing, and Routing Economics as a Governance Artifact

KellerAI

KellerAI White Paper · Frontier Tier Governance · June 2026

The Cost Tier of Frontier Access

Tier-stratified rates, fallback billing, and routing economics as a governance artifact.

In-Depth · 8 sections · 28 references · ~4,000 words · 2026-06-09

Section 01

Abstract

Anthropic released Claude Fable 5 and Claude Mythos 5 on June 9, 2026, priced identically at $10 per million input tokens and $50 per million output tokens. 1 Claude Opus 4.8, released twelve days earlier, stayed at $5 and $25, unchanged, Anthropic states, from Opus 4.7. 2 On every pricing dimension Anthropic publishes (base input, both cache-write tiers, cache reads, and output), the frontier rate card is exactly 2.0x the Opus 4.8 rate card. 2 The multiplier is not approximate and not mixed. It is uniform.

Our previous paper in this lane, The Token You Didn't Count, established three cost vectors: tokenizer opacity, the end of flat rate, and agentic loop amplification. It argued that effective cost per outcome, not the headline price, is the governance artifact — and it assumed, everywhere, one published rate per model. 21 Fable 5 ends that assumption. The rate card itself is now a function of access tier, and a tier-stratified rate is a fourth cost vector the three-vector taxonomy does not cover.

This paper does three things the predecessor could not. It treats the tier premium as a routing decision: frontier access pays exactly when your measured capability delta per outcome beats 2.0x. It documents the tier era's first new billing primitive, the fallback credit, and the billing rule that ships with it — each attempt in a fallback chain bills at the rates of the model that ran it. 5 And it extends the predecessor's cost-observability design with a tier dimension, so cost per outcome can be computed per tier, fallback-served tokens included.

Two boundaries govern everything below. First, cost is a performance dimension, not a safety property; the safeguard mechanism, the access boundary, and the disclosure metric are governed in this series' companion papers. 21 Second, the fallback at the center of the billing story is disclosed, not silent: Anthropic states that users are informed whenever a fallback occurs 9, and in the Messages API the default is a structured refusal, with server-side fallback as an explicit opt-in. 10

Section 02

The Fourth Cost Vector: A Rate That Is a Function of the Tier

Anthropic prices both configurations identically: $10 per million input tokens, $50 per million output. 1 The announcement anchors that number in one direction only — “less than half the price of Claude Mythos Preview.” 1 Mythos Preview's price appears in no public Anthropic document we could locate; the announcement, the pricing page, and the model-introduction page are all silent on it. 1 The figure exists secondhand: llm-stats reports Mythos Preview at $25 per million input and $125 per million output, which would make Fable 5 exactly 0.4x its predecessor. 6 The same report notes the Preview was limited to 12 Glasswing partner organizations plus 40 vetted organizations, so the comparison price was a limited-access tier price, not a market price. 6

Independent coverage uniformly chose the other anchor. TechCrunch: “double the price of Opus 4.8.” 18 The Decoder: the price almost doubles compared to Opus, and on Claude.ai plans the new models draw usage credits at 2x. 20 VentureBeat's independent pricing comparison places Fable 5 as the most expensive major AI model available globally. 19 Both anchors are arithmetically true. The difference is presentation: the vendor compares downward from a predecessor almost nobody could buy, the press compares upward from the model you already run. We read this as a presentation finding, not deception — and it is exactly why the premium should be stated plainly.

So state it plainly. Anthropic's own pricing table lists Opus 4.8 at $5 base input, $6.25 five-minute cache write, $10 one-hour cache write, $0.50 cache read, and $25 output, per million tokens. 2 The same table lists Fable 5 at $10, $12.50, $20, $1, and $50. 2 Every dimension is exactly 2.0x. The exactness is structural rather than coincidental: cache prices are uniform multipliers of base input across all models — 1.25x for five-minute writes, 2x for one-hour writes, 0.1x for reads — so doubling the base doubles the entire row. 3 That makes the tier premium a clean governance quantity. There is no blur to argue about. The premium is one number, 2.0x, and it survives every cache configuration.

The last paper was about a bill that grew while the price stood still. This one is about a price that depends on which tier of the frontier you are allowed to buy.

The fourth cost vector

A premium band above the workhorse rate is not new. Anthropic's pricing page still lists the deprecated Claude Opus 4 and 4.1 at $15 and $75 — three times today's Opus rate. 8 OpenAI maintains its own band: GPT-5.4 Pro at $30/$180 against o3 at $10/$40. 8 And the band persists against a falling market: average frontier output-token pricing fell roughly 94.5% from March 2023 to April 2026. 8 What is new is the coupling. The old premium bands priced different models. Fable 5 and Mythos 5 are, Anthropic states, two configurations of one model, sold at one price — within the frontier tier, access is governed by vetting, not by the rate card. 1 The pricing page makes the point typographically: Mythos 5 sits in the same $10/$50 row, marked limited availability — the restricted tier and the public tier share one rate card, and what the vetting gates is the safeguard configuration, not a discount. 1 The premium buys the tier itself.

The predecessor's entire apparatus — effective cost as count times rate — assumed the rate was one published number per model. 21 Tiering breaks the assumption without touching the arithmetic. The decoupling thesis survives and gains a variable: the bill is now a function of token counts you cannot fully predict, billed at a rate determined by which tier of the frontier you chose.

Section 03

Agentic Amplification at the Doubled Rate

The predecessor's strongest claim was about interaction: a code-heavy agentic workload on a newly repriced, newly metered platform experiences not the sum of the three effects but their product. 21 The tier premium is a new factor in that product. An agent loops — plan, act, observe, revise — and every turn is billable whether or not it advanced the task. 21 At the frontier tier the product becomes rate-of-tier times token count times turns, and the rate term just doubled.

The incident base for loop economics is established; we inherit it rather than re-derive it. Uber expanded AI coding tools to thousands of engineers and exhausted its full-year 2026 AI budget by April. 23 Per-developer agentic spend varies by an order of magnitude for nominally similar work. 23 One large vendor pulled back most internal agent-coding licenses for an engineering division rather than absorb open-ended consumption. 23 The FinOps Foundation's 2026 survey found AI spend management moved from a minority practice to near-universal in a single year, with the spend described as functionally hard to forecast. 23 Every one of those incidents happened at workhorse rates. The frontier tier replays the same mechanics at 2.0x.

The predecessor's second pre-migration check — cost per completed task, not per call — has a frontier-tier version. 21 The old trap was an upgrade cheaper per call that quietly takes more turns. The new trap inverts it: a model capable enough to need fewer turns, at twice the rate per turn. Whether that is a trap or a bargain turns on one ratio — does the turn reduction beat the rate doubling for your workload? No rate card answers that question. Cost per completed task, measured by tier, does.

The shape of that ratio is worth one illustration, with the assumption stated. Hold tokens per turn equal across tiers, and take a loop averaging 50,000 effective input tokens and 2,000 output tokens per turn. At Opus 4.8 rates that turn bills roughly $0.30; at Fable 5 rates, $0.60. 2 A 40-turn task then costs about $12 at the workhorse tier and $24 at the frontier tier, so for the frontier tier to win on cost alone it must finish the same task in fewer than 20 turns — a better-than-half turn reduction at equal tokens per turn. 2 Nothing in the vendor-reported benchmark deltas tells you whether your workload clears that bar. 15 The illustration is not a forecast. It is the bar your per-tier measurement has to beat.

The capability side of the ratio is, today, vendor-reported. Anthropic's harness shows Fable 5 at 80.3 on SWE-bench Pro against 69.2 for Opus 4.8, and 29.3% on Cognition's FrontierCode Diamond against 13.4%. 15 No independent reproduction existed at launch; Artificial Analysis's GDPval-AA ranking is partial corroboration at best. 15 Treat those deltas as the hypothesis your per-tier measurement will test, not as inputs to a budget.

One inherited caution closes the section. The predecessor documented that the Opus 4.7 tokenizer change produces 1x to 1.35x the tokens for identical input — the mechanism that made token counts untrustworthy as a planning quantity. 22 Whether Fable 5 shares that tokenizer is stated nowhere we could locate. The mechanism is inherited; the number is not. Run a count_tokens audit on a representative sample of your own traffic before projecting frontier-tier costs from workhorse-tier counts. 22

Section 04

Cache Economics at the Frontier Tier

Caching remains the highest-leverage offset at the new tier. The Fable 5 schedule: $10 base input, $12.50 five-minute cache write, $20 one-hour cache write, $1 cache read, $50 output, per million tokens. 3 Anthropic states the breakeven directly: caching pays off after one cache read on the five-minute tier and after two reads on the one-hour tier. 3

The TTL mechanics decide whether an agent harness actually collects the offset. The default cache lifetime is five minutes, refreshed at no additional cost on every hit, so an active loop stays warm indefinitely. 4 An idle gap longer than the TTL forces a full re-write at the write premium. 4 Claude Code defaults to the five-minute tier on per-token billing, with the one-hour tier an explicit opt-in via ENABLE_PROMPT_CACHING_1H=1 . 4

Here is the arithmetic that matters for the tier decision, shown once. Take a turn whose input is 90% cache reads and 10% fresh tokens. On Fable 5 that bills 0.9 × $1 + 0.1 × $10 = $1.90 per million effective input tokens. 3 On Opus 4.8 the same mix bills 0.9 × $0.50 + 0.1 × $5 = $0.95. 3 Still exactly 2.0x. Because the multipliers are uniform, the tier premium is invariant under cache mix: the cache lowers the absolute bill and leaves the ratio untouched. 3 You cannot cache your way out of the tier decision. You can only change its stakes.

The offset is still worth governing, because it is large. Cache reads bill at roughly a tenth of fresh input, and one widely reported figure has Claude Code saving on the order of 300 million tokens per week through caching. 23 At frontier rates, the same swing in hit rate moves twice as many dollars.

The batch lane behaves the same way, and it is the one place the levels line up usefully. Anthropic's Batch API bills both tiers at a 50% discount — Fable 5 batch traffic at $5 input and $25 output per million tokens, Opus 4.8 batch traffic at $2.50 and $12.50. 1 Two things follow. The ratio survives another discount: batched frontier traffic still costs exactly 2.0x batched workhorse traffic. 2 And the levels produce a usable coincidence — a batched Fable 5 token costs precisely what an interactive Opus 4.8 token costs, so a workload that tolerates asynchronous turnaround can buy the frontier tier at the workhorse tier's rate card. 1 The discount changes what you pay. It never changes which tier decision you face.

And the cache's terms are not yours. Anthropic introduced a one-hour Claude Code cache around February 1, 2026, then reverted to a five-minute TTL around March 7 — observed independently across two machines and two accounts, with no user opt-in and no setting to pin it. 7 Nothing in your contract froze the cache schedule. A TTL change is a price change for any workload with idle gaps, and it arrives without any price field moving. The predecessor's cache-hit-rate check therefore gains a tier dimension: measure hit rate per tier, and treat a drop as a first-class billing event, not a footnote. 21

Section 05

Fallback Billing: The Tier Era's First New Primitive

Everything in this section concerns a disclosed mechanism. Anthropic states that when Fable's classifiers detect a request related to cybersecurity, biology and chemistry, or distillation, the response is handled by Claude Opus 4.8 instead, and that “users will be informed whenever this occurs.” 9 In the Messages API there is no automatic substitution: the default is a structured refusal — HTTP 200, stop_reason: "refusal", a category naming the policy area — and server-side fallback is an explicit opt-in, reflected in the response object. 10 A developer opts in by naming up to three fallback models in a fallbacks parameter under the server-side-fallback-2026-06-01 beta header, or by configuring the SDK middleware once on the client. 11

The billing rules are documented, and they are tier-aware in a way no previous rate card needed to be. A request Fable 5 refuses before generating any output is not billed at all and does not count against rate limits; a mid-stream refusal bills the input and whatever output already streamed, at normal rates. 5 When a fallback chain runs, Anthropic's rule is explicit: “You pay for the model that actually serves the request. Each attempt is billed separately, at the rates of the model that ran it.” 5 A fallback-served answer therefore bills at Opus 4.8 rates, not Fable 5 rates, and the tier premium is charged only when the frontier model actually serves. We flag this plainly because our own earlier research notes recorded the fallback billing rate as undocumented; the launch-day documentation answers it, and Section 08 records the correction. 5

The genuinely new primitive sits one layer down, in the cache. Prompt caches are per-model, so a conversation cached for Fable 5 must be written from scratch into the fallback model's cache, and cache writes cost more than cache reads. 5 The fallback credit removes that switching cost: the refusal carries a credit token, the retry echoes it under the fallback-credit-2026-06-01 beta header, and the retry is billed as though the conversation had been on the new model all along. 5 The token expires after five minutes, and the refund is visible in the retry's usage fields as cache writes repriced to reads. 5 As far as we can determine, this is the first billing primitive whose entire purpose is to make crossing a tier boundary cost-neutral at the cache layer.

Agent harnesses pay the boundary differently. In Claude Code a safety fallback is a model switch, and each model has its own cache, so the fallback turn reprocesses the entire conversation history at uncached rates. 5 A long session that trips one classifier late pays a one-time full-history re-read. The deeper the session, the bigger the toll.

Both surfaces then make the switch sticky. On consumer surfaces, the model picker stays on Opus for the rest of the conversation after a switch. 13 On the API, sticky routing serves later turns of a fallen-back conversation directly from the fallback model for roughly an hour, best-effort, precisely to avoid re-billing a predictably declined attempt on every turn. 12 One trigger can convert a session's remaining turns to the workhorse tier. Your workload's effective tier is not a setting. It is an emergent property of your content and the classifiers.

The audit trail for all of this exists. The response's top-level model field names the model that produced it, a fallback content block marks each model boundary, and usage.iterations records every attempt with its own token counts. 12 That is a per-request record of which tier served what — sufficient, if you capture it, to compute your effective tier mix and your true blended rate. Capturing it is the work of the next section.

Section 06

Routing Economics: When Does Frontier Pay

Every request now has a tier choice: Fable 5 at 2.0x, or Opus 4.8 directly. The rate card cannot make the choice, because the rate card only knows the premium. The decision needs two quantities the rate card does not carry — the capability delta per outcome for your workload, and the fallback leakage of your workload.

On leakage, the published number is a global bound. Anthropic reports that “more than 95% of Fable sessions involve no fallback at all,” from pre-launch early data, with sessions rather than requests as the unit — and nothing published at launch lets anyone outside Anthropic verify it. 14 A one-sided global bound tells you almost nothing about your tenant, because triggers concentrate by domain. SANS reported routine incident-response, detection, and basic forensics workflows auto-routed to Opus 4.8 in initial testing, and Anthropic describes the tuning as intentionally conservative. 17 A security or life-sciences workload can live on the wrong side of the average.

Launch commentary drew the budget conclusion early: a workload heavy in safeguarded territory “may pay the Fable 5 premium while receiving Opus 4.8 answers, in which case routing directly to Opus 4.8 is both cheaper and equivalent.” 16 The documented billing rule narrows that claim on metered surfaces, since per-token billing follows the serving model and the premium is not charged on tokens Opus serves. 5 But the conclusion survives where it counts. On subscription surfaces, The Decoder reports frontier turns draw usage credits at 2x, with no stated carve-out for fallback-served turns. 20 In harnesses, every late trigger costs a full-history cache re-read. 5 On streaming requests, time to first byte includes the declined attempt. 12 For a fallback-heavy workload, routing directly to the workhorse tier remains both cheaper and equivalent — now for documented reasons rather than assumed ones.

The routing rule we recommend is the predecessor's artifact with a tier dimension. Route a workload to the frontier tier only while its measured cost per outcome at Fable 5 is lower than its measured cost per outcome at Opus 4.8 — that is, only while the outcome delta beats the 2.0x premium net of fallback leakage. Anthropic's benchmark deltas are the hypothesis. 15 Your per-tier outcome data is the test. 21

Instrumenting the rule extends the predecessor's three-layer design, and we propose it as a design, not a product. Layer one, per-request capture: log the serving model and tier for every request; usage.iterations and the model field already carry what you need. 12 Layer two, per-tier normalization: compute effective cost per request at the serving model's actual rates — fallback-served tokens, cache reads, and cache writes included. 21 Layer three, outcome linkage by tier: attach spend to resolved tickets, completed tasks, and shipped features, and report cost per outcome per tier with the routing rule as a standing alert. 21 None of this is shipped tooling. All of it is buildable from fields the API already returns. 12

One response field makes the design sharper than its predecessor could be. A refusal names its policy area: stop_details.category returns cyber, bio, or reasoning_extraction, so fallback leakage can be measured per request family, not just per tenant. 12 That turns the tier choice from one decision into a portfolio. Families whose measured leakage is low compete for the frontier tier on the cost-per-outcome rule; families that predictably trip a classifier route straight to Opus 4.8 — the same model that would have served them anyway, with no declined attempt ahead of the first byte and no cross-model cache toll in a harness. 5 A global session percentage cannot make that split. Your per-category capture can. 14

Section 07

The Regulated Overlay

None of this requires new regulation to matter. The frameworks that govern model risk already have a slot for a tier change; the work is filing it. The predecessor's framing carries over unchanged: these are obligations to verify against your facts, and the discipline is to document the analysis rather than assume it away. 21

The Federal Reserve's model-risk guidance — SR 11-7, revised in April 2026 as SR 26-2 — makes ongoing monitoring, independent validation, and effective challenge core obligations for models in use. 24 Selecting a capability tier is a modeling decision, and a workload whose effective tier drifts through fallback is a model change happening to you. The per-tier cost-per-outcome record from Section 06 is the natural ongoing-monitoring artifact: it demonstrates, with numbers, that someone is watching what the tier decision does. 24

NIST's AI Risk Management Framework states the same discipline in AI-native terms. The MEASURE function expects systems tested “before their deployment and regularly while in operation.” 25 MANAGE 4.1 expects implemented post-deployment monitoring plans, including appeal and override mechanisms and change management, and MANAGE 3.2 expects pre-trained models monitored as part of regular system maintenance. 25 Cost sits under Measure as a performance dimension; a tier migration is a Manage-stage change event.

ISO/IEC 42001, the AI management-system standard, builds conformance around documented change controls across the AI system life cycle. 26 Three changes from this paper belong in that change record: a tier change you make, a fallback policy you configure, and a cache-TTL change the vendor makes for you — Section 04's incident is exactly the kind of unannounced operational change a change record exists to catch. 7 SOC 2's CC8.1 asks whether the entity “authorizes, designs, develops or acquires, configures, documents, tests, approves and implements changes” to its infrastructure and software; per-tier normalization is audit evidence that the change process saw the tier. 27

The EU AI Act's Article 72 requires providers of high-risk AI systems to “actively and systematically collect, document and analyse relevant data” on system performance throughout the lifetime, under a documented post-market monitoring plan. 28 The obligation binds high-risk-system providers, not general-purpose model vendors as such; it reaches a Fable 5 deployment by analogy and through the downstream deployers whose systems embed it. 28 Keep the predecessor's careful scoping here. 21 Its open question — whether a pricing-relevant model change is a “substantial modification” under the Act — also acquires a tier analogue: whether a forced tier migration is one. 21 We do not assert an answer. The defensible posture is the same as before: document the analysis. 21

Section 08

Honest Limits

The 2.0x figure rests on Anthropic's own pricing and caching tables, re-confirmed live against the published pages on June 9, 2026. 2 Those tables are vendor-controlled and can change without notice; the claim is dated, not eternal. 3

Mythos Preview's $25/$125 price is secondary-sourced only. No Anthropic primary discloses it, and the 0.4x comparison should be read as reported by llm-stats, never as established fact. 6

The historical premium band is scoped deliberately. benchlm.ai's pricing ladder conflicts with Anthropic's pricing page on the recent Opus rows and omits Opus 4.8 and Fable 5 entirely, so we cite the band only where the vendor's own page carries it: the deprecated Opus 4 and 4.1 rows at $15/$75. 8

One claim in our research base did not survive verification, and we corrected it rather than shipping it. The salvaged extracts behind this paper recorded the fallback billing rate as answered nowhere. The launch-day documentation answers it: each attempt bills at the rates of the model that ran it. 5 A paper that had asserted the absence would have been wrong about its own headline novelty. The discipline that caught the error — re-verify every claim against the primary source on the day you publish — is the same discipline this paper asks of your cost dashboards.

Whether Fable 5 shares the Opus 4.7 tokenizer is undocumented. The 1x–1.35x token-count figure is cited here only as the predecessor mechanism that made counts untrustworthy, never as a Fable 5 property. 22

Per-tier agentic amplification data does not yet exist in public sources, and Section 06's routing framework is a proposed design: unbuilt, unbenchmarked, and labeled as such. Section 03's worked example assumes equal tokens per turn across tiers — an illustrative assumption, not a measured one, and real workloads will differ in both directions. The capability deltas a router would act on are Anthropic-run, with no independent reproduction at launch. 15

Last, the disclosure this series owes you: these papers are written about the model family that runs the pipeline drafting them. That is why every Anthropic claim above is attributed rather than asserted, and why vendor benchmarks are labeled the hypothesis rather than the evidence. The structural claim survives every limit on this list — the rate is now a function of the tier — because it rests on the mechanism, not on any one magnitude. 21

A short, executive version of this argument is published as the companion brief, When the Rate Card Has Tiers . It states the stakes, the 2.0x premium, and the routing rule in five sections, without the citations carried here.

End of paper

↑ Back to top

The rate card now has tiers

Context

The Finding