Skip to main content
kellerai.blog

One budget, one rule

Token spend and usage limits are not two problems. They are one budget, governed by one rule, and a frontier-tier model makes every term in it bigger.

KellerAI White Paper · AI Economics · Jun 2026

Context

Nobody running agents pays per token. You pay per request, times how many agents you fan out, times the tier multiplier you set once and then forget. Fable 5 doubles the per-token rate — but the real multiplier is the one already baked into your architecture.

The Finding

Token spend and usage limits are the same number viewed from two directions. The five levers that move the bill — model tier, fan-out, request length, cache hit rate, and batch lane — are the same levers that push you toward the usage wall. Adjust one, you move both. We know because our own research run for this paper hit the wall twice in 24 hours.

Tags:
Token economy & usage limitsAgentic cost governanceFrontier model fan-out economics
Paper Details
CategoryAI Economics
AudienceEngineers and FinOps leads who run agentic workloads and need the one-rule framing for Fable 5 token economics without the full technical treatment.
MethodExecutive distillation of the companion whitepaper, which live-verified every claim against Anthropic's launch-day documentation and the authors' own usage-limit incidents during the research run. The brief carries no citations by corpus convention; the companion substantiates every claim.
Length~800 · 3 min
Sections5
DateJun 2026
AuthorsKellerAI
Read the full paper
Related
Placeholder — pending analytics

KellerAI Executive Brief · June 2026 · Frontier Tier Governance

The Fable 5 Token Economy

Token spend and usage limits are not two problems. They are one budget, governed by one rule, and a frontier-tier model makes every term in it bigger.

On June 9, 2026, Anthropic released Claude Fable 5 at twice the per-token price of the workhorse model you already run. But nobody running agents pays per token. You pay per request, times how many agents you fan out, times the tier multiplier you set once and then forget. This brief gives you the one rule that governs that bill, the five levers that move it, and the reason the usage wall is something you can schedule rather than something that ends your day. We know the rule holds because our own research run for this paper hit the wall twice in 24 hours.

Section 01

Two Walls in 24 Hours

On the evening of June 9, a Fable 5 session authoring this series invoked one research skill. That single call fanned out four concurrent research harnesses and climbed to roughly 261 active subagents at its peak. The session hit its usage limit mid-run, stalled in flight, and would not move again until the reset clock turned over at 8:20pm. One command spent a session's entire remaining headroom in minutes.

The next day, the wall came back — and this time it hit the research run for this very paper. The harness fanned out again and burned millions of tokens before the 3pm reset stopped it cold. Two walls, same cause, 24 hours apart, both produced by a fan-out nobody had capped and a tier nobody had chosen. We are publishing our own anti-pattern because it is the cleanest evidence we have.

Section 02

The Rule That Compounds

Anthropic prices Fable 5 at exactly twice Claude Opus 4.8 per token, and that is the smallest part of your bill. The per-token rate is one factor in a product, and the other two are the ones that bite. An agent loops — plan, act, observe, revise — and every turn is billable whether or not it advanced the task. Fan out that loop across many subagents and the turns multiply. So the real cost of agentic work is per-request cost, times fan-out, times the tier multiplier.

Multiplication is why a small mistake becomes a session-ending one. In our run, every one of those 101 subagents inherited the session model — Fable 5, the most expensive tier available — because no dispatch set an override. Commodity work, fetching pages and writing files from supplied text, billed at frontier rates. The tier multiplier did not add a line to the bill. It doubled the whole bill, and the same doubling applies to limit consumption: subscription usage is weighted by model choice, so frontier tokens drain your headroom at a multiple of cheaper ones. The fan-out was the spark. The tier was the accelerant.

The bill is per-request cost times fan-out times the tier multiplier. A frontier model you forgot to override makes every subagent a frontier subagent.

The rule
Section 03

Five Levers

Each term in the rule has a lever, and each lever multiplies a different part of the product. Pull them in order; the early ones move the most.

  • Tier selection. The single largest lever. Subagents inherit the session model unless you override them, so default the cheap tiers up and reserve the frontier for work that truly needs frontier reasoning.

  • Caching. Re-reading the same context every turn is the easiest spend to eliminate. Prompt caching turns repeated context into a fractional read — but idle gaps and context churn break the cache, and a broken cache bills full price.

  • Context discipline. Re-sent context is the majority of a long session's bill, and it grows with the square of the loop. Stage large payloads to files and pass paths; inlining them re-bills the whole payload on every turn.

  • Batching. Work that can wait costs half. The batch lane discounts both tiers, and it stacks with caching — the cheapest token you can buy is a cached, batched one.

  • Fan-out caps. The multiplier you control most directly. A pipeline that hands work down a chain spends far less than a barrier that launches everything at once, and a cap on concurrent subagents is the difference between a run that finishes and one that hits the wall at 261 agents deep.

Section 04

The Wall Is Schedulable

Not every wall is the same wall, and the difference decides what it costs you. A rate limit throttles and clears in seconds. An entitlement wall fails fast and cheap: when our session switched accounts mid-authoring, every fresh request failed in under a second with zero tokens spent — “usage credits required for 1M context” — and the fix was flipping to standard context, no waiting and no spend. The session limit is the dangerous one: it kills work in flight and will not lift until a clock turns over.

But a fixed reset clock is a schedule, and a schedule is something you plan around. Both times the session limit stopped us, the work resumed and completed the moment capacity returned, because the intermediate artifacts had been checkpointed to files. No research was redone. The wall was a pause, not a loss. Checkpoint your work and the session limit stops being an outage and becomes a coffee break with a known end time.

Section 05

The Meter Is Part of the Architecture

On a frontier-tier model, the meter is no longer a billing detail you reconcile at month-end. It is a property of your system, set by the same design choices that decide how your agents fan out, what they carry between turns, and which tier serves each request. The five levers are how you govern it.

For the full argument — the price ladder verified against the rate card, the inheritance trap and the 2.54M-token retrospective, the caching breakeven math, the fan-out amplification factors, the two kinds of wall and the March-to-May limit-change timeline, and the disclosed fallback's economics — read the companion technical whitepaper, Operating a Frontier Model Without Hitting the Wall .

End of brief

↑ Back to top