Skip to main content
kellerai.blog

The Bug You Fix Twice

Resource-lifecycle bugs don't have a fix — they have a fixpoint.

KellerAI White Paper · Code Quality & Architecture · May 2026

Context

Every engineer knows the bug that comes back. You patch it, the incident closes, and a month later it returns wearing a slightly different shape. You patch it again. The honest conclusion is not that you fixed it badly the first time; it is that the thing you were fixing was never a single bug. It was a system quietly settling into a state it was always going to reach — and your patch only blocked one of the paths into that state. A fixpoint is the name for that state: the place a system ends up under its own dynamics when no contract is steering it elsewhere.

The Finding

A fix patches the symptom that surfaced. A fixpoint is what you have to name to stop every symptom that has not surfaced yet. The practical payoff is a habit you can use at code-review time: for every new await that touches a resource, ask if the system is left in a terminal state on purpose or just where it happens to land. For every new state transition, ask which predicate must hold & where it is enforced. For every subscription, ask who owns it & when they close it. If the answer is 'I don't know,' you have found a fixpoint the system will discover on its own, on its own schedule, in production.

Tags:
Contract SpecificationResource LifecycleAsync Invariants
Paper Details
CategoryCode Quality & Architecture
AudienceEngineering teams, platform architects, & code reviewers responsible for concurrent & distributed systems
MethodProduction incident analysis (5 resource-lifecycle anti-patterns from KellerAI platform) + field study with commit history & remediation saga documentation
Length~980 · 4 min
Sections6
DateMay 2026
AuthorsKellerAI
Read the full paper
Related
Placeholder — pending analytics
Section 01

The bug you fix twice

Every engineer knows the bug that comes back. You patch it, the incident closes, and a month later it returns wearing a slightly different shape. You patch it again. The honest conclusion is not that you fixed it badly the first time. It is that the thing you were fixing was never a single bug. It was a system quietly settling into a state it was always going to reach — and your patch only blocked one of the paths into that state. A fixpoint is the name for that state: the place a system ends up under its own dynamics when no contract is steering it elsewhere. A fix patches the symptom that surfaced. A fixpoint is what you have to name to stop every symptom that has not surfaced yet.

Section 02

Five shapes of the same mistake

The companion paper studies five resource-lifecycle anti-patterns from a single production codebase, and they turn out to be one mistake wearing five costumes. A background task spawned with asyncio.create_task raises an exception, escapes to the event loop, and the database row it was supposed to advance stays stuck at “initialized” forever. A semaphore meant to cap concurrency is held across a wait for user input — a user closes their laptop, and two abandoned sessions deadlock the whole pool for ninety-three minutes. A browser event-stream connection is never closed, so the connections pile up until the page freezes. A service publishes an event before it subscribes, so the answer arrives in the gap and is lost, and the code waits forever for an event that already fired. And an invariant — “this status always means a row exists” — lives only in a comment, enforced by nothing. Different resources, different failures, but every one of them is a system finding its own attractor because the engineer forgot to name one.

Section 03

The contract nobody wrote down

What ties the five together is an absence. In each case there was a rule the system depended on — a task must reach a terminal status, a lock must be released before an unbounded wait, a handle must be closed by its owner, a subscriber must register before the publisher writes, a state must imply its companion row — and that rule was never written into anything the runtime could check. It lived in the engineer's head, or in a comment, or in the shape of code that happened to work the day it was reviewed. The code passed review precisely because the missing contract was invisible. There was nothing on the screen that said this is wrong; there was only something not on the screen that should have been.

Section 04

Why the cost compounds

An unwritten contract is debt, and the interest is not linear. A contract missing for one day costs you a hotfix. A contract missing for six months costs you a hotfix, then a second commit for the edge case the hotfix missed, then a backfill script to repair the rows that already drifted, then an observability dashboard so you can see it next time, then a runbook entry, then four engineers who each tell a different war story about how it surfaced. The debt is not the missing contract itself — it is the saga that has to be re-performed every time that contract resurfaces in a new shape. The longer it stays unnamed, the more shapes it gets to wear, and each one is a fresh incident.

Section 05

Naming the fixpoint

Naming the fixpoint means moving the contract out of your head and into something that runs. The tools are old and unglamorous. A database CHECK constraint turns “this status implies a row” from a comment into a rule the database refuses to break. A structured-concurrency boundary — a task group or nursery — makes it impossible for a child task to escape its parent without its outcome being observed. A timeout on every wait that crosses a process or a person bounds how long a lock can be held hostage. A type that tracks a handle's lifecycle state makes calling a method on a closed connection a compile error. An eager subscription, opened before the publish, closes the race by construction. None of this is novel. The shift is only this: the contract now lives where the runtime can see it, not where only the reader can.

Section 06

The question to ask

The practical payoff is a habit you can use at code-review time, before any incident exists. For every new await that touches a resource, ask: if this line throws, times out, or is cancelled, what state is the system left in — and is that state terminal on purpose, or just where it happens to land? For every new state transition, ask which predicate must hold and where it is enforced. For every subscription, ask who owns it and when they close it. If the answer is “I don't know,” you have found a fixpoint the system will discover on its own, on its own schedule, in production. The questions are mechanical. The answers, written down where a machine can check them, are the contracts — and a contract is the only thing that turns a bug you fix twice into a bug that cannot happen.

This brief is the short version. For the full field study — five production sagas with their pre-fix code, the commit series that resolved each, the debt calculus, and the five principles in full — read the companion technical paper, Fixpoints, Not Fixes: In Depth .