Observability Theater: In Depth

KellerAI

Section 01

Executive Summary

A governance telemetry event in keller-platform fires on every Kai run. It is named governance.pack.applied. It emits a field called obligations_referenced, and the field is permanently, structurally empty. The code that assigns it reads "obligations_referenced": [] at kai_workspace.py:197 — a literal empty list, hard-coded, never computed. The surrounding docstring acknowledges the gap at lines 159–168: the value is a "v1 placeholder," and the wiring required to populate it "does not yet exist." Downstream dashboards consume that field to display which compliance obligations a run verified. They always display zero. Operators have been trained to accept zero as the normal state.

This is not a bug awaiting a fix. It is a pattern this paper calls observability theater, following Majors, Fong-Jones, and Miranda ²⁰ , who use the phrase to describe the broader dysfunction in which the performance of monitoring substitutes for its substance. Their treatment centers on dashboard proliferation as ritual; we extend the term to its field-level instance — a named telemetry field that ships, emits on every event, satisfies schema validation, and carries no semantic content.

When a structured field emits a constant value on every event, operators stop reading it as a signal and begin reading it as background. A permanently empty field in a compliance-automation product is worse: it is an audit trail that passes structural validation while carrying no audit content. NIST SP 800-92 ¹⁵ requires that logs contain "sufficient detail for after-the-fact investigation." An always-empty obligations_referenced field fails that requirement by construction, on every event, silently.

Structured telemetry with permanently-empty key fields trains operators to trust signals that carry no information. That is worse than emitting nothing, because it produces false confidence in monitoring coverage that does not exist.

The thesis

The paper proceeds from the specific instance — keller-platform's governance.pack.applied event — outward to the general anti-pattern of placeholder telemetry in compliance-sensitive systems, then to the human-factors literature on signal normalization ¹¹¹²¹⁸ , and finally to remediation strategies grounded in schema-data contract enforcement ⁷⁸ . Section 02 establishes what observability promises. Sections 03–05 dissect how the placeholder breaks that promise and locks it broken. Sections 06–07 explain why empty fields are not neutral. Section 08 shows how a parallel error-suppression pattern amplifies the damage. Sections 09–11 cover detection, remediation, and migration. Sections 12–14 situate the work, conclude, and reference.

Section 02

The Observability Promise

Observability is the property of a system that allows an operator to understand its internal state from external outputs alone ⁵ . That definition contains an implicit contract: the emitted outputs must be interpretable as evidence of what the system did. Traces, metrics, and logs each carry that contract. The OpenTelemetry Semantic Conventions ¹ formalize it directly — emitted attributes "MUST carry defined meaning," and the stability guarantees published by the project attach to the meaning of each field, not merely to its syntactic presence in the payload.

The four golden signals popularized by the Google SRE book ⁶ are actionable only when the underlying telemetry is populated. A governance dashboard that consumes obligations_referenced uses one of those signals — a count of verified compliance obligations — as a proxy for system health. If the field reads [] on every event, the dashboard operates on a signal with zero entropy. It tells the operator nothing while appearing to tell them something. The dashboard is doing exactly what it was wired to do. The signal it consumes simply has no information content.

Majors, writing about structured events, argues that the unit of observability is "one arbitrarily wide, densely populated event per request" ⁹¹⁰ . The density matters as much as the structure. A sparse event — one with named fields that are always empty — is syntactically structured but semantically hollow. Majors makes the distinction explicit: syntactic structure (key-value formatting) is not semantic signal (fields that carry actionable information) ¹⁰ . Keller-platform's governance.pack.applied achieves the former while failing the latter on the obligations_referenced dimension.

Niedermaier and colleagues, interviewing 28 software professionals across industry, identified missing semantic understanding of telemetry as a top practitioner pain point ³ . The obligations_referenced case is a more subtle failure than the one their study describes. The semantic understanding exists in the codebase — the docstring at lines 159–168 explains exactly what the field is intended to carry. What does not exist is the wiring that would populate it. The field is named with intent; it is emitted without content. That gap between documented intent and actual emission is the observability promise broken. Burgess's promise-theoretic treatment ⁴ makes this precise: a field that names a promise it never keeps has zero promise-keeping capacity, regardless of how reliably it fires. Operator observability — the human-interpretive side of telemetry — is the consumption-side expression of that promise-keeping capacity; when the capacity is zero at the source, the operator's interpretive work has nothing to engage with.

Section 03

The Placeholder Anti-Pattern

The governance.pack.applied event is emitted by emit_governance_pack_applied, defined at kai_workspace.py:147. Its parameter list closes at line 150 with decision_summary: str = "kai_run_complete" — itself flagged in the docstring as a v1 placeholder. The function iterates over ctx.loaded_packs (line 181) and calls logger.info (line 184) with an extra={} dict. Inside that dict, at line 197, sits the literal "obligations_referenced": []. The value is not computed. It is not derived from any pack object, run context, or post-evaluation artifact. It is a constant, embedded directly in the emission call (abridged — see kai_workspace.py:147–202 for full source):

emit_governance_pack_applied (abridged)

Python

def emit_governance_pack_applied(
  ctx: KaiWorkspaceContext,
  *,
  decision_summary: str = "kai_run_complete",
) -> None:
  """
  ...
  TODO: The default ``decision_summary='kai_run_complete'``
  is a v1 placeholder, and ``obligations_referenced`` is currently emitted
  as an empty list. The ``obligations_referenced`` field on
  ``governance.pack.applied`` is intended to carry real signal — specifically,
  which packs were cited and which obligations were satisfied during the run.
  That wiring does not yet exist: it requires post-run introspection of Kai's
  ...
  """
  for pack in ctx.loaded_packs:
      try:
          logger.info(
              "governance.pack.applied",
              extra={
                  ...
                  "obligations_referenced": [],
                  ...
              },
          )
      except Exception:
          logger.exception(...)

The docstring at lines 159–168 is unusually honest about the gap. The phrase "That wiring does not yet exist" appears at line 164. The author named the placeholder, marked it v1, and identified the missing component (post-run introspection of Kai's evaluation results). The TODO is embedded in documentation, not in an issue tracker, not in a sprint backlog, and not in any migration plan visible at the call site. Documentation labels do not constrain lifetime.

The same function carries an in-code comment at lines 194–198 that goes further than the docstring. It articulates a positive design rationale for emitting the empty list: the field is "emitted as an empty list so downstream consumers can rely on the field being present and list-typed even before real extraction lands." This is the schema-stability defense in the author's own words. It is a coherent argument that deserves engagement on its own terms, and §04 returns to it as the strongest counter-position to this paper's thesis.

The placeholder lifecycle also distinguishes two superficially similar states. A placeholder with an owner, a target version, and a visible debt counter is a tracked migration — the same pattern as OpenTelemetry's experimental → stable semantic-convention promotion flow. A placeholder with none of those — no linked issue, no due date, no instrument that surfaces its rate of emission — is the anti-pattern. The defect in obligations_referenced is not that it began as a placeholder; it is that the placeholder shipped without any of the constraints that would force its eventual completion.

The function also illustrates, in the same scope, the correct alternative pattern. At line 176, when ctx.loaded_packs is empty, the function emits governance.pack.emit_skipped — a warning event with a populated reason code. That event carries information on the condition under which it fires. The same author, in the same function, knew how to emit a content-carrying event when the design called for one. The distinction between governance.pack.emit_skipped (populated on condition, semantically active) and obligations_referenced: [] (named but empty, permanently inert) is the distinction this paper is trying to draw.

Shkuro and colleagues, in the canonical schema-first telemetry paper ⁷ , identify exactly this failure mode. Locking schemas before signals are real creates platforms that treat dimensions as strings without semantic metadata, blocking compile-time validation and privacy enforcement. The field name obligations_referenced is stable. The schema accepting an array is stable. The value is permanently invariant — never varies, never carries data — and no schema tooling will flag the discrepancy because the structural contract (array, named field) is satisfied.

AlSayyad and colleagues formalize structured logs as schema-compliant transformations L(S:E:C)→R with four required properties ⁸ . One of those properties is fidelity: the log record must faithfully represent the system event. An event that emits obligations_referenced: [] regardless of which packs were applied or which obligations were evaluated fails fidelity by construction. The log record does not represent the system event. It represents a placeholder that substitutes for the system event.

The placeholder pattern has a lifecycle. It begins as a known, named, explicitly-labeled shortcut. But labeling a placeholder is not the same as constraining its lifetime. Once the event reaches downstream consumers — dashboards, alerting rules, audit pipelines — the placeholder is no longer a private implementation detail. It becomes a contract. Fixing it now requires coordinating with every consumer, and the cost of remediation rises while the cost of inaction stays at zero. The placeholder persists not because anyone wants it to, but because nothing in the system makes its persistence visible.

Section 04

Why Empty Fields Are Worse Than No Fields

The intuitive assumption is that an empty field is neutral — it costs nothing, carries nothing, means nothing. The human-factors literature shows this assumption is wrong. Parasuraman and Manzey, in a paper with more than 2,100 citations, demonstrate that humans overtrust automated systems that consistently confirm expected states ¹⁸ . The mechanism is automation-induced complacency: when an automated system reports a stable output reliably, operators reduce their scrutiny of that output. The output stops registering as a signal and starts functioning as background.

Applied to telemetry: a field that always reads [] trains operators to treat [] as the expected state. After enough exposures, the operator no longer questions whether [] is correct, whether it indicates missing data, or whether it might one day carry real content. The field becomes transparent — visually present but cognitively dismissed. This is not a failure of operator attention. It is the predictable outcome of how human attention responds to invariant stimuli ¹⁸ .

The bridge from the loud-false-positive case to the silent-false-constant case is the same mechanism, not a metaphor. Parasuraman and Manzey establish that the controlling variable is the invariance of the stimulus, not its loudness. What conditions an operator to disengage scrutiny is the stable confirmation of an expected state — whether that state is reported by a chime that never indicates real danger or by a field that always reads []. The cognitive system economizes on attention to invariant inputs regardless of modality. This is why alarm-fatigue findings from clinical and security contexts transfer to silent-field-fatigue in compliance telemetry: the perceptual loudness differs, the underlying attention dynamic does not.

The clinical analog is precise. Lewandowska and colleagues, reviewing seven studies across 389 ICU nurses, found that 85–99% of alarms in intensive care environments are clinically insignificant, and 79–100% of nurses report that nuisance alarms reduce confidence in monitoring systems as a whole ¹¹ . The governance telemetry case is structurally parallel but with one important inversion. A nuisance alarm fires loudly and wrongly; an empty field is silent and wrong. The empty field produces no alert to dismiss. It simply reads zero, on every event, indefinitely. Operators have nothing to respond to and therefore nothing to flag as a problem.

Tariq and colleagues document the same dynamic in security operations centers ¹² . High false-positive rates produce desensitization, decreased attention to detail, and increased risk of overlooking genuine incidents. The key insight is that desensitization applies to false-constant signals as much as to false-positive signals. An SOC analyst who sees "0 critical alerts" every hour on a dashboard structurally incapable of producing non-zero output has been trained to treat "0 critical alerts" as a healthy baseline — including in the case where the monitoring system itself is broken.

An empty field is not neutral. It is a vote cast on every event for the proposition that zero is normal. After enough votes, zero becomes the baseline, and any future deviation — even a true positive when the wiring is finally built — reads as an anomaly that requires justification rather than action.

The complacency mechanism

Three counter-arguments deserve explicit treatment because each captures a half-truth.

First, the schema-stability defense: emitting a stable [] lets downstream consumers wire parsing logic against a guaranteed-present, list-typed field before producers populate it, as the inline comment at lines 194–198 argues. This is a real engineering benefit, and it is the strongest defense of the current code. The same benefit is deliverable through a status: placeholder schema annotation that consumers can read and surface — exactly the soft-deprecation path §10 proposes — without committing the audit-trail pathology of an indistinguishable empty value. Schema stability for unborn consumers is a valid goal; it does not license permanent semantic hollowness in a compliance product.

Second, the machine-consumer defense: many compliance pipelines are unattended, and a typed empty list is a value a downstream attestation processor can branch on without any cognitive cost. This is correct for the operator-habituation mechanism alone — machine consumers do not get fatigued. The structural-vs-semantic gap from §05 still applies (the field cannot discriminate "no obligations evaluated" from "wiring does not exist"), but the worse-than-nothing claim in this paper is scoped specifically to operator-mediated consumption: dashboards, alerting rules, and audit-review workflows read by humans. Where governance telemetry flows only into machine pipelines that explicitly treat empty as a sentinel, the cognitive harm is absent. The audit-trail harm remains.

Third, the phased-rollout defense: OpenTelemetry's experimental → stable semantic-convention lifecycle ships fields before populators exist, as a deliberate accommodation of staged migrations. This is correct, and it is why §03 introduced the distinction between tracked placeholders (owner, due date, debt counter) and the version this paper calls the anti-pattern (none of those). A placeholder inside a planned migration is not theater. A placeholder that has been emitting on every event since first deploy, with no owner and no exit criterion, has stopped being a placeholder in the OTel sense and become a permanent emission with a misleading name.

A note on what this critique is not. Best-effort telemetry — non-blocking logging that swallows emission errors so that the application can continue — is a valid engineering choice. The complaint here is not about non-blocking emission policy. The complaint is about hollow schema commitments: named fields that promise specific semantic content and emit none. A try/except around logger.info is a policy decision; an "obligations_referenced": [] literal in the payload is a contract violation. The two are independent, and conflating them obscures the real issue.

Section 05

The Schema Lock Problem

Once a telemetry schema reaches downstream consumers, it is effectively frozen. This is the schema lock problem. The structure of the event is stable; the semantics of the fields are not — because one of them was never populated to begin with. Shkuro and colleagues identify premature schema stabilization as a first-class risk: locking schemas before signals are real creates situations where the schema cannot evolve without breaking consumers ⁷ . Burgess's promise-theoretic framework ⁴ sharpens the point at the field level: a frozen schema that declares promises its producers cannot keep transmits a contract whose terms cannot be satisfied, and downstream interpretation accumulates against that empty contract over the entire production lifetime of the field.

The OpenTelemetry specification ² addresses structural evolution through Schema URLs — versioned identifiers that allow consumers to request specific schema versions and producers to announce migrations. The OTLP telemetry schema format defines transformation rules that normalize attributes across versions. Schema versioning solves renames, additions, and type changes. It does not solve semantic drift. A field that has the same name, the same type (array), and the same structural position, but has gone from "placeholder empty" to "populated with real obligations," is structurally identical across versions while being semantically different. No versioning machinery in the OTel ecosystem detects this kind of change.

The result is that schema lock in the governance.pack.applied case is worse than the tooling anticipates. OTel Weaver, the CLI tool for validating semantic convention compliance, will not flag obligations_referenced: [] as a violation. The schema declares the field as an array; the emitted value is a valid empty array; structural validation passes. The semantic contract — that the field carries the obligations referenced during the run — is permanently violated, but no automated check sees it.

AgentTrace ⁸ proposes a consistency property: log records for the same event type must be structurally and semantically consistent. An event that is structurally consistent (always an array) but semantically inconsistent (sometimes empty because no obligations were checked, sometimes empty because the wiring does not exist) fails the consistency test. The deeper problem is that the current implementation cannot distinguish between these two states. [] means "no obligations were referenced" and it also means "the wiring to reference obligations does not yet exist." These are different system states with the same telemetry representation. A consumer cannot tell them apart. An auditor cannot tell them apart. The field has lost the ability to discriminate between two materially different facts.

Section 06

Alarm Fatigue and Signal Normalization

The human-factors research on alarm fatigue supplies the theoretical backbone for the claim that hollow telemetry is worse than absent telemetry. Lewandowska and colleagues show that clinically insignificant alarms do not merely waste attention ¹¹ . They actively damage the monitoring relationship. Nurses who experience high rates of nuisance alarms develop reduced confidence in the monitoring system itself. When a real alarm fires after a long stretch of insignificant ones, response time and vigilance are lower than in a hypothetical baseline of fewer but accurate alarms.

The obligations_referenced: [] pattern creates a structurally analogous condition. A compliance engineer who monitors governance events sees obligations_referenced: [] on every event. Over time, this is internalized as: governance events do not carry obligation data. When obligation data eventually becomes available — either because the wiring is built or because a future change accidentally populates the field — the change may not be noticed. The field has been filtered from conscious attention. The operator's mental model no longer includes the possibility that the field could carry information.

Tariq and colleagues trace the progression in security operations: high false-positive rates lead to desensitization, desensitization to decreased attention, and decreased attention to missed genuine incidents ¹² . The governance telemetry case compresses this progression into a degenerate form. The false-positive rate is effectively 100% — every event is misleading on the obligations_referenced dimension. Desensitization is immediate because the constancy is total. The missed genuine incident is the one that occurs the day after the wiring is built, when the field finally carries real data that operators have been trained to ignore.

Sarter, Woods, and Billings define "automation surprises" as breakdowns in mode awareness — operators discover that an automated system behaved contrary to their mental model, but only after the fact ¹⁹ . The governance telemetry case produces an automation surprise in reverse. The system has been behaving contrary to its documented intent since the event was first shipped. Operators have not noticed because the constant output gave them no signal of the discrepancy. The surprise arrives during an audit, not during operations. By then, the question is no longer "what is the system doing now" but "what has the system not been telling us for the entire production lifetime of this event."

Section 07

Compliance Theater

In a compliance-automation product, governance telemetry is not optional instrumentation. It is the audit trail. NIST SP 800-92 establishes that logs must contain "sufficient detail for after-the-fact investigation and regulatory compliance" ¹⁵ . The 2023 revision in SP 800-92 Rev. 1 extends this requirement to cloud-native architectures while preserving the semantic completeness standard ¹⁶ . A governance event that permanently emits obligations_referenced: [] fails that requirement on every event, for every Kai run, since the event was first shipped to production.

The SOC 2 criteria are more specific ¹⁷ . CC4.1 and CC4.2 govern continuous monitoring controls; CC7.1 through CC7.5 require that monitoring detect and respond to security events. A compliance-automation product that emits governance events to satisfy continuous monitoring obligations must ensure those events carry the semantic content the criteria demand. An audit of the governance.pack.applied event stream would find an event that fires reliably — satisfying the structural expectation — and carries no obligation data — failing the semantic expectation. The audit trail is present. The audit evidence is absent.

The phrase "compliance theater" captures this precisely. It is the performance of compliance monitoring without the substance of compliance monitoring. Vaughan's account of the Challenger decision ¹³ anchors the social dynamic: deviations from a safety norm, once labeled and shipped, become rationalized as the new baseline; the audit trail of memos documenting the deviation does not, by itself, prevent the deviation from being normalized as acceptable. A dashboard populated with obligations_referenced counts displays zeros and is interpreted as "no obligations were violated." The correct interpretation is "no obligations were referenced, because the field has never been wired to anything." These two interpretations have radically different implications for audit readiness. The event stream provides no way to distinguish between them.

The audit trail exists. The audit evidence does not. Every governance.pack.applied event is a receipt for a transaction whose line items were never filled in.

Compliance theater

A reasonable objection is that internal engineering teams know the field is a placeholder. That defense holds only as long as the team that knows is the team that reads the events. The day an external auditor, a new compliance hire, or a downstream regulator queries the event stream, the placeholder presents as audit evidence. The internal context that would identify it as theater is invisible at the point of consumption. A docstring at line 164 of kai_workspace.py is not a control documented in a SOC 2 audit binder.

Section 08

The Error-Suppression Amplifier

The try/except policy itself is not the anti-pattern; the amplification arises only because the placeholder leaves no other signal. The placeholder does not live in isolation. It sits inside an error-suppression pattern that amplifies the damage. The emit_governance_pack_applied function iterates over ctx.loaded_packs (line 181) inside a try/except Exception: block that closes at line 202. If the logger.info call at line 184 raises any exception, the error is logged and the loop continues to the next pack. The governance event is silently dropped, and the loop never signals which packs lost their telemetry.

This is log-and-continue, and in isolation it is a defensible policy. A telemetry emission failure should not crash a compliance run. In combination with a permanently-empty obligations_referenced, the policy creates a second layer of information loss. Operators are already receiving zero obligation data per event. If events begin dropping silently — because the except Exception: swallows a serialization error or a logging backend timeout — the event count drops too. Operators have no baseline expectation of how many events to expect, because the field that would have anchored such an expectation has always been empty.

A related but importantly different pattern appears elsewhere in the same file, and the difference is worth being careful about. _download_grc_documents, defined at line 457, runs a per-document fetch loop starting at line 494. Each iteration is wrapped in a try/except that logs a governance.pack.error event and continues, spanning lines 506–522. The inline comment at lines 500–505 makes the policy deliberate: fetch failures are tolerated and the loop continues, while write failures further down the function are re-raised and abort preparation, on the explicit reasoning that a half-empty GRC context is unsafe to operate on. This is not indiscriminate suppression; it is a defended distinction between recoverable and non-recoverable failure modes at the data-acquisition layer.

The narrower claim of this paper is therefore not that the file's error policy is broken. It is that the obligations_referenced field, were it populated, would be the natural place to record which documents were actually resolved against a pack — and that the absence of population converts the deliberate fetch-failure tolerance into an invisible amplifier. A pack reported as "applied" in governance.pack.applied may have been applied against an incomplete document set, and the event stream provides no way to detect this from the outside.

A third instance appears at lines 335–341, where mark_synced is wrapped in a try/except Exception: whose log message at line 339 reads "clone succeeded, continuing." The comment acknowledges that the state record may be incorrect while execution proceeds. Across these three cases, errors are absorbed, execution continues, and the telemetry emitted after absorption carries no record of what was absorbed. The obligations_referenced: [] literal is co-located with, but logically independent of, the broader try/except policy. The policy is defensible; the placeholder is the defect; the policy becomes load-bearing for the defect only because the placeholder leaves no other signal that an absorption ever occurred.

Section 09

Detection Patterns

Finding this anti-pattern in a production codebase requires looking for the intersection of three conditions: a telemetry field with a meaningful semantic name, a constant or always-empty value at the call site, and an accompanying comment, TODO, or docstring acknowledging that the value should be dynamic.

The third condition is the most reliable signal. Production code that carries a comment like "v1 placeholder," "not yet implemented," or "wiring does not yet exist" in proximity to a telemetry emission call is a strong indicator. The keller-platform docstring at lines 159–168 is unusually clear. Subtler variants include inline comments like # TODO: populate from run context or # placeholder until post-run introspection available .

Structural detection complements comment search. A field in a telemetry event that exhibits zero variance across thousands of events is a candidate for inspection. A cardinality analysis of the governance.pack.applied event stream would surface obligations_referenced as a zero-cardinality field — every event carries the identical value. Zero-cardinality fields in structured events are either genuinely invariant (a schema version, for instance) or placeholder constants. The distinction is made by reading the code at the emission site.

The Burgess promise-theoretic frame ⁴ provides a formal lens: a field that never carries information has zero promise-keeping capacity. Programmatic detection follows: compute the Shannon entropy of each field's value distribution over a sliding window of events. Fields with entropy below a threshold and with descriptive, non-constant names are candidates for audit. For compliance products specifically, the NIST SP 800-92 requirements ¹⁵ supply a per-event checklist — each governance event should record what action was taken, by whom, on what resource, with what outcome, and against which policy or obligation. obligations_referenced: [] fails the last criterion on every event.

Symptom	Smell-test	Confirmation step
Field name suggests rich content, displays as zero on dashboards	Operators describe the field as "always empty, ignore it"	Read the emission call site for a literal [], None, or ""
Comment near telemetry emission contains "TODO", "v1", "placeholder", "not yet"	Comment exists but no linked issue or sprint owns it	Grep for the field name across the repo; check whether any producer assigns a non-empty value
Field exhibits zero cardinality across many events	Shannon entropy of value distribution is zero	Sample 1,000 events from the warehouse; if all values match, confirm placeholder via code read
Schema declares field but no consumer validates non-emptiness	Schema validator and OTel Weaver both pass with the empty payload	Add a runtime assertion or pre-emit hook that flags empty on conditions the schema considers populated
Field appears in compliance dashboards as a count or list	Dashboard reads zero indefinitely; no one has filed a regression	Trace dashboard query back to event field; compare displayed value to expected business meaning

The table is not exhaustive, but each row corresponds to an evidence type encountered in the keller-platform case. Any one row is suggestive; two or more in combination move the case from "possible placeholder" to "almost certainly placeholder."

Section 10

The Remedy

The fundamental remedy is a field-level liveness contract: a schema-data contract that asserts not only the type and structure of a field but the conditions under which the field must be non-empty. For obligations_referenced, the contract reads: this field MUST contain at least one obligation identifier when a governance pack has been applied and the run completed evaluation. Emitting an empty list when the run evaluated obligations is a contract violation, not valid data.

Implementing the contract has two components. First, the wiring. emit_governance_pack_applied must receive, as a parameter, the list of obligations evaluated during the run. The docstring at lines 159–168 already specifies what this requires: post-run introspection of Kai's evaluation results. The current function signature at lines 147–150 accepts no parameter for that data. The target state for the v2 field is a new parameter that carries actual run data — with no default-empty escape hatch. The default-empty literal is the proximate cause of the placeholder's persistence; removing it from the v2 emission forces the wiring decision into the open. This is the target state for the new field, not a same-day breaking change to the legacy emission; §11 sets out the migration bridge in which the legacy obligations_referenced continues to emit [] while the v2 field carries the real data.

Second, the validation. At the point of emission, assert that the list is non-empty when a non-trivial run has been performed. In development and test environments, the assertion can fail hard. In production, it can log a warning event that consumers can subscribe to. Shkuro and colleagues argue that compile-time validation of telemetry fields is achievable when schemas carry semantic metadata ⁷ . A field annotated as required_non_empty_on_condition: pack_applied can be checked by a pre-emit hook before the event leaves the application. The OpenTelemetry Weaver tooling provides the infrastructure for this kind of extended semantic validation, though the validation rule itself needs to be added at the schema layer.

The field-level liveness check has value before the wiring is complete. Introducing it as a monitoring instrument — a counter that increments whenever obligations_referenced: [] is emitted — converts a silent failure into a visible debt counter. Once the rate of empty emissions is visible on a dashboard, the business case for completing the wiring becomes tractable. Engineering leadership can see how many events per day are emitted as theater. Compliance leadership can see how many audit records per day fail the semantic completeness requirement of SP 800-92. The numbers make the cost of inaction legible for the first time since the placeholder shipped.

A complementary remedy is the soft-deprecation path for placeholders that already exist in production schemas. When a field is known to be empty, the schema can carry an annotation — status: placeholder or liveness: pending — that consumers can read and surface in dashboards. A dashboard panel rendering obligations_referenced could display a warning badge until the field's status changes to live. This is a workaround, not a substitute for the wiring, but it makes the placeholder visible to consumers who otherwise have no way to know.

Section 11

Migration Path

Migrating from hollow telemetry to real signal without breaking consumers requires careful sequencing. The risk is not technical alone. Downstream dashboards and alerting rules have been calibrated, implicitly or explicitly, to [] as the expected value. When the field is populated for the first time, consumers may interpret non-empty arrays as anomalies.

01

Step 1 — Audit consumers.

Before changing the emission, enumerate every system that reads obligations_referenced from governance.pack.applied events. This includes dashboards, alerting rules, compliance report generators, and audit pipeline processors. The consumer audit establishes the blast radius and identifies which teams need notification before the field changes meaning. Keller-platform's specific consumer graph was outside the scope of the code evidence available for this paper; the blast-radius analysis here is architectural inference, not verified inventory.

02

Step 2 — Introduce a parallel field as the migration bridge.

Emit obligations_referenced_v2 alongside the existing obligations_referenced: []. The v2 field is the field to which §10's "no default-empty escape hatch" target state applies; the legacy field continues to emit [] for backward compatibility precisely so that consumers do not break on the day the wiring lands. The two prescriptions are consistent: §10 describes the target state for the new field, and Step 2 describes the bounded period in which the legacy field is permitted to remain hollow because its consumers have not yet migrated. Consumers migrate to the v2 field incrementally on their own schedule. Once all consumers have migrated, the legacy field can be removed in a versioned schema update. The OpenTelemetry Telemetry Schemas mechanism supports this kind of transition through schema migration rules that map field names across versions.

03

Step 3 — Emit field-level liveness metrics during and after the migration.

A metric that counts events where obligations_referenced is empty has value at both phases. Before migration, it confirms the baseline — always zero, always empty, every event. During migration, it traces the rate at which real data is becoming available. After migration, any return to zero signals a regression in the wiring. The same instrument that surfaces the original problem becomes the regression guard once the problem is fixed.

04

Step 4 — Update the schema stability annotation.

Once the field is populated, obligations_referenced transitions from undeclared placeholder to semantically active field. The schema annotation moves to required_non_empty for events where a pack was evaluated, making the contract machine-readable and enabling automated compliance checks for any future change that might silently re-empty the field.

The Dekker drift model ¹⁴ implies that migration cannot be treated as a purely technical exercise. The operators who have internalized [] as normal need explicit communication that the field is now populated and what non-empty values mean. Without that communication, the normalization of [] will persist even after the wiring is fixed. Operators will treat the first non-empty value as a monitoring anomaly requiring investigation rather than a successful migration to real signal. A migration plan that fixes the code and skips the human handoff replaces one form of theater with another.

Section 12

Related Work

The literature most directly relevant to this paper clusters in three areas.

The schema-first telemetry literature is anchored by Shkuro and colleagues ⁷ , whose positional paper for ACM SIGOPS makes the case for declaring semantic metadata at schema definition time rather than inferring it from field values at query time. AlSayyad and colleagues ⁸ , addressing agent system observability, formalize the fidelity and consistency properties that a placeholder event violates. The Burgess promise-theoretic treatment ⁴ supplies a more abstract foundation: observability is a property of promise-keeping, and a telemetry field that never carries data has zero promise-keeping capacity regardless of how reliably it fires. Practitioner work by Majors ⁹¹⁰ anchors the same argument in industrial experience, distinguishing syntactic structure from semantic signal. The Sridharan practitioner reference ⁵ frames observability itself as the property that external outputs are sufficient to reason about internal state — a baseline expectation that an always-empty field structurally violates.

The alarm fatigue and normalization-of-deviance literature is the human-factors spine. Vaughan coined "normalization of deviance" in her study of the Challenger disaster — repeated exposure to a known deviation from safety norms leads to rationalization of the deviation as acceptable ¹³ . The keller-platform docstring is the engineering equivalent of the Challenger O-ring memos: the deviation is documented, named, acknowledged, and shipped. Dekker provides the mechanism ¹⁴ : gradual drift, without a discrete decision point, from "known placeholder" to "invisible baseline." Parasuraman and Manzey supply the cognitive science ¹⁸ : automation-induced complacency explains why operators stop scrutinizing constant-value outputs even when those outputs are nominally important. Lewandowska and colleagues ¹¹ and Tariq and colleagues ¹² document the operational consequences in clinical and security settings respectively.

The compliance telemetry literature is thinner but authoritative. NIST SP 800-92 ¹⁵ and its 2023 revision ¹⁶ establish the federal standard for log semantic completeness. The AICPA SOC 2 criteria ¹⁷ provide the audit framework that makes an always-empty obligations_referenced field a material finding rather than a technical debt item. The combination of these two sources establishes that the governance telemetry failure described here is not an engineering quality issue. It is a compliance posture issue with audit consequences.

A gap is worth naming. The OpenTelemetry Semantic Conventions ¹ do not currently define conventions specific to compliance and audit events — obligations, regulatory frameworks, policy identifiers. Compliance products build bespoke schemas without community-standardized field names, which means each product invents its own placeholder traps independently. To our knowledge, no prior work has named the "placeholder anti-pattern" in telemetry as a distinct failure mode or analyzed its interaction with human-factors normalization dynamics. The contribution of this paper is to connect the code-level observation — a specific [] literal in a compliance product — to the broader pattern, the cognitive mechanism, and the compliance implications.

Section 13

Conclusion

The governance.pack.applied event in keller-platform is a case study in how good engineering intentions can still produce a system that is worse than no monitoring at all. The author documented the placeholder. The author named the TODO. The author structured the event correctly and shipped it through the proper logging interface. And the compliance audit trail it populates is structurally hollow on every event since first deploy.

The broader pattern — placeholder telemetry fields in compliance-sensitive products — is likely more common than documented, precisely because the failure mode is silent. No alarm fires when obligations_referenced is empty — at least, not until the field-level liveness counter described in §10 is in place. No test fails. No schema validator complains. The only signal is the absence of signal: a field that never carries any information, slowly training operators to expect nothing from it.

The remedy is not only technical. Wiring the obligations_referenced field requires code changes — a new parameter on emit_governance_pack_applied, post-run introspection of Kai's evaluation results, an end to the default-empty escape hatch. Reversing the normalization of [] as a baseline requires operator communication, consumer migration, and explicit schema contracts that make the non-empty expectation machine-readable and machine-enforced. The technical fix without the human handoff replaces one form of theater with another.

Structured telemetry is a promise. A field named obligations_referenced promises to tell its consumers which obligations were referenced. When that field emits [] on every event, the promise is made and broken simultaneously, on every run, indefinitely — until someone notices that the dashboard has been counting zero for a very long time.

The closing argument

The work ahead is to make that noticing the default, not the exception.

Empty Fields and the Liveness Contract

Context

The Finding