Skip to content

Fleet metabolism

Fleet metabolism is BMO’s runtime metabolism: the live state of the FleetController and FleetRegulationObserver, the sticky-degraded health flag, the bounded action × outcome histogram, and the KTD-5-redacted recent-event ring.

It is distinct from bmo metabolism inspect, which reads the static Adaptive Decision Manifest (ADM) spine and is a design-time surface. Fleet metabolism is wholly read-only and covers the running fleet. The two surfaces share neither code path nor read model.

Hand-drawn sketch: health signals feed a Healthy → Degraded → Recovering lifecycle with a KTD-5-redacted recent-event ring

A single fleetmetabolism.PostureSnapshot aggregates:

  • FleetController state — present/absent, mode (disabled, observe_only, active, paused), pause reason if any, leader flag, gate reason. Sourced from App.FleetControllerManager().FleetControllerStatus.
  • FleetRegulationObserver state — enabled-in-config flag, configured mode (disabled, observe_only, recommend_only, auto_apply), API error-rate count, synthetic-monitor count and critical count, last observed timestamp.
  • Sticky-degraded flagStickyDegraded plus DegradedSince and a closed-set Reasons vocabulary (see below). The flag persists across at least one quiescence interval after the last fault; the next observed success emits metabolism.action degraded_recovered and clears the flag.
  • Action × outcome histogram — bounded count matrix over the closed MetabolismAction × Outcome enums.
  • Recent-event ring — fixed-capacity (32) FIFO of metadata-only RecentMetabolismEvent records, KTD-5 hashed identifiers only.
flowchart TB subgraph inputs [Signal ingress] healthSignals["POST /v1/health-signals"] end subgraph controller [FleetController] fleetCtrl["Mode, cycles, actuations"] end subgraph observer [FleetRegulationObserver] fleetObs["Rules, aggregates, recommendations"] end subgraph snapshot [PostureSnapshot] sticky["Sticky-degraded flag"] histogram["Action x outcome histogram"] ring["Recent-event ring cap 32"] end healthSignals --> fleetObs fleetObs --> fleetCtrl fleetCtrl --> snapshot fleetObs --> snapshot
SurfaceScopeUse for
bmo metabolism inspect (existing)ADM manifest + decision spine (static)Reviewing decision genealogy, ADM patches, gene rollups
bmo config show-fleet-metabolism (this feature)FleetController + observer + ring (live)Operator runtime posture, sticky-degraded triage, gate reasons

bmo metabolism inspect does not read fleetmetabolism.PostureSnapshot; the two trees are intentionally disjoint to keep design-time review separate from runtime observability.

Reasons is a closed PostureReason enum:

ReasonMeaning
controller_unreachableFleetController manager is nil or returned a non-success status; runtime cannot read live mode/pause state
config_absent[options.fleet_regulation] is missing or the regulation section did not load; observer cannot fire rules
policy_deniedA configured rule predicate failed the runtime policy gate (e.g. mode=observe_only with an auto_apply request)
persistence_unhealthyController or observer state-store write failed; sticky-degraded persists until a successful op clears it
auto_apply_skipped_pendingAn auto-apply rule matched but a prior actuation is still pending evaluation; the patch was deliberately skipped

MetabolismAction is bounded to: observe, recommend, auto_apply, pause, unpause, mode_change, cycle, degraded_recovered.

Outcome is bounded to: ok, failed, timeout, canceled, skipped.

ErrorCategory is bounded to: rule_predicate_failed, patch_apply_failed, patch_rejected, rate_limited, controller_missing, status_unavailable, context_done, config_absent, policy_denied, pending (plus the empty sentinel for ok outcomes).

SurfacePurpose
bmo config show-fleet-metabolism (--format=text|json)Posture summary for the live fleet runtime; same RenderPosture envelope as every other surface
/fleet-metabolism (TUI slash; aliases /fleet_metabolism, /metabolism-posture, /metabolism_posture)Inline RenderPosture text in the chat transcript, sharing the CLI formatter
/metabolism (TUI slash)Existing Metabolism inspector dialog — interactive cycle/recommendation viewer; complements the inline posture report
Metabolism sidebar chipLive status badge derived from PostureSnapshot.State
metabolism_posture (native agent tool)Read-only posture as JSON for in-process agents
bmo_metabolism_posture (MCP parity tool)Same JSON envelope exposed through MCP for cross-process agents
GET /v1/metabolism/postureSame JSON envelope over HTTP, gated by the existing requireAuth helper (KTD-7)
fleetmetabolism.BuildSnapshotLibrary accessor used by every surface above

The same fleetmetabolism.PostureSnapshot flows through CLI, slash, sidebar, agent-tool, MCP, and HTTP layers. A Pattern-4 hub composition smoke (TestFleetMetabolismHubSmoke_AllSurfacesEmitTheSameJSON) asserts JSON equality modulo captured_at, preventing the formatter-divergence failure mode called out in the maturity-iteration plan.

The fleet-metabolism observer is off by default. Enable in bmo.toml:

[options.fleet_regulation]
enabled = true
mode = "observe_only" # or "recommend_only", "auto_apply"
[[options.fleet_regulation.recommend_rules]]
id = "high-error-rate-single-topology"
[options.fleet_regulation.recommend_rules.when]
api_error_rate_at_least = 0.15
[options.fleet_regulation.recommend_rules.recommend]
gene = "topology"
value = "single"
[options.fleet_controller]
mode = "observe_only" # disabled, observe_only, active, paused
interval_seconds = 30

Mutate the live posture through the existing runtime control routes: POST /v1/metabolism/mode, POST /v1/metabolism/pause, POST /v1/metabolism/unpause, and the agent-tool equivalents (metabolism_set_mode, metabolism_pause, metabolism_unpause). This iteration adds no new mutation routes (Scope Boundaries).

The runtime-side fail-closed matrix (REQ-MAT-004) guarantees that the following arms never fire auto_apply_* and never advance the histogram into a misleading ok outcome:

stateDiagram-v2 [*] --> healthy: nominal observe cycle healthy --> configAbsent: regulation section missing healthy --> controllerUnreachable: manager nil or status fail healthy --> policyDenied: auto_apply blocked by mode healthy --> persistenceUnhealthy: state-store write fail healthy --> pendingActuation: prior actuation still pending configAbsent --> stickyDegraded: Reasons += config_absent controllerUnreachable --> stickyDegraded: Reasons += controller_unreachable policyDenied --> stickyDegraded: Reasons += policy_denied persistenceUnhealthy --> stickyDegraded: Reasons += persistence_unhealthy pendingActuation --> stickyDegraded: Reasons += auto_apply_skipped_pending stickyDegraded --> healthy: degraded_recovered on next success state configAbsent:::blocked state controllerUnreachable:::blocked state policyDenied:::blocked state persistenceUnhealthy:::blocked state pendingActuation:::blocked classDef blocked stroke-dasharray:4 2
ArmBehavior
[options.fleet_regulation] missingobserver is a no-op; emits metabolism.action observe outcome=skipped error_category=config_absent; posture Reasons includes config_absent
FleetController manager nil / unavailablecontroller posture reports present=false; posture Reasons includes controller_unreachable; sticky-degraded set
Mode is disabled or observe_only, request is auto_applyobserver emits metabolism.action auto_apply outcome=skipped error_category=policy_denied; no patch applied; posture Reasons includes policy_denied
Persistence write failssticky-degraded set with Reasons += persistence_unhealthy; clears on next success via metabolism.action degraded_recovered
Rule predicate matched but prior actuation pendingobserver emits metabolism.action auto_apply outcome=skipped error_category=pending; posture Reasons includes auto_apply_skipped_pending

Every gating decision lands on the recent ring (cap 32) with KTD-5-hashed session_id_hash, rule_id_hash, and gene_hash. Raw session, request, or rule identifiers never reach the log layer or the posture surfaces.

Walk a fleet-metabolism cycle end-to-end without leaving the operator seat:

sequenceDiagram participant Operator participant TUI as TUI /fleet-metabolism participant CLI as bmo config show-fleet-metabolism participant Logs as bmo logs --tail participant Ring as Recent event ring
Operator->>TUI: confirm controller and observer posture
Operator->>CLI: same RenderPosture envelope headless
Operator->>Logs: jq filter metabolism.* events
Logs->>Ring: bounded FIFO metadata only
Operator->>Logs: pair metabolism.fired with metabolism.action
Note over Ring: session_id_hash correlates lifecycle pairs
  1. /fleet-metabolism (TUI) or bmo config show-fleet-metabolism — confirm controller/observer state and sticky-degraded flag.
  2. Tail recent events:
    Terminal window
    bmo logs --tail 1000 | jq -c 'select(.msg|startswith("metabolism."))'
  3. Filter the failure arms only:
    Terminal window
    bmo logs --tail 1000 \
    | jq -c 'select(.msg=="metabolism.action" and (.outcome=="failed" or .outcome=="timeout"))'
  4. Pair metabolism.firedmetabolism.action by session_id_hash for the full action lifecycle:
    Terminal window
    bmo logs --tail 2000 \
    | jq -c 'select(.msg|startswith("metabolism."))' \
    | jq -s 'group_by(.session_id_hash) | map({session_id_hash: .[0].session_id_hash, events: map({msg, action, outcome, error_category, rule_id_hash, gene_hash, latency_ms})})'
  5. If Reasons includes persistence_unhealthy, the next successful observe/cycle will emit metabolism.action action=degraded_recovered outcome=ok and clear the sticky flag automatically.

The full recipe lives in agent tracing and is fixture-validated in the regression suite.

Feature #27 is currently Experimental tier. Graduation to Yellow/B requires:

  • Hub composition smoke (TestFleetMetabolismHubSmoke_*) green for ≥60 days with no Sev-1 incidents on the surfaces above.
  • Fail-closed matrix (REQ-MAT-004) green across all five arms.
  • Surface-parity matrix fleet_metabolism row remains parity with no divergent renderers.

Default-on autoscaler / auto-apply promotion is Deferred — this iteration delivers graduation readiness, not the flip itself.