Run observability parity

Run observability is the parity family behind BMO’s bounded run-inspection surfaces. It does not replace the Agent Debugger; it complements it with cheaper machine-readable views when you need bounded summary, next-step cueing, or the tool-call-only lens instead of a full event replay.

Maturity: Maintainer-facing reference surface. Use it for diagnostics, support scripts, and parity checks across CLI, TUI, HTTP, native-tool, and MCP access. It is not a replacement for the interactive run investigation flow in Agent Debugger.

Surfaces

The family is split into three sibling views:

View	Scope	Live surfaces
Session summary	session family	`GET /v1/sessions/{id}/observability`, `session_observability`, `bmo_get_session_observability`, `/run-observability`
Cue ledger	session family	`GET /v1/sessions/{id}/run-cue-ledger`, `run_cue_ledger`, `bmo_get_run_cue_ledger`, `/run-observability`, `/run_cue_ledger`
Trace lens	one run in one session family	`GET /v1/agent-runs/{run_id}/trace`, `inspect_run_trace`, `bmo_inspect_run_trace`, `/run-observability`

All live surfaces read app-owned builders in internal/app:

SessionObservabilityPayload
RunCueLedgerPayload
RunTracePayload

The config-only shell discoverability view is:

bmo config show-run-observability

That CLI command intentionally does not claim live process visibility.

Session-summary fields

The session-summary payload includes:

session_id
session_state
- prompt_tokens
- completion_tokens
- accumulated_cost_usd
recent_run_usage
- run_count
- provider_call_count
- provider_event_limit_per_run
- provider_events_truncated
- provider_events_sampled
- sampled_provider_event_count
- omitted_provider_event_count
- truncated_run_count
- total_tokens
- input_tokens
- output_tokens
- cache_read_tokens
- cache_create_tokens
- metadata_decode_errors
- metadata_numeric_parse_errors
- provider_models
recent_run_performance
- run_count
- completed_run_count
- failed_run_count
- canceled_run_count
- active_run_count
- duration_sample_count
- min_duration_ms
- p50_duration_ms
- p90_duration_ms
- p95_duration_ms
- max_duration_ms
- average_duration_ms
- throughput_runs_per_hour
- started_by_day
turn_intent
- state
- phase
- source
- session_mode
- workflow_phase
- prompt_handoff
- fallback
session_mode_status
- scope
- current_mode
- available_modes
- override
- deferred
session_posture_status
- scope
- desired_posture
- committed_state
- conditions
- notes
interruptibility
- state
- action
- source
- target_type
- scope
- queue_policy
- child_policy
- background_job_policy
- reason
adaptive_orchestration
warnings — non-fatal notices for optional sections that could not be loaded
recent_runs — raw run-ledger rows in the same session family as Agent Debugger run list: session_id matches the requested id, or parent_session_id matches (so spawned work stored under a child session_id is included when you query the parent session)

recent_run_usage is bounded twice: by the recent-run window and by a per-run provider-event cap. When one or more runs exceed that cap, the payload sets provider_events_truncated = true, reports the cap in provider_event_limit_per_run, sets provider_events_sampled = true, increments truncated_run_count, and reports how many provider events were included or omitted through sampled_provider_event_count and omitted_provider_event_count. Token and call totals are computed from the included provider events.

metadata_decode_errors and metadata_numeric_parse_errors are non-fatal aggregation counters. Decode errors skip the affected provider event; numeric parse errors zero only the affected token counter. Aggregation continues for the rest of the window.

warnings records optional sections that could not be loaded. For example, continuity or historian summaries can be omitted with a warning while session state, run usage, recent runs, and adaptive orchestration remain available.

recent_run_performance is computed from BMO’s durable agent_runs timestamps for the same bounded recent-run window. It does not parse host transcript logs. Runs without both started_at_ms and a later completed_at_ms count toward status and throughput, but not duration percentiles.

adaptive_orchestration is the same parity object documented in Adaptive orchestration parity, including decision_trace when present.

session_mode_status is the same shared status family used by /mode and the session_mode tool. On the detached session-observability surface it is explicitly session-backed, not a claim about live TUI-local pending state for a different session shell. The bounded fields answer: current mode, reachable mode set, whether a mode switch is deferred in the live coordinator queue, and the latest durable autoselect/manual-override fact the app can observe.

session_posture_status is the canonical session-regulation status family used by get_session_posture_status, bmo_get_session_posture_status, and GET /v1/sessions/{id}/posture/status. It distinguishes the genome-owned desired posture from the choreography-owned committed state, including desired and observed generations, pending-transition conditions, and session-scoped operator locks.

turn_intent is a metadata-only status snapshot for the current conversation turn. It combines the latest durable run/phase signal, current session mode, current staged-workflow phase when available, and bounded prompt handoff segment metadata. It does not include prompt bodies, tool arguments, file paths, or raw pending-submit ids. The TUI may overlay a local pending_submit or stale_rejected state before durable run evidence exists; HTTP, native-tool, and MCP surfaces read the app-owned live/runtime snapshot.

interruptibility is a metadata-only status snapshot for cancellation and recovery posture. Durable agent_runs rows provide the base state for active, canceled, terminal, and child-agent runs; accepted live cancels and TUI foreground cancels can overlay cancel_requested or canceling until durable run status catches up. It does not include prompts, tool arguments, child-agent output, or job output.

Durable terminal evidence wins over non-terminal local cancel evidence for the same target, so a completed or canceled run does not regress to canceling because process-local evidence is still present. Repeated cancel pressure remains bounded: the snapshot keeps one target, scope, queue policy, child policy, and background-job policy instead of accumulating duplicate cancel events.

Cue-ledger fields

The cue-ledger payload is the same session-family projection used by the TUI cue dialog. It answers:

which actors are active
which cue is waiting, running, recovering, or blocked
which action is expected next
which evidence should confirm that move
which recovery or control reference is available

It is built by internal/app/run_cue_ledger.go from the existing session-family run ledger and event rows. It does not add a second store.

Trace-lens fields

The trace-lens payload is intentionally narrow:

run_id
ordered events
- only kind=tool rows
- success/failure outcome
- duration metadata
- execution order by sequence number

It is not a replacement for get_agent_run_events or the debugger timeline.

Request controls

The family accepts bounded selectors:

session summary:
- HTTP query param: recent_runs_limit
- tool/MCP param: recent_runs_limit
cue ledger:
- tool/MCP param: limit
trace lens:
- tool/MCP param: run_id
- HTTP route param: {run_id}

HTTP validation: If recent_runs_limit is present but not a base-10 integer, the route returns 400 with invalid recent_runs_limit. Omitted or numeric values are normalized server-side: non-positive limits fall back to the default window, and large values are capped before reading run-ledger records.

What each view is for

Use session summary when you want:

a compact session usage summary
per-provider/model token breakdown for recent runs
run latency percentiles and recent throughput from the run ledger
the current adaptive parity snapshot without replaying the full run ledger
a stable operator payload for scripts, diagnostics, and support flows

Use cue ledger when you need to answer which actor acts next, which cue is waiting or recovering, and which evidence or control should confirm the next move.

Use trace lens when you need a bounded tool-call-only read for one run.

Use the Agent Debugger or get_agent_run_events when you need full ordered run and event history.

Relationship to runtime features

runtime_features distinguishes the live access paths:

session_observability_api
session_observability_tool
session_posture_status_api
session_posture_status_tool
run_cue_ledger_api
run_cue_ledger_tool
inspect_run_trace_api
inspect_run_trace_tool

That lets operators tell whether the HTTP routes, native tools, or both are being exercised on a live instance.