Skip to content

Run observability parity

Run observability is the parity family behind BMO’s bounded run-inspection surfaces. It does not replace the Agent Debugger; it complements it with cheaper machine-readable views when you need bounded summary, next-step cueing, or the tool-call-only lens instead of a full event replay.

Maturity: Maintainer-facing reference surface. Use it for diagnostics, support scripts, and parity checks across CLI, TUI, HTTP, native-tool, and MCP access. It is not a replacement for the interactive run investigation flow in Agent Debugger.

The family is split into three sibling views:

ViewScopeLive surfaces
Session summarysession familyGET /v1/sessions/{id}/observability, session_observability, bmo_get_session_observability, /run-observability
Cue ledgersession familyGET /v1/sessions/{id}/run-cue-ledger, run_cue_ledger, bmo_get_run_cue_ledger, /run-observability, /run_cue_ledger
Trace lensone run in one session familyGET /v1/agent-runs/{run_id}/trace, inspect_run_trace, bmo_inspect_run_trace, /run-observability

All live surfaces read app-owned builders in internal/app:

  • SessionObservabilityPayload
  • RunCueLedgerPayload
  • RunTracePayload

The config-only shell discoverability view is:

  • bmo config show-run-observability

That CLI command intentionally does not claim live process visibility.

The session-summary payload includes:

  • session_id
  • session_state
    • prompt_tokens
    • completion_tokens
    • accumulated_cost_usd
  • recent_run_usage
    • run_count
    • provider_call_count
    • provider_event_limit_per_run
    • provider_events_truncated
    • provider_events_sampled
    • sampled_provider_event_count
    • omitted_provider_event_count
    • truncated_run_count
    • total_tokens
    • input_tokens
    • output_tokens
    • cache_read_tokens
    • cache_create_tokens
    • metadata_decode_errors
    • metadata_numeric_parse_errors
    • provider_models
  • recent_run_performance
    • run_count
    • completed_run_count
    • failed_run_count
    • canceled_run_count
    • active_run_count
    • duration_sample_count
    • min_duration_ms
    • p50_duration_ms
    • p90_duration_ms
    • p95_duration_ms
    • max_duration_ms
    • average_duration_ms
    • throughput_runs_per_hour
    • started_by_day
  • turn_intent
    • state
    • phase
    • source
    • session_mode
    • workflow_phase
    • prompt_handoff
    • fallback
  • session_mode_status
    • scope
    • current_mode
    • available_modes
    • override
    • deferred
  • interruptibility
    • state
    • action
    • source
    • target_type
    • scope
    • queue_policy
    • child_policy
    • background_job_policy
    • reason
  • adaptive_orchestration
  • warnings — non-fatal notices for optional sections that could not be loaded
  • recent_runs — raw run-ledger rows in the same session family as Agent Debugger run list: session_id matches the requested id, or parent_session_id matches (so spawned work stored under a child session_id is included when you query the parent session)

recent_run_usage is bounded twice: by the recent-run window and by a per-run provider-event cap. When one or more runs exceed that cap, the payload sets provider_events_truncated = true, reports the cap in provider_event_limit_per_run, sets provider_events_sampled = true, increments truncated_run_count, and reports how many provider events were included or omitted through sampled_provider_event_count and omitted_provider_event_count. Token and call totals are computed from the included provider events.

metadata_decode_errors and metadata_numeric_parse_errors are non-fatal aggregation counters. Decode errors skip the affected provider event; numeric parse errors zero only the affected token counter. Aggregation continues for the rest of the window.

warnings records optional sections that could not be loaded. For example, continuity or historian summaries can be omitted with a warning while session state, run usage, recent runs, and adaptive orchestration remain available.

recent_run_performance is computed from BMO’s durable agent_runs timestamps for the same bounded recent-run window. It does not parse host transcript logs. Runs without both started_at_ms and a later completed_at_ms count toward status and throughput, but not duration percentiles.

adaptive_orchestration is the same parity object documented in Adaptive orchestration parity, including decision_trace when present.

session_mode_status is the same shared status family used by /mode and the session_mode tool. On the detached session-observability surface it is explicitly session-backed, not a claim about live TUI-local pending state for a different session shell. The bounded fields answer: current mode, reachable mode set, whether a mode switch is deferred in the live coordinator queue, and the latest durable autoselect/manual-override fact the app can observe.

turn_intent is a metadata-only status snapshot for the current conversation turn. It combines the latest durable run/phase signal, current session mode, current staged-workflow phase when available, and bounded prompt handoff segment metadata. It does not include prompt bodies, tool arguments, file paths, or raw pending-submit ids. The TUI may overlay a local pending_submit or stale_rejected state before durable run evidence exists; HTTP, native-tool, and MCP surfaces read the app-owned live/runtime snapshot.

interruptibility is a metadata-only status snapshot for cancellation and recovery posture. Durable agent_runs rows provide the base state for active, canceled, terminal, and child-agent runs; accepted live cancels and TUI foreground cancels can overlay cancel_requested or canceling until durable run status catches up. It does not include prompts, tool arguments, child-agent output, or job output.

The cue-ledger payload is the same session-family projection used by the TUI cue dialog. It answers:

  • which actors are active
  • which cue is waiting, running, recovering, or blocked
  • which action is expected next
  • which evidence should confirm that move
  • which recovery or control reference is available

It is built by internal/app/run_cue_ledger.go from the existing session-family run ledger and event rows. It does not add a second store.

The trace-lens payload is intentionally narrow:

  • run_id
  • ordered events
    • only kind=tool rows
    • success/failure outcome
    • duration metadata
    • execution order by sequence number

It is not a replacement for get_agent_run_events or the debugger timeline.

The family accepts bounded selectors:

  • session summary:
    • HTTP query param: recent_runs_limit
    • tool/MCP param: recent_runs_limit
  • cue ledger:
    • tool/MCP param: limit
  • trace lens:
    • tool/MCP param: run_id
    • HTTP route param: {run_id}

HTTP validation: If recent_runs_limit is present but not a base-10 integer, the route returns 400 with invalid recent_runs_limit. Omitted or numeric values are normalized server-side: non-positive limits fall back to the default window, and large values are capped before reading run-ledger records.

Use session summary when you want:

  • a compact session usage summary
  • per-provider/model token breakdown for recent runs
  • run latency percentiles and recent throughput from the run ledger
  • the current adaptive parity snapshot without replaying the full run ledger
  • a stable operator payload for scripts, diagnostics, and support flows

Use cue ledger when you need to answer which actor acts next, which cue is waiting or recovering, and which evidence or control should confirm the next move.

Use trace lens when you need a bounded tool-call-only read for one run.

Use the Agent Debugger or get_agent_run_events when you need full ordered run and event history.

runtime_features distinguishes the live access paths:

  • session_observability_api
  • session_observability_tool
  • run_cue_ledger_api
  • run_cue_ledger_tool
  • inspect_run_trace_api
  • inspect_run_trace_tool

That lets operators tell whether the HTTP routes, native tools, or both are being exercised on a live instance.