Skip to content

Agent Debugger

Agent Debugger is a contributor-facing replay view for BMO runs. It shows what the agent did in order, including status changes, tool activity, assistant output, and linked artifacts such as checkpoints and file activity.

Use it when a long run went wrong and you want evidence, not guesswork.

Maturity: Maintainer-facing observability surface. Operators can use it for support and recovery, but it is a replay/audit tool rather than the normal first-run path.

When a run fails, stalls, or produces surprising output, chat text is not enough to explain the sequence. The debugger gives you a persisted event timeline so you can find the first bad transition, inspect linked artifacts, and decide whether to retry or fork from a checkpoint.

In the TUI, open the command picker and run:

/debugger

Alias:

/runs

The dialog scopes to the current session first, so you can inspect recent runs without leaving the conversation you were working in.

For the compact “who acts next?” view, run:

/cue-ledger

That dialog uses the same session-family run rows as the debugger, but projects them into active actors, cue status, expected action, expected evidence, and available recovery controls.

The debugger has two levels:

  • Run list — recent runs in the current session family: rows where session_id or parent_session_id matches the parent session (so the parent’s own run rows and any spawned child run rows appear together in one list)
  • Run timeline — ordered events for a selected run

A run can include:

  • prompt assembly
  • status/lifecycle changes
  • tool calls and tool results
  • provider usage
  • assistant text deltas
  • file read/write summaries
  • checkpoints created during the run
  • branch/fork lineage

This is a persisted replay view, not a live-only stream. You can inspect the timeline after the run is over.

  1. Open /debugger
  2. Pick the run that failed or behaved unexpectedly
  3. Step through the timeline to find the first bad transition
  4. Inspect any linked checkpoint or file activity
  5. Fork from a checkpoint-backed step if you want to retry from there

This is especially useful for:

  • long tool-heavy runs
  • sub-agent failures
  • regressions that only appear after a sequence of edits
  • understanding how a branch session was created
  • inspecting quality orchestration runs (candidate set, winner, judge outcome)
  • inspecting Quality Gates decisions such as warn, send_back, retry results, rubric failures, and evidence refs
  • inspecting Patch Proposals through the run and workstream events that produced or reviewed them
graph TB A["Agent run completes"] --> B["Run events persisted
to agent_runs table"] B --> C["Event timeline"] B --> D["Observability snapshot"] C --> E["TUI /debugger
step-by-step replay"] C --> F["HTTP /v1/agent-runs/events
ordered event stream"] C --> G["In-agent
get_agent_run_events tool"] D --> H["Run observability family
summary, cue, trace"] H --> I["HTTP /v1/sessions/id/observability
session summary"] H --> L["HTTP /v1/sessions/id/run-cue-ledger
operator cue book"] H --> M["HTTP /v1/agent-runs/id/trace
tool-call trace lens"] H --> N["Tools: session_observability,
run_cue_ledger, inspect_run_trace"] E -->|forkable step| J["Create branch session
from checkpoint"] H -->|current state| K["Adaptive parity,
usage, memory"]

This reflects how the debugger answers “what happened, step by step?” via the event timeline, while the Run Observability family answers bounded read questions about summary, next-step cueing, and one run’s tool-call trace lens.

When a selected step is backed by a usable session checkpoint, the debugger marks it as forkable.

Press:

f

to create a new branch session from the nearest valid checkpoint. The new session keeps lineage back to the original run, but the restore point is best-effort rather than perfect time travel.

If a step is inspectable but not forkable, the debugger will say so instead of pretending it can recreate state exactly.

Debugger continuity recording
A recording-backed demo shows BMO preserving enough run evidence and workspace trail context for an operator to recover from interruption safely.
KeyAction
up / downMove through runs or steps
enterOpen the selected run
left / backspaceGo back to the run list
ctrl+rRefresh the current view
fFork from the selected checkpoint-backed step
escClose the debugger

The debugger does not try to provide:

  • full terminal recording
  • exact token-by-token replay
  • automatic rerun of the original process
  • perfect reconstruction of every in-memory agent state transition

The goal is narrower: help you understand what happened, in order, with enough evidence to debug or branch safely.

Every agent run is listable and inspectable for audit and control. You can use the TUI debugger above, the server API, or in-agent tools—no separate tracking product.

When using BMO over HTTP, the same persisted data is available through authenticated endpoints:

  • GET /v1/agent-runs — List runs. Optional query params: session_id, family_session_id (parent session + child rows: session_id or parent_session_id match; use for a full tree in one list), run_id, parent_run_id, orchestration_run_id, agent_type, status, limit. If both session_id and family_session_id are set, family_session_id wins for the session filter.
  • GET /v1/agent-runs/{run_id} — Get a single run plus derived artifacts (e.g. branch name from the run record).
  • GET /v1/agent-runs/{run_id}/events — List ordered events for a run (timeline data).

When the agent run store is not configured, list and events return 200 with an empty array; GET /v1/agent-runs/{run_id} returns 503 (ledger unavailable).

These are intended for operator and contributor tooling rather than normal chat clients.

When the agent run store is configured, the agent can inspect runs the same way as the debugger:

  • list_agent_runs — List recent runs; use family_session_id to mirror the /debugger combined parent+child list, or session_id for a single session id.
  • get_agent_run_events — Get the event stream for a given run_id (use after list_agent_runs for details).
  • session_observability — Read usage and adaptive parity for the current session; recent run aggregation uses the same parent+child “family” scope as the debugger run list (not only session_id = parent), so child run rows count toward the bounded window.
  • run_cue_ledger — Read the same session-family evidence as a cue book: active actors, cue states, next actor, expected actions, expected evidence, and control references.
  • inspect_run_trace — Read the bounded tool-call-only lens for one run; use this when you want the ordered tools without the broader event stream.

That gives the agent parity with the TUI for auditing and debugging runs.

When you need the bounded run-observability family instead of the full run/event replay, use:

  • GET /v1/sessions/{id}/observability — HTTP snapshot for one session
  • GET /v1/sessions/{id}/run-cue-ledger — HTTP cue-oriented next-step ledger
  • GET /v1/agent-runs/{run_id}/trace — HTTP tool-call-only trace lens
  • session_observability — In-agent summary tool for the same bounded snapshot
  • run_cue_ledger — In-agent cue-oriented next-step tool
  • inspect_run_trace — In-agent trace-lens tool

This surface is complementary to the debugger:

  • the debugger answers “what happened, step by step?”
  • session observability answers “what is the current bounded summary for this session?”
  • run cue ledger answers “who acts next, with what expected evidence?”
  • inspect run trace answers “which tools ran in this one run?”

See Run Observability and Session observability parity.

For scheduled recipe runs, use the CLI to audit history: bmo schedule list and bmo schedule runs <id> (alias sessions) show jobs and their recent runs. See CLI reference and Automation & Headless.

Implementation reference: agent-run-ledger-sessions.md — how the spawn registry, transcript, and SQLite agent_runs differ; session_id vs parent_session_id; and which surfaces use the family run list.

Debugger timeline anatomy
Prompt assemblyContext, system prompt, mode, tools, and provider metadata.
Model responseReasoning state, text output, and requested tool calls.
Tool eventInputs, outputs, permission state, timing, and errors.
Follow-on stateAuto-debug, pruning, compaction, reflection, or schedule signals.
Final resultTerminal status, output, error, and recovery entrypoints.
Use the debugger to locate the first bad transition, not just the final error.
Session snapshot
A session snapshot is the bounded state BMO can use to explain what it knows now without replaying the whole transcript.