Skip to content

Performance

BMO performance is about two related outcomes:

Time to first visible progress so the operator sees honest movement early.
Total turn cost so long sessions, tools, and providers stay bounded.

The public docs focus on the operator-facing signals for those outcomes.

Current runtime posture

Area	Current behavior	Why it matters
Session history	Prompt assembly uses a bounded prompt-history window. Unsummarized sessions load only the recent tail, while summarized sessions load the summary checkpoint plus post-summary messages.	Deep sessions do not reread the whole transcript on every turn.
Summary reuse	Summary messages carry hidden provenance metadata that lets the runtime frame the summarized prefix without rehydrating it on every turn.	Long-running sessions stay responsive after compaction.
Prompt assembly reuse	The session agent keeps a local prompt-history packet cache and invalidates it when message create/update/delete events land.	Repeated turns avoid rebuilding identical framed history.
Runtime evidence	`bmo runtime latency` separates input preparation, provider, tool, data-path, and render timing instead of collapsing them into one total.	Operators can tell whether a slow turn is provider-bound, context-bound, or UI-bound.
Bench evidence	Focused benchmark suites cover prompt history, tool-output shaping, choreography reads, and other hot-path helpers.	Regressions can be caught with targeted evidence instead of anecdotal slowness.

Which evidence to use

Need	Surface
Per-turn latency attribution	`bmo runtime latency`
Cold-vs-warm startup comparison	`bmo startup proof` and `bmo runtime latency --startup-proof ...`
Prompt-cache timing comparison	`bmo runtime latency --cache-proof ...`