Skip to content

Performance

BMO performance is about two related outcomes:

  • Time to first visible progress so the operator sees honest movement early.
  • Total turn cost so long sessions, tools, and providers stay bounded.

The public docs focus on the operator-facing signals for those outcomes.

AreaCurrent behaviorWhy it matters
Session historyPrompt assembly uses a bounded prompt-history window. Unsummarized sessions load only the recent tail, while summarized sessions load the summary checkpoint plus post-summary messages.Deep sessions do not reread the whole transcript on every turn.
Summary reuseSummary messages carry hidden provenance metadata that lets the runtime frame the summarized prefix without rehydrating it on every turn.Long-running sessions stay responsive after compaction.
Prompt assembly reuseThe session agent keeps a local prompt-history packet cache and invalidates it when message create/update/delete events land.Repeated turns avoid rebuilding identical framed history.
Runtime evidencebmo runtime latency separates input preparation, provider, tool, data-path, and render timing instead of collapsing them into one total.Operators can tell whether a slow turn is provider-bound, context-bound, or UI-bound.
Bench evidenceFocused benchmark suites cover prompt history, tool-output shaping, choreography reads, and other hot-path helpers.Regressions can be caught with targeted evidence instead of anecdotal slowness.
NeedSurface
Representative benchmark numbersBenchmark Results
Per-turn latency attributionbmo runtime latency
Cold-vs-warm startup comparisonbmo startup proof and bmo runtime latency --startup-proof ...
Prompt-cache timing comparisonbmo runtime latency --cache-proof ...
Historical benchmark numbersBenchmark Results