Performance
Performance
Section titled “Performance”BMO performance is about two related outcomes:
- Time to first visible progress so the operator sees honest movement early.
- Total turn cost so long sessions, tools, and providers stay bounded.
The public docs focus on the operator-facing signals for those outcomes.
Current runtime posture
Section titled “Current runtime posture”| Area | Current behavior | Why it matters |
|---|---|---|
| Session history | Prompt assembly uses a bounded prompt-history window. Unsummarized sessions load only the recent tail, while summarized sessions load the summary checkpoint plus post-summary messages. | Deep sessions do not reread the whole transcript on every turn. |
| Summary reuse | Summary messages carry hidden provenance metadata that lets the runtime frame the summarized prefix without rehydrating it on every turn. | Long-running sessions stay responsive after compaction. |
| Prompt assembly reuse | The session agent keeps a local prompt-history packet cache and invalidates it when message create/update/delete events land. | Repeated turns avoid rebuilding identical framed history. |
| Runtime evidence | bmo runtime latency separates input preparation, provider, tool, data-path, and render timing instead of collapsing them into one total. | Operators can tell whether a slow turn is provider-bound, context-bound, or UI-bound. |
| Bench evidence | Focused benchmark suites cover prompt history, tool-output shaping, choreography reads, and other hot-path helpers. | Regressions can be caught with targeted evidence instead of anecdotal slowness. |
Which evidence to use
Section titled “Which evidence to use”| Need | Surface |
|---|---|
| Representative benchmark numbers | Benchmark Results |
| Per-turn latency attribution | bmo runtime latency |
| Cold-vs-warm startup comparison | bmo startup proof and bmo runtime latency --startup-proof ... |
| Prompt-cache timing comparison | bmo runtime latency --cache-proof ... |
| Historical benchmark numbers | Benchmark Results |