Skip to content

Openai Compat Api

This page is the authoritative reference for the OpenAI-compatible HTTP routes BMO exposes. Shapes here are pinned by TestOpenAICompatGoldenContract in internal/server/openai_compat_golden_contract_test.go; any drift is a CI failure.

For an integration-level walkthrough see OpenAI-compatible API. For known divergences from the OpenAI specification see OpenAI-compatible API gaps.

All routes require Authorization: Bearer <token> when an auth token is configured on the HTTP/SSE hub. The token value comes from BMO’s HTTP server config (options.http_server.auth_token or environment); see HTTP/SSE server for setup.

MethodPathPurposeGated by openai_compat.enabled
GET/v1/modelsList BMO-backed model entriesno — always registered
POST/v1/chat/completionsChat completions (streaming and non-streaming)yes
GET/v1/openai-compat/postureSummary posture snapshot (state, bounded route counts, top client UAs)no — always registered
GET/v1/openai-compat/runsPaginated run-ledger listno — always registered
GET/v1/openai-compat/runs/{request_id}/eventsPer-run event streamno — always registered

Returns the list of model entries BMO can route to.

Terminal window
curl -sS \
-H "Authorization: Bearer $BMO_TOKEN" \
http://localhost:9000/v1/models

Response envelope (200 OK, application/json):

{
"object": "list",
"data": [
{
"id": "copilot/claude-sonnet-4.6",
"object": "model",
"created": 1716969600
}
]
}

Per-row fields (object is always "model", created is a Unix timestamp). Additional fields may be present and are non-breaking.

Accepts the OpenAI chat completions request envelope and routes the prompt into the BMO coordinator.

Minimal non-streaming request:

Terminal window
curl -sS \
-H "Authorization: Bearer $BMO_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "copilot/claude-sonnet-4.6",
"messages": [{"role":"user","content":"hello"}]
}' \
http://localhost:9000/v1/chat/completions

Streaming request (stream: true flips to SSE; Content-Type: text/event-stream):

Terminal window
curl -sS -N \
-H "Authorization: Bearer $BMO_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "copilot/claude-sonnet-4.6",
"messages": [{"role":"user","content":"hello"}],
"stream": true
}' \
http://localhost:9000/v1/chat/completions

The 200 response shapes (openAIChatCompletionResponse for non-streaming, openAIChatCompletionChunk per SSE frame for streaming) are pinned by struct unmarshaling across routes_openai_compat_test.go — any field rename or removal fails those tests in lockstep.

  • openai_compat.enabled = false → route returns 404 (not registered).
  • Empty messages → 400 with the flat error envelope.
  • Unknown model with no resolver fallback → 400 with the same flat envelope.

Paginated list of recorded compat requests from the SQLite openai_compat_runs ledger.

Terminal window
curl -sS \
-H "Authorization: Bearer $BMO_TOKEN" \
"http://localhost:9000/v1/openai-compat/runs?limit=10"

Response envelope on an empty ledger:

{
"generated_at": 1716969600,
"total": 0,
"runs": null
}

Notes:

  • runs is JSON null on empty (not []); see gaps for the planned migration to [].
  • total is the total matching count, not the page size.
  • Query parameters: limit (default and max bounded server-side), offset, and ledger filters. See openAICompatRunsListHandler in internal/server for the live filter set.

GET /v1/openai-compat/runs/{request_id}/events

Section titled “GET /v1/openai-compat/runs/{request_id}/events”

Per-run event stream for a single ledger row.

Terminal window
curl -sS \
-H "Authorization: Bearer $BMO_TOKEN" \
"http://localhost:9000/v1/openai-compat/runs/abc-123/events"

Response envelope:

{
"generated_at": 1716969600,
"request_id": "abc-123",
"total": 0,
"events": null
}

Behavior on unknown request_id: returns 200 with total: 0 and empty events — not 404. This is intentional (avoids leaking ledger existence) and is pinned by the golden contract.

Non-streaming errors on every compat route use BMO’s flat error shape:

{
"error": "messages array must not be empty",
"code": 400
}

The error field is a plain string and code echoes the HTTP status.

This diverges from the OpenAI canonical shape ({"error": {"message": "...", "type": "..."}}). Streaming errors do use the canonical shape via writeSSEError. See OpenAI-compatible API gaps for the divergence and the canonicalization plan.

Streaming errors on POST /v1/chat/completions (when stream: true) emit the OpenAI canonical envelope as the final SSE event before the [DONE] marker:

event: error
data: {"error":{"message":"…"}}
data: [DONE]

Every request emits two slog records on the default logger:

  • openai_compat.fired at request entry, with request_id, route, model, and bounded client_ua.
  • openai_compat.action at terminal arm (success or failure), with outcome, duration_ms, and contextual fields.

For copy-paste filter recipes see agent tracing — OpenAI-compat per-route filters.

The same fields drive:

  • /openai-compat slash command (TUI live posture)
  • GET /v1/openai-compat/posture (live summary posture)
  • get_openai_compat_status / bmo_get_openai_compat_status (agent-native summary posture)
  • bmo config show-openai-compat (CLI snapshot)
  • The run-ledger HTTP routes above