openai-compat-api

OpenAI-compatible API reference

This page is the authoritative reference for the OpenAI-compatible HTTP routes BMO exposes. Shapes here are pinned by TestOpenAICompatGoldenContract in internal/server/openai_compat_golden_contract_test.go; any drift is a CI failure.

For an integration-level walkthrough see OpenAI-compatible API. For known divergences from the OpenAI specification see OpenAI-compatible API gaps.

Authentication

All routes require Authorization: Bearer <token> when an auth token is configured on the HTTP/SSE hub. The token value comes from BMO’s HTTP service boundary (services.http.token, services.autopilot.token, or a bmo service start http|autopilot --token override); see HTTP/SSE server for setup.

Routes

Method	Path	Purpose	Gated by `openai_compat.enabled`
`GET`	`/v1/models`	List BMO-backed model entries	no — always registered
`POST`	`/v1/chat/completions`	Chat completions (streaming and non-streaming)	yes
`GET`	`/v1/openai-compat/posture`	Summary posture snapshot (state, bounded provider/auth pressure cohorts, recovery signal, bounded route counts, top client UAs)	no — always registered
`GET`	`/v1/openai-compat/runs`	Paginated run-ledger list	no — always registered
`GET`	`/v1/openai-compat/runs/{request_id}/events`	Per-run event stream	no — always registered

`GET /v1/models`

Returns the list of model entries BMO can route to.

curl -sS \
  -H "Authorization: Bearer $BMO_TOKEN" \
  http://localhost:9000/v1/models

Response envelope (200 OK, application/json):

{
  "object": "list",
  "data": [
    {
      "id": "copilot/claude-sonnet-4.6",
      "object": "model",
      "created": 1716969600
    }
  ]
}

Per-row fields (object is always "model", created is a Unix timestamp). Additional fields may be present and are non-breaking.

`POST /v1/chat/completions`

Accepts the OpenAI chat completions request envelope and routes the prompt into the BMO coordinator.

Minimal non-streaming request:

curl -sS \
  -H "Authorization: Bearer $BMO_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "copilot/claude-sonnet-4.6",
    "messages": [{"role":"user","content":"hello"}]
  }' \
  http://localhost:9000/v1/chat/completions

Streaming request (stream: true flips to SSE; Content-Type: text/event-stream):

curl -sS -N \
  -H "Authorization: Bearer $BMO_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "copilot/claude-sonnet-4.6",
    "messages": [{"role":"user","content":"hello"}],
    "stream": true
  }' \
  http://localhost:9000/v1/chat/completions

The 200 response shapes (openAIChatCompletionResponse for non-streaming, openAIChatCompletionChunk per SSE frame for streaming) are pinned by struct unmarshaling across routes_openai_compat_test.go — any field rename or removal fails those tests in lockstep.

Request gates

openai_compat.enabled = false → route returns 404 (not registered).
Empty messages → 400 with the flat error envelope.
Unknown model with no resolver fallback → 400 with the same flat envelope.

`GET /v1/openai-compat/runs`

Paginated list of recorded compat requests from the SQLite openai_compat_runs ledger.

curl -sS \
  -H "Authorization: Bearer $BMO_TOKEN" \
  "http://localhost:9000/v1/openai-compat/runs?limit=10"

Response envelope on an empty ledger:

{
  "generated_at": 1716969600,
  "total": 0,
  "runs": null
}

Notes:

runs is JSON null on empty (not []); see gaps for the planned migration to [].
total is the total matching count, not the page size.
Query parameters: limit (default and max bounded server-side), offset, and ledger filters. See openAICompatRunsListHandler in internal/server for the live filter set.

`GET /v1/openai-compat/runs/{request_id}/events`

Per-run event stream for a single ledger row.

curl -sS \
  -H "Authorization: Bearer $BMO_TOKEN" \
  "http://localhost:9000/v1/openai-compat/runs/abc-123/events"

Response envelope:

{
  "generated_at": 1716969600,
  "request_id": "abc-123",
  "total": 0,
  "events": null
}

Behavior on unknown request_id: returns 200 with total: 0 and empty events — not 404. This is intentional (avoids leaking ledger existence) and is pinned by the golden contract.

Error envelopes (non-streaming)

Non-streaming errors on every compat route use BMO’s flat error shape:

{
  "error": "messages array must not be empty",
  "code": 400
}

The error field is a plain string and code echoes the HTTP status.

This diverges from the OpenAI canonical shape ({"error": {"message": "...", "type": "..."}}). Streaming errors do use the canonical shape via writeSSEError. See OpenAI-compatible API gaps for the divergence and the canonicalization plan.

Error envelopes (streaming)

Streaming errors on POST /v1/chat/completions (when stream: true) emit the OpenAI canonical envelope as the final SSE event before the [DONE] marker:

event: error
data: {"error":{"message":"…"}}

data: [DONE]

Observability

Every request emits two slog records on the default logger:

openai_compat.fired at request entry, with request_id, route, model, and bounded client_ua.
openai_compat.action at terminal arm (success or failure), with outcome, duration_ms, and contextual fields.

For copy-paste filter recipes see agent tracing — OpenAI-compat per-route filters.

The same fields drive:

/openai-compat slash command (TUI live posture)
GET /v1/openai-compat/posture (live summary posture)
get_openai_compat_status / bmo_get_openai_compat_status (agent-native summary posture)
bmo config show-openai-compat (CLI snapshot)
The run-ledger HTTP routes above

The summary posture remains metadata-only. It surfaces bounded provider_pressure_count, auth_rejection_count, and recovery_observed fields so operators can tell whether the current issue is downstream-provider pressure, auth-boundary churn, or a recently recovered compat surface without opening raw ledger rows first.