Skip to content

Openai Compat Gaps

This page is the honest scope statement for BMO’s OpenAI-compatible HTTP surface. It exists so adopters know what to expect before they wire up a client and hit a silent ignore.

For the route table and shapes BMO does implement, see OpenAI-compatible API reference. For the integration-level walkthrough see OpenAI-compatible API.

BMO’s chat completions endpoint accepts the same request envelope as OpenAI’s POST /v1/chat/completions, but only a subset of the fields on that envelope are actually wired into the BMO coordinator. The tables below classify every notable spec field into one of three buckets:

  1. Accepted and honored — BMO parses the field and routes it.
  2. Accepted but ignored — BMO parses the JSON (or silently drops the field via Go’s JSON decoder) without acting on it. Clients should not rely on the behavior described by the OpenAI spec.
  3. Rejected — BMO returns 400 when the field is present.

Source of truth: internal/server/routes_openai_compat.go (openAIChatCompletionRequest struct and the chat completions handler).

FieldBMO behavior
modelRouted to BMO’s model resolver. Bad/unknown models without a fallback resolver return 400.
messagesRequired, must be non-empty (400 otherwise). Routed into the BMO session/coordinator.
streamWhen true, response is SSE (text/event-stream) with [DONE] terminator. When false, single JSON envelope.
stream_options.include_usageWhen set with stream: true, BMO emits a final usage chunk before [DONE].

Within a messages entry: role, content, and tool_calls ({id, type, function:{name, arguments}}) are read by BMO. Tool calls flow into the BMO tool-invocation path subject to options.openai_compat.tool_policy.

These fields are part of the OpenAI Chat Completions spec but are not present on BMO’s request struct. Go’s JSON decoder drops them silently — clients can send them, but BMO does not act on them.

FieldOpenAI behaviorBMO behavior today
temperatureSampling temperature.Silently ignored. BMO uses provider/agent defaults.
top_pNucleus sampling.Silently ignored.
nMultiple completions per prompt.Silently ignored — BMO returns one choice.
max_tokens / max_completion_tokensOutput cap.Silently ignored. BMO uses provider/agent defaults and BMO’s own context-budget controls.
presence_penalty, frequency_penaltyPenalty knobs.Silently ignored.
stopStop sequences.Silently ignored.
seedDeterministic-sampling seed.Silently ignored.
logprobs, top_logprobsToken log-probabilities.Silently ignored — BMO does not surface logprobs.
logit_biasPer-token bias.Silently ignored.
response_format (incl. type: "json_schema")Forces a JSON schema.Silently ignored — BMO returns whatever the underlying model produces. Use BMO’s structured-output features instead.
tools (request-side declaration)Client-declared tool catalog.Silently ignored as a client-supplied catalog — BMO uses its own tool registry from the active agent’s policy. The spec-shaped tool_calls come from BMO regardless of what the client declared.
tool_choiceForce a specific tool / "none" / "auto".Silently ignored — tool gating comes from BMO’s tool_policy.
parallel_tool_callsToggle parallel tool calling.Silently ignored.
userPer-user tracing tag.Silently ignored — BMO uses its own session and request_id.
service_tierOpenAI-side priority hint.Silently ignored — has no analogue in BMO.
predictionPredicted-output speculative decoding.Silently ignored.
audio, modalities, metadata, storeMisc OpenAI-side knobs.Silently ignored.

A future iteration may begin honoring temperature, top_p, and max_tokens end-to-end. When that lands, those rows move into the Accepted and honored table above and a release note calls out the behavior change. Until then, treat sampling knobs as inert.

BMO does not reject any chat-completions field outright today. The only request-level 400 conditions are:

  • Empty messages.
  • Unknown model with no resolver fallback configured.
  • Malformed JSON body.

Adjacent OpenAI surfaces — Assistants, Files, Vector Stores, Realtime, Audio, Images, Embeddings, Fine-tuning — are not implemented at all (404, route not registered). They are explicitly out of BMO’s compat scope; BMO’s compat surface is /v1/models + /v1/chat/completions

  • the BMO-specific /v1/openai-compat/runs* extensions only.

Wire-shape divergences from the OpenAI spec

Section titled “Wire-shape divergences from the OpenAI spec”

These are differences in BMO’s response envelopes that adopters need to know about. All are pinned by TestOpenAICompatGoldenContract so any drift fails CI in lockstep with this doc.

Non-streaming error envelope is flat, not OpenAI-canonical

Section titled “Non-streaming error envelope is flat, not OpenAI-canonical”

BMO emits:

{ "error": "messages array must not be empty", "code": 400 }

OpenAI emits:

{ "error": { "message": "...", "type": "invalid_request_error" } }

This affects every non-streaming error on every compat route. Streaming errors on /v1/chat/completions (when stream: true) do use the OpenAI-canonical shape via writeSSEError, so the shape an adopter sees depends on whether the request was streaming.

A future iteration will canonicalize the non-streaming shape to match OpenAI; this doc and the golden contract test will both flip in the same change.

Empty list envelopes use JSON null, not []

Section titled “Empty list envelopes use JSON null, not []”

GET /v1/openai-compat/runs and GET /v1/openai-compat/runs/{id}/events return null for runs / events on an empty result set, not []. This is a BMO-specific extension surface (not part of the OpenAI spec) but is worth flagging because most JSON consumers expect arrays. A follow-up will migrate to [].

GET /v1/openai-compat/runs/{request_id}/events returns 200 with total: 0 and empty events for an unknown request_id rather than 404. This is intentional — it avoids leaking ledger existence to unauthenticated probes — and is pinned by the golden contract.

These are decisions, not gaps:

  • BMO’s compat surface is one-way: BMO appears as an OpenAI server to clients. BMO is not, and will not be, a translating proxy that lets OpenAI clients drive other providers (Anthropic, Cohere, etc.) via BMO. Use a dedicated proxy (LiteLLM, Helicone, …) for that.
  • BMO does not author SDKs in other languages — adopters use the existing OpenAI SDKs (openai-python, openai-node, etc.) directly.
  • The Assistants / Files / Vector Stores / Realtime / Audio / Images / Embeddings / Fine-tuning surfaces are out of scope. They are separate OpenAI APIs; BMO’s compat surface is chat completions + the BMO ledger extensions only.