OpenAI-compatible API

BMO exposes an OpenAI-compatible HTTP surface so any tool that speaks the OpenAI API can drive BMO sessions and the agent loop. This is the generic entry point — Open WebUI is the most comprehensive integration scenario, but the same surface works for the official OpenAI Go SDK, Continue, custom HTTP clients, and any other OpenAI-compatible tool.

Prerequisite: the headless HTTP/SSE hub (bmo service start http or bmo service start autopilot) must be running and reachable.

Quick start (Go `openai-go` SDK)

package main

import (
  "context"
  "fmt"

  "github.com/openai/openai-go"
  "github.com/openai/openai-go/option"
)

func main() {
  client := openai.NewClient(
    option.WithBaseURL("http://localhost:9000/v1/"),
    option.WithAPIKey("<your-bmo-auth-token>"),
  )

  resp, err := client.Chat.Completions.New(
    context.Background(),
    openai.ChatCompletionNewParams{
      Model: "copilot/claude-sonnet-4.6",
      Messages: []openai.ChatCompletionMessageParamUnion{
        openai.UserMessage("hello"),
      },
    },
  )
  if err != nil {
    panic(err)
  }

  fmt.Println(resp.Choices[0].Message.Content)
}

The model value is one of the entries returned by GET /v1/models. The api_key becomes the Authorization: Bearer … header BMO requires when an auth token is configured.

Quick start (Continue)

Add an OpenAI-compatible model to ~/.continue/config.json:

{
  "models": [
    {
      "title": "BMO",
      "provider": "openai",
      "model": "copilot/claude-sonnet-4.6",
      "apiBase": "http://localhost:9000/v1",
      "apiKey": "<your-bmo-auth-token>"
    }
  ]
}

Continue treats BMO as a regular OpenAI-compatible backend; chat, streaming, and completion all flow through BMO’s coordinator and tools.

What works today

Chat completions — POST /v1/chat/completions, both streaming (stream: true, SSE) and non-streaming. BMO’s full coordinator, tool invocation, and persistence run behind the OpenAI envelope.
Model listing — GET /v1/models returns the model entries BMO can route to.
Persisted run ledger — every compat request lands in the SQLite openai_compat_runs table with route, status, timing, and client UA. The /openai-compat slash, GET /v1/openai-compat/posture, get_openai_compat_status, bmo_get_openai_compat_status, bmo config show-openai-compat, and the HTTP runs/events routes (GET /v1/openai-compat/runs, GET /v1/openai-compat/runs/{id}/events) read from it.

What’s not implemented

The OpenAI-compatible surface aims at the chat-completions API; several spec corners are intentionally out of scope today (function-call shape divergences, response-format JSON-schema mode, fine-tuning, embeddings, audio, images, etc.). See the OpenAI-compatible API gaps reference for the current list.

Operating it

Live posture — TUI /openai-compat, GET /v1/openai-compat/posture, get_openai_compat_status, and bmo_get_openai_compat_status share one live summary snapshot: configuration state, recent runs, persistence-degradation status, bounded provider/auth pressure cohorts, recovery signal, per-route counts, and the top-5 client User-Agents pulled from the run ledger.
CLI snapshot — bmo config show-openai-compat prints the merged configuration and points at the live posture family for runtime data.
Tracing recipes — bounded slog records under openai_compat.fired and openai_compat.action, with per-route filters and top-UA queries: see the agent tracing recipes.
Run ledger HTTP API — GET /v1/openai-compat/runs for the paginated list and GET /v1/openai-compat/runs/{request_id}/events for the per-run event stream. Authoritative shape pinned by TestOpenAICompatGoldenContract.
Reference — the OpenAI-compatible API reference carries the route table, request/response shapes, and minimal curl examples derived from the golden contract test fixtures.

Edit Predictions (`POST /v1/completions`)

Zed Edit Predictions use the legacy OpenAI completions API (not chat completions). BMO exposes an opt-in thin FIM proxy at POST /v1/completions when [options.openai_compat.edit_predictions] enabled = true. The handler forwards Zed’s pre-formatted prompt to a configured fast model (for example ollama/qwen2.5-coder:7b) and does not enter the coordinator agent loop.

Configure the lane in bmo.toml, verify with bmo config show-openai-compat, then point Zed edit_predictions.open_ai_compatible_api at http://127.0.0.1:8080/v1/completions. Full operator recipe: Zed integration.

Open WebUI

Open WebUI is the most exercised OpenAI-compatible client against BMO. It surfaces lifecycle/infra agents, MCP-backed read-only deployments, and quality-orchestration features that the generic SDK clients above do not exercise. See the Open WebUI integration deep-dive for that scenario.