Semantic Code Search

Semantic Code Search lets the agent find code by meaning rather than by literal text patterns. Instead of writing regex for grep, the agent describes what it is looking for in natural language and receives ranked results based on vector similarity.

Tools

Tool	Description
`embedding_status()`	Read-only posture check for semantic search readiness, model cache, sidecar state, and index health.
`semantic_search(query, [directory], [top_k])`	Natural language code search. “Find authentication handling code” returns the most semantically relevant chunks across the codebase.
`find_similar_code(file, line, [top_k], [threshold])`	Clone and near-duplicate detection. Given a reference location, returns code chunks ranked by cosine similarity. Scores above 0.9 indicate near-duplicates; above 0.8 indicates structurally similar code.

Configuration

[options.embedding]
enabled = true
max_chunks = 100000

Option	Default	Description
`enabled`	`true`	Enable the embedding service and semantic search tools
`max_chunks`	`100000`	Safety limit on total indexed chunks per directory

First-use checklist

Enable [options.embedding] and set max_chunks high enough for the directory.
Run bmo embedding download to pre-cache the default model when the machine has network access.
Run bmo embedding status --json to inspect sidecar, model-cache, and index posture without starting the sidecar.
Ask the agent to call embedding_status before semantic_search if readiness is uncertain.
Run the first semantic_search; the index is built lazily for the requested directory.
If posture is disabled, unavailable, stale, degraded, or empty, use grep and view while fixing the model cache, sidecar, scope, or index.

GPU acceleration

The embedding service automatically detects available hardware at startup:

Platform	Detection	Provider
macOS (Apple Silicon)	Architecture plus `machdep.cpu.brand_string`	CoreML / Metal
Linux (NVIDIA)	`nvidia-smi`	CUDA
Other / no GPU	Automatic fallback	CPU

No configuration is needed. The service picks the fastest available provider and logs the selected device on first use.

How indexing works

Indexing is lazy: the first semantic_search call on a directory triggers a full index build. Subsequent searches reuse the in-memory index.

Gitignore-aware — uses the project’s .gitignore rules to skip ignored files and directories.
PathScope-confined — when a PathScope is active, only files within the allowed scope are indexed.
Text-only — binary files and files larger than 1 MB are skipped automatically.
Chunked — source files are split into overlapping ~512-token windows so that large files produce multiple searchable chunks.

Indexing typically takes 10-30 seconds depending on codebase size and GPU availability.

Status and troubleshooting

bmo embedding status and the embedding_status tool report a shared posture: disabled, uninitialized, initializing, ready, unavailable, indexing, degraded, stale index, or empty index. The TUI /embedding inspector and /v1/runtime/features expose the same state classes.

Common failure classes:

Model not cached: run bmo embedding download or use lexical tools until the cache is available.
Missing or incompatible sidecar: install/build bmo-embedding with ONNX support; the main bmo binary stays CGO-free.
Empty index: no searchable text chunks were indexed for that directory, or scope rules excluded them.
Stale or degraded index: rebuild by running semantic search again after fixing changed/deleted/capped/corrupt index inputs.
Path-scope rejection: narrow the directory to the allowed workspace path.
Sidecar restart exhaustion: inspect logs and fall back to grep/view until the sidecar can start cleanly.

Usage examples

Find code related to a concept:

semantic_search(query: "retry logic with exponential backoff")

Scope the search to a subdirectory:

semantic_search(query: "database connection pooling", directory: "internal/db")

Find code similar to a specific location (clone detection):

find_similar_code(file: "internal/auth/handler.go", line: 42, threshold: 0.8)

Persistent Memory uses a separate embedding system for memory search. Semantic Code Search operates on source files in the working tree, not on stored memories.