Semantic Code Search
Semantic Code Search lets the agent find code by meaning rather than by literal text patterns. Instead of writing regex for grep, the agent describes what it is looking for in natural language and receives ranked results based on vector similarity.
| Tool | Description |
|---|---|
embedding_status() | Read-only posture check for semantic search readiness, model cache, sidecar state, and index health. |
semantic_search(query, [directory], [top_k]) | Natural language code search. “Find authentication handling code” returns the most semantically relevant chunks across the codebase. |
find_similar_code(file, line, [top_k], [threshold]) | Clone and near-duplicate detection. Given a reference location, returns code chunks ranked by cosine similarity. Scores above 0.9 indicate near-duplicates; above 0.8 indicates structurally similar code. |
Configuration
Section titled “Configuration”[options.embedding]enabled = truemax_chunks = 100000| Option | Default | Description |
|---|---|---|
enabled | true | Enable the embedding service and semantic search tools |
max_chunks | 100000 | Safety limit on total indexed chunks per directory |
First-use checklist
Section titled “First-use checklist”- Enable
[options.embedding]and setmax_chunkshigh enough for the directory. - Run
bmo embedding downloadto pre-cache the default model when the machine has network access. - Run
bmo embedding status --jsonto inspect sidecar, model-cache, and index posture without starting the sidecar. - Ask the agent to call
embedding_statusbeforesemantic_searchif readiness is uncertain. - Run the first
semantic_search; the index is built lazily for the requested directory. - If posture is disabled, unavailable, stale, degraded, or empty, use
grepandviewwhile fixing the model cache, sidecar, scope, or index.
GPU acceleration
Section titled “GPU acceleration”The embedding service automatically detects available hardware at startup:
| Platform | Detection | Provider |
|---|---|---|
| macOS (Apple Silicon) | Architecture plus machdep.cpu.brand_string | CoreML / Metal |
| Linux (NVIDIA) | nvidia-smi | CUDA |
| Other / no GPU | Automatic fallback | CPU |
No configuration is needed. The service picks the fastest available provider and logs the selected device on first use.
How indexing works
Section titled “How indexing works”Indexing is lazy: the first semantic_search call on a directory triggers a full index build. Subsequent searches reuse the in-memory index.
- Gitignore-aware — uses the project’s
.gitignorerules to skip ignored files and directories. - PathScope-confined — when a PathScope is active, only files within the allowed scope are indexed.
- Text-only — binary files and files larger than 1 MB are skipped automatically.
- Chunked — source files are split into overlapping ~512-token windows so that large files produce multiple searchable chunks.
Indexing typically takes 10-30 seconds depending on codebase size and GPU availability.
Status and troubleshooting
Section titled “Status and troubleshooting”bmo embedding status and the embedding_status tool report a shared posture:
disabled, uninitialized, initializing, ready, unavailable, indexing, degraded,
stale index, or empty index. The TUI /embedding inspector and
/v1/runtime/features expose the same state classes.
Common failure classes:
- Model not cached: run
bmo embedding downloador use lexical tools until the cache is available. - Missing or incompatible sidecar: install/build
bmo-embeddingwith ONNX support; the mainbmobinary stays CGO-free. - Empty index: no searchable text chunks were indexed for that directory, or scope rules excluded them.
- Stale or degraded index: rebuild by running semantic search again after fixing changed/deleted/capped/corrupt index inputs.
- Path-scope rejection: narrow the directory to the allowed workspace path.
- Sidecar restart exhaustion: inspect logs and fall back to
grep/viewuntil the sidecar can start cleanly.
Usage examples
Section titled “Usage examples”Find code related to a concept:
semantic_search(query: "retry logic with exponential backoff")Scope the search to a subdirectory:
semantic_search(query: "database connection pooling", directory: "internal/db")Find code similar to a specific location (clone detection):
find_similar_code(file: "internal/auth/handler.go", line: 42, threshold: 0.8)Related
Section titled “Related”Persistent Memory uses a separate embedding system for memory search. Semantic Code Search operates on source files in the working tree, not on stored memories.