Workspace overview
ohara is a Cargo workspace of seven library/binary crates plus an
out-of-band perf harness. Crate boundaries follow the rule keep the
core git-free: the only crates that depend on git2 or
tree-sitter are the adapters (ohara-git, ohara-parse); everything
else talks to them through async_trait ports defined in
ohara-core.
Dependency direction
┌────────────┐
│ ohara-cli │ ohara binary (commands/*)
└─────┬──────┘
│ uses
┌──────────────┼─────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ohara-mcp │ │ohara-git │ │ohara-parse│
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ adapters │
│ ▼ ▼
│ ┌────────────────────────┐
├────► │ ohara-core │ traits + orchestration
│ └─────────┬──────────────┘
│ │ uses
│ ▼
│ ┌──────────────────┐ ┌────────────┐
└────► │ ohara-storage │ │ ohara-embed│
│ (sqlite + vec + │ │ (fastembed)│
│ fts5) │ │ │
└──────────────────┘ └────────────┘
ohara-mcp is the second binary (ohara-mcp); it composes the same
core + storage + embed + git + parse stack the CLI uses.
Crates
ohara-core
The orchestration layer. Defines Indexer, Retriever, the
explain_change orchestrator, and the async_trait ports they talk
through (Storage, EmbeddingProvider, RerankProvider,
CommitSource, SymbolSource, BlameSource, CommitsBehind,
ProgressSink). Knows nothing about git or tree-sitter — those are
hidden behind the traits, which keeps the core unit-testable with
in-memory fakes.
ohara-storage
SQLite + sqlite-vec +
FTS5 backend. Owns the on-disk index format, the refinery migrations
under migrations/V*.sql, and the implementations of every storage
trait the core declares. See Storage schema.
ohara-embed
Local embedding + cross-encoder reranker via
fastembed-rs. Wraps the
BGE-small embedding model (~80 MB, 384-dim) and bge-reranker-base
(~110 MB) behind the EmbeddingProvider and RerankProvider traits.
First call downloads the model; subsequent calls hit the local cache.
ohara-git
git2-backed implementations of CommitSource, BlameSource, and
CommitsBehind. The only crate (besides the CLI’s repo-discovery
helper) that opens a real git repo. Walks history, extracts hunks,
runs blame_file.
ohara-parse
tree-sitter extractors for Rust, Python, Java, and Kotlin.
Implements SymbolSource for HEAD-snapshot symbol extraction and
applies the AST sibling-merge chunker (Plan 3 / Track C) to keep
chunks under a 500-token budget. See
Language support.
ohara-cli
The ohara binary. One subcommand per file under
src/commands/: init, index,
query, explain,
status, update. Each
command builds the same core/storage/embed/git/parse stack.
ohara-mcp
The ohara-mcp binary. Hosts the OharaService with the
find_pattern and explain_change tools (see
MCP tool reference) and serves them over
stdio via rmcp.
tests/perf
Out-of-band perf harness — a workspace member but not a published
crate. Used to capture the v0.6 baseline numbers (see
docs/perf/v0.6-baseline.md) and to A/B retrieval-quality tweaks
without polluting the main crates’ [dev-dependencies].
Reading order
If you’re new to the codebase, follow the data flow:
- Indexing & abort-resume — what
ohara indexdoes end-to-end. - Storage schema — what lands in SQLite.
- Retrieval pipeline — how a query becomes a ranked list of hits.
- Language support — how symbols get extracted.