Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ohara index

Walk a repo’s git history, embed every commit’s diff hunks, and extract HEAD-snapshot symbols into the local SQLite index. Idempotent and abort-safe — see Indexing & abort-resume for the full state machine.

Usage

ohara index [PATH] [--incremental] [--force] [--rebuild --yes] \
            [--commit-batch N] [--threads N] [--no-progress] \
            [--profile] [--embed-provider {auto,cpu,coreml,cuda}] \
            [--resources {auto,conservative,aggressive}]
FlagDefaultDescription
PATH (positional).Path to the repo.
--incrementaloffSkip the indexer (and embedder init) when the storage watermark already points at HEAD. Used by the post-commit hook to make no-op re-indexes nearly free.
--forceoffClear existing HEAD symbol rows and re-extract from scratch. Used after upgrades that change the AST chunker. Wins over --incremental if both are set; commit/hunk history is untouched.
--rebuildoffDestructive. Delete the entire index for this repo and rebuild from scratch. Stronger than --force (which only refreshes HEAD-symbol rows). Used when ohara status reports compatibility: needs rebuild (the binary’s embedder dimension or model differs from what the index was built with). Requires --yes to confirm; conflicts with --incremental and --force.
--yesoffConfirm a destructive operation. Currently only valid alongside --rebuild.
--commit-batchfrom --resourcesCommits per storage transaction. Smaller = less peak RAM and more frequent fsyncs; larger = faster but uses more memory. When unset, --resources picks a value from host core count.
--threadsfrom --resourcesCap the embedder’s ONNX runtime to this many threads (0 = let ort decide, typically CPU count). When unset, --resources picks a value from host core count.
--no-progressoffDisable the progress bar even when stderr is a TTY. Structured tracing::info! events still fire every 100 commits.
--profileoffEmit a single-line JSON PhaseTimings blob on stdout after the run finishes (per-phase wall time + hunk-text inflation). Used by the v0.6 throughput baseline.
--embed-providerfrom --resourcesONNX execution provider for the embedder: auto (default — CoreML on Apple silicon, CUDA when CUDA_VISIBLE_DEVICES is set, else CPU), cpu, coreml, or cuda. CoreML / CUDA require a feature-flagged build; see Install → hardware acceleration.
--resourcesautoResource intensity policy. auto picks --commit-batch / --threads / --embed-provider from host core count. conservative halves the picked batch + thread count; aggressive doubles them. Explicit flags always override the picked plan.

Examples

First-time index of the current repo:

ohara index

Hook-style re-index — fast no-op when HEAD is already indexed:

ohara index --incremental

Force a HEAD-symbol rebuild after upgrading to a new ohara that changed the chunker:

ohara index --force

Full rebuild after an embedder/dimension change (status says needs rebuild):

ohara index --rebuild --yes

Cap embedder threads on a shared box, larger batches for speed:

ohara index --threads 4 --commit-batch 1024

Capture per-phase timings for performance work:

ohara index --profile | tail -1 | jq .

Run the indexer with hardware acceleration on Apple silicon (requires a --features coreml build):

ohara index --embed-provider coreml

Trade off resource intensity against the rest of the box — conservative halves batch + threads, aggressive doubles them:

ohara index --resources conservative
ohara index --resources aggressive --commit-batch 1024   # explicit flag still wins

Output

A summary line on stdout:

indexed: 132 new commits, 487 hunks, 1204 HEAD symbols

Plus structured tracing events on stderr (drive verbosity with RUST_LOG, e.g. RUST_LOG=info). With --profile, a JSON line follows the summary:

{"commit_walk_ms":42,"diff_extract_ms":318,"tree_sitter_parse_ms":0,"embed_ms":1820,"storage_write_ms":210,"fts_insert_ms":0,"head_symbols_ms":540,"total_diff_bytes":482312,"total_added_lines":1842}

Resume safety

Killed mid-walk? The watermark advances every 100 commits inside the indexer. Worst case on resume is re-doing ~100 commits — put_hunks clears any previously-written hunks for those SHAs first, so duplicates never accumulate. See Indexing & abort-resume.