bf_tree migration away from diskann-providers#1020
Conversation
b94f4df to
44be934
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1020 +/- ##
=========================================
Coverage 89.51% 89.52%
=========================================
Files 461 476 +15
Lines 85920 90193 +4273
=========================================
+ Hits 76912 80742 +3830
- Misses 9008 9451 +443
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
ca413d4 to
1051152
Compare
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Migrates the async bf_tree provider away from diskann-providers by extracting it into a dedicated diskann-bf_tree-provider crate, while also modernizing quantization (PQ → spherical) and simplifying deletion semantics (hard deletes).
Changes:
- Extracts bf_tree + caching into
diskann-bf_tree-providerand updates workspace/CI accordingly. - Replaces PQ-based quant vector store with spherical quantization (
Poly<dyn Quantizer>) + updated serialization. - Simplifies delete handling by removing the delete-provider generic and switching to hard deletes.
Reviewed changes
Copilot reviewed 18 out of 20 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-providers/src/model/graph/provider/async_/table_delete_provider.rs | Makes delete-table operations public and removes bf_tree-specific bitmap (de)serialization helpers/tests. |
| diskann-providers/src/model/graph/provider/async_/mod.rs | Adjusts module exports, removes bf_tree/caching modules, re-exports postprocess utilities. |
| diskann-providers/src/model/graph/provider/async_/caching/utils.rs | Updates import path and minor formatting in tests. |
| diskann-providers/src/model/graph/provider/async_/caching/provider.rs | Refactors imports/bounds formatting; adds Future import. |
| diskann-providers/src/model/graph/provider/async_/caching/mod.rs | Removes caching module root from diskann-providers. |
| diskann-providers/src/model/graph/provider/async_/caching/example.rs | Updates test imports to use diskann_providers / local module paths. |
| diskann-providers/src/model/graph/provider/async_/caching/bf_cache.rs | Updates ConfigError import path and test formatting. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/vector_provider.rs | Adds hard-delete helper and adjusts TestCallCount assertions. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/quant_vector_provider.rs | Switches PQ → spherical quantization + new query/distance computer adapter + test updates. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/provider.rs | Removes delete-provider generic, implements hard delete, switches postprocess to CopyIds, updates quantizer serialization. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/neighbor_provider.rs | Updates TestCallCount import path. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/mod.rs | Switches re-exports to new bf_tree_provider module and removes QuantAccessor re-export. |
| diskann-providers/Cargo.toml | Removes bf_tree feature/deps from diskann-providers. |
| diskann-bf_tree-provider/src/lib.rs | Introduces new crate entry point with provider + caching modules. |
| diskann-bf_tree-provider/src/caching/mod.rs | Adds caching module root (example tests disabled with TODO). |
| diskann-bf_tree-provider/Cargo.toml | Adds new crate manifest + dependencies for bf_tree provider and spherical quantization. |
| Cargo.toml | Adds diskann-bf_tree-provider to workspace members. |
| .github/workflows/ci.yml | Removes bf_tree from explicitly tested feature list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1219bcc to
cbea923
Compare
|
@copilot would you mind giving this a review |
Reviewed the current branch at |
Add bf_tree insert benchmarks (Criterion + IAI)
Add pre-migration bf_tree provider benchmarks mirroring the existing
inmem insert benchmark. Both Criterion (wall-clock) and IAI-Callgrind
(instruction-count) benchmarks use the same sift-256 dataset and PQ
training as the inmem benchmarks for apples-to-apples comparison.
Gated behind #[cfg(feature = "bf_tree")] so existing benchmarks
continue to work without the feature flag.
Run with: cargo bench --bench bench_main --features virtual_storage,bf_tree
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add bf_tree search benchmarks for migration parity
Add search benchmarks (Criterion + IAI-Callgrind) alongside existing
insert benchmarks. Search exercises the read path through
BfTreeProvider (vector lookups, neighbor access, greedy search
traversal) which is critical for detecting regressions post-migration.
The search benchmark builds the index once during setup, then measures
10 KNN queries (k=10, l=20) against the populated index.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract bf_tree provider into diskann-bf_tree-provider crate
Move the bf_tree provider and caching modules from diskann-providers
into the dedicated diskann-bf_tree-provider crate. This makes the
bf-tree dependency explicit at the crate level rather than hidden
behind a feature flag.
Changes:
- Move bf_tree/ and caching/ modules to diskann-bf_tree-provider/src/
- Set up Cargo.toml with proper dependencies (bf-tree, diskann,
diskann-providers, diskann-utils, diskann-vector, etc.)
- Convert all crate:: imports to diskann_providers:: imports
- Bump visibility of postprocess module, AsDeletionCheck/DeletionCheck
traits, and TableDeleteProviderAsync methods to pub
- Remove bf_tree feature flag and optional deps from diskann-providers
- Remove cfg(feature = "bf_tree") gates from table_delete_provider.rs
- Rename provider/provider.rs to provider/bf_tree_provider.rs (clippy)
- Disable caching example.rs tests (cross-crate cfg(test) limitation)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move bf_tree benchmarks to diskann-bf_tree-provider
Migrate Criterion and IAI-Callgrind benchmark targets from
diskann-providers to the new diskann-bf_tree-provider crate.
Update imports to use the new crate's module paths. Remove
bf_tree benchmark remnants (cfg gates, modules) from
diskann-providers benches.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
cargo.lock got missed by last commit
Move delete bitmap serialization to diskann-bf_tree-provider
Move to_bytes/from_bytes logic out of TableDeleteProviderAsync (where
it was bf_tree-scoped) into a dedicated delete_bitmap_serde module in
the new crate. The reimplementation works through the public API and
produces byte-identical output (LE u32 words). Revert the methods in
diskann-providers to pub(crate) + #[cfg(test)].
Adds format-level compatibility tests including known fixtures and
padding bit rejection.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move DeletionCheck traits and RemoveDeletedIdsAndCopy to diskann::graph::glue
These are core to the inplace delete algorithm's search post-processing
and belong in the diskann crate alongside SearchPostProcess, CopyIds,
FilterStartPoints, and Pipeline. This removes the need to widen
diskann-providers' postprocess module to pub for external crates.
Also removes the now-dead to_bytes/from_bytes methods and their tests
from TableDeleteProviderAsync in diskann-providers, since that
serialization logic has been moved to diskann-bf_tree-provider.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use workspace_root() for benchmark test data paths
Replace env!("CARGO_MANIFEST_DIR")-relative paths with
diskann_utils::workspace_root() so benchmarks find test_data/
without requiring symlinks in each crate directory.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The bf_tree feature was removed from diskann-providers when the provider was extracted into its own crate (diskann-bf_tree-provider). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rewrite QuantVectorProvider to use Arc<Poly<dyn Quantizer>> - Add local HybridComputer and QuantQueryComputer wrapper - Update CreateQuantProvider impl for Poly<dyn Quantizer> - Update serialization to use Quantizer::serialize/try_deserialize - Rewrite all tests and doc examples for spherical quantizer - Update benchmarks to use SphericalQuantizer::train with ReciprocalMeanNorm - Move rand to regular dependencies (needed by doc tests) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Delete HybridAccessor, HybridComputer, and all Hybrid strategy impls (Search, PostProcess, Prune, Insert, MultiInsert, InplaceDelete) - Delete Rerank post-processor (only used by Hybrid) - Remove QuantAccessor (only constructed by Hybrid search path) - Move QuantQueryComputer from hybrid_computer.rs to quant_vector_provider.rs - Delete hybrid_computer.rs - Remove metric field from QuantVectorProvider (dead after Hybrid removal) - Add SaveWith/LoadWith serialization impls back (accidentally lost during D removal) - Fix test compilation: remove deleted_ids references, fix turbofish generics, insert vectors before status checks, skip hard-deleted IDs in verification loops - Update doc examples to remove NoDeletes/TableBasedDeletes from constructors - Clean up unused imports (distances::pq, Hybrid, DistanceFunction, etc.) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ree impl now uses hard deletion)
Restore postprocess.rs in diskann-providers with AsDeletionCheck, DeletionCheck, and RemoveDeletedIdsAndCopy. These were temporarily moved into diskann/src/graph/glue.rs during bf_tree extraction but belong alongside the inmem providers that use them. - Create diskann-providers/src/model/graph/provider/async_/postprocess.rs - Remove traits and struct from diskann/src/graph/glue.rs - Update all inmem provider imports to use local postprocess module - Update DeletionCheck impls on NoDeletes and TableDeleteProviderAsync Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Restore QuantAccessor with Opaque-based element types for spherical quantization compatibility - Add Quantized strategy impls: SearchStrategy, DefaultPostProcessor, PruneStrategy, InsertStrategy, MultiInsertStrategy, InplaceDeleteStrategy - Use PassThrough working set pattern (matching inmem spherical approach) - Add OwnedOpaque newtype for workingset::View (bf_tree copies vs zero-copy) - Add BuildDistanceComputer impl using UnwrapErr<DistanceComputer> - Remove #[cfg(test)] from get_vector_into and get_vector_sync Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Hybrid searches in quantized space (same as Quantized), then reranks final candidates using full-precision distances. Pruning stays in quantized space. This replaces the old HybridComputer with its 4 distance modes (full×full, quant×quant, cross-type) with a simpler search-quant → rerank-full → prune-quant pipeline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reverts accidental pub visibility on TableDeleteProviderAsync methods (is_deleted, delete, undelete, clear) and removes unused RemoveDeletedIdsAndCopy re-export. Nothing outside diskann-providers uses these, so widening the public API was unnecessary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix FullAccessor::on_elements_unordered to silently skip deleted entries - Fix QuantAccessor::on_elements_unordered to silently skip deleted entries - Add create_quant_index and create_full_precision_index test helpers - Add tests: quantized insert+search, multi_insert, delete+search - Add tests: hybrid insert+search, hybrid delete+search - Add tests: full precision insert+search, full precision delete+search - Add doc comments for OwnedOpaque, strategy descriptions, and BfTreeProvider Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rerank::post_process used .expect() on get_vector_sync, which could panic if a concurrent delete removed a vector between the quantized search phase and the reranking phase. Replace with .filter_map() to silently skip unreadable entries, consistent with the pattern used in QuantAccessor::on_elements_unordered and FullAccessor. Caught during local code review. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0ad39b8 to
83e6231
Compare
Migrate bf_tree provider to dedicated crate and modernize
Motivation
The long-term goal is to remove
diskann-providersentirely. The bf_tree provider was one of several components keeping that crate alive — it depended ondiskann-providersfor generic delete infrastructure (DeletionCheck/AsDeletionCheck/RemoveDeletedIdsAndCopy) and PQ quantization types. This PR extracts bf_tree into a standalonediskann-bf_treecrate, cuts those dependencies, simplifies the generics, and switches from PQ to spherical quantization — all to reduce the surface area ofdiskann-providersand move closer to removing it.What Changed
1. Crate Extraction (
bf7d33fd)Moved the bf_tree provider from
diskann-providers/src/model/graph/provider/async_/bf_tree/into a standalonediskann-bf_treecrate. Updated workspaceCargo.tomland CI configuration.2. PQ → Spherical Quantization (
74149bbc)QuantVectorProvider: StoresArc<Poly<dyn Quantizer>>(spherical) instead ofArc<FixedChunkPQTable>.QuantQueryComputer: Newtype wrapper adapting spherical'sOpaque-based API toPreprocessedDistanceFunction<&[u8], f32>.Quantizer::serialize/iface::try_deserialize(flatbuffers format).CreateQuantProvider: Now implemented forPoly<dyn Quantizer>instead ofFixedChunkPQTable.3. Remove
D(Delete Provider) Parameter (14f504e8)BfTreeProvider<T, Q, D>→BfTreeProvider<T, Q>.delete(key)API replace bitmap-based soft deletes.delete()removes from all three trees (neighbors, full_vectors, quant_vectors).status_by_internal_id()checks vector presence (Ok = valid, Err = deleted).delete_bitmap_serde.rs,NoDeletes/TableBasedDeletesfrom constructors.4. Simplify to Core Strategies, Then Re-add (
02d355fb→c598d71d)First removed the old Hybrid/QuantAccessor/hybrid_computer code to start fresh, then re-added clean implementations:
QuantAccessor(e1ac1011): New accessor that reads quantized vectors from the bf_tree and usesQuantQueryComputerfor distance. Implements allQuantizedstrategy traits.Hybridstrategies withRerank(c598d71d): Re-introducedHybridas a struct withmax_fp_vecs_per_prune: Option<usize>.Rerankpost-processor re-ranks quantized search results using full-precision vectors. ImplementsSearchPostProcess,SearchPreprocess,InsertPreprocess, andExpandBeamforHybrid.benches/directory).5. Use
CopyIdsInstead ofRemoveDeletedIdsAndCopy(8a3b6742)Since bf_tree uses hard deletes, deleted IDs never appear in neighbor lists. The
AsDeletionCheck/DeletionCheck/RemoveDeletedIdsAndCopymachinery is unnecessary. Replaced withglue::CopyIdsand removed theAsDeletionCheckimpl fromFullAccessor.6. Move
DeletionCheckTraits Back todiskann-providers(38f1e614)These traits were temporarily moved into
diskann/src/graph/glue.rsduring the extraction as a stop-gap. Now that bf_tree no longer depends on them, they are restored to their original location indiskann-providers/src/model/graph/provider/async_/postprocess.rswithpub(crate)visibility.7. Scope
diskann-providersAPI Back topub(crate)(ee595e2c)Reverted methods on
TableDeleteProviderAsync(is_deleted,delete,undelete,clear) frompubback topub(crate). Removed unusedRemoveDeletedIdsAndCopyre-export.8. Tests and Delete-Panic Fix (
cbea923f)FullAccessor::on_elements_unorderedandQuantAccessor::on_elements_unorderednow silently skip deleted entries instead of propagating hard errors. bf_tree does hard deletes but graph edges still reference deleted nodes — when search traverses those edges,get_vector_intoreturns anANNError(always-escalate), which panicked inexpand_beam. The fix skips entries that fail to read.test_quantized_index_search,test_quantized_index_multi_insert_search,test_quantized_delete_and_search,test_hybrid_index_search,test_hybrid_delete_and_search,test_full_precision_index_search,test_full_precision_delete_and_search.OwnedOpaqueand indexing strategy descriptions onBfTreeProvider.Poly<dyn Quantizer>instead ofFixedChunkPQTable.Dparameter removed, noNoDeletes/TableBasedDeletesneeded.Net Impact
diskann-bf_treeis now a self-contained crate with two generic parameters (T,Q) instead of threeFullPrecision,Quantized,Hybrid(withRerankpost-processor)DeletionCheck/AsDeletionCheck/RemoveDeletedIdsAndCopydiskann-providersAPI surface reduced (methods scoped back topub(crate))