PQ: tighten dim contract; right-size scratch buffer#1044
Conversation
Follow-up to @hildebrandmw's review on #960. - `PQScratch::rotated_query` is sized by `PQData::get_dim()` (PQ logical dim) instead of `graph_header.metadata().dims` (slot byte count, exceeds logical dim for `MinMaxElement` due to trailing min/max metadata). - PQ entries take `&[f32]`, accept `len >= dim`, slice `[..dim]`. Callers decode via `VectorRepr::as_f32` once at the boundary; PQ subtree is f32-only internally. - Kernels (`preprocess_query`, `populate_chunk_distances_impl`, `direct_distance_impl`) `debug_assert_eq!` on entry, matching `pq_dist_lookup_single`. The two `_impl` helpers become private. - `DirectCosine::populate` uses `copy_from_slice` (the previous zip silently truncated, no longer applicable). - Drop redundant `Copy` and `U: Into<f32>` bounds on touched fns.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1044 +/- ##
==========================================
- Coverage 90.60% 89.48% -1.13%
==========================================
Files 461 461
Lines 85559 85742 +183
==========================================
- Hits 77525 76723 -802
- Misses 8034 9019 +985
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
PR moved entries to &[f32] so test_X_inner helpers no longer need to generate per-T data — drop the type parameter, generate Vec<f32> directly. Removes turbofish at all call sites and the rstest values parameterization.
There was a problem hiding this comment.
Pull request overview
This PR tightens and clarifies the PQ query-dimension contract by moving conversion/validation to boundary APIs, making PQ internals f32-only, and sizing scratch buffers to the PQ table’s logical dimension (rather than alignment-padded lengths). It also hides previously public PQ internals that were effectively low-level/FFI-oriented.
Changes:
- Switch PQ “entry points” (e.g.,
QueryComputer/MultiQueryComputer) to accept&[f32], validatelen >= dim, and internally slice to[..dim]. - Add debug-only dimension assertions inside inner kernels and right-size PQ scratch/query buffers to the PQ table’s logical dimension.
- Remove public exposure of some PQ internals (
direct_distance_impl,populate_chunk_distances_impl,inner_product_raw) and adjust callers/tests accordingly.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-providers/src/model/pq/mod.rs | Stops re-exporting direct_distance_impl from the PQ module surface. |
| diskann-providers/src/model/pq/fixed_chunk_pq_table.rs | Makes several PQ internals private and adds debug-only dim assertions in kernels. |
| diskann-providers/src/model/pq/distance/test_utils.rs | Updates PQ distance test helpers to generate/use f32 queries directly. |
| diskann-providers/src/model/pq/distance/multi.rs | Updates MultiQueryComputer::new to take &[f32] and adjusts tests. |
| diskann-providers/src/model/pq/distance/l2.rs | Changes TableL2 constructor/populate path to accept &[f32]; updates tests. |
| diskann-providers/src/model/pq/distance/innerproduct.rs | Changes TableIP constructor/populate path to accept &[f32]; updates tests. |
| diskann-providers/src/model/pq/distance/dynamic.rs | Enforces boundary dim checks/slicing in QueryComputer::new; slices FP input in DistanceComputer. Adds tests for undersized query handling. |
| diskann-providers/src/model/pq/distance/cosine.rs | Changes cosine query handling to copy from &[f32]; updates tests. |
| diskann-providers/src/model/mod.rs | Removes re-export of direct_distance_impl from the top-level model API. |
| diskann-providers/src/model/graph/provider/async_/memory_quant_vector_provider.rs | Moves query decoding to the provider boundary via VectorRepr::as_f32. |
| diskann-providers/src/model/graph/provider/async_/fast_memory_quant_vector_provider.rs | Aligns query constraint to T: VectorRepr (dropping Copy). |
| diskann-providers/src/model/graph/provider/async_/experimental/multi_pq_async.rs | Decodes queries to f32 at the boundary and passes &[f32] into PQ code. |
| diskann-providers/src/model/graph/provider/async_/bf_tree/quant_vector_provider.rs | Aligns query constraint to T: VectorRepr (dropping Copy). |
| diskann-disk/src/storage/quant/pq/pq_dataset.rs | Adds PQData::get_dim() to expose PQ logical dimension for sizing buffers. |
| diskann-disk/src/search/provider/disk_provider.rs | Uses PQ logical dim and decodes query to f32 before PQScratch::set. |
| diskann-disk/src/search/pq/quantizer_preprocess.rs | Removes manual query slicing and relies on right-sized rotated_query. |
| diskann-disk/src/search/pq/pq_scratch.rs | Updates PQScratch::set to accept &[f32] and copy exactly dim elements into a right-sized buffer. |
Comments suppressed due to low confidence (1)
diskann-providers/src/model/pq/distance/multi.rs:377
MultiQueryComputer::newnow requires&[f32]rather than&[U: Into<f32> + Copy], which is a breaking change for downstreams passing non-f32 query types directly. If this is intended, consider a transition path (deprecated overload or helper) consistent with the new boundary decode pattern.
/// Construct a new `MultiQueryComputer` with the requested metric and query.
pub fn new(table: MultiTable<T, I>, metric: Metric, query: &[f32]) -> ANNResult<Self> {
let s = match table {
MultiTable::One { table, version } => Self::One {
computer: { QueryComputer::new(table, metric, query, None)? },
version,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Per #1044 review: callers passing oversized queries to inmem PQ entries was a bug worth surfacing. QueryComputer::new (and via delegation MultiQueryComputer::new) return Result::Err on mismatch; DistanceComputer::evaluate_similarity asserts equality since the trait method has no Result return. PQScratch::set kept on >= dim tolerance for now — disk-side surface.
…late-contract # Conflicts: # diskann-providers/src/model/mod.rs
…ratch OPQ is gone, so 'rotated query' is a misnomer — preprocess_query only subtracts the corpus centroid. Rename PQScratch.rotated_query to query_scratch, drop the stale 'Applies the OPQ transformation matrix' comment in TableL2::populate, update preprocess_query docstring, and sweep parameter names (rotated_query_vec → query).
# DiskANN v0.52.0 Release Notes ## Breaking Changes An AI generated, human reviewed list of changes is summarized below. ### `get_degree_stats` signature changed ([#998](#998)) `DiskANNIndex::get_degree_stats` now takes an explicit iterator of IDs instead of requiring the data provider to implement `IntoIterator`. ```rust // Before — provider had to impl IntoIterator index.get_degree_stats(&mut accessor)?; // After — caller supplies the ID iterator index.get_degree_stats(&mut accessor, id_iter)?; ``` ### PQ dimension contract tightened; entries now `&[f32]` only ([#1044](#1044)) With `AlignedBoxWithSlice` removed from the PQ path, the dimension handling has been refactored into a three-layer contract: | Layer | Where | Contract | |---|---|---| | **Boundary (inmem)** | `QueryComputer::new`, `MultiQueryComputer::new`, `DistanceComputer::evaluate_similarity` | `len == dim` (returns `Err` on mismatch) | | **Boundary (disk)** | `PQScratch::set` | `len >= dim`, slices to `[..dim]` | | **Internal** | `TableL2/IP/Cosine::{new, populate}` | Trusted — no re-validation | **Other changes:** - PQ table populate/distance methods now accept `&[f32]` instead of `<U: Into<f32>>`. Callers must pre-decode quantized vectors via `VectorRepr::as_f32`. - Generic trampoline impls (`&Vec<u8>`, `&&[u8]`) on `QueryComputer` / `DistanceComputer` have been removed. ### `calculate_chunk_offsets` relocated to `ChunkOffsets` constructors ([#976](#976)) The free functions `calculate_chunk_offsets` and `calculate_chunk_offsets_auto` have been moved into constructors on `ChunkOffsets` / `ChunkOffsetsView` in `diskann-quantization::views`. ```rust // Before let offsets = calculate_chunk_offsets(dim, num_chunks); // After (allocating) let offsets = ChunkOffsets::partition(dim, num_chunks)?; // After (zero-alloc, borrows caller-owned scratch) let view = ChunkOffsetsView::partition_into(dim, &mut scratch)?; ``` Additionally, `get_chunk_from_training_data` has been moved from public API. ### `CachingProvider` removed ([#1052](#1052)) The entire `diskann_providers::model::graph::provider::async_::caching` module has been deleted. **Why:** The `CachingProvider` was an experiment in transparent caching over `DataProvider`. In practice it required double monomorphization of the indexing code, didn't save integration work for bulk methods like `on_elements_unordered`/`distances_unordered`, and was complex to maintain. An internal user who …migrated off it removed ~1,000 lines of code, improved compile times by ~20%, and substantially reduced complexity. **Upgrade:** Manage caching directly in your `DataProvider` implementation. ## New Features ### AVX-512 4-bit distance kernels ([#1045](#1045)) Native V4 (AVX-512) specializations for 4-bit packed vector distance computations: - **`SquaredL2`** — 16 × `u32` lanes per iteration via `_mm512_madd_epi16`. - **`InnerProduct`** — AVX-512 VNNI (`_mm512_dpbusd_epi32`) over `u8x64` / `i8x64` operands. Previously, V4 hardware fell back to two AVX2 (V3) kernel invocations per 512-bit chunk. The native kernels double per-instruction throughput. No API changes — existing code benefits automatically on AVX-512 capable hardware. ## Merged PRs * Deprecate 32-bit targets by @suhasjs in #1022 * Add a fast path to `Map::prepare`. by @hildebrandmw in #1023 * Add boundary checks in gen_associated_data_from_range() by @Copilot in #847 * [deps] Don't pull `rayon` as a dependency of `diskann`. by @hildebrandmw in #1024 * Bump openssl from 0.10.78 to 0.10.79 by @dependabot[bot] in #1026 * Cleaning up test work and changing the get_degree_stats signature. by @JordanMaples in #998 * Reduce scalar-quantization benchmark monomorphization by @suri-kumkaran in #1041 * [diskann-vector] Support truly unaligned distances. by @hildebrandmw in #981 * rename spherical.json to graph index with spherical quantization by @harsha-simhadri in #1042 * [PQ Cleanup] Part 2: Consolidate `calculate_chunk_offsets*` by @arkrishn94 in #976 * PQ: tighten dim contract; right-size scratch buffer by @wuw92 in #1044 * Add v4 distance kernels (4-bit SquaredL2 / InnerProduct) by @m3hm3t in #1045 * Remove the Caching Provider by @hildebrandmw in #1052 ## New Contributors * @suhasjs made their first contribution in #1022 * @m3hm3t made their first contribution in #1045 **Full Changelog**: v0.51.0...v0.52.0 Co-authored-by: Mark Hildebrand <mhildebrand@microsoft.com>
Background
Historically, query buffers came from
AlignedBoxWithSlice, which silently rounded length up to a multiple of 8 for SIMD alignment. Downstream populate functions therefore had to acceptquery.len() >= diminstead ofquery.len() == dim— pre-PR comment inTableL2::populate:With #960 removing
AlignedBoxWithSlicefrom the PQ path, the subtree can refactor dim handling along the boundary convert / internal trust idiom.Three-layer dim contract
QueryComputer::new,MultiQueryComputer::new,DistanceComputer::evaluate_similaritylen == dimResult::Err/assert_eq!PQScratch::setlen >= dim, slice[..dim]Result::Erron undersizeTableL2/IP/Cosine::{new, populate}preprocess_query,populate_chunk_distances_impl,direct_distance_impldebug_assert_eq!Boundary methods take
&[f32]. Quantized inputs are decoded viaVectorRepr::as_f32once at the caller boundary; the PQ subtree is f32-only internally.Why entries take
&[f32]Into<f32>is per-element.MinMaxElement<8>is a single byte that can't decode without the full slice's trailing metadata — it cannot implementInto<f32>. Production callers supporting MinMax were therefore always pre-decoding viaVectorRepr::as_f32and passing&[f32]upstream; the previous<U: Into<f32>>generic on PQ entries was effectively orphan-only.