Skip to content

PQ: tighten dim contract; right-size scratch buffer#1044

Merged
wuw92 merged 7 commits into
mainfrom
wewu2/tighten-pq-populate-contract
May 12, 2026
Merged

PQ: tighten dim contract; right-size scratch buffer#1044
wuw92 merged 7 commits into
mainfrom
wewu2/tighten-pq-populate-contract

Conversation

@wuw92
Copy link
Copy Markdown
Contributor

@wuw92 wuw92 commented May 8, 2026

Background

Historically, query buffers came from AlignedBoxWithSlice, which silently rounded length up to a multiple of 8 for SIMD alignment. Downstream populate functions therefore had to accept query.len() >= dim instead of query.len() == dim — pre-PR comment in TableL2::populate:

Alignment means that the size of query gets increased ...
This makes is VERY hard to do error checking on dimension propagation.

With #960 removing AlignedBoxWithSlice from the PQ path, the subtree can refactor dim handling along the boundary convert / internal trust idiom.

Three-layer dim contract

Layer Where Action Failure
Boundary (inmem) QueryComputer::new, MultiQueryComputer::new, DistanceComputer::evaluate_similarity Validate len == dim Result::Err / assert_eq!
Boundary (disk) PQScratch::set Validate len >= dim, slice [..dim] Result::Err on undersize
Internal TableL2/IP/Cosine::{new, populate} Trusted, no re-validation
Inner kernel preprocess_query, populate_chunk_distances_impl, direct_distance_impl Contract boundary debug_assert_eq! dev/CI panic, release zero-cost; also fail-loud for direct OSS callers

Boundary methods take &[f32]. Quantized inputs are decoded via VectorRepr::as_f32 once at the caller boundary; the PQ subtree is f32-only internally.

Why entries take &[f32]

Into<f32> is per-element. MinMaxElement<8> is a single byte that can't decode without the full slice's trailing metadata — it cannot implement Into<f32>. Production callers supporting MinMax were therefore always pre-decoding via VectorRepr::as_f32 and passing &[f32] upstream; the previous <U: Into<f32>> generic on PQ entries was effectively orphan-only.

Follow-up to @hildebrandmw's review on #960.

- `PQScratch::rotated_query` is sized by `PQData::get_dim()` (PQ
  logical dim) instead of `graph_header.metadata().dims` (slot byte
  count, exceeds logical dim for `MinMaxElement` due to trailing
  min/max metadata).
- PQ entries take `&[f32]`, accept `len >= dim`, slice `[..dim]`.
  Callers decode via `VectorRepr::as_f32` once at the boundary;
  PQ subtree is f32-only internally.
- Kernels (`preprocess_query`, `populate_chunk_distances_impl`,
  `direct_distance_impl`) `debug_assert_eq!` on entry, matching
  `pq_dist_lookup_single`. The two `_impl` helpers become private.
- `DirectCosine::populate` uses `copy_from_slice` (the previous zip
  silently truncated, no longer applicable).
- Drop redundant `Copy` and `U: Into<f32>` bounds on touched fns.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 8, 2026

Codecov Report

❌ Patch coverage is 99.30070% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 89.48%. Comparing base (d516da1) to head (6b8ade0).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
diskann-disk/src/search/pq/quantizer_preprocess.rs 80.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1044      +/-   ##
==========================================
- Coverage   90.60%   89.48%   -1.13%     
==========================================
  Files         461      461              
  Lines       85559    85742     +183     
==========================================
- Hits        77525    76723     -802     
- Misses       8034     9019     +985     
Flag Coverage Δ
miri 89.48% <99.30%> (-1.13%) ⬇️
unittests 89.32% <99.30%> (-1.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-disk/src/search/pq/pq_scratch.rs 89.18% <100.00%> (+9.64%) ⬆️
diskann-disk/src/search/provider/disk_provider.rs 90.88% <100.00%> (-0.01%) ⬇️
diskann-disk/src/storage/quant/pq/pq_dataset.rs 97.33% <100.00%> (+0.19%) ⬆️
...aph/provider/async_/experimental/multi_pq_async.rs 96.40% <100.00%> (ø)
...ovider/async_/fast_memory_quant_vector_provider.rs 98.46% <100.00%> (ø)
...ph/provider/async_/memory_quant_vector_provider.rs 98.26% <100.00%> (ø)
diskann-providers/src/model/pq/distance/cosine.rs 98.76% <100.00%> (-1.24%) ⬇️
diskann-providers/src/model/pq/distance/dynamic.rs 96.81% <100.00%> (+2.70%) ⬆️
...nn-providers/src/model/pq/distance/innerproduct.rs 100.00% <100.00%> (ø)
diskann-providers/src/model/pq/distance/l2.rs 100.00% <100.00%> (ø)
... and 4 more

... and 39 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wuw92 wuw92 changed the title PQ: size scratch by logical dim, formalize dim contract PQ: tighten dim contract; right-size scratch buffer May 8, 2026
PR moved entries to &[f32] so test_X_inner helpers no longer need to
generate per-T data — drop the type parameter, generate Vec<f32>
directly. Removes turbofish at all call sites and the rstest values
parameterization.
@wuw92 wuw92 marked this pull request as ready for review May 8, 2026 08:06
@wuw92 wuw92 requested review from a team and Copilot May 8, 2026 08:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens and clarifies the PQ query-dimension contract by moving conversion/validation to boundary APIs, making PQ internals f32-only, and sizing scratch buffers to the PQ table’s logical dimension (rather than alignment-padded lengths). It also hides previously public PQ internals that were effectively low-level/FFI-oriented.

Changes:

  • Switch PQ “entry points” (e.g., QueryComputer / MultiQueryComputer) to accept &[f32], validate len >= dim, and internally slice to [..dim].
  • Add debug-only dimension assertions inside inner kernels and right-size PQ scratch/query buffers to the PQ table’s logical dimension.
  • Remove public exposure of some PQ internals (direct_distance_impl, populate_chunk_distances_impl, inner_product_raw) and adjust callers/tests accordingly.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
diskann-providers/src/model/pq/mod.rs Stops re-exporting direct_distance_impl from the PQ module surface.
diskann-providers/src/model/pq/fixed_chunk_pq_table.rs Makes several PQ internals private and adds debug-only dim assertions in kernels.
diskann-providers/src/model/pq/distance/test_utils.rs Updates PQ distance test helpers to generate/use f32 queries directly.
diskann-providers/src/model/pq/distance/multi.rs Updates MultiQueryComputer::new to take &[f32] and adjusts tests.
diskann-providers/src/model/pq/distance/l2.rs Changes TableL2 constructor/populate path to accept &[f32]; updates tests.
diskann-providers/src/model/pq/distance/innerproduct.rs Changes TableIP constructor/populate path to accept &[f32]; updates tests.
diskann-providers/src/model/pq/distance/dynamic.rs Enforces boundary dim checks/slicing in QueryComputer::new; slices FP input in DistanceComputer. Adds tests for undersized query handling.
diskann-providers/src/model/pq/distance/cosine.rs Changes cosine query handling to copy from &[f32]; updates tests.
diskann-providers/src/model/mod.rs Removes re-export of direct_distance_impl from the top-level model API.
diskann-providers/src/model/graph/provider/async_/memory_quant_vector_provider.rs Moves query decoding to the provider boundary via VectorRepr::as_f32.
diskann-providers/src/model/graph/provider/async_/fast_memory_quant_vector_provider.rs Aligns query constraint to T: VectorRepr (dropping Copy).
diskann-providers/src/model/graph/provider/async_/experimental/multi_pq_async.rs Decodes queries to f32 at the boundary and passes &[f32] into PQ code.
diskann-providers/src/model/graph/provider/async_/bf_tree/quant_vector_provider.rs Aligns query constraint to T: VectorRepr (dropping Copy).
diskann-disk/src/storage/quant/pq/pq_dataset.rs Adds PQData::get_dim() to expose PQ logical dimension for sizing buffers.
diskann-disk/src/search/provider/disk_provider.rs Uses PQ logical dim and decodes query to f32 before PQScratch::set.
diskann-disk/src/search/pq/quantizer_preprocess.rs Removes manual query slicing and relies on right-sized rotated_query.
diskann-disk/src/search/pq/pq_scratch.rs Updates PQScratch::set to accept &[f32] and copy exactly dim elements into a right-sized buffer.
Comments suppressed due to low confidence (1)

diskann-providers/src/model/pq/distance/multi.rs:377

  • MultiQueryComputer::new now requires &[f32] rather than &[U: Into<f32> + Copy], which is a breaking change for downstreams passing non-f32 query types directly. If this is intended, consider a transition path (deprecated overload or helper) consistent with the new boundary decode pattern.
    /// Construct a new `MultiQueryComputer` with the requested metric and query.
    pub fn new(table: MultiTable<T, I>, metric: Metric, query: &[f32]) -> ANNResult<Self> {
        let s = match table {
            MultiTable::One { table, version } => Self::One {
                computer: { QueryComputer::new(table, metric, query, None)? },
                version,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-providers/src/model/mod.rs Outdated
Comment thread diskann-providers/src/model/pq/mod.rs
Comment thread diskann-providers/src/model/pq/fixed_chunk_pq_table.rs
Comment thread diskann-providers/src/model/pq/fixed_chunk_pq_table.rs
Comment thread diskann-providers/src/model/pq/fixed_chunk_pq_table.rs
Comment thread diskann-providers/src/model/pq/distance/dynamic.rs
Comment thread diskann-providers/src/model/pq/distance/dynamic.rs
Comment thread diskann-providers/src/model/pq/distance/dynamic.rs
Per #1044 review: callers passing oversized queries to inmem PQ
entries was a bug worth surfacing. QueryComputer::new (and via
delegation MultiQueryComputer::new) return Result::Err on mismatch;
DistanceComputer::evaluate_similarity asserts equality since the
trait method has no Result return.

PQScratch::set kept on >= dim tolerance for now — disk-side surface.
…late-contract

# Conflicts:
#	diskann-providers/src/model/mod.rs
Comment thread diskann-disk/src/search/pq/pq_scratch.rs Outdated
wuw92 added 2 commits May 12, 2026 15:30
…ratch

OPQ is gone, so 'rotated query' is a misnomer — preprocess_query only
subtracts the corpus centroid. Rename PQScratch.rotated_query to
query_scratch, drop the stale 'Applies the OPQ transformation matrix'
comment in TableL2::populate, update preprocess_query docstring, and
sweep parameter names (rotated_query_vec → query).
@wuw92 wuw92 merged commit 2c5775d into main May 12, 2026
26 checks passed
@wuw92 wuw92 deleted the wewu2/tighten-pq-populate-contract branch May 12, 2026 07:51
hildebrandmw added a commit that referenced this pull request May 12, 2026
# DiskANN v0.52.0 Release Notes

## Breaking Changes

An AI generated, human reviewed list of changes is summarized below.

### `get_degree_stats` signature changed
([#998](#998))

`DiskANNIndex::get_degree_stats` now takes an explicit iterator of IDs
instead of requiring the data provider to implement `IntoIterator`.

```rust
// Before — provider had to impl IntoIterator
index.get_degree_stats(&mut accessor)?;

// After — caller supplies the ID iterator
index.get_degree_stats(&mut accessor, id_iter)?;
```

### PQ dimension contract tightened; entries now `&[f32]` only
([#1044](#1044))

With `AlignedBoxWithSlice` removed from the PQ path, the dimension
handling has been refactored into a three-layer contract:

| Layer | Where | Contract |
|---|---|---|
| **Boundary (inmem)** | `QueryComputer::new`,
`MultiQueryComputer::new`, `DistanceComputer::evaluate_similarity` |
`len == dim` (returns `Err` on mismatch) |
| **Boundary (disk)** | `PQScratch::set` | `len >= dim`, slices to
`[..dim]` |
| **Internal** | `TableL2/IP/Cosine::{new, populate}` | Trusted — no
re-validation |

**Other changes:**
- PQ table populate/distance methods now accept `&[f32]` instead of `<U:
Into<f32>>`. Callers must pre-decode quantized vectors via
`VectorRepr::as_f32`.
- Generic trampoline impls (`&Vec<u8>`, `&&[u8]`) on `QueryComputer` /
`DistanceComputer` have been removed.
### `calculate_chunk_offsets` relocated to `ChunkOffsets` constructors
([#976](#976))

The free functions `calculate_chunk_offsets` and
`calculate_chunk_offsets_auto` have been moved into constructors on
`ChunkOffsets` / `ChunkOffsetsView` in `diskann-quantization::views`.

```rust
// Before
let offsets = calculate_chunk_offsets(dim, num_chunks);

// After (allocating)
let offsets = ChunkOffsets::partition(dim, num_chunks)?;

// After (zero-alloc, borrows caller-owned scratch)
let view = ChunkOffsetsView::partition_into(dim, &mut scratch)?;
```

Additionally, `get_chunk_from_training_data` has been moved from public
API.

### `CachingProvider` removed
([#1052](#1052))

The entire `diskann_providers::model::graph::provider::async_::caching`
module has been deleted.

**Why:** The `CachingProvider` was an experiment in transparent caching
over `DataProvider`. In practice it required double monomorphization of
the indexing code, didn't save integration work for bulk methods like
`on_elements_unordered`/`distances_unordered`, and was complex to
maintain. An internal user who …migrated off it removed ~1,000 lines of
code, improved compile times by ~20%, and substantially reduced
complexity.

**Upgrade:** Manage caching directly in your `DataProvider`
implementation.

## New Features

### AVX-512 4-bit distance kernels
([#1045](#1045))

Native V4 (AVX-512) specializations for 4-bit packed vector distance
computations:

- **`SquaredL2`** — 16 × `u32` lanes per iteration via
`_mm512_madd_epi16`.
- **`InnerProduct`** — AVX-512 VNNI (`_mm512_dpbusd_epi32`) over `u8x64`
/ `i8x64` operands.

Previously, V4 hardware fell back to two AVX2 (V3) kernel invocations
per 512-bit chunk. The native kernels double per-instruction throughput.
No API changes — existing code benefits automatically on AVX-512 capable
hardware.

## Merged PRs
* Deprecate 32-bit targets by @suhasjs in
#1022
* Add a fast path to `Map::prepare`. by @hildebrandmw in
#1023
* Add boundary checks in gen_associated_data_from_range() by @Copilot in
#847
* [deps] Don't pull `rayon` as a dependency of `diskann`. by
@hildebrandmw in #1024
* Bump openssl from 0.10.78 to 0.10.79 by @dependabot[bot] in
#1026
* Cleaning up test work and changing the get_degree_stats signature. by
@JordanMaples in #998
* Reduce scalar-quantization benchmark monomorphization by
@suri-kumkaran in #1041
* [diskann-vector] Support truly unaligned distances. by @hildebrandmw
in #981
* rename spherical.json to graph index with spherical quantization by
@harsha-simhadri in #1042
* [PQ Cleanup] Part 2: Consolidate `calculate_chunk_offsets*` by
@arkrishn94 in #976
* PQ: tighten dim contract; right-size scratch buffer by @wuw92 in
#1044
* Add v4 distance kernels (4-bit SquaredL2 / InnerProduct) by @m3hm3t in
#1045
* Remove the Caching Provider by @hildebrandmw in
#1052

## New Contributors
* @suhasjs made their first contribution in
#1022
* @m3hm3t made their first contribution in
#1045

**Full Changelog**:
v0.51.0...v0.52.0

Co-authored-by: Mark Hildebrand <mhildebrand@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants