refactor(mem_wal): redesign FTS mem index for single-writer multi-reader#6726
Open
touch-of-grey wants to merge 5 commits into
Open
refactor(mem_wal): redesign FTS mem index for single-writer multi-reader#6726touch-of-grey wants to merge 5 commits into
touch-of-grey wants to merge 5 commits into
Conversation
Replace the SkipMap<(token, row_position)> postings layout with per-term ArcSwap<TermSlice> slices and an ArcSwap<Snapshot>-published per-batch visibility watermark. Readers grab a snapshot, walk per-term chunks filtered by chunk.batch_position < snapshot.visible_count, and score with snapshot-coupled BM25 stats. Writer publishes the snapshot only after every term chunk for a batch is linked, so readers never observe a partial document or BM25 stats out of sync with the postings. Also: tokenize-time tokenizer ownership moves out of a shared Mutex into a tokenizer pool plus a writer-dedicated slot, so search calls do not serialize against the writer; add Utf8View to the supported text types; add memory_usage() for size-based flush triggers; reuse lance-index InvertedIndex's TokenSet/DocSet/PostingListBuilder for the flush path.
Boolean and Boost queries called search_query recursively for each sub-clause, and every leaf (search / search_phrase / search_fuzzy) took its own self.snapshot.load_full(). A writer publishing between sub-queries left the compound result mixing BM25 stats from different snapshots — n, avgdl, and df disagreed across leaves of the same query, so the summed score wasn't valid for any point in time. Snapshot once at search_query / search_with_options and pass the Arc<Snapshot> through every leaf and every recursive call. Public search* entry points keep their signatures and snapshot internally. Also: filter entry_count() by visibility; correct stale doc-comments on Snapshot::batches, TermChunk::positions, and Snapshot::batch_for; hoist test-body 'use std::sync::Arc;' to the test module's top.
New rust/lance/benches/mem_wal_fineweb_fts.rs covers three metrics across the 12 configs in the design doc: - write throughput at memtable sizes 100k / 500k / 1M - MemTable FTS query latency (avg/p50/p95) over 100 high-frequency tokens + 50 sampled phrases - consistency: |memtable_top10 ∩ post_flush_disk_top10| / |union| as a user-approved replacement for recall@k The bench downloads HuggingFaceFW/fineweb sample/10BT shards, caches them, and is fully env-driven so a single binary handles every config. Driver script bench/run_fineweb_fts.sh loops the 12 configs, uploads each result.json to S3, and prints a summary. Also: make `dataset::mem_wal::index` public so the bench can call `FtsMemIndex::search_with_options` directly to time the MemTable read path.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
The previous spin on max_indexed_batch_position never terminated when max_memtable_rows triggers auto-flushes during ingest: the counter is reset on each new active memtable, so target_batch_pos = total_batches - 1 is unreachable from the active generation. Close the writer instead — it drains the final WAL flush and any outstanding memtable flush; the inline sync_indexed_write covers the per-put index updates. close() time is included in the measured elapsed so configs with different flush cadences are compared apples-to-apples.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the
SkipMap<(token, row_position)>postings layout inrust/lance/src/dataset/mem_wal/index/fts.rswith per-termArcSwap<TermSlice>slices and anArcSwap<Snapshot>-publishedper-batch visibility watermark. Readers grab the snapshot, walk per-term
chunks filtered by
chunk.batch_position < snapshot.visible_count, andscore with snapshot-coupled BM25 stats.
The writer publishes the snapshot only after every term chunk for a
batch is linked, so readers never observe a partial document or BM25
stats out of sync with the postings — per-batch monotonic visibility.
Also in this PR:
Mutexinto a small reader pool plus a writer-dedicated slot, so search
calls no longer serialize against the writer.
Utf8Viewis added to the supported text types alongsideUtf8/LargeUtf8; non-string columns now produce a clear error instead ofbeing silently skipped.
FtsMemIndex::memory_usage()for size-based flush triggers.Boolean,Boost) snapshot once atsearch_queryand thread the same
Arc<Snapshot>through every leaf, so acompound query never mixes BM25 stats from different snapshots.
lance_index::scalar::invertedtypes(
TokenSet/DocSet/PostingListBuilder); the on-disk format isunchanged. The stale "flush is not yet implemented" header doc is
corrected.
Public API (
FtsMemIndexconstructors,FtsQueryExpr,SearchOptions,BooleanQueryBuilder,FtsIndexConfig,to_index_builder_reversed)is preserved. No callers of the removed internal
FtsKey/PostingValuetypes existed outside the file.cc @jackye1995 for review.