Support inner-product distance metric in IVF-RaBitQ#2291
Draft
jamxia155 wants to merge 2 commits into
Draft
Conversation
IVF-RaBitQ previously supported L2 only. This adds InnerProduct support,
initially only for the bitwise (QUANT4/QUANT8) search paths. Support for LUT16
and LUT32 search modes are a natural follow-up.
Approach: rather than re-deriving the tuned RaBitQ per-vector factors, the
existing squared-L2 estimator is reused as-is and the inner product is
recovered via the identity <q,x> = (||q||^2 + ||x||^2 - ||q-x||^2) / 2. The
kernels emit the negated inner product (a "pseudo-distance"), so the existing
min-selection, block-sort queue, and per-query threshold are reused unchanged;
the result is negated after select_k. Probe selection for InnerProduct picks
clusters by argmax <q,c> while the centroid-distance buffer still carries
||q-c||^2 for the estimator (mirroring ivf_pq); k-means clustering stays L2.
The signed-distance behavior (the pseudo-distance transform plus a sign-aware
atomic threshold min) is selected at compile time via a new Signed axis on the
bitwise JIT-LTO fragments, so the L2 path's codegen is unchanged and only the
small entrypoint/emit fragments fan out.
Details:
- metric plumbed through index_params -> index<IdxT> -> IVFGPU (with accessor);
build validates {L2Expanded, InnerProduct}; LUT search modes reject IP.
- per-vector ||x||^2 stored (InnerProduct only) in cluster-permuted order,
computed in the quantizer and passed to the search kernels.
- serialization gains a leading version field and the metric enum, plus the
per-vector norm blob for InnerProduct (clean break, validated on load).
- tests: var_metric() adds InnerProduct cases over QUANT4/QUANT8 x
block-sort/non-block-sort x with_ex/no_ex under a dedicated
IvfRabitqInnerProduct instantiation; a signed_data flag feeds mean-zero data
to those cases so inner products take both signs and exercise the sign-aware
atomic threshold. Existing cases are unchanged.
Follow-up (not in this change):
- LUT16 and LUT32 search modes do not yet support InnerProduct; only the
bitwise QUANT4/QUANT8 path is implemented, and search() rejects IP for the
LUT modes. Extending them means adding the same compile-time Signed axis to
the LUT emit fragments.
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
IVF-RaBitQ previously supported L2 only. This PR shall add InnerProduct support. Currently WIP, only enabled in the bitwise (QUANT4/QUANT8) search paths. LUT16 and LUT32 search modes are tracked as a TODO item.
Approach: rather than re-deriving the tuned RaBitQ per-vector factors, the existing squared-L2 estimator is reused as-is and the inner product is recovered via the identity
<q,x> = (||q||^2 + ||x||^2 - ||q-x||^2) / 2. The kernels emit the negated inner product (a "pseudo-distance"), so the existing min-selection, block-sort queue, and per-query threshold are reused unchanged; the result is negated after select_k. Probe selection for InnerProduct picks clusters by argmax<q,c>while the centroid-distance buffer still carries||q-c||^2for the estimator (mirroring ivf_pq); k-means clustering stays L2.The signed-distance behavior (the pseudo-distance transform plus a sign-aware atomic threshold min) is selected at compile time via a new Signed axis on the bitwise JIT-LTO fragments, so the L2 path's codegen is unchanged and only the small entrypoint/emit fragments fan out.
Details:
||x||^2stored (InnerProduct only) in cluster-permuted order, computed in the quantizer and passed to the search kernels.TODO: