Skip to content

Support inner-product distance metric in IVF-RaBitQ#2291

Draft
jamxia155 wants to merge 2 commits into
NVIDIA:mainfrom
jamxia155:ivf-rabitq-ip-distance
Draft

Support inner-product distance metric in IVF-RaBitQ#2291
jamxia155 wants to merge 2 commits into
NVIDIA:mainfrom
jamxia155:ivf-rabitq-ip-distance

Conversation

@jamxia155

Copy link
Copy Markdown
Contributor

IVF-RaBitQ previously supported L2 only. This PR shall add InnerProduct support. Currently WIP, only enabled in the bitwise (QUANT4/QUANT8) search paths. LUT16 and LUT32 search modes are tracked as a TODO item.

Approach: rather than re-deriving the tuned RaBitQ per-vector factors, the existing squared-L2 estimator is reused as-is and the inner product is recovered via the identity <q,x> = (||q||^2 + ||x||^2 - ||q-x||^2) / 2. The kernels emit the negated inner product (a "pseudo-distance"), so the existing min-selection, block-sort queue, and per-query threshold are reused unchanged; the result is negated after select_k. Probe selection for InnerProduct picks clusters by argmax <q,c> while the centroid-distance buffer still carries ||q-c||^2 for the estimator (mirroring ivf_pq); k-means clustering stays L2.

The signed-distance behavior (the pseudo-distance transform plus a sign-aware atomic threshold min) is selected at compile time via a new Signed axis on the bitwise JIT-LTO fragments, so the L2 path's codegen is unchanged and only the small entrypoint/emit fragments fan out.

Details:

  • metric plumbed through index_params -> index -> IVFGPU (with accessor); build validates {L2Expanded, InnerProduct}; LUT search modes reject IP.
  • per-vector ||x||^2 stored (InnerProduct only) in cluster-permuted order, computed in the quantizer and passed to the search kernels.
  • serialization gains a leading version field and the metric enum, plus the per-vector norm blob for InnerProduct (clean break, validated on load).
  • tests: var_metric() adds InnerProduct cases over QUANT4/QUANT8 x block-sort/non-block-sort x with_ex/no_ex under a dedicated IvfRabitqInnerProduct instantiation; a signed_data flag feeds mean-zero data to those cases so inner products take both signs and exercise the sign-aware atomic threshold. Existing cases are unchanged.

TODO:

  • LUT16 and LUT32 search modes do not yet support InnerProduct; only the bitwise QUANT4/QUANT8 path is implemented, and search() rejects IP for the LUT modes. Extending them means adding the same compile-time Signed axis to the LUT emit fragments.

jamxia155 added 2 commits July 2, 2026 06:59
IVF-RaBitQ previously supported L2 only. This adds InnerProduct support,
initially only for the bitwise (QUANT4/QUANT8) search paths. Support for LUT16
and LUT32 search modes are a natural follow-up.

Approach: rather than re-deriving the tuned RaBitQ per-vector factors, the
existing squared-L2 estimator is reused as-is and the inner product is
recovered via the identity <q,x> = (||q||^2 + ||x||^2 - ||q-x||^2) / 2. The
kernels emit the negated inner product (a "pseudo-distance"), so the existing
min-selection, block-sort queue, and per-query threshold are reused unchanged;
the result is negated after select_k. Probe selection for InnerProduct picks
clusters by argmax <q,c> while the centroid-distance buffer still carries
||q-c||^2 for the estimator (mirroring ivf_pq); k-means clustering stays L2.

The signed-distance behavior (the pseudo-distance transform plus a sign-aware
atomic threshold min) is selected at compile time via a new Signed axis on the
bitwise JIT-LTO fragments, so the L2 path's codegen is unchanged and only the
small entrypoint/emit fragments fan out.

Details:
- metric plumbed through index_params -> index<IdxT> -> IVFGPU (with accessor);
  build validates {L2Expanded, InnerProduct}; LUT search modes reject IP.
- per-vector ||x||^2 stored (InnerProduct only) in cluster-permuted order,
  computed in the quantizer and passed to the search kernels.
- serialization gains a leading version field and the metric enum, plus the
  per-vector norm blob for InnerProduct (clean break, validated on load).
- tests: var_metric() adds InnerProduct cases over QUANT4/QUANT8 x
  block-sort/non-block-sort x with_ex/no_ex under a dedicated
  IvfRabitqInnerProduct instantiation; a signed_data flag feeds mean-zero data
  to those cases so inner products take both signs and exercise the sign-aware
  atomic threshold. Existing cases are unchanged.

Follow-up (not in this change):
- LUT16 and LUT32 search modes do not yet support InnerProduct; only the
  bitwise QUANT4/QUANT8 path is implemented, and search() rejects IP for the
  LUT modes. Extending them means adding the same compile-time Signed axis to
  the LUT emit fragments.
@copy-pr-bot

copy-pr-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@jamxia155 jamxia155 added feature request New feature or request non-breaking Introduces a non-breaking change C++ labels Jul 2, 2026
@jamxia155 jamxia155 moved this to In Progress in Unstructured Data Processing Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ feature request New feature or request non-breaking Introduces a non-breaking change

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant