Skip to content

Fix(geotransolver): fail fast when include_local_features=True without geometry#1766

Open
ShubhJain007 wants to merge 1 commit into
NVIDIA:mainfrom
ShubhJain007:fix/geotransolver-local-features-guard
Open

Fix(geotransolver): fail fast when include_local_features=True without geometry#1766
ShubhJain007 wants to merge 1 commit into
NVIDIA:mainfrom
ShubhJain007:fix/geotransolver-local-features-guard

Conversation

@ShubhJain007

@ShubhJain007 ShubhJain007 commented Jun 30, 2026

Copy link
Copy Markdown

PR: GeoTransolver — fail fast when include_local_features=True without geometry

Repo: NVIDIA/physicsnemo
Area: physicsnemo/experimental/models/geotransolver/
Related: issue #1606 (also an input-validation gap on the GeoTransolver local path; this PR is a distinct fix — missing-geometry guard, not the ball_query positions-dim check).

Problem

With include_local_features=True, the GALE blocks are sized for
effective_hidden = n_hidden + n_hidden_local * len(radii), but the extra
(local multi-scale) channels are only produced when a geometry tensor is
supplied. If geometry_dim/geometry is omitted, the concat is silently skipped
and the token stream stays at n_hidden, producing a confusing downstream error:

RuntimeError: Given normalized_shape=[192], expected input with shape [*, 192],
but got input of size [1, N, 128]

Fix

Fail fast with an actionable message, in two places:

  1. Constructor (GeoTransolver.__init__): include_local_features=True requires
    geometry_dim to be set.
  2. GlobalContextBuilder.build_context: if local extractors are active, geometry
    must be passed at forward time.

( 2 small guards, no behavior change for valid usage.)

…t geometry

GeoTransolver sizes its GALE blocks for
effective_hidden = n_hidden + n_hidden_local * len(radii) whenever
include_local_features=True, but the local multi-scale features are only
produced when a geometry tensor is supplied. Omitting geometry/geometry_dim
silently skipped the concat and surfaced as a confusing downstream
LayerNorm shape mismatch (e.g. normalized_shape=[192] vs input 128).

Add two fail-fast guards with actionable messages:
  - GeoTransolver.__init__: include_local_features=True requires geometry_dim.
  - GlobalContextBuilder.build_context: geometry is required at forward time
    when local features are enabled.

Add a regression test covering both guards and the working path.

Signed-off-by: Shubh Jain <shubhjain10102003@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 30, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps

greptile-apps Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds two fail-fast input-validation guards to GeoTransolver to surface a previously silent misconfiguration — using include_local_features=True without providing the required geometry inputs — as an actionable ValueError rather than a cryptic downstream LayerNorm shape mismatch.

  • geotransolver.py: Constructor-time guard raises immediately when include_local_features=True but geometry_dim is None, before any sub-module is constructed.
  • context_projector.py: Forward-time guard in GlobalContextBuilder.build_context raises when local_extractors are active but geometry is None at call time, with the and geometry is not None check on the extraction block now technically redundant.
  • test_geotransolver.py: Regression test covers all three scenarios — construction without geometry_dim, forward without geometry, and the valid path.

Important Files Changed

Filename Overview
physicsnemo/experimental/models/geotransolver/context_projector.py Adds a fail-fast guard in GlobalContextBuilder.build_context: raises ValueError when local_extractors is active but geometry is None, preventing a silent skip of the local-feature concat. The subsequent extraction condition retains a now-redundant and geometry is not None check that can be cleaned up.
physicsnemo/experimental/models/geotransolver/geotransolver.py Adds a construction-time guard: raises ValueError when include_local_features=True but geometry_dim is None, preventing a confusing downstream LayerNorm shape mismatch. Guard is correctly placed before the n_hidden % n_head check and before any sub-module construction.
test/models/geotransolver/test_geotransolver.py Adds a regression test covering all three paths: (A) ValueError at construction with missing geometry_dim, (B) ValueError at forward with missing geometry, and (C) successful forward when geometry is provided. Missing @requires_module("warp") but the torch fallback in radius_search makes this safe.

Reviews (1): Last reviewed commit: "Fix(geotransolver): fail fast when inclu..." | Re-trigger Greptile

Comment on lines 1039 to 1040
# Extract multi-scale features if enabled
if self.local_extractors is not None and geometry is not None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The and geometry is not None guard in this condition is now logically redundant. The new guard added above already raises ValueError whenever self.local_extractors is not None and geometry is None, so by the time execution reaches this line, geometry is guaranteed to be non-None whenever local_extractors is set. Simplifying the condition removes the dead branch and makes the intent clearer.

Suggested change
# Extract multi-scale features if enabled
if self.local_extractors is not None and geometry is not None:
# Extract multi-scale features if enabled
if self.local_extractors is not None:

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

assert not torch.isnan(compiled_out).any()


def test_geotransolver_local_features_requires_geometry(device):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The similar test test_geotransolver_local_features_compile directly above is decorated with @requires_module("warp"). The new test omits that decorator, which is fine for cases (A) and (B) since they raise before any BQWarp call, and radius_search does have a pure-PyTorch fallback (rank=1, baseline). However, if CI enforces the same guard for all local-features tests for consistency, adding the decorator (or a comment explaining the intentional omission) would prevent future confusion.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant