Add kw arg to normalize kernel in distance weights. #791

sjsrey · 2025-06-03T19:33:41Z

This is responding to the discussion in #790.

The default normalize=True preserves the current behavior of the Gaussian kernel weights (heavily used in spreg):
$$K(z) = \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2} z^2}$$

Setting normalize=False provides an option to use unnormalized Gaussian weights:
$$K(z) = e^{-\frac{1}{2} z^2}$$

codecov · 2025-06-03T19:37:19Z

Codecov Report

❌ Patch coverage is 98.55072% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 85.4%. Comparing base (44ede54) to head (8f8b704).
⚠️ Report is 28 commits behind head on main.

Files with missing lines	Patch %	Lines
libpysal/kernels.py	97.6%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##            main    #791     +/-   ##
=======================================
+ Coverage   85.4%   85.4%   +0.1%     
=======================================
  Files        150     151      +1     
  Lines      15971   15995     +24     
=======================================
+ Hits       13632   13661     +29     
+ Misses      2339    2334      -5

Files with missing lines	Coverage Δ
libpysal/graph/_kernel.py	`79.3% <100.0%> (-3.1%)`	⬇️
libpysal/graph/base.py	`96.5% <ø> (-0.2%)`	⬇️
libpysal/graph/tests/test_builders.py	`100.0% <100.0%> (ø)`
libpysal/graph/tests/test_kernel.py	`99.1% <100.0%> (+<0.1%)`	⬆️
libpysal/graph/tests/test_triangulation.py	`98.8% <ø> (ø)`
libpysal/weights/distance.py	`85.6% <100.0%> (+0.2%)`	⬆️
libpysal/weights/tests/test_distance.py	`100.0% <100.0%> (ø)`
libpysal/kernels.py	`97.6% <97.6%> (ø)`

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

martinfleis · 2025-06-03T19:38:18Z

Can we get it to Graph? And I suppose that it could be generalised to other than a Gaussian kernels as well.

sjsrey · 2025-06-25T15:08:07Z

Kernels in PySAL

This note collects issues, background, and proposals for refactoring the treatment of kernels in PySAL, especially in relation to bandwidth handling, standardization, and cross-module reuse.

Background

Conceptual Overview

In nonparametric statistics, kernels are generally used for smoothing. At a given focal point, a weighted average of surrounding observations is computed, with weights that decrease with distance from the focal point. The kernel function defines the distance decay pattern of these weights.

In spatial analysis, kernels serve multiple purposes, which can be classified into three conceptual cases:

Volume Standardization
Some applications require the kernel to be a proper probability density function (pdf)—i.e., it integrates to 1 over its support. We refer to this as volume_standardized (also called normalization).
Unnormalized Decay Weights
In other use cases, kernels define distance decay weights without requiring normalization. The kernel might be a proper pdf or not; sometimes the pdf’s normalization factor is omitted.
Standardized Peak Value
Some use cases require the kernel’s maximum value at zero distance to equal 1, i.e., $K(0) = 1$. We refer to this as zero_standardized.

Current Implementation

Location: libpysal/weights/distance.py

Notation

$h_i$: bandwidth at focal unit $i$
$z_{i,j} = d_{i,j}/h_i$
$d_{i,j}$: distance between points $i$ and $j$

Kernel Table

A diagonal boolean flag in current code sets the kernel value at $d = 0$ to 1 when True, but this does not scale the function at $d > 0$. Thus, diagonal=True implies:

volume_standardized = False
zero_standardized = True

Derivations: Do Kernels Integrate to 1?

Triangular

$$ \int_{-1}^{1} (1 - |z|) , dz = 2 \int_0^1 (1 - z) , dz = 2 \left[ z - \frac{z^2}{2} \right]_0^1 = 2 \left(1 - \frac{1}{2} \right) = 1 $$

✅ Integrates to 1

Uniform

$$ \int_{-1}^{1} \frac{1}{2} , dz = \left[ \frac{1}{2} z \right]_{-1}^{1} = \frac{1}{2}(1) - \frac{1}{2}(-1) = 1 $$

✅ Integrates to 1

Quadratic (Epanechnikov)

$$ \int_{-1}^{1} \frac{3}{4}(1 - z^2) , dz = 2 \cdot \frac{3}{4} \int_0^1 (1 - z^2) , dz $$

$$ = \frac{3}{2} \left[ z - \frac{z^3}{3} \right]_0^1 = \frac{3}{2} \left(1 - \frac{1}{3} \right) = \frac{3}{2} \cdot \frac{2}{3} = 1 $$

✅ Integrates to 1

Quartic (Biweight)

$$ \int_{-1}^{1} \frac{15}{16}(1 - z^2)^2 , dz = 2 \cdot \frac{15}{16} \int_0^1 (1 - 2z^2 + z^4) , dz $$

$$ = \frac{15}{8} \left[ z - \frac{2z^3}{3} + \frac{z^5}{5} \right]_0^1 = \frac{15}{8} \left(1 - \frac{2}{3} + \frac{1}{5} \right) $$

$$ = \frac{15}{8} \cdot \frac{15 - 10 + 3}{15} = \frac{15}{8} \cdot \frac{8}{15} = 1 $$

✅ Integrates to 1

Gaussian

$$ \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} e^{-z^2/2} , dz = 1 $$

✅ Integrates to 1 (by definition of the standard normal distribution)

Why Set $K(0) = 1$?

Although not required, setting $K(0) = 1$ is often useful:

Maximum Influence at Center: Ensures the focal point has the highest weight.
Similarity Interpretation: Aligns kernel values with similarity measures.
Conventional Form: Simplifies kernel expressions in some cases.
Ease of Comparison: Facilitates comparison across kernels.

Examples:

Triangular: $K(0) = 1$
Epanechnikov: $K(0) = \frac{3}{4}$
Biweight: $K(0) = \frac{15}{16}$
Gaussian: $K(0) = \frac{1}{\sqrt{2\pi}} \approx 0.3989$

Note: Most kernels peak at 0, but their peak value varies unless explicitly standardized.

Possible Refactorings

New kernel.py module
Consolidate kernel logic into a standalone module consumable by both weights and graph.
kw args
Support:
- volume_standardized (aka normalize)
- zero_standardized (aka scale)
Ensure legacy behavior is preserved. Clarify the interpretation of kernel values across contexts.
Bandwidth Strategies
Add options for:
- Fixed bandwidth: e.g., max of minimum KNN distances
- $k$ can vary (not necessarily 1)
K-Nearest Neighbors (KNN)
- Clarify how $k$ is computed and allow user overrides.
- $k = \lfloor n^{1/3} \rfloor$
~~Gaussian Bandwidth Truncation~~
- ~~PySAL: no truncation~~
- GeoDa: truncates kernel at bandwidth
- ~~Suggest: expose this as an option~~

Next Steps

Draft kernel.py scaffolding
Add tests for both normalization and zero standardization behavior
Evaluate current usages in weights and graph for migration
Decide on default behavior for legacy compatibility

martinfleis · 2025-06-26T07:49:12Z

One thing we need to figure out is handling of diagonal filling as a user typically does not know the value of $K(0)$ for any other than zero standardised kernel. That is what started my investigation into all of this. One way is exposing a keyword in the kernel builder that deals with the self weight at build time but that solves only part of the use cases. Given we, by design, do not store information on the origin of the Graph, it is hard to keep around the expected value.

ljwolf · 2025-06-26T08:44:04Z

100% w/ the kernel.py file. To propagate this through to the Graph objects, would it make sense to handle this using some kind of class composition? I wonder because we allow the user to apply a kernel to basically any weight type. I think this would address @martinfleis's concern?

In this pattern, when a weight is built & kernel calculated, we set the .kernel attribute to be some Kernel() class object that allows you to calculate new values and store the parameterization of the kernel itself. This would also let you set up your own custom kernels and pass them down and apply them to spatial data during construction. If no kernel is used, we set .kernel to some default Identity() kernel.

graph = Graph.build_delaunay(coordinates, kernel='parabolic')
graph.kernel(0) # .75
graph.set_self_weight(1.0) # warns if kernel is not Identity()
graph.add_self_weight() # equivalent to graph.set_self_weight(graph.kernel(0)) w/ no warning
graph.kernel.bandwidth # what is the length-scale of this kernel? (sometimes bandwidth, theta)
graph.kernel.taper # past what distance is this kernel zero? (sometimes bandwidth also)

ljwolf · 2025-06-26T16:13:00Z

We've decided against this .kernel class composition strategy. It maintains too much state.

ljwolf · 2025-06-26T16:16:02Z

No matter what we do, we need to fix this, so that when distances are calculated, we populate the self-weight with K(0).

Co-authored-by: knaaptime <[email protected]>

for more information, see https://pre-commit.ci

sjsrey · 2025-10-09T16:41:11Z

@jGaboardi do we want to add pyproj to the deps in in ci/313-min to address these failing tests?

martinfleis · 2025-10-09T16:52:16Z

No, min should have the minimal set of deps to test our ability to work only with required deps. We should conditionally skip if pyproj is not available.

for more information, see https://pre-commit.ci

martinfleis

We have a weird situation here right now.

The kernels.kernel function supports decay and taper but these are not exposed from within graph builders that use kernels. In graph._kernel we do use the keyword taper but don't pass it to the actual kernel.

I am fine with merging this as is (except that one note in the notebook) to get it in, but then we should ensure this propagates to the actual builders.

martinfleis · 2025-12-11T21:44:48Z

docs/user-guide/kernels.ipynb

For the Gaussian kernel, a decay parameter controls how quickly weights fall off with distance. A larger decay implies a faster decrease of weights as distance increases.

This implies that decay can be a numeric value but it is a bool. And controls whether the kernel is interpreted as a distance-decay function or not.

Good catch. I meant to revise that but missed it.
Let me revise.

for more information, see https://pre-commit.ci

sjsrey · 2025-12-13T00:45:24Z

We have a weird situation here right now.

The kernels.kernel function supports decay and taper but these are not exposed from within graph builders that use kernels. In graph._kernel we do use the keyword taper but don't pass it to the actual kernel.

I am fine with merging this as is (except that one note in the notebook) to get it in, but then we should ensure this propagates to the actual builders.

These have now been propagated to the builders.
In the process I think we had some inconsistencies before that have now been corrected.
For example, if bandwidth was not set, then the graph kernel was using the 25th percentile distance to set the bw. However, if k=3 was also set, it is possible that taper (which was True by default) would drop i-j relations when j was a knn neighbor of j but the distance > bw.

martinfleis · 2025-12-13T12:05:06Z

libpysal/graph/base.py

    def build_kernel(
        cls,
        data,
        kernel="gaussian",
        k=None,
        bandwidth=None,
        metric="euclidean",
        p=2,
        coplanar="raise",


Should we expose decay and taper also in here? Because this is the public API.

The same applies to build_distance_band and build_knn build_triangulation and build_travel_cost which` also allow kernels.

But again, happy to keep this for a follow up. However, without these being exposed directly in class methods, they are useful only internally.

Thanks for the close review and catching these. Let me add these and ensure we have all kernel related changes incorporated before we merge.

sjsrey added 3 commits June 3, 2025 10:58

Allow for normalize=True for gaussian kernel as pdf, False K(0)=1

74abd07

bool not binary for type

a46413a

Add tests for normalize kw in Kernel

ca5b0c8

sjsrey requested review from ljwolf and martinfleis June 3, 2025 19:33

sjsrey added the discussion label Jun 3, 2025

numpy testing

335a76f

Correct reference for kernel defs

2c0cdc5

sjsrey and others added 9 commits June 26, 2025 14:24

prototyping new _kernels.py

c18939e

triangular (#29)

c53ee7c

Co-authored-by: knaaptime <[email protected]>

move primative kernels out to libpysal._kernels.py

645edbc

renaming kernels.py

37fe0f6

populate self-weight with K(0)

29d859c

modify imports

7824797

Merge branch 'main' into exp/kernel

cd15d2a

roll back graph kernels

c5f937e

Add tests and illustrative nb

6d38dd0

ljwolf marked this pull request as draft July 10, 2025 16:21

sjsrey added 3 commits July 24, 2025 10:18

Correct Gaussian kernel function

46a7e86

Update tests for gaussian correction

a1b3353

ruff format rules and remove test nbs

162e761

ljwolf mentioned this pull request Jul 24, 2025

Bug/gaussian #798

Closed

ruff tests

f7a1329

pre-commit-ci bot removed the pre-commit.ci autofix label Oct 9, 2025

pre-commit-ci bot and others added 2 commits October 9, 2025 16:02

[pre-commit.ci] auto fixes from pre-commit.com hooks

b909e66

for more information, see https://pre-commit.ci

formatting and import sorting

f689835

sjsrey marked this pull request as ready for review October 9, 2025 16:29

sjsrey added 4 commits October 9, 2025 13:10

skip tests if pyproj not available

6471872

Spelling

4306257

np.in1d is deprecated, using np.isin instead

02ba6a8

Don't assign to unused variable in pytest

a5fdb75

jGaboardi mentioned this pull request Oct 10, 2025

BUG: fix euality check on Graph #809

Merged

sjsrey added 2 commits December 11, 2025 12:37

Docs for kernels

0cc04e2

ruff formatting

f76185e

sjsrey added the pre-commit.ci autofix label Dec 11, 2025

pre-commit-ci bot removed the pre-commit.ci autofix label Dec 11, 2025

pre-commit-ci bot and others added 2 commits December 11, 2025 20:46

[pre-commit.ci] auto fixes from pre-commit.com hooks

097d049

for more information, see https://pre-commit.ci

ruff take 2

09cebd6

sjsrey removed the discussion label Dec 11, 2025

martinfleis reviewed Dec 11, 2025

View reviewed changes

sjsrey added 5 commits December 11, 2025 14:50

more narrative on kernels

69e4255

Pass decay and trim kw args into graph kernels

970e2b1

add decay to graph/_kernel.py

72bf588

If k is specified we need the bandwith set to max distance

c04c8d5

Handle taper appropriately

f3f375c

sjsrey added the pre-commit.ci autofix label Dec 13, 2025

pre-commit-ci bot removed the pre-commit.ci autofix label Dec 13, 2025

pre-commit-ci bot and others added 2 commits December 13, 2025 00:26

[pre-commit.ci] auto fixes from pre-commit.com hooks

3ace2de

for more information, see https://pre-commit.ci

ruff

8f8b704

martinfleis reviewed Dec 13, 2025

View reviewed changes

Add kw arg to normalize kernel in distance weights. #791

Are you sure you want to change the base?

Add kw arg to normalize kernel in distance weights. #791

Uh oh!

Conversation

sjsrey commented Jun 3, 2025

Uh oh!

codecov bot commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

martinfleis commented Jun 3, 2025

Uh oh!

sjsrey commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Kernels in PySAL

Background

Conceptual Overview

Current Implementation

Notation

Kernel Table

Derivations: Do Kernels Integrate to 1?

Triangular

Uniform

Quadratic (Epanechnikov)

Quartic (Biweight)

Gaussian

Why Set $K(0) = 1$?

Possible Refactorings

Next Steps

Uh oh!

martinfleis commented Jun 26, 2025

Uh oh!

ljwolf commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ljwolf commented Jun 26, 2025

Uh oh!

ljwolf commented Jun 26, 2025

Uh oh!

sjsrey commented Oct 9, 2025

Uh oh!

martinfleis commented Oct 9, 2025

Uh oh!

martinfleis left a comment

Choose a reason for hiding this comment

Uh oh!

martinfleis Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

sjsrey Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

sjsrey commented Dec 13, 2025

Uh oh!

martinfleis Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

martinfleis Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

sjsrey Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Jun 3, 2025 •

edited

Loading

sjsrey commented Jun 25, 2025 •

edited

Loading

ljwolf commented Jun 26, 2025 •

edited

Loading