⚡️ Speed up function `calculate_smoothed_npmi` by 19% #169

codeflash-ai · 2025-11-13T06:36:26Z

📄 19% (0.19x) speedup for `calculate_smoothed_npmi` in `mlflow/store/analytics/trace_correlation.py`

⏱️ Runtime : 226 microseconds → 190 microseconds (best of 91 runs)

📝 Explanation and details

The optimized code achieves a 19% speedup through several micro-optimizations that reduce function call overhead and improve computational efficiency:

Key optimizations in _calculate_npmi_core:

Local function reference: log = math.log eliminates repeated attribute lookups to math.log, creating a faster local reference
Simplified denominator calculation: Changed -(log_n11 - log_N) to log_N - log_n11 to avoid the negation operation
Optimized clamping logic: Replaced max(-1.0, min(1.0, npmi)) with explicit conditional checks, eliminating two function calls and their associated overhead

Key optimization in calculate_smoothed_npmi:

Direct chained comparison: Replaced min(n11, n10, n01, n00) < 0 with n11 < 0 or n10 < 0 or n01 < 0 or n00 < 0, avoiding tuple creation and the min() function call
Inline constant: Used smoothing: float = 0.5 directly instead of referencing JEFFREYS_PRIOR, eliminating a module-level attribute lookup

Performance impact by test case:

Perfect co-occurrence cases see moderate gains (10-18%) as they bypass the computationally intensive log calculations
General computation cases achieve the best speedups (15-29%) where all optimizations apply, especially the math function optimizations
Edge cases with early returns show smaller but consistent improvements (3-8%) from the streamlined validation logic

These optimizations are particularly effective because they target the most frequently executed code paths in statistical calculations, where even small reductions in overhead compound significantly across many invocations.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 121 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import math

# imports
import pytest
from mlflow.store.analytics.trace_correlation import calculate_smoothed_npmi

# function to test
# (pasted as requested)
# --- BEGIN calculate_smoothed_npmi implementation ---
JEFFREYS_PRIOR = 0.5
from mlflow.store.analytics.trace_correlation import \
    calculate_smoothed_npmi  # --- END calculate_smoothed_npmi implementation ---

# unit tests

# Basic Test Cases

def test_basic_perfect_cooccurrence():
    # Both events always occur together, NPMI should be 1.0
    codeflash_output = calculate_smoothed_npmi(10, 10, 10, 10) # 2.48μs -> 2.20μs (12.7% faster)

def test_basic_no_cooccurrence():
    # No joint occurrences, but both events occur separately
    # NPMI should be -1.0 (minimal association)
    codeflash_output = calculate_smoothed_npmi(0, 5, 5, 10); result = codeflash_output # 4.46μs -> 3.69μs (20.9% faster)

def test_basic_independent_events():
    # Events occur independently, NPMI should be close to 0
    # For 2 events, each occurs in half the cases, joint occurs in 1/4
    # n11=2, f1=4, f2=4, total=8
    codeflash_output = calculate_smoothed_npmi(2, 4, 4, 8); result = codeflash_output # 4.28μs -> 3.67μs (16.9% faster)

def test_basic_smoothing_parameter_effect():
    # Check that increasing smoothing moves NPMI toward 0
    # Use no smoothing and large smoothing for same input
    codeflash_output = calculate_smoothed_npmi(2, 4, 4, 8, smoothing=0); raw = codeflash_output # 4.38μs -> 3.56μs (22.8% faster)
    codeflash_output = calculate_smoothed_npmi(2, 4, 4, 8, smoothing=10); smoothed = codeflash_output # 1.75μs -> 1.36μs (29.0% faster)

def test_basic_return_type_and_range():
    # NPMI should always be float and in [-1, 1] or NaN
    codeflash_output = calculate_smoothed_npmi(3, 7, 5, 10); result = codeflash_output # 4.37μs -> 3.68μs (18.9% faster)

# Edge Test Cases

def test_edge_zero_total_count():
    # total_count=0 is undefined, should return NaN
    codeflash_output = calculate_smoothed_npmi(0, 0, 0, 0); result = codeflash_output # 1.18μs -> 1.13μs (4.61% faster)

def test_edge_perfect_anti_cooccurrence():
    # Events never co-occur, but always occur separately
    # n11=0, n10=10, n01=10, n00=0, total=20
    codeflash_output = calculate_smoothed_npmi(0, 10, 10, 20); result = codeflash_output # 4.49μs -> 3.76μs (19.2% faster)

def test_edge_large_smoothing_for_zero_counts():
    # Smoothing should allow calculation even when all counts are zero
    codeflash_output = calculate_smoothed_npmi(0, 0, 0, 10, smoothing=1.0); result = codeflash_output # 4.98μs -> 4.23μs (17.7% faster)

# Large Scale Test Cases

def test_large_scale_balanced_counts():
    # Large, balanced counts, events occur independently
    # n11=250, f1=500, f2=500, total=1000
    codeflash_output = calculate_smoothed_npmi(250, 500, 500, 1000); result = codeflash_output # 4.41μs -> 3.68μs (19.7% faster)

def test_large_scale_perfect_cooccurrence():
    # Large perfect co-occurrence
    codeflash_output = calculate_smoothed_npmi(500, 500, 500, 500) # 2.47μs -> 2.09μs (18.3% faster)

def test_large_scale_no_cooccurrence():
    # Large no co-occurrence
    codeflash_output = calculate_smoothed_npmi(0, 500, 500, 1000); result = codeflash_output # 4.39μs -> 3.90μs (12.7% faster)

def test_large_scale_sparse_joint():
    # Large counts, but joint occurrence is rare
    codeflash_output = calculate_smoothed_npmi(5, 500, 500, 1000); result = codeflash_output # 4.37μs -> 3.72μs (17.6% faster)

def test_large_scale_high_smoothing():
    # Large counts, high smoothing parameter
    codeflash_output = calculate_smoothed_npmi(250, 500, 500, 1000, smoothing=100); result = codeflash_output # 4.52μs -> 3.87μs (16.7% faster)

def test_large_scale_edge_case_maximum_counts():
    # All counts at maximum, events always present
    codeflash_output = calculate_smoothed_npmi(1000, 1000, 1000, 1000); result = codeflash_output # 2.43μs -> 2.21μs (9.94% faster)

def test_large_scale_edge_case_all_zero():
    # All counts zero, should return NaN
    codeflash_output = calculate_smoothed_npmi(0, 0, 0, 0); result = codeflash_output # 1.17μs -> 1.13μs (3.19% faster)

def test_large_scale_randomized_consistency():
    # Test a variety of random but valid large inputs for determinism and range
    for joint_count in [0, 50, 250, 500, 999]:
        for filter1_count in [joint_count, 500, 1000]:
            for filter2_count in [joint_count, 500, 1000]:
                total_count = max(filter1_count, filter2_count)
                total_count = max(total_count, joint_count)
                total_count = max(total_count, filter1_count + filter2_count - joint_count)
                # Ensure valid counts
                codeflash_output = calculate_smoothed_npmi(joint_count, filter1_count, filter2_count, total_count); result = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import math

# imports
import pytest
from mlflow.store.analytics.trace_correlation import calculate_smoothed_npmi

# function to test
JEFFREYS_PRIOR = 0.5
from mlflow.store.analytics.trace_correlation import calculate_smoothed_npmi

# unit tests

# ---- Basic Test Cases ----

def test_basic_independent_events():
    # Two events occur independently: joint_count = 25, filter1 = 50, filter2 = 50, total = 100
    # Expect NPMI near 0 (independent)
    codeflash_output = calculate_smoothed_npmi(25, 50, 50, 100); result = codeflash_output # 4.31μs -> 3.72μs (15.8% faster)

def test_basic_perfect_cooccurrence():
    # Both events always occur together: joint_count = filter1 = filter2 = total
    codeflash_output = calculate_smoothed_npmi(100, 100, 100, 100); result = codeflash_output # 2.48μs -> 2.11μs (17.6% faster)

def test_basic_never_cooccur():
    # Events never co-occur: joint_count = 0, filter1 = 50, filter2 = 50, total = 100
    codeflash_output = calculate_smoothed_npmi(0, 50, 50, 100); result = codeflash_output # 4.47μs -> 3.90μs (14.8% faster)

def test_basic_partial_overlap():
    # Events overlap partially: joint_count = 10, filter1 = 50, filter2 = 30, total = 100
    codeflash_output = calculate_smoothed_npmi(10, 50, 30, 100); result = codeflash_output # 4.34μs -> 3.71μs (17.3% faster)

def test_basic_no_overlap():
    # Events do not overlap: joint_count = 0, filter1 = 20, filter2 = 30, total = 100
    codeflash_output = calculate_smoothed_npmi(0, 20, 30, 100); result = codeflash_output # 4.36μs -> 3.79μs (15.2% faster)

def test_basic_all_zero_counts():
    # All counts zero (degenerate)
    codeflash_output = calculate_smoothed_npmi(0, 0, 0, 0); result = codeflash_output # 1.17μs -> 1.10μs (6.75% faster)

def test_basic_smoothing_parameter_effect():
    # Check that increasing smoothing moves NPMI closer to 0
    codeflash_output = calculate_smoothed_npmi(0, 20, 30, 100, smoothing=0); result_no_smooth = codeflash_output # 2.94μs -> 2.49μs (17.9% faster)
    codeflash_output = calculate_smoothed_npmi(0, 20, 30, 100, smoothing=10); result_smooth = codeflash_output # 3.30μs -> 2.70μs (21.9% faster)

# ---- Edge Test Cases ----

def test_edge_negative_counts():
    # Negative joint count (invalid)
    codeflash_output = calculate_smoothed_npmi(-1, 10, 10, 20); result = codeflash_output # 1.92μs -> 1.49μs (28.4% faster)

def test_edge_negative_total_count():
    # Negative total count (invalid)
    codeflash_output = calculate_smoothed_npmi(5, 10, 10, -10); result = codeflash_output # 1.16μs -> 1.07μs (8.49% faster)

def test_edge_filter1_less_than_joint():
    # filter1 < joint_count (invalid)
    codeflash_output = calculate_smoothed_npmi(11, 10, 10, 20); result = codeflash_output # 1.88μs -> 1.50μs (24.6% faster)

def test_edge_filter2_less_than_joint():
    # filter2 < joint_count (invalid)
    codeflash_output = calculate_smoothed_npmi(11, 10, 10, 20); result = codeflash_output # 1.90μs -> 1.48μs (28.1% faster)

def test_edge_total_less_than_filters():
    # total < filter1 + filter2 - joint_count (invalid)
    codeflash_output = calculate_smoothed_npmi(5, 10, 10, 10); result = codeflash_output # 2.03μs -> 1.63μs (24.2% faster)

def test_edge_zero_joint_count_with_smoothing():
    # Zero joint count but smoothing should make NPMI defined
    codeflash_output = calculate_smoothed_npmi(0, 10, 10, 20, smoothing=1); result = codeflash_output # 4.67μs -> 3.91μs (19.5% faster)

def test_edge_zero_joint_count_no_smoothing():
    # Zero joint count and no smoothing: should be undefined (NaN)
    codeflash_output = calculate_smoothed_npmi(0, 10, 10, 20, smoothing=0); result = codeflash_output # 2.99μs -> 2.65μs (12.7% faster)

def test_edge_joint_count_equals_total():
    # joint_count == total, filter1 == filter2 == total
    codeflash_output = calculate_smoothed_npmi(20, 20, 20, 20); result = codeflash_output # 2.48μs -> 2.24μs (10.7% faster)

def test_edge_joint_count_greater_than_total():
    # joint_count > total (invalid)
    codeflash_output = calculate_smoothed_npmi(21, 20, 20, 20); result = codeflash_output # 1.98μs -> 1.65μs (20.0% faster)

def test_edge_all_counts_zero_but_total_positive():
    # All counts zero except total
    codeflash_output = calculate_smoothed_npmi(0, 0, 0, 10); result = codeflash_output # 4.58μs -> 3.98μs (15.1% faster)

def test_edge_extreme_smoothing():
    # Large smoothing parameter should yield NPMI close to zero
    codeflash_output = calculate_smoothed_npmi(0, 0, 0, 10, smoothing=1000); result = codeflash_output # 4.75μs -> 3.97μs (19.7% faster)

def test_edge_float_inputs():
    # Accept float inputs (should work as ints are cast to float internally)
    codeflash_output = calculate_smoothed_npmi(5.0, 10.0, 10.0, 20.0); result = codeflash_output # 4.07μs -> 3.39μs (20.2% faster)

def test_edge_large_joint_count_small_total():
    # joint_count == total, but filters much larger (invalid)
    codeflash_output = calculate_smoothed_npmi(10, 20, 20, 10); result = codeflash_output # 2.07μs -> 1.75μs (17.8% faster)

# ---- Large Scale Test Cases ----

def test_large_scale_balanced():
    # Large balanced dataset: joint_count = 500, filter1 = 1000, filter2 = 1000, total = 2000
    codeflash_output = calculate_smoothed_npmi(500, 1000, 1000, 2000); result = codeflash_output # 4.47μs -> 3.83μs (16.8% faster)

def test_large_scale_perfect_cooccurrence():
    # All events co-occur: joint_count = filter1 = filter2 = total = 1000
    codeflash_output = calculate_smoothed_npmi(1000, 1000, 1000, 1000); result = codeflash_output # 2.47μs -> 2.18μs (13.3% faster)

def test_large_scale_never_cooccur():
    # Events never co-occur: joint_count = 0, filter1 = 500, filter2 = 500, total = 1000
    codeflash_output = calculate_smoothed_npmi(0, 500, 500, 1000); result = codeflash_output # 4.60μs -> 3.94μs (16.5% faster)

def test_large_scale_partial_overlap():
    # Partial overlap: joint_count = 100, filter1 = 600, filter2 = 400, total = 1000
    codeflash_output = calculate_smoothed_npmi(100, 600, 400, 1000); result = codeflash_output # 4.49μs -> 3.84μs (17.0% faster)

def test_large_scale_sparse_events():
    # Sparse events: joint_count = 1, filter1 = 10, filter2 = 10, total = 1000
    codeflash_output = calculate_smoothed_npmi(1, 10, 10, 1000); result = codeflash_output # 4.43μs -> 3.82μs (16.0% faster)

def test_large_scale_extreme_smoothing():
    # Large smoothing parameter
    codeflash_output = calculate_smoothed_npmi(0, 0, 0, 1000, smoothing=1000); result = codeflash_output # 4.70μs -> 4.00μs (17.6% faster)

def test_large_scale_randomized_counts():
    # Random valid counts, should always return value in [-1, 1]
    for joint_count in range(0, 1000, 250):
        for filter1_count in range(joint_count, 1000, 250):
            for filter2_count in range(joint_count, 1000, 250):
                total_count = max(filter1_count, filter2_count) + 100
                codeflash_output = calculate_smoothed_npmi(joint_count, filter1_count, filter2_count, total_count); result = codeflash_output

def test_large_scale_invalid_counts():
    # Large invalid counts should return NaN
    codeflash_output = calculate_smoothed_npmi(1001, 1000, 1000, 1000); result = codeflash_output # 2.06μs -> 1.59μs (29.1% faster)

def test_large_scale_zero_total():
    # Large scale zero total
    codeflash_output = calculate_smoothed_npmi(0, 0, 0, 0); result = codeflash_output # 1.20μs -> 1.11μs (8.05% faster)

def test_large_scale_joint_count_equals_total():
    # joint_count == total, filter1 == filter2 == total
    codeflash_output = calculate_smoothed_npmi(999, 999, 999, 999); result = codeflash_output # 2.45μs -> 2.23μs (10.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

from mlflow.store.analytics.trace_correlation import calculate_smoothed_npmi

def test_calculate_smoothed_npmi():
    calculate_smoothed_npmi(1, 2, 2, 3, smoothing=0.0)

def test_calculate_smoothed_npmi_2():
    calculate_smoothed_npmi(0, 2, 0, 1, smoothing=0.0)

def test_calculate_smoothed_npmi_3():
    calculate_smoothed_npmi(0, 0, 0, 0, smoothing=0.0)

To edit these changes git checkout codeflash/optimize-calculate_smoothed_npmi-mhx22r5p and push.

The optimized code achieves a **19% speedup** through several micro-optimizations that reduce function call overhead and improve computational efficiency: **Key optimizations in `_calculate_npmi_core`:** - **Local function reference**: `log = math.log` eliminates repeated attribute lookups to `math.log`, creating a faster local reference - **Simplified denominator calculation**: Changed `-(log_n11 - log_N)` to `log_N - log_n11` to avoid the negation operation - **Optimized clamping logic**: Replaced `max(-1.0, min(1.0, npmi))` with explicit conditional checks, eliminating two function calls and their associated overhead **Key optimization in `calculate_smoothed_npmi`:** - **Direct chained comparison**: Replaced `min(n11, n10, n01, n00) < 0` with `n11 < 0 or n10 < 0 or n01 < 0 or n00 < 0`, avoiding tuple creation and the `min()` function call - **Inline constant**: Used `smoothing: float = 0.5` directly instead of referencing `JEFFREYS_PRIOR`, eliminating a module-level attribute lookup **Performance impact by test case:** - **Perfect co-occurrence cases** see moderate gains (10-18%) as they bypass the computationally intensive log calculations - **General computation cases** achieve the best speedups (15-29%) where all optimizations apply, especially the math function optimizations - **Edge cases with early returns** show smaller but consistent improvements (3-8%) from the streamlined validation logic These optimizations are particularly effective because they target the most frequently executed code paths in statistical calculations, where even small reductions in overhead compound significantly across many invocations.

codeflash-ai bot requested a review from mashraf-222 November 13, 2025 06:36

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: Medium Optimization Quality according to Codeflash labels Nov 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `calculate_smoothed_npmi` by 19% #169

⚡️ Speed up function `calculate_smoothed_npmi` by 19% #169

Uh oh!

codeflash-ai bot commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function calculate_smoothed_npmi by 19% #169

Are you sure you want to change the base?

⚡️ Speed up function calculate_smoothed_npmi by 19% #169

Uh oh!

Conversation

codeflash-ai bot commented Nov 13, 2025

📄 19% (0.19x) speedup for calculate_smoothed_npmi in mlflow/store/analytics/trace_correlation.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `calculate_smoothed_npmi` by 19% #169

⚡️ Speed up function `calculate_smoothed_npmi` by 19% #169

📄 19% (0.19x) speedup for `calculate_smoothed_npmi` in `mlflow/store/analytics/trace_correlation.py`