Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 13, 2025

📄 17% (0.17x) speedup for _get_indentation_of_key in mlflow/utils/docstring_utils.py

⏱️ Runtime : 76.2 microseconds 64.8 microseconds (best of 65 runs)

📝 Explanation and details

The optimization replaces a conditional expression with an early return pattern and swaps the operand order in string multiplication. Here's why it's faster:

Key Optimizations:

  1. Early return for common cases: The optimized version immediately returns "" when index <= 0 (covering both not-found cases where index == -1 and zero-indentation cases where index == 0). This avoids the overhead of string multiplication entirely for these frequent scenarios.

  2. Operand order optimization: Changed from index * " " to " " * index. In CPython, left-multiplication of strings by integers (str * int) is more optimized than right-multiplication (int * str), as the string object can directly handle the repetition internally.

Performance Impact by Test Case:

  • Best gains (30-45% faster): Cases where placeholder is at index 0 or not found (empty placeholders, placeholders at start)
  • Moderate gains (10-20% faster): Cases requiring actual string multiplication
  • Minimal impact: Very long strings where find() dominates runtime

Hot Path Context:
Based on function_references, this function is called from _replace_all() in docstring processing, where it runs in a loop over multiple replacements. The optimization is particularly valuable here because:

  • Docstring processing likely encounters many placeholders at the start of lines (common indentation patterns)
  • The function gets called repeatedly during template replacement operations
  • Early returns reduce cumulative overhead across multiple calls

The 17% overall speedup makes this optimization worthwhile for MLflow's docstring processing pipeline, especially given the function's usage in text replacement workflows.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 64 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest  # used for our unit tests
from mlflow.utils.docstring_utils import _get_indentation_of_key

# unit tests

# 1. Basic Test Cases

def test_basic_no_indentation():
    # No leading spaces, placeholder at index 0
    codeflash_output = _get_indentation_of_key("key: value", "key") # 1.25μs -> 932ns (34.7% faster)

def test_basic_single_space_indentation():
    # One leading space before placeholder
    codeflash_output = _get_indentation_of_key(" key: value", "key") # 1.20μs -> 1.03μs (15.8% faster)

def test_basic_multiple_spaces_indentation():
    # Multiple spaces before placeholder
    codeflash_output = _get_indentation_of_key("    key: value", "key") # 1.17μs -> 1.06μs (10.2% faster)

def test_basic_placeholder_not_found():
    # Placeholder not present in line
    codeflash_output = _get_indentation_of_key("some other text", "key") # 1.05μs -> 927ns (13.7% faster)

def test_basic_placeholder_in_middle():
    # Placeholder not at start, but after some text
    codeflash_output = _get_indentation_of_key("abc key: value", "key") # 1.25μs -> 1.07μs (16.5% faster)

def test_basic_placeholder_at_end():
    # Placeholder at end of line
    codeflash_output = _get_indentation_of_key("value is key", "key") # 1.23μs -> 1.08μs (14.2% faster)

def test_basic_multiple_occurrences():
    # Multiple occurrences, should use the first
    codeflash_output = _get_indentation_of_key("  key and key", "key") # 1.20μs -> 1.04μs (14.7% faster)

# 2. Edge Test Cases

def test_edge_empty_line():
    # Empty line, placeholder can't be found
    codeflash_output = _get_indentation_of_key("", "key") # 837ns -> 823ns (1.70% faster)

def test_edge_empty_placeholder():
    # Empty placeholder, should always match at index 0
    codeflash_output = _get_indentation_of_key("anything", "") # 1.13μs -> 819ns (37.9% faster)

def test_edge_placeholder_is_space():
    # Placeholder is a space, should find first space
    codeflash_output = _get_indentation_of_key("  key", " ") # 1.11μs -> 804ns (37.7% faster)

def test_edge_placeholder_is_tab():
    # Placeholder is a tab character
    codeflash_output = _get_indentation_of_key("\tkey", "\t") # 1.11μs -> 797ns (39.1% faster)

def test_edge_line_is_only_placeholder():
    # Line is exactly the placeholder
    codeflash_output = _get_indentation_of_key("key", "key") # 1.24μs -> 885ns (39.8% faster)

def test_edge_placeholder_longer_than_line():
    # Placeholder longer than line
    codeflash_output = _get_indentation_of_key("ab", "abc") # 906ns -> 782ns (15.9% faster)

def test_edge_unicode_characters():
    # Unicode characters in line and placeholder
    codeflash_output = _get_indentation_of_key("   κλειδί: value", "κλειδί") # 1.35μs -> 1.17μs (15.0% faster)

def test_edge_placeholder_at_start_with_leading_tabs():
    # Leading tabs before placeholder
    codeflash_output = _get_indentation_of_key("\t\tkey", "key") # 1.25μs -> 1.06μs (17.2% faster)

def test_edge_placeholder_is_newline():
    # Placeholder is a newline character
    codeflash_output = _get_indentation_of_key("abc\nkey", "\n") # 1.13μs -> 933ns (20.7% faster)

def test_edge_placeholder_multiple_characters():
    # Placeholder is multiple characters and appears in the middle
    codeflash_output = _get_indentation_of_key("   foo_bar_baz", "bar") # 1.24μs -> 1.03μs (19.3% faster)

def test_edge_line_is_only_spaces():
    # Line is only spaces, placeholder not found
    codeflash_output = _get_indentation_of_key("     ", "key") # 1.01μs -> 896ns (12.6% faster)

def test_edge_placeholder_is_empty_and_line_is_empty():
    # Both line and placeholder are empty
    codeflash_output = _get_indentation_of_key("", "") # 1.12μs -> 833ns (34.9% faster)

def test_edge_placeholder_is_whitespace():
    # Placeholder is whitespace, should match first whitespace
    codeflash_output = _get_indentation_of_key("   key", " ") # 1.12μs -> 790ns (42.4% faster)

def test_edge_placeholder_is_special_char():
    # Placeholder is a special character
    codeflash_output = _get_indentation_of_key("abc$def", "$") # 1.05μs -> 902ns (16.1% faster)

def test_edge_placeholder_not_found_in_long_line():
    # Long line, placeholder not present
    codeflash_output = _get_indentation_of_key("a" * 100, "zzz") # 1.11μs -> 998ns (11.6% faster)

# 3. Large Scale Test Cases

def test_large_line_with_placeholder_at_start():
    # Large line, placeholder at start
    line = "key" + " " * 997
    codeflash_output = _get_indentation_of_key(line, "key") # 1.22μs -> 919ns (33.0% faster)

def test_large_line_with_placeholder_at_end():
    # Large line, placeholder at end
    line = " " * 997 + "key"
    codeflash_output = _get_indentation_of_key(line, "key") # 1.75μs -> 1.71μs (2.10% faster)

def test_large_line_with_placeholder_in_middle():
    # Large line, placeholder in the middle
    line = " " * 500 + "key" + " " * 497
    codeflash_output = _get_indentation_of_key(line, "key") # 1.47μs -> 1.33μs (10.6% faster)

def test_large_line_with_multiple_placeholders():
    # Large line, multiple occurrences, should use the first
    line = " " * 100 + "key" + " " * 100 + "key"
    codeflash_output = _get_indentation_of_key(line, "key") # 1.27μs -> 1.12μs (13.9% faster)

def test_large_line_no_placeholder():
    # Large line, no placeholder
    line = " " * 1000
    codeflash_output = _get_indentation_of_key(line, "key") # 1.19μs -> 1.10μs (7.24% faster)

def test_large_placeholder_at_various_positions():
    # Test placeholder at several positions in large line
    for idx in [0, 250, 500, 999]:
        line = " " * idx + "key" + " " * (1000 - idx - 3)
        codeflash_output = _get_indentation_of_key(line, "key") # 2.94μs -> 2.59μs (13.5% faster)

def test_large_line_with_non_ascii_placeholder():
    # Large line, unicode placeholder
    line = " " * 500 + "κλειδί" + " " * 494
    codeflash_output = _get_indentation_of_key(line, "κλειδί") # 1.63μs -> 1.46μs (11.8% faster)

def test_large_line_placeholder_not_found():
    # Large line, placeholder not present
    line = "a" * 1000
    codeflash_output = _get_indentation_of_key(line, "key") # 1.26μs -> 1.15μs (9.45% faster)

def test_large_line_placeholder_is_empty():
    # Large line, empty placeholder, should match at index 0
    line = " " * 1000
    codeflash_output = _get_indentation_of_key(line, "") # 1.13μs -> 826ns (37.2% faster)

def test_large_line_placeholder_is_space():
    # Large line, placeholder is a space, should match at index 0
    line = " " * 1000
    codeflash_output = _get_indentation_of_key(line, " ") # 1.15μs -> 846ns (35.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest  # used for our unit tests
from mlflow.utils.docstring_utils import _get_indentation_of_key

# unit tests

# -------------------------------
# 1. Basic Test Cases
# -------------------------------

def test_basic_indentation_at_start():
    # Placeholder at the start, so zero indentation
    codeflash_output = _get_indentation_of_key("PLACEHOLDER rest of line", "PLACEHOLDER") # 1.23μs -> 995ns (23.1% faster)

def test_basic_indentation_with_spaces():
    # Placeholder after 4 spaces
    codeflash_output = _get_indentation_of_key("    PLACEHOLDER rest of line", "PLACEHOLDER") # 1.30μs -> 1.10μs (18.2% faster)

def test_basic_indentation_with_tabs():
    # Tabs are counted as characters, not as indentation
    codeflash_output = _get_indentation_of_key("\t\tPLACEHOLDER", "PLACEHOLDER") # 1.26μs -> 1.09μs (15.7% faster)

def test_basic_indentation_middle_of_line():
    # Placeholder not at the start, but after some text
    codeflash_output = _get_indentation_of_key("abc PLACEHOLDER xyz", "PLACEHOLDER") # 1.24μs -> 1.11μs (11.1% faster)

def test_basic_no_placeholder():
    # Placeholder not found, so no indentation
    codeflash_output = _get_indentation_of_key("    No placeholder here", "PLACEHOLDER") # 1.04μs -> 929ns (12.4% faster)

def test_basic_empty_line():
    # Empty line, placeholder can't be found
    codeflash_output = _get_indentation_of_key("", "PLACEHOLDER") # 915ns -> 769ns (19.0% faster)

def test_basic_empty_placeholder():
    # Empty placeholder, always found at start
    codeflash_output = _get_indentation_of_key("abc", "") # 1.12μs -> 851ns (31.6% faster)

def test_basic_placeholder_at_end():
    # Placeholder at the end, indentation up to that point
    codeflash_output = _get_indentation_of_key("    something PLACEHOLDER", "PLACEHOLDER") # 1.25μs -> 1.10μs (13.9% faster)

def test_basic_multiple_occurrences():
    # Only first occurrence matters
    codeflash_output = _get_indentation_of_key("  PLACEHOLDER and PLACEHOLDER again", "PLACEHOLDER") # 1.27μs -> 1.05μs (20.8% faster)

# -------------------------------
# 2. Edge Test Cases
# -------------------------------

def test_edge_placeholder_is_space():
    # Placeholder is a space, so indentation is up to first space
    codeflash_output = _get_indentation_of_key("    abc", " ") # 1.07μs -> 821ns (29.8% faster)

def test_edge_placeholder_is_empty_and_line_is_empty():
    # Both empty, so placeholder found at start
    codeflash_output = _get_indentation_of_key("", "") # 1.15μs -> 795ns (45.0% faster)

def test_edge_placeholder_longer_than_line():
    # Placeholder longer than line, can't be found
    codeflash_output = _get_indentation_of_key("abc", "abcdef") # 867ns -> 828ns (4.71% faster)

def test_edge_line_all_spaces():
    # Line is all spaces, placeholder not found
    codeflash_output = _get_indentation_of_key("     ", "PLACEHOLDER") # 922ns -> 812ns (13.5% faster)

def test_edge_line_is_placeholder_only():
    # Line is exactly the placeholder, so indentation is zero
    codeflash_output = _get_indentation_of_key("PLACEHOLDER", "PLACEHOLDER") # 1.21μs -> 940ns (28.2% faster)

def test_edge_placeholder_not_ascii():
    # Unicode placeholder
    codeflash_output = _get_indentation_of_key("  你好 rest", "你好") # 1.24μs -> 1.06μs (16.6% faster)

def test_edge_placeholder_overlapping():
    # Placeholder overlaps with itself, but only first occurrence matters
    codeflash_output = _get_indentation_of_key("xxxPLACEHOLDERPLACEHOLDER", "PLACEHOLDER") # 1.28μs -> 1.10μs (16.2% faster)

def test_edge_placeholder_at_index_zero():
    # Placeholder at very start
    codeflash_output = _get_indentation_of_key("PLACEHOLDERabc", "PLACEHOLDER") # 1.24μs -> 949ns (30.6% faster)

def test_edge_placeholder_is_newline():
    # Placeholder is newline, so indentation is up to first newline
    codeflash_output = _get_indentation_of_key("abc\ndef", "\n") # 1.09μs -> 948ns (14.6% faster)

def test_edge_placeholder_is_special_characters():
    # Placeholder is special characters
    codeflash_output = _get_indentation_of_key("##@@!!PLACEHOLDER", "PLACEHOLDER") # 1.25μs -> 1.09μs (14.5% faster)

# -------------------------------
# 3. Large Scale Test Cases
# -------------------------------

def test_large_scale_long_line_with_placeholder_at_start():
    # Line with 1000 characters, placeholder at start
    line = "PLACEHOLDER" + "x" * 990
    codeflash_output = _get_indentation_of_key(line, "PLACEHOLDER") # 1.25μs -> 935ns (33.2% faster)

def test_large_scale_long_line_with_placeholder_at_end():
    # Line with 1000 characters, placeholder at end
    line = "x" * 990 + "PLACEHOLDER"
    codeflash_output = _get_indentation_of_key(line, "PLACEHOLDER") # 1.76μs -> 1.60μs (10.1% faster)

def test_large_scale_long_line_with_placeholder_in_middle():
    # Placeholder at position 500
    line = "x" * 500 + "PLACEHOLDER" + "y" * 490
    codeflash_output = _get_indentation_of_key(line, "PLACEHOLDER") # 1.50μs -> 1.44μs (4.30% faster)

def test_large_scale_many_spaces_before_placeholder():
    # 999 spaces before placeholder
    line = " " * 999 + "PLACEHOLDER"
    codeflash_output = _get_indentation_of_key(line, "PLACEHOLDER") # 1.71μs -> 1.57μs (9.24% faster)

def test_large_scale_no_placeholder_in_long_line():
    # Long line, no placeholder
    line = "x" * 1000
    codeflash_output = _get_indentation_of_key(line, "PLACEHOLDER") # 1.11μs -> 1.14μs (2.98% slower)

def test_large_scale_placeholder_is_long_string():
    # Placeholder is 100 characters, found at position 50
    placeholder = "A" * 100
    line = "x" * 50 + placeholder + "y" * 850
    codeflash_output = _get_indentation_of_key(line, placeholder) # 1.56μs -> 1.38μs (13.0% faster)

def test_large_scale_multiple_placeholders():
    # Multiple placeholders, only first matters
    line = " " * 100 + "PLACEHOLDER" + " " * 200 + "PLACEHOLDER"
    codeflash_output = _get_indentation_of_key(line, "PLACEHOLDER") # 1.28μs -> 1.14μs (12.3% faster)

def test_large_scale_placeholder_not_found():
    # Placeholder not found in a large line
    line = "a" * 999 + "b"
    codeflash_output = _get_indentation_of_key(line, "PLACEHOLDER") # 1.16μs -> 1.04μs (11.1% faster)

def test_large_scale_placeholder_at_last_possible_index():
    # Placeholder at index 999-10=989
    line = "x" * 989 + "PLACEHOLDER"
    codeflash_output = _get_indentation_of_key(line, "PLACEHOLDER") # 1.71μs -> 1.65μs (3.21% faster)

def test_large_scale_line_is_placeholder_repeated():
    # Line is placeholder repeated 100 times
    placeholder = "P"
    line = placeholder * 100
    # First occurrence at index 0
    codeflash_output = _get_indentation_of_key(line, placeholder) # 1.12μs -> 829ns (35.7% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from mlflow.utils.docstring_utils import _get_indentation_of_key

def test__get_indentation_of_key():
    _get_indentation_of_key('', '')

def test__get_indentation_of_key_2():
    _get_indentation_of_key('', '\x00')

To edit these changes git checkout codeflash/optimize-_get_indentation_of_key-mhx7axwz and push.

Codeflash Static Badge

The optimization replaces a conditional expression with an early return pattern and swaps the operand order in string multiplication. Here's why it's faster:

**Key Optimizations:**

1. **Early return for common cases**: The optimized version immediately returns `""` when `index <= 0` (covering both not-found cases where `index == -1` and zero-indentation cases where `index == 0`). This avoids the overhead of string multiplication entirely for these frequent scenarios.

2. **Operand order optimization**: Changed from `index * " "` to `" " * index`. In CPython, left-multiplication of strings by integers (`str * int`) is more optimized than right-multiplication (`int * str`), as the string object can directly handle the repetition internally.

**Performance Impact by Test Case:**
- **Best gains (30-45% faster)**: Cases where placeholder is at index 0 or not found (empty placeholders, placeholders at start)
- **Moderate gains (10-20% faster)**: Cases requiring actual string multiplication 
- **Minimal impact**: Very long strings where `find()` dominates runtime

**Hot Path Context:**
Based on `function_references`, this function is called from `_replace_all()` in docstring processing, where it runs in a loop over multiple replacements. The optimization is particularly valuable here because:
- Docstring processing likely encounters many placeholders at the start of lines (common indentation patterns)
- The function gets called repeatedly during template replacement operations
- Early returns reduce cumulative overhead across multiple calls

The 17% overall speedup makes this optimization worthwhile for MLflow's docstring processing pipeline, especially given the function's usage in text replacement workflows.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 13, 2025 09:02
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant