Skip to content

[fix](be) Avoid local runtime filter merge deadlock#65102

Open
BiteTheDDDDt wants to merge 1 commit into
apache:branch-4.1from
BiteTheDDDDt:codex/pick-64866-branch-4.1
Open

[fix](be) Avoid local runtime filter merge deadlock#65102
BiteTheDDDDt wants to merge 1 commit into
apache:branch-4.1from
BiteTheDDDDt:codex/pick-64866-branch-4.1

Conversation

@BiteTheDDDDt

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: #64866

Problem Summary: Backport #64866 to branch-4.1. Local runtime filter merge can deadlock because the old local merge context lock protected both the merger and producer list, allowing lock inversion with producer/merger locks. This backport snapshots the local merge context/producers under RuntimeFilterMgr lock and performs merge, size, and debug work after releasing it.

Release note

None

Check List (For Author)

  • Test: Unit Test
    • ./run-be-ut.sh --run --filter=RuntimeFilterMgrTest.*
    • ./run-be-ut.sh --run --filter=RuntimeFilterMergerTest.*
    • git diff --check upstream/branch-4.1..HEAD
  • Behavior changed: No
  • Does this need documentation: No

Issue Number: None

Related PR: None

Problem Summary: Local runtime filter merge can deadlock when one join
build instance publishes a local-merge runtime filter while another
instance sends its runtime filter size. The old local merge context lock
protected both the merger and the producer list, so one path could hold
a producer runtime filter lock and then wait for the context lock while
another path held the context lock and then waited for a producer lock.

This change gives RuntimeFilterMerger its own internal synchronization
and makes LocalMergeContext expose a snapshot of the merger and
producers. Publish, send-size, and sync-size paths take the context lock
only while copying that snapshot, then merge filters or update producer
sizes outside the context lock. RuntimeFilterMerger returns the ready
transition from merge_from directly, removing the separate unlocked
ready check.

None

- Test: Unit Test
- build-support/clang-format.sh
be/src/exec/runtime_filter/runtime_filter_merger.h
be/src/exec/runtime_filter/runtime_filter_mgr.cpp
be/src/exec/runtime_filter/runtime_filter_mgr.h
be/src/exec/runtime_filter/runtime_filter_producer.cpp
be/test/exec/runtime_filter/runtime_filter_merger_test.cpp
be/test/exec/runtime_filter/runtime_filter_mgr_test.cpp
    - git diff --cached --check
    - ./run-be-ut.sh --run --filter=RuntimeFilterMgrTest.*
    - ./run-be-ut.sh --run --filter=RuntimeFilterMergerTest.*
- Behavior changed: No
- Does this need documentation: No

(cherry picked from commit 9d7d3a2)
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt

Copy link
Copy Markdown
Contributor Author

run buildall

@BiteTheDDDDt BiteTheDDDDt marked this pull request as ready for review July 1, 2026 09:36
@BiteTheDDDDt BiteTheDDDDt requested a review from yiguolei as a code owner July 1, 2026 09:36
@hello-stephen

Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 70.49% (86/122) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 54.71% (20878/38163)
Line Coverage 38.17% (199199/521897)
Region Coverage 34.63% (156372/451523)
Branch Coverage 35.59% (68209/191668)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 86.89% (106/122) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.01% (26844/37280)
Line Coverage 55.10% (286168/519358)
Region Coverage 52.62% (239189/454592)
Branch Coverage 53.73% (103140/191953)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants