Skip to content

Conversation

@RocMarshal
Copy link
Contributor

@RocMarshal RocMarshal commented Jan 20, 2026

What is the purpose of the change

[FLINK-38943][runtime] Support Adaptive Partition Selection for RescalePartitioner and RebalancePartitioner

Brief change log

Introduce the following:

  • config options
    • taskmanager.network.adaptive-partitioner.enabled
    • taskmanager.network.adaptive-partitioner.max-traverse-size
  • AdaptiveLoadBasedRecordWriter.java for adaptive partition

Verifying this change

This change added tests and can be verified as follows:

  • AdaptiveLoadBasedRecordWriterTest.java

The benchmark about it is here

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Jan 20, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@RocMarshal
Copy link
Contributor Author

Hi, @davidradl @X-czh Could you help take a look ? thx a lot.

@RocMarshal
Copy link
Contributor Author

@flinkbot run azure

@X-czh
Copy link
Contributor

X-czh commented Jan 20, 2026

@RocMarshal Thanks for the quick contribution. I'll take a look later this week.

Copy link
Contributor Author

@RocMarshal RocMarshal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @davidradl for the review.
I updated the related lines based on your comments.
PTAL ~

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Jan 21, 2026
@RocMarshal RocMarshal requested a review from davidradl January 22, 2026 12:36
public void broadcastEmit(T record) throws IOException {
checkErroneous();

// Emitting to all channels in a for loop can be better than calling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you say can be better, when is it not?
I am curious what the overhead is for ResultPartitionWriter#broadcastRecord as we are in a method called broadcastEmit so I was expecting a broadcast implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @davidradl As the comments lines before the method.
/** Copy from {@link ChannelSelectorRecordWriter#broadcastEmit}. */.

I just kept the original lines as most as possible.

bytesPerPartition.put(1, 3L);
bytesPerPartition.put(2, 1L);
assertThat(adaptiveLoadBasedRecordWriter.getTheIdlestChannelIndex()).isEqualTo(2);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at test coverage, could we have a test for

  • the zero bytes case
  • setting different maxTraverseSizes in the config.
  • flush all vs not flush all

Copy link
Contributor Author

@RocMarshal RocMarshal Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the zero bytes case
setting different maxTraverseSizes in the config.

Sounds good to me .update.

flush all vs not flush all

The test case is ignored to avoid redundant testing, Because the current change is not about the flush performance. It's about the channels selection before flushing.
So, I introduced the benchmark testing in the related sub-jira .

WDYTA?

}

@VisibleForTesting
int getTheIdlestChannelIndex() {
Copy link
Contributor

@davidradl davidradl Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest not having The in the method name as it is superfluous to the meaning. If you like you could call it getMostIdle; just a suggestion.

}

public RecordWriterBuilder<T> setMaxTraverseSize(int maxTraverseSize) {
Preconditions.checkArgument(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RYI: I reverted this change for testing purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants