Skip to content

Conversation

@coderzc
Copy link
Member

@coderzc coderzc commented Dec 2, 2025

Motivation

2025-10-29T12:55:33,009+0000 [pulsar-modular-load-manager-37-1] INFO  org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Successfully split namespace bundle cc/billing/0x80000000_0xc0000000
2025-10-29T12:55:33,009+0000 [pulsar-ordered-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.namespace.OwnedBundle - Disabling ownership: xx/billing/0x80000000_0x93c58e36
2025-10-29T12:55:33,009+0000 [pulsar-ordered-OrderedExecutor-0-0] INFO  org.eclipse.jetty.server.RequestLog - 10.247.8.118 - - [29/Oct/2025:12:55:32 +0000] ""PUT /admin/v2/namespaces/xx/billing/0x80000000_0xc0000000/split?unload=true HTTP/1.1"" 204 0 ""-"" ""Pulsar-Java-v4.0.4.1"" 20
2025-10-29T12:55:33,009+0000 [pulsar-ordered-OrderedExecutor-0-0] INFO  org.apache.pulsar.broker.admin.v2.Namespaces - [admin] Successfully split namespace bundle 0x80000000_0xc0000000


2025-10-29T12:55:33,008+0000 [pulsar-2-1] INFO  org.apache.pulsar.broker.service.BrokerService - [xx/billing] updating with Policies(auth_policies=AuthPoliciesImpl(namespaceAuthentication={ccuser=[produce, consume]}, topicAuthentication={persistent://xx/billing/sim-aggr-data={cdr=[produce]}}, subscriptionAuthentication={}), replication_clusters=[pulsar-sn-platform], allowed_clusters=[], bundles=BundlesDataImpl(boundaries=[0x00000000, 0x40000000, 0x80000000, 0xc0000000, 0xcd19337b, 0xd90f1766, 0xddc47076, 0xec49f116, 0xffffffff], numBundles=8), backlog_quota_map={}, clusterDispatchRate={}, topicDispatchRate={}, subscriptionDispatchRate={}, replicatorDispatchRate={}, clusterSubscribeRate={}, persistence=null, deduplicationEnabled=null, autoTopicCreationOverride=null, autoSubscriptionCreationOverride=null, publishMaxMessageRate={}, latency_stats_sample_rate={}, message_ttl_in_seconds=null, subscription_expiration_time_minutes=null, retention_policies=RetentionPolicies{retentionTimeInMinutes=0, retentionSizeInMB=0}, deleted=false, encryption_required=false, delayed_delivery_policies=null, inactive_topic_policies=null, subscription_auth_mode=None, max_producers_per_topic=null, max_consumers_per_topic=null, max_consumers_per_subscription=null, max_unacked_messages_per_consumer=null, max_unacked_messages_per_subscription=null, max_subscriptions_per_topic=null, compaction_threshold=null, offload_threshold=107374182400, offload_threshold_in_seconds=432000, offload_deletion_lag_ms=null, max_topics_per_namespace=null, schema_auto_update_compatibility_strategy=AlwaysCompatible, schema_compatibility_strategy=UNDEFINED, is_allow_auto_update_schema=true, schema_validation_enforced=false, offload_policies=OffloadPoliciesImpl(offloadersDirectory=./offloaders, managedLedgerOffloadDriver=null, managedLedgerOffloadMaxThreads=2, managedLedgerOffloadPrefetchRounds=1, managedLedgerOffloadThresholdInSeconds=432000, managedLedgerOffloadThresholdInBytes=107374182400, managedLedgerOffloadDeletionLagInMillis=null, managedLedgerOffloadedReadPriority=tiered-storage-first, managedLedgerExtraConfigurations={}, s3ManagedLedgerOffloadRegion=null, s3ManagedLedgerOffloadBucket=null, s3ManagedLedgerOffloadServiceEndpoint=null, s3ManagedLedgerOffloadMaxBlockSizeInBytes=67108864, s3ManagedLedgerOffloadReadBufferSizeInBytes=1048576, s3ManagedLedgerOffloadCredentialId=null, s3ManagedLedgerOffloadCredentialSecret=******** s3ManagedLedgerOffloadRole=null, s3ManagedLedgerOffloadRoleSessionName=pulsar-s3-offload, gcsManagedLedgerOffloadRegion=null, gcsManagedLedgerOffloadBucket=null, gcsManagedLedgerOffloadMaxBlockSizeInBytes=134217728, gcsManagedLedgerOffloadReadBufferSizeInBytes=1048576, gcsManagedLedgerOffloadServiceAccountKeyFile=null, fileSystemProfilePath=null, fileSystemURI=null, managedLedgerOffloadBucket=null, managedLedgerOffloadRegion=null, managedLedgerOffloadServiceEndpoint=null, managedLedgerOffloadMaxBlockSizeInBytes=null, managedLedgerOffloadReadBufferSizeInBytes=null), deduplicationSnapshotIntervalSeconds=null, subscription_types_enabled=[], properties={}, resource_group_name=null, migrated=false, dispatcherPauseOnAckStatePersistentEnabled=null, entryFilters=null)
2025-10-29T12:55:33,008+0000 [configuration-metadata-store-13-1] INFO  org.apache.pulsar.broker.resourcegroup.ResourceGroupNamespaceConfigListener - Metadata store notification: Path /admin/policies/xx/billing, Type Modified
2025-10-29T12:55:33,008+0000 [pulsar-2-4] INFO  org.apache.pulsar.broker.service.BrokerService - [xx/billing] updating with LocalPolicies(bundles=BundlesDataImpl(boundaries=[0x00000000, 0x40000000, 0x80000000, 0x93c58e36, 0xc0000000, 0xcd19337b, 0xd90f1766, 0xddc47076, 0xec49f116, 0xffffffff], numBundles=9), bookieAffinityGroup=null, namespaceAntiAffinityGroup=null, migrated=false)
2025-10-29T12:55:33,008+0000 [pulsar-2-3] INFO  org.apache.pulsar.broker.service.BrokerService - [xx/billing] updating with LocalPolicies(bundles=BundlesDataImpl(boundaries=[0x00000000, 0x40000000, 0x80000000, 0x93c58e36, 0xc0000000, 0xcd19337b, 0xd90f1766, 0xddc47076, 0xec49f116, 0xffffffff], numBundles=9), bookieAffinityGroup=null, namespaceAntiAffinityGroup=null, migrated=false)

...

2025-10-29T12:55:50,341+0000 [pulsar-web-46-6] INFO  org.apache.pulsar.broker.admin.impl.NamespacesBase - [admin] Split namespace bundle xx/billing/0x80000000_0xc0000000
2025-10-29T12:55:50,339+0000 [pulsar-modular-load-manager-37-1] INFO  org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Load-manager splitting bundle xx/billing/0x80000000_0xc0000000 and unloading true
2025-10-29T12:55:50,334+0000 [pulsar-modular-load-manager-37-1] INFO  org.apache.pulsar.metadata.impl.AbstractMetadataStore - Deleting path: /loadbalance/bundle-data/xx/billing/0x80000000_0xc0000000 (v. Optional.empty)
...
2025-10-29T12:56:32,969+0000 [pulsar-load-manager-1-1] INFO  org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl - Writing local data to metadata store because maximum change 1788.8082864051726% exceeded threshold 10%; time since last report written is 59.999 seconds
Caused by: javax.ws.rs.ClientErrorException: HTTP 412 {""reason"":""Invalid upper boundary for bundle xx/billing/0x80000000_0xc0000000. Expected upper boundary of xx/billing/0x80000000_0x93c58e36""}
...
org.apache.pulsar.client.admin.PulsarAdminException$PreconditionFailedException: Invalid upper boundary for bundle xx/billing/0x93c58e36_0xc0000000. Expected upper boundary of xx/billing/0x93c58e36_0xab7838eb
	at org.apache.pulsar.client.admin.PulsarAdminException.wrap(PulsarAdminException.java:252) ~[io.streamnative-pulsar-client-admin-api-4.0.4.1.jar:4.0.4.1]
	at org.apache.pulsar.client.admin.internal.BaseResource.sync(BaseResource.java:352) ~[io.streamnative-pulsar-client-admin-original-4.0.4.1.jar:4.0.4.1]
	at org.apache.pulsar.client.admin.internal.NamespacesImpl.splitNamespaceBundle(NamespacesImpl.java:877) ~[io.streamnative-pulsar-client-admin-original-4.0.4.1.jar:4.0.4.1]
	at org.apache.pulsar.client.admin.internal.NamespacesImpl.splitNamespaceBundle(NamespacesImpl.java:864) ~[io.streamnative-pulsar-client-admin-original-4.0.4.1.jar:4.0.4.1]
	at org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl.checkNamespaceBundleSplit(ModularLoadManagerImpl.java:776) ~[io.streamnative-pulsar-broker-4.0.4.1.jar:4.0.4.1]
	at org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl.updateAll(ModularLoadManagerImpl.java:476) ~[io.streamnative-pulsar-broker-4.0.4.1.jar:4.0.4.1]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.119.Final.jar:4.1.119.Final]
	at java.base/java.lang.Thread.run(Unknown Source) [?:?]

As you can see from the log, after 0x80000000_0xc0000000 was successfully split, it was split again for the second time calling the updateAll() method and failed to split old bundle since bundle been deleted.

Modifications

Skip split the bundles that do not exist in NamespaceBundles

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@coderzc coderzc changed the title [fix][load] Avoid get old loadData during split bundle [fix][broker] Avoid get old loadData during split bundle Dec 2, 2025
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

@coderzc Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot added doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. and removed doc-label-missing labels Dec 2, 2025
@coderzc coderzc closed this Dec 2, 2025
@coderzc coderzc reopened this Dec 2, 2025
@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. labels Dec 5, 2025
@coderzc coderzc force-pushed the fix_split_old_bundle branch 3 times, most recently from e2b577f to 2da2096 Compare December 9, 2025 11:30
@coderzc coderzc force-pushed the fix_split_old_bundle branch from 2da2096 to d582d6b Compare December 9, 2025 11:35
@codecov-commenter
Copy link

codecov-commenter commented Dec 9, 2025

Codecov Report

❌ Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.47%. Comparing base (a8b41b9) to head (ef61623).
⚠️ Report is 32 commits behind head on master.

Files with missing lines Patch % Lines
...roker/loadbalance/impl/ModularLoadManagerImpl.java 80.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #25031      +/-   ##
============================================
+ Coverage     74.35%   74.47%   +0.11%     
+ Complexity    34102    34037      -65     
============================================
  Files          1920     1899      -21     
  Lines        150313   149641     -672     
  Branches      17459    17397      -62     
============================================
- Hits         111771   111448     -323     
+ Misses        29635    29320     -315     
+ Partials       8907     8873      -34     
Flag Coverage Δ
inttests 26.38% <0.00%> (+0.18%) ⬆️
systests 23.01% <0.00%> (+0.16%) ⬆️
unittests 74.01% <80.00%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../apache/pulsar/common/naming/NamespaceBundles.java 80.48% <ø> (ø)
...roker/loadbalance/impl/ModularLoadManagerImpl.java 84.40% <80.00%> (-0.13%) ⬇️

... and 128 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderzc coderzc changed the title [fix][broker] Avoid get old loadData during split bundle [fix][broker] Avoid split non-existent bundle Dec 10, 2025
return future;
}

public boolean checkBundleDataExistInNamespaceBundles(NamespaceBundles namespaceBundles, String bundleRange) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderzc What is the difference between the behavior here and org.apache.pulsar.common.naming.NamespaceBundles#validateBundle?

By the way, this should be a private method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior is almost the same, but validateBundle directly throws an exception instead of returning a result. I modified it to call validateBundle in checkBundleDataExistInNamespaceBundles.

@coderzc coderzc force-pushed the fix_split_old_bundle branch from f4ddc6f to ef61623 Compare December 17, 2025 05:15
}
} finally {
lock.unlock();
// lock.unlock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be modified. Small problem.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a race condition where namespace bundles could be split multiple times, causing failures on subsequent split attempts after the bundle has already been removed. The fix adds validation to check if a bundle still exists in the current NamespaceBundles before attempting to split it.

Key Changes:

  • Added validation logic to skip splitting bundles that no longer exist in NamespaceBundles
  • Changed validateBundle exception signature from generic Exception to more specific IllegalArgumentException
  • Added test case to verify repeated split attempts don't cause errors

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
ModularLoadManagerImpl.java Added checkBundleDataExistInNamespaceBundles method to validate bundle existence before split; commented out lock statements in writeBrokerDataOnZooKeeper
NamespaceBundles.java Changed validateBundle to throw IllegalArgumentException instead of generic Exception for better type safety
ModularLoadManagerImplTest.java Added testRepeatSplitBundle test to verify the fix for repeated bundle split attempts

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1159 to +1160
pulsarClient.newConsumer().topic(topicNameI)
.subscriptionName("my-subscriber-name2").subscribe();
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test creates consumers but never closes them, which can lead to resource leaks. Each consumer created in the loop should be closed after use, or stored in a list and closed in a cleanup step.

Copilot uses AI. Check for mistakes.
Comment on lines +1173 to +1176
primaryLoadManager.updateAll();

primaryLoadManager.updateAll();
Assert.assertFalse(loadData.getBundleData().containsKey(bundleKey));
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name 'testRepeatSplitBundle' and the setup with updateAll() called twice suggests this test should verify that repeated splits don't cause errors. However, the test only asserts that the bundle data was removed (line 1176), but doesn't verify that no exceptions were thrown during the second updateAll() call. Consider adding assertions to verify that both updateAll() calls complete successfully without throwing the PreconditionFailedException mentioned in the PR description.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants