Skip to content

Conversation

@junegunn
Copy link
Member

@junegunn junegunn commented Dec 29, 2025

This pull request is for exploring possible fixes for HBASE-29144.

  • The first commit adds a test class that currently fails and reproduces the issue.
  • The second commit applies the solution suggested by @NihalJain, which makes the test pass.

However, I'm concerned about mutating a configuration object that could be reused in unintended ways later (e.g., being cached by a singleton, as in this case). So I also explored an alternative approach.

  • The third commit reverts the previous fix, and the fourth commit removes the noAuthConf. And as expected, the test fails again.
  • The final commit introduces a scheme, where we avoid using SASL when the default cluster ID is specified, effectively preserving the original behavior without relying on noAuthConf.

@junegunn junegunn self-assigned this Dec 29, 2025
@Apache9
Copy link
Contributor

Apache9 commented Dec 29, 2025

As I said on the jira issue, we need to find out why we need to use a singleton instance...

This PR just reverts the code change in HBASE-25051...

@Apache-HBase

This comment has been minimized.

Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed contacting masters after 1 attempts.
Exceptions:
java.io.IOException: Call to address=192.168.35.34:57912 failed on local exception: java.io.IOException: Authentication provider class org.apache.hadoop.hbase.security.provider.SimpleSaslClientAuthenticationProvider returned a null SaslClient
…Jain"

This reverts commit 8b46519ca6dbc2a05a5b083e182383a070763daa.
@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@junegunn
Copy link
Member Author

@Apache9 Hi, thanks for taking a look.

This PR just reverts the code change in HBASE-25051...

I don't think this accurately describes the patch. Could you please take another look? This change does not revert what was done in HBASE-25051. Instead, it only changes how we instruct the RPC connection for obtaining the cluster ID to not use SASL. Previously, we used to achieve that by passing a modified Configuration object. With this patch, we explicitly choose not to use SASL when a valid cluster ID is not provided as a parameter. To my understanding, this only happens when obtaining the cluster ID.

After all, this change doesn't break any existing tests.

That said, I understand that you may want to take a different approach. However, I still believe it's preferable to avoid mutating the configuration object, as doing so can lead to unintended side effects elsewhere in the codebase.

@Apache9
Copy link
Contributor

Apache9 commented Dec 29, 2025

We do not want any authentications when fetching cluster id, so we create a new configuration instance and use it for the rpc connnection, this is a valid solution and should not have any side effect, as I do not modify the existing configuration object right? If this is not a valid usage, then how do our users try connecting to different hbase clusters in a single client process, where one cluster enables sasl, the other does not?

As I said on the jira issue, the problem here is we should not have a singleton SaslProviders, as its intialization needs a Configuration object, but we allow our users to initialize different rpc connections with different Configurations right?

This is what we need to fix here, I think.

Thanks.

@junegunn
Copy link
Member Author

how do our users try connecting to different hbase clusters in a single client process, where one cluster enables sasl, the other does not?

You make a valid point. I initially thought that not modifying the original Configuration would free us from having to worry about how it's used downstream. But on second thought, that would only mask fundamental bugs like this, which can't really be considered a good thing.

@Apache-HBase

This comment has been minimized.

@junegunn junegunn marked this pull request as draft January 1, 2026 03:28
@junegunn junegunn marked this pull request as ready for review January 5, 2026 01:24
@junegunn
Copy link
Member Author

junegunn commented Jan 5, 2026

In the last commit, I extended the interface to accept a Configuration object and use it when selecting an AuthenticationProvider. This new method comes with a default implementation which uses the cached object as before, so that the existing implementations doesn't need to be updated.

Alternatively, we could just remove the "singleton-ness" of SaslClientAuthenticationProviders. However, it is possible that existing custom AuthenticationProviderSelectors might rely on this single-instantiation behavior and perform costly operations in their configure method.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@junegunn
Copy link
Member Author

junegunn commented Jan 5, 2026

org.apache.hadoop.hbase.security.provider.TestCustomSaslAuthenticationProviderNettyRpcServer

The test failures are related. This is because the test-defined class InMemoryProviderSelector extends BuiltInProviderSelector but only overrides the legacy version of the method that does not take a Configuration parameter. So in ecc9c1c, I updated the code to retry using the legacy method signature. This approach is not ideal, but it preserves backward compatibility.

@Apache9
Copy link
Contributor

Apache9 commented Jan 5, 2026

Seems you misunderstand my point above...

I mean we should just remove the getInstance method, initialize Providers everytime when creating a new RpcClient...

Or at least, we should not use getInstance in our code base, but keep the method there for compatibility since the class is IA.LimitedPrivate.

@Apache-HBase

This comment has been minimized.

@junegunn
Copy link
Member Author

junegunn commented Jan 5, 2026

I mean we should just remove the getInstance method

I explained why I didn't take the approach above.

However, it is possible that existing custom AuthenticationProviderSelectors might rely on this single-instantiation behavior and perform costly operations in their configure method.

If I'm not mistaken, the motivation behind HBASE-23347 was to allow users to implement their own AuthenticationProviderSelector. My concern was that some implementations out there may rely on the documented "exactly once per implementation" property, assume that configure is called only once, and do some heavy stuff in there.

/**
* Initializes the implementation with configuration and a set of providers available. This method
* should be called exactly once per implementation prior to calling
* {@link #selectProvider(String, User)}.
*/
void configure(Configuration conf,

That said, I can't decide if we should be concerned about the possibility.

@Apache9
Copy link
Contributor

Apache9 commented Jan 5, 2026

I mean we should just remove the getInstance method

I explained why I didn't take the approach above.

However, it is possible that existing custom AuthenticationProviderSelectors might rely on this single-instantiation behavior and perform costly operations in their configure method.

If I'm not mistaken, the motivation behind HBASE-23347 was to allow users to implement their own AuthenticationProviderSelector. My concern was that some implementations out there may rely on the documented "exactly once per implementation" property, assume that configure is called only once, and do some heavy stuff in there.

/**
* Initializes the implementation with configuration and a set of providers available. This method
* should be called exactly once per implementation prior to calling
* {@link #selectProvider(String, User)}.
*/
void configure(Configuration conf,

That said, I can't decide if we should be concerned about the possibility.

If user just initialize one RpcClient in their program, then removing the singleton pattern does not break anything, we still only initialize the provider once.

And the old behavior is incorrect if users choose to connect to different hbase clusters in a single process, so it does not make sense to keep the old behavior and try to fix only one of the possible problems with a very hacky way and leave lots of potential problems out there...

I prefer we just remove the singleton pattern, add release note and update the ref guide about this behavior change, and make new releases.

About the test, please extend TestBasicReadWriteWithDifferentConnectionRegistries instead of TestRpcConnectionRegistry, as TestRpcConnectionRegistry does not have any table read write requests, it just tests connection registry APIs...

@junegunn
Copy link
Member Author

junegunn commented Jan 5, 2026

If user just initialize one RpcClient in their program, then removing the singleton pattern does not break anything, we still only initialize the provider once.

We currently use SaslClientAuthenticationProviders in RpcConnection, so the configure will be called for each region server, right? I mean, if the user has a custom implementation of configure that takes 10 seconds, they will experience the delay multiple times. Or even worse, their configure might not be idempotent and break if called multiple times. That's what I was worried about, but we can move the instantiation to the RpcClient layer.

@junegunn
Copy link
Member Author

junegunn commented Jan 5, 2026

Please let me know if the latest commits align with your vision.

@Apache-HBase

This comment has been minimized.

@Apache9
Copy link
Contributor

Apache9 commented Jan 5, 2026

@junegunn See #7588 , mostly same with the approach here. We could retain the getInstance method and mark it as deprecated to notice users that you should not use it any more.

And on the tests, you need to call providers.reset otherwise the test can pass without any fix...

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 28s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+0 🆗 mvndep 0m 16s Maven dependency ordering for branch
+1 💚 mvninstall 3m 31s master passed
+1 💚 compile 4m 59s master passed
+1 💚 checkstyle 1m 32s master passed
+1 💚 spotbugs 2m 57s master passed
+1 💚 spotless 0m 53s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 13s Maven dependency ordering for patch
-1 ❌ mvninstall 1m 45s /patch-mvninstall-root.txt root in the patch failed.
-1 ❌ compile 1m 50s /patch-compile-hbase-server.txt hbase-server in the patch failed.
-0 ⚠️ javac 1m 50s /patch-compile-hbase-server.txt hbase-server in the patch failed.
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 27s the patch passed
-1 ❌ spotbugs 0m 49s /patch-spotbugs-hbase-server.txt hbase-server in the patch failed.
-1 ❌ hadoopcheck 2m 6s The patch causes 36 errors with Hadoop v3.3.6.
-1 ❌ hadoopcheck 4m 12s The patch causes 36 errors with Hadoop v3.4.1.
+1 💚 spotless 0m 45s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 23s The patch does not generate ASF License warnings.
30m 51s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7580/5/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7580
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 131d7139fa29 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 181b4ac
Default Java Eclipse Adoptium-17.0.11+9
hadoopcheck https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7580/5/artifact/yetus-general-check/output/patch-javac-3.3.6.txt
hadoopcheck https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7580/5/artifact/yetus-general-check/output/patch-javac-3.4.1.txt
Max. process+thread count 84 (vs. ulimit of 30000)
modules C: hbase-client hbase-server hbase-mapreduce U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7580/5/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 28s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+0 🆗 mvndep 0m 32s Maven dependency ordering for branch
+1 💚 mvninstall 3m 36s master passed
+1 💚 compile 1m 43s master passed
+1 💚 javadoc 1m 1s master passed
+1 💚 shadedjars 6m 21s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 14s Maven dependency ordering for patch
-1 ❌ mvninstall 1m 45s /patch-mvninstall-root.txt root in the patch failed.
-1 ❌ compile 0m 58s /patch-compile-hbase-server.txt hbase-server in the patch failed.
-0 ⚠️ javac 0m 58s /patch-compile-hbase-server.txt hbase-server in the patch failed.
+1 💚 javadoc 1m 0s the patch passed
-1 ❌ shadedjars 4m 46s patch has 36 errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 1m 32s hbase-client in the patch passed.
-1 ❌ unit 0m 58s /patch-unit-hbase-server.txt hbase-server in the patch failed.
+1 💚 unit 24m 5s hbase-mapreduce in the patch passed.
51m 26s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7580/5/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7580
Optional Tests javac javadoc unit compile shadedjars
uname Linux 71677b740b59 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 181b4ac
Default Java Eclipse Adoptium-17.0.11+9
shadedjars https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7580/5/artifact/yetus-jdk17-hadoop3-check/output/patch-shadedjars.txt
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7580/5/testReport/
Max. process+thread count 3314 (vs. ulimit of 30000)
modules C: hbase-client hbase-server hbase-mapreduce U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7580/5/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@junegunn
Copy link
Member Author

junegunn commented Jan 5, 2026

And on the tests, you need to call providers.reset otherwise the test can pass without any fix...

Yeah, I did that in the first commit, but I later removed it in the second commit as it became pointless.

We could retain the getInstance method and mark it as deprecated to notice users that you should not use it any more.

I also considered that, but I decided it was unnecessary because SaslClientAuthenticationProviders is mostly an internal class and I couldn't imagine users directly calling the methods of that class.

Anyway, I'll fine with you taking over from here. I think we just need to mention that AuthenticationProvider is going to created for "each connection" (which should be acceptable in 99.9% of cases) in the release note.

@junegunn junegunn closed this Jan 5, 2026
@Apache9
Copy link
Contributor

Apache9 commented Jan 5, 2026

And on the tests, you need to call providers.reset otherwise the test can pass without any fix...

Yeah, I did that in the first commit, but I later removed it in the second commit as it became pointless.

We could retain the getInstance method and mark it as deprecated to notice users that you should not use it any more.

I also considered that, but I decided it was unnecessary because SaslClientAuthenticationProviders is mostly an internal class and I couldn't imagine users directly calling the methods of that class.

Anyway, I'll fine with you taking over from here. I think we just need to mention that AuthenticationProvider is going to created for "each connection" (which should be acceptable in 99.9% of cases) in the release note.

It is a Connection instance per HBase cluster, not a RpcConnection per region server, which should be OK I assume :)

@junegunn
Copy link
Member Author

junegunn commented Jan 5, 2026

Yeah, that's what I meant by "connection" from the user's point of view. We still should mention the change though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants