server: validate advertise-address is bound locally at startup and lease renewal#69561
server: validate advertise-address is bound locally at startup and lease renewal#69561sunbains wants to merge 1 commit into
Conversation
…ase renewal Add validation to ensure the advertise-address parameter is bound to a local network interface. This prevents TiDB from registering stale entries in etcd when the host IP address changes. Changes: - Add IsIPBoundLocally() utility function to check if an IP is bound locally - Fail server startup if advertise-address is not bound to any local interface - Validate advertise-address before etcd session renewal in owner manager - Validate advertise-address before server info session renewal - Add comprehensive unit tests for IP validation logic The runtime validation during lease renewal is the key improvement - when the host IP changes, TiDB will refuse to renew its etcd lease, allowing stale entries to naturally expire instead of being kept alive indefinitely.
|
Hi @sunbains. Thanks for your PR. I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Welcome @sunbains! |
|
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughAdds an ChangesAdvertise-address local binding validation
Estimated code review effort: 2 (Simple) | ~15 minutes Sequence Diagram(s)sequenceDiagram
participant OwnerManager as "Owner Manager"
participant ServerInfoSyncer as "Server Info Syncer"
participant Config as "Global Config"
participant IsIPBoundLocally as "util.IsIPBoundLocally"
participant EtcdSession as "Etcd Session"
OwnerManager->>Config: read AdvertiseAddress
OwnerManager->>IsIPBoundLocally: check(AdvertiseAddress)
IsIPBoundLocally-->>OwnerManager: bound true/false
alt not bound
OwnerManager-->>OwnerManager: log warning, return error
else bound
OwnerManager->>EtcdSession: closeSession + renew
end
ServerInfoSyncer->>Config: read AdvertiseAddress
ServerInfoSyncer->>IsIPBoundLocally: check(AdvertiseAddress)
IsIPBoundLocally-->>ServerInfoSyncer: bound true/false
alt not bound
ServerInfoSyncer-->>ServerInfoSyncer: log warning, return error
else bound
ServerInfoSyncer->>EtcdSession: create session, store server info
end
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the 📖 For more info, you can check the "Contribute Code" section in the development guide. Notice: To remove the For example:
📖 For more info, you can check the "Contribute Code" section in the development guide. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
pkg/util/misc_test.go (1)
195-202: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winConsider adding a case for non-canonical bind-all forms.
E.g.
IsIPBoundLocally("0:0:0:0:0:0:0:0")orIsIPBoundLocally("::0")— these are semantically the same "bind all" address but currently bypass the string-equality special case inIsIPBoundLocally(see comment onpkg/util/misc.go).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/util/misc_test.go` around lines 195 - 202, The IsIPBoundLocally test coverage only checks the canonical bind-all strings, so add a case in misc_test for non-canonical bind-all IPv6 forms like 0:0:0:0:0:0:0:0 or ::0. Update the tests around IsIPBoundLocally to assert these are treated the same as 0.0.0.0 and ::, matching the special-case behavior in pkg/util/misc.go.cmd/tidb-server/main.go (1)
732-742: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winRedundant wildcard pre-check duplicated across three call sites.
IsIPBoundLocallyalready special-cases"0.0.0.0"/"::"internally, so this outercfg.AdvertiseAddress != "0.0.0.0" && cfg.AdvertiseAddress != "::"guard is redundant here — and it's copy-pasted identically inpkg/owner/manager.goandpkg/domain/serverinfo/syncer.go. Simplify toif len(cfg.AdvertiseAddress) > 0 && !util.IsIPBoundLocally(cfg.AdvertiseAddress) { ... }in all three places once the helper reliably normalizes bind-all addresses.Note: the startup-failure behavior of this block is also affected by the local-binding assumption raised on
pkg/util/misc.go(Docker/NAT deployments).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/tidb-server/main.go` around lines 732 - 742, The wildcard address pre-check is duplicated and unnecessary because IsIPBoundLocally already handles "0.0.0.0" and "::" internally. Remove the extra cfg.AdvertiseAddress comparisons in the advertise-address validation block and keep only the len(cfg.AdvertiseAddress) > 0 check before calling util.IsIPBoundLocally. Apply the same simplification at the matching checks in pkg/owner/manager.go and pkg/domain/serverinfo/syncer.go so all three call sites rely on the helper consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/util/misc.go`:
- Around line 557-560: The bind-all check in IsIPBoundLocally is using raw
string comparisons for "0.0.0.0" and "::", so equivalent IPv6 wildcard forms can
be missed. Update the logic in misc.go’s IsIPBoundLocally to parse ipAddr first
and use the parsed IP’s IsUnspecified() check before any interface/local-address
comparison, so all wildcard representations are handled consistently and callers
don’t need to duplicate this special-case check.
- Around line 556-589: Keep IsIPBoundLocally from requiring the
advertise-address to exist on a current network interface; instead, adjust the
local-IP check to compare against the originally resolved local IP rather than
iterating net.InterfaceAddrs in pkg/util/misc.go. Update the call sites that
rely on this guard in startup and renewal flows (cmd/tidb-server/main.go,
pkg/owner/manager.go, pkg/domain/serverinfo/syncer.go) so Docker/NAT/Kubernetes
deployments with reachable but non-local advertised IPs do not fail. Preserve
the special-case handling for bind-all addresses and invalid IP parsing.
---
Nitpick comments:
In `@cmd/tidb-server/main.go`:
- Around line 732-742: The wildcard address pre-check is duplicated and
unnecessary because IsIPBoundLocally already handles "0.0.0.0" and "::"
internally. Remove the extra cfg.AdvertiseAddress comparisons in the
advertise-address validation block and keep only the len(cfg.AdvertiseAddress) >
0 check before calling util.IsIPBoundLocally. Apply the same simplification at
the matching checks in pkg/owner/manager.go and pkg/domain/serverinfo/syncer.go
so all three call sites rely on the helper consistently.
In `@pkg/util/misc_test.go`:
- Around line 195-202: The IsIPBoundLocally test coverage only checks the
canonical bind-all strings, so add a case in misc_test for non-canonical
bind-all IPv6 forms like 0:0:0:0:0:0:0:0 or ::0. Update the tests around
IsIPBoundLocally to assert these are treated the same as 0.0.0.0 and ::,
matching the special-case behavior in pkg/util/misc.go.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 6b469802-c3b3-4133-a3b4-31642a8d0c85
📒 Files selected for processing (5)
cmd/tidb-server/main.gopkg/domain/serverinfo/syncer.gopkg/owner/manager.gopkg/util/misc.gopkg/util/misc_test.go
| func IsIPBoundLocally(ipAddr string) bool { | ||
| // Handle bind-all addresses | ||
| if ipAddr == "0.0.0.0" || ipAddr == "::" { | ||
| return true | ||
| } | ||
|
|
||
| // Parse the input IP | ||
| targetIP := net.ParseIP(ipAddr) | ||
| if targetIP == nil { | ||
| // Invalid IP format | ||
| return false | ||
| } | ||
|
|
||
| // Get all network interfaces | ||
| addrs, err := net.InterfaceAddrs() | ||
| if err != nil { | ||
| logutil.BgLogger().Warn("failed to get network interfaces", zap.Error(err)) | ||
| return false | ||
| } | ||
|
|
||
| // Check each interface address | ||
| for _, addr := range addrs { | ||
| ipnet, ok := addr.(*net.IPNet) | ||
| if !ok { | ||
| continue | ||
| } | ||
| // Compare the IP addresses | ||
| if ipnet.IP.Equal(targetIP) { | ||
| return true | ||
| } | ||
| } | ||
|
|
||
| return false | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🔴 Critical | 🏗️ Heavy lift
🧩 Analysis chain
🌐 Web query:
Does TiDB tidb-server support running behind Docker/NAT where advertise-address differs from any local interface IP?
💡 Result:
Yes, TiDB (specifically the tidb-server component) supports running behind Docker or NAT where the advertise-address differs from any local interface IP [1][2][3]. The advertise-address is intended specifically to be the address that external clients and other cluster components use to reach the TiDB server, rather than the address the server binds to locally [4][5]. Key technical points: 1. Separation of Concerns: You configure host (or -host) to tell tidb-server which local interface to bind to (e.g., 0.0.0.0 for all interfaces or a specific container-internal IP) [4][3]. The advertise-address is the value registered in the cluster or provided to clients for connection purposes [1][5]. 2. NAT/Docker Scenarios: In Docker or NAT environments, you can bind to the local container IP (or 0.0.0.0) while setting advertise-address to the host's public/reachable IP and port [1][3]. This allows the server to correctly advertise its accessible endpoint even though it does not technically "own" that IP on its own network interfaces [1][4]. 3. Requirement: The address provided to advertise-address must be reachable by the rest of the TiDB cluster and the external clients [1][2]. This behavior is standard practice for distributed components in TiDB (similar to TiKV's configuration patterns) to facilitate deployment in containerized and cloud environments [6].
Citations:
- 1: https://docs.pingcap.com/tidb/v6.1/command-line-flags-for-tidb-configuration/
- 2: https://docs.pingcap.com/tidb/v8.5/command-line-flags-for-tidb-configuration
- 3: https://docs.pingcap.com/tidb/stable/command-line-flags-for-tidb-configuration/
- 4: start with multi-ip addresses
--advertise-address=192.168.7.210 xxx.xxx.18.210couldn't connect server success #21448 - 5: Add
advertise-addressto config #7078 - 6: https://docs.pingcap.com/tidb/stable/command-line-flags-for-tikv-configuration/
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Map the function and its callers.
printf '\n== pkg/util/misc.go outline ==\n'
ast-grep outline pkg/util/misc.go --view expanded | sed -n '1,220p'
printf '\n== main.go occurrences ==\n'
rg -n "IsIPBoundLocally|advertise-address|MustNil|host" cmd/tidb-server/main.go pkg/owner/manager.go pkg/domain/serverinfo/syncer.go pkg/util/misc.go
printf '\n== relevant file line counts ==\n'
wc -l pkg/util/misc.go cmd/tidb-server/main.go pkg/owner/manager.go pkg/domain/serverinfo/syncer.goRepository: pingcap/tidb
Length of output: 7896
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Read only the relevant slices around the callers and the function.
for f in pkg/util/misc.go cmd/tidb-server/main.go pkg/owner/manager.go pkg/domain/serverinfo/syncer.go; do
echo "===== $f ====="
case "$f" in
pkg/util/misc.go) sed -n '520,620p' "$f" ;;
cmd/tidb-server/main.go) sed -n '1,220p' "$f" ;;
pkg/owner/manager.go) sed -n '1,260p' "$f" ;;
pkg/domain/serverinfo/syncer.go) sed -n '1,260p' "$f" ;;
esac
echo
doneRepository: pingcap/tidb
Length of output: 28000
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Read the startup validation and the exact callers around the renew path.
sed -n '700,760p' cmd/tidb-server/main.go
printf '\n===== owner renew path =====\n'
sed -n '430,470p' pkg/owner/manager.go
printf '\n===== serverinfo renew path =====\n'
sed -n '105,130p' pkg/domain/serverinfo/syncer.goRepository: pingcap/tidb
Length of output: 5007
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
import socket
tests = ["0.0.0.0", "::", "::0", "0:0:0:0:0:0:0:0", "127.0.0.1", "2001:db8::1"]
for t in tests:
for fam, name in [(socket.AF_INET, "v4"), (socket.AF_INET6, "v6")]:
try:
packed = socket.inet_pton(fam, t)
print(f"{t!r} -> {name} len={len(packed)} hex={packed.hex()}")
except OSError:
pass
PYRepository: pingcap/tidb
Length of output: 465
Keep advertise-address from requiring a local interface.
This guard runs in startup and session-renewal paths (cmd/tidb-server/main.go, pkg/owner/manager.go, pkg/domain/serverinfo/syncer.go), so any Docker/NAT/Kubernetes setup that advertises a reachable host/LB IP not present on the container’s interfaces will fail to start or keep renewing. If the goal is to detect host-IP changes, compare against the originally resolved local IP instead of enforcing universal interface membership.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/util/misc.go` around lines 556 - 589, Keep IsIPBoundLocally from
requiring the advertise-address to exist on a current network interface;
instead, adjust the local-IP check to compare against the originally resolved
local IP rather than iterating net.InterfaceAddrs in pkg/util/misc.go. Update
the call sites that rely on this guard in startup and renewal flows
(cmd/tidb-server/main.go, pkg/owner/manager.go, pkg/domain/serverinfo/syncer.go)
so Docker/NAT/Kubernetes deployments with reachable but non-local advertised IPs
do not fail. Preserve the special-case handling for bind-all addresses and
invalid IP parsing.
| // Handle bind-all addresses | ||
| if ipAddr == "0.0.0.0" || ipAddr == "::" { | ||
| return true | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Bind-all detection misses non-canonical IPv6 representations.
The bind-all check is a raw string comparison against "0.0.0.0"/"::". Equivalent but differently-formatted addresses (e.g. "0:0:0:0:0:0:0:0", "::0", "0000:0000:0000:0000:0000:0000:0000:0000") will bypass this special case, fall through to net.ParseIP/interface comparison, and be incorrectly rejected as "not bound locally" even though they mean the same address.
Parsing first and checking IsUnspecified() avoids this entirely and also removes the need for callers to duplicate this same wildcard string check before invoking IsIPBoundLocally.
🐛 Proposed fix
func IsIPBoundLocally(ipAddr string) bool {
- // Handle bind-all addresses
- if ipAddr == "0.0.0.0" || ipAddr == "::" {
- return true
- }
-
// Parse the input IP
targetIP := net.ParseIP(ipAddr)
if targetIP == nil {
// Invalid IP format
return false
}
+
+ // Handle bind-all addresses (covers all equivalent textual forms of 0.0.0.0/::)
+ if targetIP.IsUnspecified() {
+ return true
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Handle bind-all addresses | |
| if ipAddr == "0.0.0.0" || ipAddr == "::" { | |
| return true | |
| } | |
| // Parse the input IP | |
| targetIP := net.ParseIP(ipAddr) | |
| if targetIP == nil { | |
| // Invalid IP format | |
| return false | |
| } | |
| // Handle bind-all addresses (covers all equivalent textual forms of 0.0.0.0/::) | |
| if targetIP.IsUnspecified() { | |
| return true | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/util/misc.go` around lines 557 - 560, The bind-all check in
IsIPBoundLocally is using raw string comparisons for "0.0.0.0" and "::", so
equivalent IPv6 wildcard forms can be missed. Update the logic in misc.go’s
IsIPBoundLocally to parse ipAddr first and use the parsed IP’s IsUnspecified()
check before any interface/local-address comparison, so all wildcard
representations are handled consistently and callers don’t need to duplicate
this special-case check.
What problem does this PR solve?
TiDB can register stale entries in etcd when the
--advertise-addressparameter doesn't match the actual IP address bound to local network interfaces. This happens when:When this occurs, stale etcd entries persist indefinitely because TiDB keeps renewing the lease, even though the advertised IP is no longer valid.
What is changed and how it works?
This PR adds validation at two critical points:
1. Startup Validation
--advertise-addressis bound to a local network interface before server starts2. Runtime Validation During Lease Renewal
Implementation Details
IsIPBoundLocally()utility function inpkg/util/misc.go0.0.0.0,::)Check List
Tests:
Related changes
Release notes
Summary by CodeRabbit
Bug Fixes
Tests