Skip to content

server: validate advertise-address is bound locally at startup and lease renewal#69561

Open
sunbains wants to merge 1 commit into
pingcap:masterfrom
sunbains:validate-advertise-address
Open

server: validate advertise-address is bound locally at startup and lease renewal#69561
sunbains wants to merge 1 commit into
pingcap:masterfrom
sunbains:validate-advertise-address

Conversation

@sunbains

@sunbains sunbains commented Jul 1, 2026

Copy link
Copy Markdown

What problem does this PR solve?

TiDB can register stale entries in etcd when the --advertise-address parameter doesn't match the actual IP address bound to local network interfaces. This happens when:

  1. The advertised address is misconfigured at startup
  2. The host IP address changes after TiDB has started (e.g., DHCP reassignment, interface reconfiguration)

When this occurs, stale etcd entries persist indefinitely because TiDB keeps renewing the lease, even though the advertised IP is no longer valid.

What is changed and how it works?

This PR adds validation at two critical points:

1. Startup Validation

  • Validates that --advertise-address is bound to a local network interface before server starts
  • Fails fast with a clear error message if the IP is not bound locally
  • Helps catch configuration errors early

2. Runtime Validation During Lease Renewal

  • Checks if advertise-address is still bound locally before renewing etcd sessions
  • Applied to:
    • Owner manager session refresh (DDL owner, auto-ID service, etc.)
    • Server info syncer session refresh
  • If validation fails, lease renewal is refused and the etcd entry naturally expires

Implementation Details

  • Added IsIPBoundLocally() utility function in pkg/util/misc.go
  • Special handling for bind-all addresses (0.0.0.0, ::)
  • Clear error messages and warning logs when validation fails
  • Comprehensive unit tests covering various scenarios

Check List

Tests:

  • Unit test
  • Integration test
  • Manual test (TiUP playground)

Related changes

  • N/A

Release notes

server: Validate that advertise-address is bound to local network interfaces at startup and during lease renewal to prevent stale etcd entries

Summary by CodeRabbit

  • Bug Fixes

    • Added checks to prevent server sessions from renewing when the configured advertised IP is no longer available on the host.
    • Validates advertised addresses at startup and skips validation for wildcard bind addresses.
    • Helps avoid stale server information when a machine’s network address changes.
  • Tests

    • Added coverage for local IP detection across wildcard, loopback, invalid, and externally bound addresses.

…ase renewal

Add validation to ensure the advertise-address parameter is bound to a local
network interface. This prevents TiDB from registering stale entries in etcd
when the host IP address changes.

Changes:
- Add IsIPBoundLocally() utility function to check if an IP is bound locally
- Fail server startup if advertise-address is not bound to any local interface
- Validate advertise-address before etcd session renewal in owner manager
- Validate advertise-address before server info session renewal
- Add comprehensive unit tests for IP validation logic

The runtime validation during lease renewal is the key improvement - when the
host IP changes, TiDB will refuse to renew its etcd lease, allowing stale
entries to naturally expire instead of being kept alive indefinitely.
@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. contribution This PR is from a community contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Jul 1, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

Hi @sunbains. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot Bot added the first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. label Jul 1, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

Welcome @sunbains!

It looks like this is your first PR to pingcap/tidb 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!



Thank you, and welcome to pingcap/tidb. 😃

@pingcap-cla-assistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@ti-chi-bot

ti-chi-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign djshow832 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds an IsIPBoundLocally utility to check whether an IP address is bound to a local network interface, and integrates it into three locations: server startup (overrideConfig), owner manager session refresh, and server info syncer, refusing operations when the configured advertise-address is no longer locally bound.

Changes

Advertise-address local binding validation

Layer / File(s) Summary
IsIPBoundLocally helper and tests
pkg/util/misc.go, pkg/util/misc_test.go
New exported function checks bind-all addresses, parses/validates IPs, and enumerates local interface addresses to determine local binding; tests cover bind-all, invalid IPs, loopback, external IPs, and actual local interfaces.
Startup validation
cmd/tidb-server/main.go
overrideConfig validates a non-wildcard AdvertiseAddress is locally bound, terminating startup via terror.MustNil if not.
Owner manager session refresh check
pkg/owner/manager.go
refreshSession loads config, checks AdvertiseAddress binding, and returns an error to refuse etcd session renewal if the address is unbound.
Server info syncer session check
pkg/domain/serverinfo/syncer.go
NewSessionAndStoreServerInfo validates AdvertiseAddress binding before creating a new etcd session, returning an error and logging a warning if invalid.

Estimated code review effort: 2 (Simple) | ~15 minutes

Sequence Diagram(s)

sequenceDiagram
  participant OwnerManager as "Owner Manager"
  participant ServerInfoSyncer as "Server Info Syncer"
  participant Config as "Global Config"
  participant IsIPBoundLocally as "util.IsIPBoundLocally"
  participant EtcdSession as "Etcd Session"

  OwnerManager->>Config: read AdvertiseAddress
  OwnerManager->>IsIPBoundLocally: check(AdvertiseAddress)
  IsIPBoundLocally-->>OwnerManager: bound true/false
  alt not bound
    OwnerManager-->>OwnerManager: log warning, return error
  else bound
    OwnerManager->>EtcdSession: closeSession + renew
  end

  ServerInfoSyncer->>Config: read AdvertiseAddress
  ServerInfoSyncer->>IsIPBoundLocally: check(AdvertiseAddress)
  IsIPBoundLocally-->>ServerInfoSyncer: bound true/false
  alt not bound
    ServerInfoSyncer-->>ServerInfoSyncer: log warning, return error
  else bound
    ServerInfoSyncer->>EtcdSession: create session, store server info
  end
Loading

Poem

A rabbit checks each hopping IP,
"Is this address truly mine to keep?"
If bound at home, I renew with glee,
If wandered off, no session for thee!
🐇📡 Local hops only, that's the decree.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers the problem, changes, tests, and release note, but it omits the required Issue Number line. Add an "Issue Number: close #xxx" line and link the related issue to match the repository template.
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: validating advertise-address locally at startup and renewal.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@ti-chi-bot

ti-chi-bot Bot commented Jul 1, 2026

Copy link
Copy Markdown

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.


Notice: To remove the do-not-merge/needs-tests-checked label, please finished the tests then check the finished items in description.

For example:

Tests <!-- At least one of them must be included. -->

- [x] Unit test
- [ ] Integration test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No code

‼️ Must keep the HTML comments <!-- At least one of them must be included. -->

📖 For more info, you can check the "Contribute Code" section in the development guide.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
pkg/util/misc_test.go (1)

195-202: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Consider adding a case for non-canonical bind-all forms.

E.g. IsIPBoundLocally("0:0:0:0:0:0:0:0") or IsIPBoundLocally("::0") — these are semantically the same "bind all" address but currently bypass the string-equality special case in IsIPBoundLocally (see comment on pkg/util/misc.go).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/util/misc_test.go` around lines 195 - 202, The IsIPBoundLocally test
coverage only checks the canonical bind-all strings, so add a case in misc_test
for non-canonical bind-all IPv6 forms like 0:0:0:0:0:0:0:0 or ::0. Update the
tests around IsIPBoundLocally to assert these are treated the same as 0.0.0.0
and ::, matching the special-case behavior in pkg/util/misc.go.
cmd/tidb-server/main.go (1)

732-742: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Redundant wildcard pre-check duplicated across three call sites.

IsIPBoundLocally already special-cases "0.0.0.0"/"::" internally, so this outer cfg.AdvertiseAddress != "0.0.0.0" && cfg.AdvertiseAddress != "::" guard is redundant here — and it's copy-pasted identically in pkg/owner/manager.go and pkg/domain/serverinfo/syncer.go. Simplify to if len(cfg.AdvertiseAddress) > 0 && !util.IsIPBoundLocally(cfg.AdvertiseAddress) { ... } in all three places once the helper reliably normalizes bind-all addresses.

Note: the startup-failure behavior of this block is also affected by the local-binding assumption raised on pkg/util/misc.go (Docker/NAT deployments).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/tidb-server/main.go` around lines 732 - 742, The wildcard address
pre-check is duplicated and unnecessary because IsIPBoundLocally already handles
"0.0.0.0" and "::" internally. Remove the extra cfg.AdvertiseAddress comparisons
in the advertise-address validation block and keep only the
len(cfg.AdvertiseAddress) > 0 check before calling util.IsIPBoundLocally. Apply
the same simplification at the matching checks in pkg/owner/manager.go and
pkg/domain/serverinfo/syncer.go so all three call sites rely on the helper
consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/util/misc.go`:
- Around line 557-560: The bind-all check in IsIPBoundLocally is using raw
string comparisons for "0.0.0.0" and "::", so equivalent IPv6 wildcard forms can
be missed. Update the logic in misc.go’s IsIPBoundLocally to parse ipAddr first
and use the parsed IP’s IsUnspecified() check before any interface/local-address
comparison, so all wildcard representations are handled consistently and callers
don’t need to duplicate this special-case check.
- Around line 556-589: Keep IsIPBoundLocally from requiring the
advertise-address to exist on a current network interface; instead, adjust the
local-IP check to compare against the originally resolved local IP rather than
iterating net.InterfaceAddrs in pkg/util/misc.go. Update the call sites that
rely on this guard in startup and renewal flows (cmd/tidb-server/main.go,
pkg/owner/manager.go, pkg/domain/serverinfo/syncer.go) so Docker/NAT/Kubernetes
deployments with reachable but non-local advertised IPs do not fail. Preserve
the special-case handling for bind-all addresses and invalid IP parsing.

---

Nitpick comments:
In `@cmd/tidb-server/main.go`:
- Around line 732-742: The wildcard address pre-check is duplicated and
unnecessary because IsIPBoundLocally already handles "0.0.0.0" and "::"
internally. Remove the extra cfg.AdvertiseAddress comparisons in the
advertise-address validation block and keep only the len(cfg.AdvertiseAddress) >
0 check before calling util.IsIPBoundLocally. Apply the same simplification at
the matching checks in pkg/owner/manager.go and pkg/domain/serverinfo/syncer.go
so all three call sites rely on the helper consistently.

In `@pkg/util/misc_test.go`:
- Around line 195-202: The IsIPBoundLocally test coverage only checks the
canonical bind-all strings, so add a case in misc_test for non-canonical
bind-all IPv6 forms like 0:0:0:0:0:0:0:0 or ::0. Update the tests around
IsIPBoundLocally to assert these are treated the same as 0.0.0.0 and ::,
matching the special-case behavior in pkg/util/misc.go.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6b469802-c3b3-4133-a3b4-31642a8d0c85

📥 Commits

Reviewing files that changed from the base of the PR and between 693e52c and d5f688e.

📒 Files selected for processing (5)
  • cmd/tidb-server/main.go
  • pkg/domain/serverinfo/syncer.go
  • pkg/owner/manager.go
  • pkg/util/misc.go
  • pkg/util/misc_test.go

Comment thread pkg/util/misc.go
Comment on lines +556 to +589
func IsIPBoundLocally(ipAddr string) bool {
// Handle bind-all addresses
if ipAddr == "0.0.0.0" || ipAddr == "::" {
return true
}

// Parse the input IP
targetIP := net.ParseIP(ipAddr)
if targetIP == nil {
// Invalid IP format
return false
}

// Get all network interfaces
addrs, err := net.InterfaceAddrs()
if err != nil {
logutil.BgLogger().Warn("failed to get network interfaces", zap.Error(err))
return false
}

// Check each interface address
for _, addr := range addrs {
ipnet, ok := addr.(*net.IPNet)
if !ok {
continue
}
// Compare the IP addresses
if ipnet.IP.Equal(targetIP) {
return true
}
}

return false
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🔴 Critical | 🏗️ Heavy lift

🧩 Analysis chain

🌐 Web query:

Does TiDB tidb-server support running behind Docker/NAT where advertise-address differs from any local interface IP?

💡 Result:

Yes, TiDB (specifically the tidb-server component) supports running behind Docker or NAT where the advertise-address differs from any local interface IP [1][2][3]. The advertise-address is intended specifically to be the address that external clients and other cluster components use to reach the TiDB server, rather than the address the server binds to locally [4][5]. Key technical points: 1. Separation of Concerns: You configure host (or -host) to tell tidb-server which local interface to bind to (e.g., 0.0.0.0 for all interfaces or a specific container-internal IP) [4][3]. The advertise-address is the value registered in the cluster or provided to clients for connection purposes [1][5]. 2. NAT/Docker Scenarios: In Docker or NAT environments, you can bind to the local container IP (or 0.0.0.0) while setting advertise-address to the host's public/reachable IP and port [1][3]. This allows the server to correctly advertise its accessible endpoint even though it does not technically "own" that IP on its own network interfaces [1][4]. 3. Requirement: The address provided to advertise-address must be reachable by the rest of the TiDB cluster and the external clients [1][2]. This behavior is standard practice for distributed components in TiDB (similar to TiKV's configuration patterns) to facilitate deployment in containerized and cloud environments [6].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Map the function and its callers.
printf '\n== pkg/util/misc.go outline ==\n'
ast-grep outline pkg/util/misc.go --view expanded | sed -n '1,220p'

printf '\n== main.go occurrences ==\n'
rg -n "IsIPBoundLocally|advertise-address|MustNil|host" cmd/tidb-server/main.go pkg/owner/manager.go pkg/domain/serverinfo/syncer.go pkg/util/misc.go

printf '\n== relevant file line counts ==\n'
wc -l pkg/util/misc.go cmd/tidb-server/main.go pkg/owner/manager.go pkg/domain/serverinfo/syncer.go

Repository: pingcap/tidb

Length of output: 7896


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Read only the relevant slices around the callers and the function.
for f in pkg/util/misc.go cmd/tidb-server/main.go pkg/owner/manager.go pkg/domain/serverinfo/syncer.go; do
  echo "===== $f ====="
  case "$f" in
    pkg/util/misc.go) sed -n '520,620p' "$f" ;;
    cmd/tidb-server/main.go) sed -n '1,220p' "$f" ;;
    pkg/owner/manager.go) sed -n '1,260p' "$f" ;;
    pkg/domain/serverinfo/syncer.go) sed -n '1,260p' "$f" ;;
  esac
  echo
done

Repository: pingcap/tidb

Length of output: 28000


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Read the startup validation and the exact callers around the renew path.
sed -n '700,760p' cmd/tidb-server/main.go
printf '\n===== owner renew path =====\n'
sed -n '430,470p' pkg/owner/manager.go
printf '\n===== serverinfo renew path =====\n'
sed -n '105,130p' pkg/domain/serverinfo/syncer.go

Repository: pingcap/tidb

Length of output: 5007


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 - <<'PY'
import socket

tests = ["0.0.0.0", "::", "::0", "0:0:0:0:0:0:0:0", "127.0.0.1", "2001:db8::1"]
for t in tests:
    for fam, name in [(socket.AF_INET, "v4"), (socket.AF_INET6, "v6")]:
        try:
            packed = socket.inet_pton(fam, t)
            print(f"{t!r} -> {name} len={len(packed)} hex={packed.hex()}")
        except OSError:
            pass
PY

Repository: pingcap/tidb

Length of output: 465


Keep advertise-address from requiring a local interface.

This guard runs in startup and session-renewal paths (cmd/tidb-server/main.go, pkg/owner/manager.go, pkg/domain/serverinfo/syncer.go), so any Docker/NAT/Kubernetes setup that advertises a reachable host/LB IP not present on the container’s interfaces will fail to start or keep renewing. If the goal is to detect host-IP changes, compare against the originally resolved local IP instead of enforcing universal interface membership.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/util/misc.go` around lines 556 - 589, Keep IsIPBoundLocally from
requiring the advertise-address to exist on a current network interface;
instead, adjust the local-IP check to compare against the originally resolved
local IP rather than iterating net.InterfaceAddrs in pkg/util/misc.go. Update
the call sites that rely on this guard in startup and renewal flows
(cmd/tidb-server/main.go, pkg/owner/manager.go, pkg/domain/serverinfo/syncer.go)
so Docker/NAT/Kubernetes deployments with reachable but non-local advertised IPs
do not fail. Preserve the special-case handling for bind-all addresses and
invalid IP parsing.

Comment thread pkg/util/misc.go
Comment on lines +557 to +560
// Handle bind-all addresses
if ipAddr == "0.0.0.0" || ipAddr == "::" {
return true
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Bind-all detection misses non-canonical IPv6 representations.

The bind-all check is a raw string comparison against "0.0.0.0"/"::". Equivalent but differently-formatted addresses (e.g. "0:0:0:0:0:0:0:0", "::0", "0000:0000:0000:0000:0000:0000:0000:0000") will bypass this special case, fall through to net.ParseIP/interface comparison, and be incorrectly rejected as "not bound locally" even though they mean the same address.

Parsing first and checking IsUnspecified() avoids this entirely and also removes the need for callers to duplicate this same wildcard string check before invoking IsIPBoundLocally.

🐛 Proposed fix
 func IsIPBoundLocally(ipAddr string) bool {
-	// Handle bind-all addresses
-	if ipAddr == "0.0.0.0" || ipAddr == "::" {
-		return true
-	}
-
 	// Parse the input IP
 	targetIP := net.ParseIP(ipAddr)
 	if targetIP == nil {
 		// Invalid IP format
 		return false
 	}
+
+	// Handle bind-all addresses (covers all equivalent textual forms of 0.0.0.0/::)
+	if targetIP.IsUnspecified() {
+		return true
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Handle bind-all addresses
if ipAddr == "0.0.0.0" || ipAddr == "::" {
return true
}
// Parse the input IP
targetIP := net.ParseIP(ipAddr)
if targetIP == nil {
// Invalid IP format
return false
}
// Handle bind-all addresses (covers all equivalent textual forms of 0.0.0.0/::)
if targetIP.IsUnspecified() {
return true
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/util/misc.go` around lines 557 - 560, The bind-all check in
IsIPBoundLocally is using raw string comparisons for "0.0.0.0" and "::", so
equivalent IPv6 wildcard forms can be missed. Update the logic in misc.go’s
IsIPBoundLocally to parse ipAddr first and use the parsed IP’s IsUnspecified()
check before any interface/local-address comparison, so all wildcard
representations are handled consistently and callers don’t need to duplicate
this special-case check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant