br: force-rebase autoid service after PiTR log replay for AUTO_ID_CAC… by vansh-py04 · Pull Request #69506 · pingcap/tidb

vansh-py04 · 2026-06-28T20:17:31Z

What problem does this PR solve?

Issue Number: close #69485

Problem Summary:
After PiTR (point-in-time recovery) restore, tables with AUTO_ID_CACHE=1 can produce
duplicate key errors on INSERT because the autoid service's in-memory allocator is stale.

Root cause: PiTR log replay writes the correct auto-increment ID (e.g., 8000) directly to
TiKV via raw KV writes. However, the autoid service may have already cached the value from
the full snapshot restore (e.g., 4000) and batch-allocated a range [4001, 8000]. After log
replay, the service still thinks it owns that range and allocates IDs that collide with
restored rows.

What changed and how does it work?

After PiTR log replay completes and schema reload finishes, iterate all restored tables
with SepAutoInc() == true (AUTO_ID_CACHE=1). For each, read the persisted IID directly
from TiKV (bypassing the service cache) and call ForceRebase to synchronize the service's
in-memory state.

This is safe because:

ForceRebase is idempotent when the persisted value already matches
It only affects AUTO_ID_CACHE=1 tables; traditional allocators are range-based and
re-read from TiKV when their local cache is exhausted
It runs after schema reload ensures the info schema is current, and before table mode
transitions allow writes

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

Bug Fixes
- Improved log restore handling for tables using separate auto-increment IDs, helping restored tables continue with the correct next ID.
- Added a post-restore step to align in-memory auto-increment state with persisted values before cleanup, reducing the chance of ID conflicts after recovery.

…HE=1 tables

ti-chi-bot · 2026-06-28T20:17:40Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yujuncen for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

br/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-06-28T20:17:42Z

Hi @vansh-py04. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

pingcap-cla-assistant · 2026-06-28T20:17:45Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

coderabbitai · 2026-06-28T20:17:56Z

📝 Walkthrough

Walkthrough

Adds RebaseAutoIDAllocatorsForSepAutoIncTables to LogClient, which reads persisted auto-increment end values from TiKV for tables with SepAutoInc() enabled and force-rebases their in-memory allocators. This method is called in the log restore flow after schema reload to fix duplicate-key errors for AUTO_ID_CACHE=1 tables after PiTR.

Changes

PiTR AUTO_ID_CACHE=1 allocator rebase fix

Layer / File(s)	Summary
RebaseAutoIDAllocatorsForSepAutoIncTables implementation `br/pkg/restore/log_client/client.go`	Imports `meta/autoid` and adds the new method that iterates `schemaReplace` db/table entries, skips filtered tables, reads the persisted `currentEnd` from TiKV via a transaction, and calls `ForceRebase(currentEnd)` on each `AUTO_INCREMENT` allocator; logs warnings on failures.
Log restore call site `br/pkg/task/stream.go`	Calls `RebaseAutoIDAllocatorsForSepAutoIncTables` after schema reload completes and before `CleanUpKVFiles`; traces and returns any error.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

approved, lgtm

Suggested reviewers

YuJuncen
Leavrth

Poem

🐇 Hops through the TiKV store with glee,
Reads the currentEnd so carefully,
ForceRebase called, no dupes in sight,
AUTO_ID_CACHE=1 restored just right!
The allocator state is fixed — hooray!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: rebasing autoid service state after PiTR log replay for AUTO_ID_CACHE-related tables.
Description check	✅ Passed	The description follows the template well, includes the issue number, problem summary, change details, and release note section.
Linked Issues check	✅ Passed	The PR directly addresses `#69485` by rebasing AUTO_ID_CACHE=1 allocators after restore so inserts use the restored auto-increment value.
Out of Scope Changes check	✅ Passed	The changes are limited to the restore flow and autoid rebasing logic, with no obvious unrelated scope added.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

ti-chi-bot · 2026-06-28T20:23:10Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-tests-checked label, please finished the tests then check the finished items in description.

For example:

Tests <!-- At least one of them must be included. -->

- [x] Unit test
- [ ] Integration test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No code

‼️ Must keep the HTML comments 

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@br/pkg/restore/log_client/client.go`:
- Around line 1592-1594: The persisted-end check in the restore client’s
allocator update path is too broad and skips valid unsigned auto-increment ends
when `ForceRebase` interprets `int64` values as `uint64`. In the `client.go`
logic that filters `currentEnd`, change the guard so only the uninitialized zero
value is skipped, and allow negative values to flow through to `ForceRebase` for
proper unsigned handling and validation.
- Around line 1588-1602: The rebase path in restoreStream is swallowing read and
ForceRebase failures, so the caller cannot tell when a table was not actually
rebased. Update the loop around the persisted auto-increment read and
alloc.ForceRebase in client.go to collect and return an error instead of only
logging and continuing, using the existing log and allocator flow to identify
the failing table. Keep the error contextual and actionable by annotating it
with the table/allocator information, or aggregate per-table failures if
multiple tables can fail, so restoreStream can fail fast when rebase does not
succeed.

In `@br/pkg/task/stream.go`:
- Around line 1911-1915: The AUTO_ID_CACHE=1 allocator rebase in stream.go is
happening after explicit-filter restores have already called
SetTableModeToNormal, which can make tables writable before the allocator state
is refreshed. Update the restore flow around the rebase call and
SetTableModeToNormal so the rebase in the restore path completes first, then
switch tables back to normal, or otherwise keep writes blocked until
RebaseAutoIDAllocatorsForSepAutoIncTables finishes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b0ad2c27-f502-48b5-954b-b52035769edd

📥 Commits

Reviewing files that changed from the base of the PR and between 1d16fde and 8a8076e.

📒 Files selected for processing (2)

br/pkg/restore/log_client/client.go
br/pkg/task/stream.go

coderabbitai · 2026-06-28T20:25:02Z

+            if err != nil {
+                log.Warn("failed to read persisted auto-increment ID for rebase", ...)
+                continue
+            }
+            if currentEnd <= 0 {
+                continue
+            }
+            allocs := tbl.Allocators(nil)
+            for _, alloc := range allocs.Allocs {
+                if alloc.GetType() != autoid.AutoIncrementType {
+                    continue
+                }
+                if err := alloc.ForceRebase(currentEnd); err != nil {
+                    log.Warn("failed to force rebase auto-increment allocator after PiTR", ...)
+                    continue


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Return rebase failures instead of only logging them.

restoreStream treats this method as a hard gate, but read/ForceRebase failures are swallowed and the method still returns nil, so restore can complete with stale AUTO_ID_CACHE=1 allocators and the original duplicate-key risk. Return an annotated error, or aggregate per-table errors, when a target table cannot be rebased.

Suggested direction

- if err != nil { - log.Warn("failed to read persisted auto-increment ID for rebase", ...) - continue - } + if err != nil { + return errors.Annotatef(err, + "failed to read persisted auto-increment ID for rebase, schemaID=%d, tableID=%d, tableName=%s", + dbReplace.DbID, tableReplace.TableID, tableReplace.Name) + } ... - if err := alloc.ForceRebase(currentEnd); err != nil { - log.Warn("failed to force rebase auto-increment allocator after PiTR", ...) - continue - } + if err := alloc.ForceRebase(currentEnd); err != nil { + return errors.Annotatef(err, + "failed to force rebase auto-increment allocator after PiTR, schemaID=%d, tableID=%d, tableName=%s, currentEnd=%d", + dbReplace.DbID, tableReplace.TableID, tableReplace.Name, currentEnd) + }

As per coding guidelines, Go code should keep error handling actionable and contextual and avoid silently swallowing errors.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if err != nil {

log.Warn("failed to read persisted auto-increment ID for rebase", ...)

continue

}

if currentEnd <= 0 {

continue

}

allocs := tbl.Allocators(nil)

for _, alloc := range allocs.Allocs {

if alloc.GetType() != autoid.AutoIncrementType {

continue

}

if err := alloc.ForceRebase(currentEnd); err != nil {

log.Warn("failed to force rebase auto-increment allocator after PiTR", ...)

continue

if err != nil {

return errors.Annotatef(err,

"failed to read persisted auto-increment ID for rebase, schemaID=%d, tableID=%d, tableName=%s",

dbReplace.DbID, tableReplace.TableID, tableReplace.Name)

}

if currentEnd <= 0 {

continue

}

allocs := tbl.Allocators(nil)

for _, alloc := range allocs.Allocs {

if alloc.GetType() != autoid.AutoIncrementType {

continue

}

if err := alloc.ForceRebase(currentEnd); err != nil {

return errors.Annotatef(err,

"failed to force rebase auto-increment allocator after PiTR, schemaID=%d, tableID=%d, tableName=%s, currentEnd=%d",

dbReplace.DbID, tableReplace.TableID, tableReplace.Name, currentEnd)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/restore/log_client/client.go` around lines 1588 - 1602, The rebase path in restoreStream is swallowing read and ForceRebase failures, so the caller cannot tell when a table was not actually rebased. Update the loop around the persisted auto-increment read and alloc.ForceRebase in client.go to collect and return an error instead of only logging and continuing, using the existing log and allocator flow to identify the failing table. Keep the error contextual and actionable by annotating it with the table/allocator information, or aggregate per-table failures if multiple tables can fail, so restoreStream can fail fast when rebase does not succeed.

Source: Coding guidelines

coderabbitai · 2026-06-28T20:25:02Z

+            if currentEnd <= 0 {
+                continue
+            }


🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Do not skip negative persisted ends for unsigned auto-increment tables.

ForceRebase has an unsigned path that interprets int64 values through uint64, so negative currentEnd values can represent valid unsigned allocator ends above math.MaxInt64. Skipping all <= 0 leaves those tables stale; only skip the uninitialized zero value and let ForceRebase report invalid sentinels.

Suggested fix

- if currentEnd <= 0 { + if currentEnd == 0 { continue }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if currentEnd <= 0 {

continue

}

if currentEnd == 0 {

continue

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/restore/log_client/client.go` around lines 1592 - 1594, The persisted-end check in the restore client’s allocator update path is too broad and skips valid unsigned auto-increment ends when `ForceRebase` interprets `int64` values as `uint64`. In the `client.go` logic that filters `currentEnd`, change the guard so only the uninitialized zero value is skipped, and allow negative values to flow through to `ForceRebase` for proper unsigned handling and validation.

coderabbitai · 2026-06-28T20:25:02Z

+	// Force-rebase autoid service allocators for AUTO_ID_CACHE=1 tables so
+	// their in-memory state reflects the PiTR-replayed persisted IID value.
+	if err = client.RebaseAutoIDAllocatorsForSepAutoIncTables(ctx, schemasReplace); err != nil {
+    	return errors.Trace(err)
+	}


🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Rebase before tables are made writable again.

For explicit-filter restores, SetTableModeToNormal runs before this block, so a table can become normal while its AUTO_ID_CACHE=1 allocator is still stale. Move the rebase before the mode switch, or otherwise keep writes blocked until after this step completes.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@br/pkg/task/stream.go` around lines 1911 - 1915, The AUTO_ID_CACHE=1 allocator rebase in stream.go is happening after explicit-filter restores have already called SetTableModeToNormal, which can make tables writable before the allocator state is refreshed. Update the restore flow around the rebase call and SetTableModeToNormal so the rebase in the restore path completes first, then switch tables back to normal, or otherwise keep writes blocked until RebaseAutoIDAllocatorsForSepAutoIncTables finishes.

br: force-rebase autoid service after PiTR log replay for AUTO_ID_CAC…

8a8076e

…HE=1 tables

ti-chi-bot Bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed do-not-merge/needs-tests-checked labels Jun 28, 2026

ti-chi-bot Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. contribution This PR is from a community contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Jun 28, 2026

coderabbitai Bot reviewed Jun 28, 2026

View reviewed changes

vldmit mentioned this pull request Jul 2, 2026

br: rebase autoid service after PiTR log replay for AUTO_ID_CACHE=1 tables #69573

Open

13 tasks

ti-chi-bot Bot removed the do-not-merge/needs-triage-completed label Jul 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

br: force-rebase autoid service after PiTR log replay for AUTO_ID_CAC…#69506

br: force-rebase autoid service after PiTR log replay for AUTO_ID_CAC…#69506
vansh-py04 wants to merge 1 commit into
pingcap:masterfrom
vansh-py04:fix/PiTR-to-restore-auto-incID

vansh-py04 commented Jun 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

ti-chi-bot Bot commented Jun 28, 2026

Uh oh!

ti-chi-bot Bot commented Jun 28, 2026

Uh oh!

pingcap-cla-assistant Bot commented Jun 28, 2026

Uh oh!

coderabbitai Bot commented Jun 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

ti-chi-bot Bot commented Jun 28, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Uh oh!

coderabbitai Bot Jun 28, 2026

Uh oh!

coderabbitai Bot Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vansh-py04 commented Jun 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

ti-chi-bot Bot commented Jun 28, 2026

Uh oh!

ti-chi-bot Bot commented Jun 28, 2026

Uh oh!

pingcap-cla-assistant Bot commented Jun 28, 2026

Uh oh!

coderabbitai Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

ti-chi-bot Bot commented Jun 28, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vansh-py04 commented Jun 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 28, 2026 •

edited

Loading