Skip to content

br: force-rebase autoid service after PiTR log replay for AUTO_ID_CAC…#69506

Open
vansh-py04 wants to merge 1 commit into
pingcap:masterfrom
vansh-py04:fix/PiTR-to-restore-auto-incID
Open

br: force-rebase autoid service after PiTR log replay for AUTO_ID_CAC…#69506
vansh-py04 wants to merge 1 commit into
pingcap:masterfrom
vansh-py04:fix/PiTR-to-restore-auto-incID

Conversation

@vansh-py04

@vansh-py04 vansh-py04 commented Jun 28, 2026

Copy link
Copy Markdown

What problem does this PR solve?

Issue Number: close #69485

Problem Summary:
After PiTR (point-in-time recovery) restore, tables with AUTO_ID_CACHE=1 can produce
duplicate key errors on INSERT because the autoid service's in-memory allocator is stale.

Root cause: PiTR log replay writes the correct auto-increment ID (e.g., 8000) directly to
TiKV via raw KV writes. However, the autoid service may have already cached the value from
the full snapshot restore (e.g., 4000) and batch-allocated a range [4001, 8000]. After log
replay, the service still thinks it owns that range and allocates IDs that collide with
restored rows.

What changed and how does it work?

After PiTR log replay completes and schema reload finishes, iterate all restored tables
with SepAutoInc() == true (AUTO_ID_CACHE=1). For each, read the persisted IID directly
from TiKV (bypassing the service cache) and call ForceRebase to synchronize the service's
in-memory state.

This is safe because:

  • ForceRebase is idempotent when the persisted value already matches
  • It only affects AUTO_ID_CACHE=1 tables; traditional allocators are range-based and
    re-read from TiKV when their local cache is exhausted
  • It runs after schema reload ensures the info schema is current, and before table mode
    transitions allow writes

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • Bug Fixes
    • Improved log restore handling for tables using separate auto-increment IDs, helping restored tables continue with the correct next ID.
    • Added a post-restore step to align in-memory auto-increment state with persisted values before cleanup, reducing the chance of ID conflicts after recovery.

@ti-chi-bot

ti-chi-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yujuncen for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. contribution This PR is from a community contributor. needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. labels Jun 28, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown

Hi @vansh-py04. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@pingcap-cla-assistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds RebaseAutoIDAllocatorsForSepAutoIncTables to LogClient, which reads persisted auto-increment end values from TiKV for tables with SepAutoInc() enabled and force-rebases their in-memory allocators. This method is called in the log restore flow after schema reload to fix duplicate-key errors for AUTO_ID_CACHE=1 tables after PiTR.

Changes

PiTR AUTO_ID_CACHE=1 allocator rebase fix

Layer / File(s) Summary
RebaseAutoIDAllocatorsForSepAutoIncTables implementation
br/pkg/restore/log_client/client.go
Imports meta/autoid and adds the new method that iterates schemaReplace db/table entries, skips filtered tables, reads the persisted currentEnd from TiKV via a transaction, and calls ForceRebase(currentEnd) on each AUTO_INCREMENT allocator; logs warnings on failures.
Log restore call site
br/pkg/task/stream.go
Calls RebaseAutoIDAllocatorsForSepAutoIncTables after schema reload completes and before CleanUpKVFiles; traces and returns any error.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

approved, lgtm

Suggested reviewers

  • YuJuncen
  • Leavrth

Poem

🐇 Hops through the TiKV store with glee,
Reads the currentEnd so carefully,
ForceRebase called, no dupes in sight,
AUTO_ID_CACHE=1 restored just right!
The allocator state is fixed — hooray!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: rebasing autoid service state after PiTR log replay for AUTO_ID_CACHE-related tables.
Description check ✅ Passed The description follows the template well, includes the issue number, problem summary, change details, and release note section.
Linked Issues check ✅ Passed The PR directly addresses #69485 by rebasing AUTO_ID_CACHE=1 allocators after restore so inserts use the restored auto-increment value.
Out of Scope Changes check ✅ Passed The changes are limited to the restore flow and autoid rebasing logic, with no obvious unrelated scope added.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@ti-chi-bot

ti-chi-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-tests-checked label, please finished the tests then check the finished items in description.

For example:

Tests <!-- At least one of them must be included. -->

- [x] Unit test
- [ ] Integration test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No code

‼️ Must keep the HTML comments <!-- At least one of them must be included. -->

📖 For more info, you can check the "Contribute Code" section in the development guide.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@br/pkg/restore/log_client/client.go`:
- Around line 1592-1594: The persisted-end check in the restore client’s
allocator update path is too broad and skips valid unsigned auto-increment ends
when `ForceRebase` interprets `int64` values as `uint64`. In the `client.go`
logic that filters `currentEnd`, change the guard so only the uninitialized zero
value is skipped, and allow negative values to flow through to `ForceRebase` for
proper unsigned handling and validation.
- Around line 1588-1602: The rebase path in restoreStream is swallowing read and
ForceRebase failures, so the caller cannot tell when a table was not actually
rebased. Update the loop around the persisted auto-increment read and
alloc.ForceRebase in client.go to collect and return an error instead of only
logging and continuing, using the existing log and allocator flow to identify
the failing table. Keep the error contextual and actionable by annotating it
with the table/allocator information, or aggregate per-table failures if
multiple tables can fail, so restoreStream can fail fast when rebase does not
succeed.

In `@br/pkg/task/stream.go`:
- Around line 1911-1915: The AUTO_ID_CACHE=1 allocator rebase in stream.go is
happening after explicit-filter restores have already called
SetTableModeToNormal, which can make tables writable before the allocator state
is refreshed. Update the restore flow around the rebase call and
SetTableModeToNormal so the rebase in the restore path completes first, then
switch tables back to normal, or otherwise keep writes blocked until
RebaseAutoIDAllocatorsForSepAutoIncTables finishes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b0ad2c27-f502-48b5-954b-b52035769edd

📥 Commits

Reviewing files that changed from the base of the PR and between 1d16fde and 8a8076e.

📒 Files selected for processing (2)
  • br/pkg/restore/log_client/client.go
  • br/pkg/task/stream.go

Comment on lines +1588 to +1602
if err != nil {
log.Warn("failed to read persisted auto-increment ID for rebase", ...)
continue
}
if currentEnd <= 0 {
continue
}
allocs := tbl.Allocators(nil)
for _, alloc := range allocs.Allocs {
if alloc.GetType() != autoid.AutoIncrementType {
continue
}
if err := alloc.ForceRebase(currentEnd); err != nil {
log.Warn("failed to force rebase auto-increment allocator after PiTR", ...)
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Return rebase failures instead of only logging them.

restoreStream treats this method as a hard gate, but read/ForceRebase failures are swallowed and the method still returns nil, so restore can complete with stale AUTO_ID_CACHE=1 allocators and the original duplicate-key risk. Return an annotated error, or aggregate per-table errors, when a target table cannot be rebased.

Suggested direction
-            if err != nil {
-                log.Warn("failed to read persisted auto-increment ID for rebase", ...)
-                continue
-            }
+            if err != nil {
+                return errors.Annotatef(err,
+                    "failed to read persisted auto-increment ID for rebase, schemaID=%d, tableID=%d, tableName=%s",
+                    dbReplace.DbID, tableReplace.TableID, tableReplace.Name)
+            }
...
-                if err := alloc.ForceRebase(currentEnd); err != nil {
-                    log.Warn("failed to force rebase auto-increment allocator after PiTR", ...)
-                    continue
-                }
+                if err := alloc.ForceRebase(currentEnd); err != nil {
+                    return errors.Annotatef(err,
+                        "failed to force rebase auto-increment allocator after PiTR, schemaID=%d, tableID=%d, tableName=%s, currentEnd=%d",
+                        dbReplace.DbID, tableReplace.TableID, tableReplace.Name, currentEnd)
+                }

As per coding guidelines, Go code should keep error handling actionable and contextual and avoid silently swallowing errors.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err != nil {
log.Warn("failed to read persisted auto-increment ID for rebase", ...)
continue
}
if currentEnd <= 0 {
continue
}
allocs := tbl.Allocators(nil)
for _, alloc := range allocs.Allocs {
if alloc.GetType() != autoid.AutoIncrementType {
continue
}
if err := alloc.ForceRebase(currentEnd); err != nil {
log.Warn("failed to force rebase auto-increment allocator after PiTR", ...)
continue
if err != nil {
return errors.Annotatef(err,
"failed to read persisted auto-increment ID for rebase, schemaID=%d, tableID=%d, tableName=%s",
dbReplace.DbID, tableReplace.TableID, tableReplace.Name)
}
if currentEnd <= 0 {
continue
}
allocs := tbl.Allocators(nil)
for _, alloc := range allocs.Allocs {
if alloc.GetType() != autoid.AutoIncrementType {
continue
}
if err := alloc.ForceRebase(currentEnd); err != nil {
return errors.Annotatef(err,
"failed to force rebase auto-increment allocator after PiTR, schemaID=%d, tableID=%d, tableName=%s, currentEnd=%d",
dbReplace.DbID, tableReplace.TableID, tableReplace.Name, currentEnd)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/restore/log_client/client.go` around lines 1588 - 1602, The rebase
path in restoreStream is swallowing read and ForceRebase failures, so the caller
cannot tell when a table was not actually rebased. Update the loop around the
persisted auto-increment read and alloc.ForceRebase in client.go to collect and
return an error instead of only logging and continuing, using the existing log
and allocator flow to identify the failing table. Keep the error contextual and
actionable by annotating it with the table/allocator information, or aggregate
per-table failures if multiple tables can fail, so restoreStream can fail fast
when rebase does not succeed.

Source: Coding guidelines

Comment on lines +1592 to +1594
if currentEnd <= 0 {
continue
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Do not skip negative persisted ends for unsigned auto-increment tables.

ForceRebase has an unsigned path that interprets int64 values through uint64, so negative currentEnd values can represent valid unsigned allocator ends above math.MaxInt64. Skipping all <= 0 leaves those tables stale; only skip the uninitialized zero value and let ForceRebase report invalid sentinels.

Suggested fix
-            if currentEnd <= 0 {
+            if currentEnd == 0 {
                 continue
             }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if currentEnd <= 0 {
continue
}
if currentEnd == 0 {
continue
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/restore/log_client/client.go` around lines 1592 - 1594, The
persisted-end check in the restore client’s allocator update path is too broad
and skips valid unsigned auto-increment ends when `ForceRebase` interprets
`int64` values as `uint64`. In the `client.go` logic that filters `currentEnd`,
change the guard so only the uninitialized zero value is skipped, and allow
negative values to flow through to `ForceRebase` for proper unsigned handling
and validation.

Comment thread br/pkg/task/stream.go
Comment on lines +1911 to +1915
// Force-rebase autoid service allocators for AUTO_ID_CACHE=1 tables so
// their in-memory state reflects the PiTR-replayed persisted IID value.
if err = client.RebaseAutoIDAllocatorsForSepAutoIncTables(ctx, schemasReplace); err != nil {
return errors.Trace(err)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Rebase before tables are made writable again.

For explicit-filter restores, SetTableModeToNormal runs before this block, so a table can become normal while its AUTO_ID_CACHE=1 allocator is still stale. Move the rebase before the mode switch, or otherwise keep writes blocked until after this step completes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@br/pkg/task/stream.go` around lines 1911 - 1915, The AUTO_ID_CACHE=1
allocator rebase in stream.go is happening after explicit-filter restores have
already called SetTableModeToNormal, which can make tables writable before the
allocator state is refreshed. Update the restore flow around the rebase call and
SetTableModeToNormal so the rebase in the restore path completes first, then
switch tables back to normal, or otherwise keep writes blocked until
RebaseAutoIDAllocatorsForSepAutoIncTables finishes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. do-not-merge/needs-tests-checked needs-ok-to-test Indicates a PR created by contributors and need ORG member send '/ok-to-test' to start testing. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PiTR does not restore AUTO_ID_CACHE=1 table's auto-inc ID correctly.

1 participant