br: force-rebase autoid service after PiTR log replay for AUTO_ID_CAC…#69506
br: force-rebase autoid service after PiTR log replay for AUTO_ID_CAC…#69506vansh-py04 wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @vansh-py04. Thanks for your PR. I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
|
📝 WalkthroughWalkthroughAdds ChangesPiTR AUTO_ID_CACHE=1 allocator rebase fix
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[FORMAT CHECKER NOTIFICATION] Notice: To remove the For example:
📖 For more info, you can check the "Contribute Code" section in the development guide. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@br/pkg/restore/log_client/client.go`:
- Around line 1592-1594: The persisted-end check in the restore client’s
allocator update path is too broad and skips valid unsigned auto-increment ends
when `ForceRebase` interprets `int64` values as `uint64`. In the `client.go`
logic that filters `currentEnd`, change the guard so only the uninitialized zero
value is skipped, and allow negative values to flow through to `ForceRebase` for
proper unsigned handling and validation.
- Around line 1588-1602: The rebase path in restoreStream is swallowing read and
ForceRebase failures, so the caller cannot tell when a table was not actually
rebased. Update the loop around the persisted auto-increment read and
alloc.ForceRebase in client.go to collect and return an error instead of only
logging and continuing, using the existing log and allocator flow to identify
the failing table. Keep the error contextual and actionable by annotating it
with the table/allocator information, or aggregate per-table failures if
multiple tables can fail, so restoreStream can fail fast when rebase does not
succeed.
In `@br/pkg/task/stream.go`:
- Around line 1911-1915: The AUTO_ID_CACHE=1 allocator rebase in stream.go is
happening after explicit-filter restores have already called
SetTableModeToNormal, which can make tables writable before the allocator state
is refreshed. Update the restore flow around the rebase call and
SetTableModeToNormal so the rebase in the restore path completes first, then
switch tables back to normal, or otherwise keep writes blocked until
RebaseAutoIDAllocatorsForSepAutoIncTables finishes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: b0ad2c27-f502-48b5-954b-b52035769edd
📒 Files selected for processing (2)
br/pkg/restore/log_client/client.gobr/pkg/task/stream.go
| if err != nil { | ||
| log.Warn("failed to read persisted auto-increment ID for rebase", ...) | ||
| continue | ||
| } | ||
| if currentEnd <= 0 { | ||
| continue | ||
| } | ||
| allocs := tbl.Allocators(nil) | ||
| for _, alloc := range allocs.Allocs { | ||
| if alloc.GetType() != autoid.AutoIncrementType { | ||
| continue | ||
| } | ||
| if err := alloc.ForceRebase(currentEnd); err != nil { | ||
| log.Warn("failed to force rebase auto-increment allocator after PiTR", ...) | ||
| continue |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Return rebase failures instead of only logging them.
restoreStream treats this method as a hard gate, but read/ForceRebase failures are swallowed and the method still returns nil, so restore can complete with stale AUTO_ID_CACHE=1 allocators and the original duplicate-key risk. Return an annotated error, or aggregate per-table errors, when a target table cannot be rebased.
Suggested direction
- if err != nil {
- log.Warn("failed to read persisted auto-increment ID for rebase", ...)
- continue
- }
+ if err != nil {
+ return errors.Annotatef(err,
+ "failed to read persisted auto-increment ID for rebase, schemaID=%d, tableID=%d, tableName=%s",
+ dbReplace.DbID, tableReplace.TableID, tableReplace.Name)
+ }
...
- if err := alloc.ForceRebase(currentEnd); err != nil {
- log.Warn("failed to force rebase auto-increment allocator after PiTR", ...)
- continue
- }
+ if err := alloc.ForceRebase(currentEnd); err != nil {
+ return errors.Annotatef(err,
+ "failed to force rebase auto-increment allocator after PiTR, schemaID=%d, tableID=%d, tableName=%s, currentEnd=%d",
+ dbReplace.DbID, tableReplace.TableID, tableReplace.Name, currentEnd)
+ }As per coding guidelines, Go code should keep error handling actionable and contextual and avoid silently swallowing errors.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if err != nil { | |
| log.Warn("failed to read persisted auto-increment ID for rebase", ...) | |
| continue | |
| } | |
| if currentEnd <= 0 { | |
| continue | |
| } | |
| allocs := tbl.Allocators(nil) | |
| for _, alloc := range allocs.Allocs { | |
| if alloc.GetType() != autoid.AutoIncrementType { | |
| continue | |
| } | |
| if err := alloc.ForceRebase(currentEnd); err != nil { | |
| log.Warn("failed to force rebase auto-increment allocator after PiTR", ...) | |
| continue | |
| if err != nil { | |
| return errors.Annotatef(err, | |
| "failed to read persisted auto-increment ID for rebase, schemaID=%d, tableID=%d, tableName=%s", | |
| dbReplace.DbID, tableReplace.TableID, tableReplace.Name) | |
| } | |
| if currentEnd <= 0 { | |
| continue | |
| } | |
| allocs := tbl.Allocators(nil) | |
| for _, alloc := range allocs.Allocs { | |
| if alloc.GetType() != autoid.AutoIncrementType { | |
| continue | |
| } | |
| if err := alloc.ForceRebase(currentEnd); err != nil { | |
| return errors.Annotatef(err, | |
| "failed to force rebase auto-increment allocator after PiTR, schemaID=%d, tableID=%d, tableName=%s, currentEnd=%d", | |
| dbReplace.DbID, tableReplace.TableID, tableReplace.Name, currentEnd) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@br/pkg/restore/log_client/client.go` around lines 1588 - 1602, The rebase
path in restoreStream is swallowing read and ForceRebase failures, so the caller
cannot tell when a table was not actually rebased. Update the loop around the
persisted auto-increment read and alloc.ForceRebase in client.go to collect and
return an error instead of only logging and continuing, using the existing log
and allocator flow to identify the failing table. Keep the error contextual and
actionable by annotating it with the table/allocator information, or aggregate
per-table failures if multiple tables can fail, so restoreStream can fail fast
when rebase does not succeed.
Source: Coding guidelines
| if currentEnd <= 0 { | ||
| continue | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Do not skip negative persisted ends for unsigned auto-increment tables.
ForceRebase has an unsigned path that interprets int64 values through uint64, so negative currentEnd values can represent valid unsigned allocator ends above math.MaxInt64. Skipping all <= 0 leaves those tables stale; only skip the uninitialized zero value and let ForceRebase report invalid sentinels.
Suggested fix
- if currentEnd <= 0 {
+ if currentEnd == 0 {
continue
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if currentEnd <= 0 { | |
| continue | |
| } | |
| if currentEnd == 0 { | |
| continue | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@br/pkg/restore/log_client/client.go` around lines 1592 - 1594, The
persisted-end check in the restore client’s allocator update path is too broad
and skips valid unsigned auto-increment ends when `ForceRebase` interprets
`int64` values as `uint64`. In the `client.go` logic that filters `currentEnd`,
change the guard so only the uninitialized zero value is skipped, and allow
negative values to flow through to `ForceRebase` for proper unsigned handling
and validation.
| // Force-rebase autoid service allocators for AUTO_ID_CACHE=1 tables so | ||
| // their in-memory state reflects the PiTR-replayed persisted IID value. | ||
| if err = client.RebaseAutoIDAllocatorsForSepAutoIncTables(ctx, schemasReplace); err != nil { | ||
| return errors.Trace(err) | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
Rebase before tables are made writable again.
For explicit-filter restores, SetTableModeToNormal runs before this block, so a table can become normal while its AUTO_ID_CACHE=1 allocator is still stale. Move the rebase before the mode switch, or otherwise keep writes blocked until after this step completes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@br/pkg/task/stream.go` around lines 1911 - 1915, The AUTO_ID_CACHE=1
allocator rebase in stream.go is happening after explicit-filter restores have
already called SetTableModeToNormal, which can make tables writable before the
allocator state is refreshed. Update the restore flow around the rebase call and
SetTableModeToNormal so the rebase in the restore path completes first, then
switch tables back to normal, or otherwise keep writes blocked until
RebaseAutoIDAllocatorsForSepAutoIncTables finishes.
What problem does this PR solve?
Issue Number: close #69485
Problem Summary:
After PiTR (point-in-time recovery) restore, tables with
AUTO_ID_CACHE=1can produceduplicate key errors on INSERT because the autoid service's in-memory allocator is stale.
Root cause: PiTR log replay writes the correct auto-increment ID (e.g., 8000) directly to
TiKV via raw KV writes. However, the autoid service may have already cached the value from
the full snapshot restore (e.g., 4000) and batch-allocated a range [4001, 8000]. After log
replay, the service still thinks it owns that range and allocates IDs that collide with
restored rows.
What changed and how does it work?
After PiTR log replay completes and schema reload finishes, iterate all restored tables
with
SepAutoInc() == true(AUTO_ID_CACHE=1). For each, read the persisted IID directlyfrom TiKV (bypassing the service cache) and call
ForceRebaseto synchronize the service'sin-memory state.
This is safe because:
ForceRebaseis idempotent when the persisted value already matchesre-read from TiKV when their local cache is exhausted
transitions allow writes
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit