Skip to content

Conversation

@timvaillancourt
Copy link
Contributor

@timvaillancourt timvaillancourt commented Dec 1, 2025

Description

This PR adds a postponed_by_in_order_completions field to SHOW-like outputs from OnlineDDL, to make it clear to users how many migrations must execute in order, before a given migration (if any)

Originally I approached this improvement by simply adding a logline to the OnlineDDL executor, but I felt this was sub-optimal for these reasons:

  1. This information is hidden from users of vtctldclient OnlineDDL <db> show and show vitess_migrations, which to me doesn't address the main concern - visibility
  2. This logline becomes repeated/noisy
  3. A logline is produced by a single vitess component, and the user may not know which component produces this log

This change raises a valid question: what if you have a LOT of dependent migrations? The dependent_migrations text-field could grow quite large. If this is a concern perhaps we need a different approach, such as a count or just-a-bool, eg: dependent_migrations # or is_postponed_in_order 0/1 (or something)

Another valid question: could we add unit tests? Yes, however the existing code I'm modifying has no tests and my goal here was to make a small improvement. If we'd like to tackle implementing unit tests for the greater logic in this PR, that is possible 👍

cc @shlomi-noach for thoughts, as we originally discussed a simpler approach

show vitess_migrations like '<uuid>':
TBD

vtctldclient OnlineDDL commerce show:
TBD

Draft documentation PR: vitessio/website#2032

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

AI Disclosure

@timvaillancourt timvaillancourt self-assigned this Dec 1, 2025
@timvaillancourt timvaillancourt added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) labels Dec 1, 2025
@vitess-bot
Copy link
Contributor

vitess-bot bot commented Dec 1, 2025

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Dec 1, 2025
@github-actions github-actions bot added this to the v24.0.0 milestone Dec 1, 2025
@timvaillancourt timvaillancourt removed NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Dec 1, 2025
Signed-off-by: Tim Vaillancourt <[email protected]>
@timvaillancourt timvaillancourt removed the NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work label Dec 1, 2025
@timvaillancourt timvaillancourt marked this pull request as ready for review December 1, 2025 13:20
@promptless
Copy link

promptless bot commented Dec 1, 2025

📝 Documentation updates detected!

New suggestion: Document new dependent_migrations field in OnlineDDL output

@timvaillancourt
Copy link
Contributor Author

This change raises a valid question: what if you have a LOT of dependent migrations? The dependent_migrations text-field could grow quite large. If this is a concern perhaps we need a different approach, such as a count or just-a-bool, eg: dependent_migrations # or is_postponed_in_order 0/1 (or something)

I'm starting to think this should just be a integer field like dependent_migrations int unsigned 🤔. It tells us less, but won't become a problem for displaying

Your thoughts are appreciated 🙇

@timvaillancourt
Copy link
Contributor Author

timvaillancourt commented Dec 1, 2025

Adding doc PR by promptless: vitessio/website#2032

Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an issue with reviewMigrationDependencies() being called on every tick, which means the migration row will be updated at about every minute, or even more frequently.

Alternatively, we can choose to only update it at this section:

				if strategySetting.IsInOrderCompletion() {
					if len(pendingMigrationsUUIDs) > 0 && pendingMigrationsUUIDs[0] != onlineDDL.UUID {
						// wait for earlier pending migrations to complete
						return nil
					}
				}

found on https://github.com/timvaillancourt/vitess/blob/19d763ddf877c9d8ec3d2aadb6a32e2914adfb46/go/vt/vttablet/onlineddl/executor.go#L2903-L2908

this means the field will only be populated once the migration is ready to complete, not before (and then, repeatedly so).

On top of that, we can populate the field once right upon submission, or right when the migration is about to start running.

I'd avoid the current recurring update.

@shlomi-noach
Copy link
Contributor

Testing-wise, go/test/endtoend/onlineddl/scheduler/onlineddl_scheduler_test.go is the place to go.

@timvaillancourt
Copy link
Contributor Author

@shlomi-noach I've made some updates to the PR:

  1. Dependents are added during .SubmitMigration(...).
  2. If --in-order-migration, dependents are updated once-per-tick when the migration is postponed (return nil) for any reason - in order to get an up-to-date view
  3. Dependents are cleared on complete, failed or cancelled migrations
  4. Dependents are re-added on .RetryMigration(...)

Upon testing everything is working as expected. Let me know how that looks!

An open question: should we just use an integer of dependent migrations (less info/more compact) or keep the more-verbose CSV list 🤔?

Signed-off-by: Tim Vaillancourt <[email protected]>
Copy link
Member

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be a nice addition! We can't merge this, however, without any tests. I would like to see unit tests, I know there's no framework in place for it but Claude is quite good for that kind of thing.

At the very least, we should add to the existing "in order completion" end to end tests to confirm that this is behaving as expected/correctly. I'm happy to help with that if you like.

@timvaillancourt
Copy link
Contributor Author

timvaillancourt commented Dec 3, 2025

After some more thought on this PR, I'm going to simplify the approach pretty significantly

Because we know everything "ahead" of us is required to cut-over when --in-order-completion is used, I'm going to move this field from a text CSV-list -> just a int unsigned count of migrations we are waiting for. If the user wants to know what those migrations are, they can read the queue in order. This simpler approach will also reduce the number of times we are updating the internal table

I will also rename the field dependent_migrations -> postponed_completion_migrations, or similar. This new name re-uses existing terminology that should explain how many migrations are the cause of a postponed completion

@timvaillancourt timvaillancourt changed the title OnlineDDL: add dependent migrations to show output OnlineDDL: add in-order-completion count to show outputs Dec 3, 2025
@timvaillancourt
Copy link
Contributor Author

@Promptless please refresh the PR you created to consider commit: 580b714

Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
@timvaillancourt
Copy link
Contributor Author

timvaillancourt commented Dec 3, 2025

The PR is now updated to just a count of migrations we will wait for in-order-completion, if any. It is refreshed each loop we are in "ready" state + waiting, when --in-order-completion is in use

Also the count is updated when we discover we were waiting, but we are no longer waiting

Outstanding:

  • More testing
  • Adding e2e test

Signed-off-by: Tim Vaillancourt <[email protected]>
Copy link
Member

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one comment/concern, which may just be due to my ignorance. 🙂 Otherwise it LGTM. Let me know what you think.

Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board with changing this to be an integer. Truly all it takes is to make a human realize "oh, this migration is not completing because there's this other migration that our migration is waiting for". Whether a UUID or an int is not that important.

How would you feel about the name prior_in_order_pending_count? I think it's more descriptive than postponed_in_order_completions.

Would you like some guidance with e2e testing?

sm.VitessLivenessIndicator = row.AsInt64("vitess_liveness_indicator", 0)
sm.UserThrottleRatio = float32(row.AsFloat64("user_throttle_ratio", 0))
sm.SpecialPlan = row.AsString("special_plan", "")
sm.PostponedByInOrderCompletions = row.AsUint64("postponed_by_in_order_completions", 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in schema_migrations.sql the column is named postponed_in_order_completions, which is incompatible with the name used here.

Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Copy link
Member

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a more thorough review this time and it LGTM.

We still need endtoend test additions/updates. IMO it's not ready for review w/o that.

@timvaillancourt
Copy link
Contributor Author

I did a more thorough review this time and it LGTM.

We still need endtoend test additions/updates. IMO it's not ready for review w/o that.

@mattlord / @shlomi-noach thanks. I've updated to a new column name and incorporated the PR suggestions

Unfortunately that change has broken some logic in unobvious ways - the count is now always zero 🤔. Still digging into that problem, then I'll add an end-to-end test

Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor change; I can't see why the latest two commits would break anything. Let's do the other way around: add a e2e validation in a few places, see what happens.

timvaillancourt and others added 3 commits December 9, 2025 14:12
Co-authored-by: Shlomi Noach <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Co-authored-by: Shlomi Noach <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
@mattlord mattlord self-requested a review December 9, 2025 14:05
Signed-off-by: Tim Vaillancourt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Enhancement Logical improvement (somewhere between a bug and feature)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: show --in-order-completion status in OnlineDDL outputs

3 participants