cluster_link: Schema Registry API-mode replication failover via paused#30984
Open
bartoszpiekny-redpanda wants to merge 6 commits into
Open
cluster_link: Schema Registry API-mode replication failover via paused#30984bartoszpiekny-redpanda wants to merge 6 commits into
bartoszpiekny-redpanda wants to merge 6 commits into
Conversation
Add a user-settable `paused` field to SchemaRegistrySyncOptions, matching the sibling sync options (topic-metadata, consumer-offset, security, role). Pausing stops the Schema Registry sync task and, for API-mode shadowing, lifts the per-context client write protection; it is also set when the link's Schema Registry replication is failed over. Regenerates the ducktape Python bindings.
Add a `paused` bool to schema_registry_sync_config and wire the admin converter both directions (proto get_paused/set_paused), mirroring the sibling sync configs. The serde envelope is bumped to v2 with a version-gated read so pre-v2 records default to not-paused. No behavior change yet: later commits consume `paused` to lift the API-mode client write block and pause the sync task.
link_disables_client_writes now returns "not blocked" for an API-mode Schema Registry link whose config is paused: replication has stopped, so the contexts the link owned are handed back to clients. Topic-mode and non-paused API-mode links are unchanged, and the any_of across links still blocks a context owned by any non-paused link.
mirroring_task::is_enabled() now also requires the config to not be paused. A paused API-mode link disables the task, so the base reconciler pauses it (should_pause = !is_enabled && should_start_impl) while the shard still leads _schemas/0, and resumes it when un-paused.
A full-link failover (empty shadow_topic_name) of an API-mode Schema Registry link now also pauses its SR config via a get-modify-write config update, so replication stops and the client write protection on the link's contexts is lifted. The update is idempotent (skipped if already paused) and reuses update_cluster_link, so no dedicated command is needed. Topic-mode SR failover is unchanged (handled by failover_link_topics). Verified end to end by ducktape (see the SR write-blocking suite).
Add end-to-end coverage to the SR write-blocking suite for the paused flag introduced across this stack: - full-link failover (rpk shadow failover --all) sets paused and lifts client write protection on the link's owned contexts; - paused is user-settable via update_shadow_link and toggles the block; - the paused flag survives a full target-cluster restart.
7a99b20 to
d6c84ab
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a user-controlled paused flag for API-mode Schema Registry replication on cluster links, and uses it to implement failover behavior that both stops the SR sync task and lifts per-context client write blocking on the target cluster.
Changes:
- Extend Schema Registry sync config (model + admin API/proto) with a durable
pausedflag, including serde v2 read/write behavior. - Update API-mode SR sync task enablement and write-blocking logic to respect
paused, and setpausedautomatically on full-link failover. - Add/extend unit + ducktape coverage for pause/unpause, failover behavior, and durability across restart.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/rptest/tests/cluster_linking_schema_registry_write_blocking_test.py | Adds ducktape coverage for API-mode SR failover pausing + write-unblocking, manual pause toggling, and persistence across restart. |
| tests/rptest/clients/admin/proto/redpanda/core/admin/v2/shadow_link_pb2.pyi | Updates Python protobuf typings for new paused field. |
| tests/rptest/clients/admin/proto/redpanda/core/admin/v2/shadow_link_pb2.py | Updates generated Python protobuf code for new paused field. |
| src/v/redpanda/admin/services/shadow_link/tests/converter_test.cc | Verifies admin <-> metadata conversion includes paused for API-mode SR configs. |
| src/v/redpanda/admin/services/shadow_link/shadow_link.cc | Sets paused on full-link failover for API-mode SR links (ends replication + lifts blocking). |
| src/v/redpanda/admin/services/shadow_link/converter.cc | Converts paused between admin proto and internal model config. |
| src/v/cluster/cluster_link/tests/shadow_link_write_blocking.cc | Adds unit tests verifying paused lifts client write blocking for API-mode contexts and doesn’t override other links’ ownership. |
| src/v/cluster/cluster_link/frontend.cc | Makes API-mode client write-blocking logic return “allowed” when the link is paused. |
| src/v/cluster_link/schema_registry_sync/tests/mirroring_task_test.cc | Adds unit test asserting paused config transitions SR mirroring task to paused and back to active. |
| src/v/cluster_link/schema_registry_sync/mirroring_task.cc | Disables SR mirroring task when config is paused. |
| src/v/cluster_link/model/types.h | Bumps schema registry sync config serde envelope to v2 and adds paused field. |
| src/v/cluster_link/model/types.cc | Copies/serializes/deserializes paused and includes it in formatting. |
| src/v/cluster_link/model/tests/test_model.cc | Adds serde/copy round-trip coverage for paused, including defaulting for older versions. |
| proto/redpanda/core/admin/v2/shadow_link.proto | Adds paused to SchemaRegistrySyncOptions in the admin API. |
Collaborator
CI test resultstest results on build#86593
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds failover for API-mode SR replication. Topic-mode SR already fails over via the _schemas mirror status; API-mode had no equivalent, so after failover the sync task kept running and owned contexts stayed write-blocked.
A single user-settable paused field on the SR sync config drives both effects — stops the sync task and lifts the per-context client write block. rpk shadow failover --all sets it for API-mode links; topic-mode is unchanged (paused is inert there).
No new controller command or feature bit. paused rides the existing config-update command (serde envelope v1→v2, version-gated read → pre-v2 records default to not-paused), so there's no rolling-upgrade hazard. API-mode SR is already gated by the existing shadow_link_sr_api_sync feature, and since paused only matters on an API-mode link, it needs no gate of its own. feature_table is untouched.
Tests: unit (serde/copy, converter, write-block incl. multi-link, task pause/resume) + ducktape (failover unblocks & pauses, manual toggle, survives restart).
Backports Required
Release Notes