fix(ci): gate Stage_3 deploy on MCR image availability instead of Stage_2 completion#1727
Open
zanejohnson-azure wants to merge 1 commit into
Open
fix(ci): gate Stage_3 deploy on MCR image availability instead of Stage_2 completion#1727zanejohnson-azure wants to merge 1 commit into
zanejohnson-azure wants to merge 1 commit into
Conversation
…ge_2 completion Stage_2's Ev2 SDP rollout does not complete for ~24h due to the bake/monitoring window, but the ama-logs images are published to MCR early in the rollout. Decouple Stage_3 (dependsOn: []) and add a WaitForMCRImages gate job that polls MCR for the new Linux and Windows image tags, so cluster deploys start as soon as the images are available. Mirrors the ama-metrics release pipeline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Stage_3(Deploy ama-logs to CI AKS Prod Clusters) depends onStage_2. ButStage_2's Ev2 SDP rollout does not "complete" for ~24h because of the bake/monitoring window — even though the ama-logs images are published to MCR early in that rollout. As a result, Stage_3 waits ~24h before it can even start deploying.Change
Decouple
Stage_3fromStage_2and gate the deploys on actual image availability on MCR instead — mirroring the ama-metrics release pipeline.Stage_3:dependsOn: Stage_2→dependsOn: []; droppedeq(dependencies.Stage_2.result, 'Succeeded')from the condition (other guards — not-PR, main branch, non-empty tag — retained).WaitForMCRImagesgate job that polls MCR until both the Linux ($(AgentImageTagSuffix)) and Windows (win-$(AgentImageTagSuffix)) tags exist undermcr.microsoft.com/azuremonitor/containerinsights/ciprod(up to 24h, 1440 × 60s).dependsOn: ['WaitForMCRImages'], so Helm only runs once the images are confirmed on MCR.Note: the check queries the manifest endpoint directly (HTTP 200 = exists) rather than grepping
/tags/list. Theciprodrepo currently has ~494 tags, and a substring grep would false-match (e.g.3.4.0inside3.4.01orwin-3.4.0). Manifest lookup is exact.Local verification
The
check_tagfunction and retry loop were tested locally against live MCR using GNU bash 5.2.21.Test script (core
check_tag/loop is verbatim from the pipeline; the rest is the harness)Output (live MCR):
Result: ✅ Succeeds only when both Linux and Windows tags exist on MCR; correctly fails on a missing tag; exact manifest match avoids substring false positives.