dxf: classify data errors in finished task metric#69507
Conversation
|
Skipping CI for Draft Pull Request. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe scheduler now classifies finished-task metrics with Task-finished metric classification
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/cherry-pick release-nextgen-20251011 |
|
@D3Hunter: once the present PR merges, I will cherry-pick it on top of release-nextgen-20251011/release-nextgen-202603 in the new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #69507 +/- ##
================================================
- Coverage 76.3257% 74.4279% -1.8979%
================================================
Files 2041 2045 +4
Lines 561045 574870 +13825
================================================
- Hits 428222 427864 -358
- Misses 131922 146725 +14803
+ Partials 901 281 -620
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
🔍 Starting code review for this PR... |
|
/retest |
ingress-bot
left a comment
There was a problem hiding this comment.
This review was generated by AI and should be verified by a human reviewer.
Manual follow-up is recommended before merge.
Summary
- Total findings: 4
- Inline comments: 4
- Summary-only findings (no inline anchor): 0
Findings (highest risk first)
🟡 [Minor] (2)
- New
"data-error"metric label value breaks the single-word naming convention of the counter's existing labels (pkg/dxf/framework/scheduler/scheduler.go:54) - Reverted data-error tasks leave the
failedmetric series, silently lowering existing failure dashboards/alerts (pkg/dxf/framework/scheduler/scheduler.go:786, pkg/dxf/framework/dxfmetric/metric.go:70)
🧹 [Nit] (2)
- Test assertions use raw string literals for metric label constants defined in the same package (pkg/dxf/framework/scheduler/scheduler.go:52, pkg/dxf/framework/scheduler/scheduler_nokit_test.go:765)
- Implicit future-work intent in
isDataErrorForMetriccomment lacks a tracking reference (pkg/dxf/framework/scheduler/scheduler.go:800)
|
/retest |
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: joechenrh The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
|
/hold |
What problem does this PR solve?
Issue Number: ref #61702
Problem Summary:
The DXF finished task metric currently classifies reverted tasks with user data errors as
failed, which can trigger failure alerts even when the underlying cause is bad input data. Import Into Lightning encode/value conversion errors and add-index duplicate-key errors should be separated from infrastructure or framework failures.What changed and how does it work?
Extract the metric label selection from
onTaskFinishedintogetMetricState.Add a new
data-errormetric state for reverted tasks whose error text matches known user-data failures:ErrEncodeKVandTruncated incorrect.[kv:1062]andDuplicate entry.The matching is intentionally string-based so the DXF scheduler does not depend on Lightning error definitions. A code comment records the real Lightning error example and the add-index duplicate-entry shape for future refactoring when the error definitions can be split out.
Check List
Tests
add index
import into
Unit and local validation commands:
Side effects
Documentation
The affected behavior is the label value emitted by the DXF finished task metric for the matched user-data error cases.
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit