-
Notifications
You must be signed in to change notification settings - Fork 495
[Bug]: fix: update incorrect metrics in cache #1814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @googs1025, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a fix for a bug where the system failed to correctly update metrics in the cache due to inconsistencies in metric names provided by different versions of underlying inference engines. By introducing a flexible mapping that supports multiple candidate metric names, the changes enhance the system's ability to reliably fetch and process metrics, improving overall compatibility and data accuracy. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant improvement by allowing multiple candidate names for a single metric, which is crucial for maintaining compatibility with different versions of inference engines that may use different metric names. The core change from map[string]string to map[string][]string for EngineMetricsNameMapping is well-implemented across the codebase, including the necessary logic updates in the metric fetching and parsing functions. The changes are logical and well-documented, especially in pkg/metrics/types.go.
My review has identified a couple of areas for improvement:
- In
pkg/metrics/engine_fetcher.go, the error propagation can be enhanced to provide more context when metric fetching fails after all retries. - In
pkg/metrics/metrics.go, there are a couple of redundant inline comments that can be removed to improve code clarity.
Overall, this is a solid contribution that improves the robustness of metric collection.
| // Try each candidate until one exists and can be parsed | ||
| var lastErr error | ||
| for _, rawMetricName := range candidates { | ||
| if _, ok := allMetrics[rawMetricName]; !ok { | ||
| continue // skip if not present | ||
| } | ||
|
|
||
| metricValue, err := ef.parseMetricFromFamily(allMetrics, rawMetricName, metricDef) | ||
| if err != nil { | ||
| lastErr = err | ||
| klog.V(5).InfoS("Failed to parse candidate metric", "candidate", rawMetricName, "error", err) | ||
| continue | ||
| } | ||
|
|
||
| klog.V(4).InfoS("Successfully fetched typed metric from engine endpoint", | ||
| "identifier", identifier, "metric", metricName, "rawMetric", rawMetricName, "value", metricValue, "attempt", attempt+1) | ||
| return metricValue, nil | ||
| } | ||
|
|
||
| klog.V(4).InfoS("Successfully fetched typed metric from engine endpoint", | ||
| "identifier", identifier, "metric", metricName, "value", metricValue, "attempt", attempt+1) | ||
| return metricValue, nil | ||
| // If we get here, none of the candidates worked | ||
| errMsg := "none of the candidate metric names found or parsed successfully" | ||
| if lastErr != nil { | ||
| errMsg += ": " + lastErr.Error() | ||
| } | ||
| klog.V(4).InfoS("Failed to find valid metric among candidates", | ||
| "candidates", candidates, "identifier", identifier, "metric", metricName) | ||
| // Continue to next retry if any | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lastErr variable, which captures parsing failures for candidate metrics, is scoped within each attempt of the retry loop. Consequently, if the final attempt fails due to a parsing error, this valuable information is lost, and a generic error message is returned at the end of the function. This makes debugging failures more difficult. To improve error reporting, consider preserving the last error across retries and including it in the final error message returned by the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/metrics/metrics.go
Outdated
| // This metric is no longer emitted by vLLM >= v0.7.0. | ||
| // TODO: Remove the deprecated vLLM metric names from EngineMetricsNameMapping | ||
| // once we confirm no deployments rely on vLLM < v0.7.0. | ||
| "vllm": {"vllm:avg_prompt_throughput_toks_per_s"}, //deprecated metrics https://github.com/vllm-project/vllm/pull/12383 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The block comment above (lines 128-132) already provides a detailed explanation about the deprecated vLLM metric, including a link to the relevant pull request. This inline comment is redundant and can be removed to improve code clarity.
| "vllm": {"vllm:avg_prompt_throughput_toks_per_s"}, //deprecated metrics https://github.com/vllm-project/vllm/pull/12383 | |
| "vllm": {"vllm:avg_prompt_throughput_toks_per_s"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/metrics/metrics.go
Outdated
| // This metric is no longer emitted by vLLM >= v0.7.0. | ||
| // TODO: Remove the deprecated vLLM metric names from EngineMetricsNameMapping | ||
| // once we confirm no deployments rely on vLLM < v0.7.0. | ||
| "vllm": {"vllm:avg_generation_throughput_toks_per_s"}, // deprecated metrics https://github.com/vllm-project/vllm/pull/12383 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the comment on AvgPromptThroughputToksPerS, the block comment on lines 144-148 already explains in detail that this metric is deprecated. This inline comment is redundant and can be removed for better readability.
| "vllm": {"vllm:avg_generation_throughput_toks_per_s"}, // deprecated metrics https://github.com/vllm-project/vllm/pull/12383 | |
| "vllm": {"vllm:avg_generation_throughput_toks_per_s"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
f8caf24 to
d3e74e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes incorrect metric configurations in the cache layer by updating the metric name mapping structure and correcting metric type classifications. The primary change converts EngineMetricsNameMapping from map[string]string to map[string][]string to support fallback metric names across different versions of inference engines (e.g., vLLM).
Key Changes:
- Introduced fallback mechanism for metric names to handle version differences in inference engines
- Added comprehensive documentation explaining the fallback ordering strategy
- Updated metric type classifications for several metrics
- Added support for SGLang queue_time_seconds metric mapping
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/metrics/types.go | Changed EngineMetricsNameMapping from map[string]string to map[string][]string with detailed documentation explaining the fallback mechanism |
| pkg/metrics/metrics.go | Updated all metric definitions to use slice-based mapping, added vLLM version compatibility comments, reorganized metrics by type, and added SGLang support for RequestQueueTimeSeconds |
| pkg/metrics/engine_fetcher.go | Implemented candidate iteration logic to try multiple metric names in order, improved error handling with lastErr tracking |
| pkg/cache/cache_metrics.go | Updated fetchMetrics to iterate through candidate metric names instead of single lookup |
| pkg/metrics/engine_fetcher_test.go | Updated test fixtures to use new slice-based mapping structure |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: %w", | ||
| metricName, identifier, ef.config.MaxRetries+1, lastErr) |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential nil error wrapping in error message
If none of the candidate metrics are found in the endpoint response (all candidates fail the existence check at line 135), lastErr will remain nil. Using %w with a nil error will result in a confusing error message like "... after 3 attempts: %!w()".
Suggested fix:
if lastErr != nil {
return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: %w",
metricName, identifier, ef.config.MaxRetries+1, lastErr)
}
return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: no matching candidates found",
metricName, identifier, ef.config.MaxRetries+1)| return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: %w", | |
| metricName, identifier, ef.config.MaxRetries+1, lastErr) | |
| if lastErr != nil { | |
| return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: %w", | |
| metricName, identifier, ef.config.MaxRetries+1, lastErr) | |
| } | |
| return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: no matching candidates found", | |
| metricName, identifier, ef.config.MaxRetries+1) |
| "sglang": "sglang:gen_throughput", | ||
| EngineMetricsNameMapping: map[string][]string{ | ||
| // vLLM deprecated "vllm:avg_generation_throughput_toks_per_s" in v0.7.0. | ||
| // It was removed entirely in PR #12383(https://github.com/vllm-project/vllm/pull/12383). |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Missing space in PR reference
There's a missing space between the PR number and the opening parenthesis. For consistency with the comment on line 129, this should be:
// It was removed entirely in PR #12383 (https://github.com/vllm-project/vllm/pull/12383).
| // It was removed entirely in PR #12383(https://github.com/vllm-project/vllm/pull/12383). | |
| // It was removed entirely in PR #12383 (https://github.com/vllm-project/vllm/pull/12383). |
| NumRequestsSwapped: { | ||
| MetricScope: PodModelMetricScope, | ||
| MetricSource: PodRawMetrics, | ||
| MetricType: MetricType{ | ||
| Raw: Counter, | ||
| }, | ||
| EngineMetricsNameMapping: map[string]string{ | ||
| "vllm": "vllm:num_requests_running", | ||
| "sglang": "sglang:num_running_reqs", | ||
| EngineMetricsNameMapping: map[string][]string{ | ||
| "vllm": {"vllm:num_requests_swapped"}, | ||
| }, | ||
| Description: "Number of running requests", | ||
| Description: "Number of swapped requests", | ||
| }, |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect metric type for NumRequestsSwapped
NumRequestsSwapped is marked as a Counter, but it should be a Gauge. According to the vLLM Prometheus metrics specification and the mock server implementation (development/app/app.py line 924), num_requests_swapped represents the current number of swapped requests at any given time, which is a point-in-time measurement that can go up or down.
Counter vs Gauge:
- Counter: Monotonically increasing value (e.g., total requests processed)
- Gauge: Value that can increase or decrease (e.g., current number of requests)
The number of swapped requests fluctuates as requests are swapped in and out, making it a Gauge metric, not a Counter.
pkg/metrics/metrics.go
Outdated
| GenerationTokenTotal: { | ||
| MetricScope: PodModelMetricScope, | ||
| MetricSource: PodRawMetrics, | ||
| MetricType: MetricType{ | ||
| Raw: Gauge, | ||
| }, | ||
| EngineMetricsNameMapping: map[string]string{ | ||
| "vllm": "vllm:generation_tokens_total", | ||
| EngineMetricsNameMapping: map[string][]string{ | ||
| "vllm": {"vllm:generation_tokens_total"}, | ||
| }, | ||
| Description: "Total generation tokens", | ||
| }, |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect metric type for GenerationTokenTotal
GenerationTokenTotal is marked as a Gauge, but it should be a Counter. According to the vLLM Prometheus metrics specification and the mock server implementation (development/app/app.py line 906), generation_tokens_total represents the cumulative count of generation tokens processed, which is a monotonically increasing value.
Counter vs Gauge:
- Counter: Monotonically increasing value (e.g., total tokens processed) ✓ Correct for this metric
- Gauge: Value that can increase or decrease (e.g., current requests)
Total token counts accumulate over time and never decrease, making this a Counter metric, not a Gauge.
d3e74e9 to
5267cf3
Compare
| MetricType MetricType | ||
| PromQL string // Optional: Only applicable for PromQL-based metrics | ||
| LabelKey string // Optional: Only applicable for QueryLabel-based metrics | ||
| // EngineMetricsNameMapping defines a fallback list of raw metric names for each engine type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a fallback mechanism here, which can be compatible with different versions of metrics.
5267cf3 to
0cb5ecc
Compare
0cb5ecc to
8c2a16d
Compare
Signed-off-by: CYJiang <[email protected]>
8c2a16d to
e756409
Compare
Pull Request Description
[Please provide a clear and concise description of your changes here]
Related Issues
Resolves: #[Insert issue number(s)]
part of: #1815
Important: Before submitting, please complete the description above and review the checklist below.
Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]: Corrections to existing functionality[CI]: Changes to build process or CI pipeline[Docs]: Updates or additions to documentation[API]: Modifications to aibrix's API or interface[CLI]: Changes or additions to the Command Line Interface[Misc]: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.