[Bug]: fix: update incorrect metrics in cache #1814

googs1025 · 2025-12-02T12:27:10Z

Pull Request Description

[Please provide a clear and concise description of your changes here]

Related Issues

Resolves: #[Insert issue number(s)]

part of: #1815

Important: Before submitting, please complete the description above and review the checklist below.

Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

[Bug]: Corrections to existing functionality
[CI]: Changes to build process or CI pipeline
[Docs]: Updates or additions to documentation
[API]: Modifications to aibrix's API or interface
[CLI]: Changes or additions to the Command Line Interface
[Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

PR title includes appropriate prefix(es)
Changes are clearly explained in the PR description
New and existing tests pass successfully
Code adheres to project style and best practices
Documentation updated to reflect changes (if applicable)
Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

gemini-code-assist · 2025-12-02T12:27:28Z

Summary of Changes

Hello @googs1025, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a fix for a bug where the system failed to correctly update metrics in the cache due to inconsistencies in metric names provided by different versions of underlying inference engines. By introducing a flexible mapping that supports multiple candidate metric names, the changes enhance the system's ability to reliably fetch and process metrics, improving overall compatibility and data accuracy.

Highlights

Flexible Metric Mapping: The EngineMetricsNameMapping structure has been updated from a single string to a slice of strings (map[string]string to map[string][]string). This allows for multiple potential metric names for a given engine type, providing a fallback mechanism.
Enhanced Metric Fetching Logic: The logic for fetching metrics in pkg/cache/cache_metrics.go and pkg/metrics/engine_fetcher.go has been revised to iterate through the list of candidate metric names. This ensures that the system can successfully retrieve metrics even if their names change across different versions of inference engines.
Improved Compatibility: This change directly addresses issues arising from metric name deprecations or changes in external libraries (e.g., vLLM), ensuring better backward compatibility and robustness in metric collection.
Updated Metric Definitions: All metric definitions in pkg/metrics/metrics.go have been adjusted to use the new slice-based mapping, with specific comments added for deprecated vLLM metrics to guide future maintenance.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant improvement by allowing multiple candidate names for a single metric, which is crucial for maintaining compatibility with different versions of inference engines that may use different metric names. The core change from map[string]string to map[string][]string for EngineMetricsNameMapping is well-implemented across the codebase, including the necessary logic updates in the metric fetching and parsing functions. The changes are logical and well-documented, especially in pkg/metrics/types.go.

My review has identified a couple of areas for improvement:

In pkg/metrics/engine_fetcher.go, the error propagation can be enhanced to provide more context when metric fetching fails after all retries.
In pkg/metrics/metrics.go, there are a couple of redundant inline comments that can be removed to improve code clarity.

Overall, this is a solid contribution that improves the robustness of metric collection.

gemini-code-assist · 2025-12-02T12:29:43Z

pkg/metrics/engine_fetcher.go

+		// Try each candidate until one exists and can be parsed
+		var lastErr error
+		for _, rawMetricName := range candidates {
+			if _, ok := allMetrics[rawMetricName]; !ok {
+				continue // skip if not present
+			}
+
+			metricValue, err := ef.parseMetricFromFamily(allMetrics, rawMetricName, metricDef)
+			if err != nil {
+				lastErr = err
+				klog.V(5).InfoS("Failed to parse candidate metric", "candidate", rawMetricName, "error", err)
+				continue
+			}
+
+			klog.V(4).InfoS("Successfully fetched typed metric from engine endpoint",
+				"identifier", identifier, "metric", metricName, "rawMetric", rawMetricName, "value", metricValue, "attempt", attempt+1)
+			return metricValue, nil
 		}

-		klog.V(4).InfoS("Successfully fetched typed metric from engine endpoint",
-			"identifier", identifier, "metric", metricName, "value", metricValue, "attempt", attempt+1)
-		return metricValue, nil
+		// If we get here, none of the candidates worked
+		errMsg := "none of the candidate metric names found or parsed successfully"
+		if lastErr != nil {
+			errMsg += ": " + lastErr.Error()
+		}
+		klog.V(4).InfoS("Failed to find valid metric among candidates",
+			"candidates", candidates, "identifier", identifier, "metric", metricName)
+		// Continue to next retry if any
 	}


The lastErr variable, which captures parsing failures for candidate metrics, is scoped within each attempt of the retry loop. Consequently, if the final attempt fails due to a parsing error, this valuable information is lost, and a generic error message is returned at the end of the function. This makes debugging failures more difficult. To improve error reporting, consider preserving the last error across retries and including it in the final error message returned by the function.

gemini-code-assist · 2025-12-02T12:29:43Z

pkg/metrics/metrics.go

+				// This metric is no longer emitted by vLLM >= v0.7.0.
+				// TODO: Remove the deprecated vLLM metric names from EngineMetricsNameMapping
+				// once we confirm no deployments rely on vLLM < v0.7.0.
+				"vllm": {"vllm:avg_prompt_throughput_toks_per_s"}, //deprecated metrics https://github.com/vllm-project/vllm/pull/12383


The block comment above (lines 128-132) already provides a detailed explanation about the deprecated vLLM metric, including a link to the relevant pull request. This inline comment is redundant and can be removed to improve code clarity.

Suggested change

"vllm": {"vllm:avg_prompt_throughput_toks_per_s"}, //deprecated metrics https://github.com/vllm-project/vllm/pull/12383

"vllm": {"vllm:avg_prompt_throughput_toks_per_s"},

gemini-code-assist · 2025-12-02T12:29:44Z

pkg/metrics/metrics.go

+				// This metric is no longer emitted by vLLM >= v0.7.0.
+				// TODO: Remove the deprecated vLLM metric names from EngineMetricsNameMapping
+				// once we confirm no deployments rely on vLLM < v0.7.0.
+				"vllm":   {"vllm:avg_generation_throughput_toks_per_s"}, // deprecated metrics  https://github.com/vllm-project/vllm/pull/12383


Similar to the comment on AvgPromptThroughputToksPerS, the block comment on lines 144-148 already explains in detail that this metric is deprecated. This inline comment is redundant and can be removed for better readability.

Suggested change

"vllm": {"vllm:avg_generation_throughput_toks_per_s"}, // deprecated metrics https://github.com/vllm-project/vllm/pull/12383

"vllm": {"vllm:avg_generation_throughput_toks_per_s"},

Copilot

Pull request overview

This PR fixes incorrect metric configurations in the cache layer by updating the metric name mapping structure and correcting metric type classifications. The primary change converts EngineMetricsNameMapping from map[string]string to map[string][]string to support fallback metric names across different versions of inference engines (e.g., vLLM).

Key Changes:

Introduced fallback mechanism for metric names to handle version differences in inference engines
Added comprehensive documentation explaining the fallback ordering strategy
Updated metric type classifications for several metrics
Added support for SGLang queue_time_seconds metric mapping

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
pkg/metrics/types.go	Changed `EngineMetricsNameMapping` from `map[string]string` to `map[string][]string` with detailed documentation explaining the fallback mechanism
pkg/metrics/metrics.go	Updated all metric definitions to use slice-based mapping, added vLLM version compatibility comments, reorganized metrics by type, and added SGLang support for RequestQueueTimeSeconds
pkg/metrics/engine_fetcher.go	Implemented candidate iteration logic to try multiple metric names in order, improved error handling with lastErr tracking
pkg/cache/cache_metrics.go	Updated fetchMetrics to iterate through candidate metric names instead of single lookup
pkg/metrics/engine_fetcher_test.go	Updated test fixtures to use new slice-based mapping structure

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-02T15:24:06Z

pkg/metrics/engine_fetcher.go

+	return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: %w",
+		metricName, identifier, ef.config.MaxRetries+1, lastErr)


Potential nil error wrapping in error message

If none of the candidate metrics are found in the endpoint response (all candidates fail the existence check at line 135), lastErr will remain nil. Using %w with a nil error will result in a confusing error message like "... after 3 attempts: %!w()".

Suggested fix:

if lastErr != nil { return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: %w", metricName, identifier, ef.config.MaxRetries+1, lastErr) } return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: no matching candidates found", metricName, identifier, ef.config.MaxRetries+1)

Suggested change

return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: %w",

metricName, identifier, ef.config.MaxRetries+1, lastErr)

if lastErr != nil {

return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: %w",

metricName, identifier, ef.config.MaxRetries+1, lastErr)

}

return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: no matching candidates found",

metricName, identifier, ef.config.MaxRetries+1)

Copilot · 2025-12-02T15:24:07Z

pkg/metrics/metrics.go

-				"sglang": "sglang:gen_throughput",
+			EngineMetricsNameMapping: map[string][]string{
+				// vLLM deprecated "vllm:avg_generation_throughput_toks_per_s" in v0.7.0.
+				// It was removed entirely in PR #12383(https://github.com/vllm-project/vllm/pull/12383).


[nitpick] Missing space in PR reference

There's a missing space between the PR number and the opening parenthesis. For consistency with the comment on line 129, this should be:

// It was removed entirely in PR #12383 (https://github.com/vllm-project/vllm/pull/12383).

Suggested change

// It was removed entirely in PR #12383(https://github.com/vllm-project/vllm/pull/12383).

// It was removed entirely in PR #12383 (https://github.com/vllm-project/vllm/pull/12383).

Copilot · 2025-12-02T15:24:07Z

pkg/metrics/metrics.go

+		NumRequestsSwapped: {
 			MetricScope:  PodModelMetricScope,
 			MetricSource: PodRawMetrics,
 			MetricType: MetricType{
 				Raw: Counter,
 			},
-			EngineMetricsNameMapping: map[string]string{
-				"vllm":   "vllm:num_requests_running",
-				"sglang": "sglang:num_running_reqs",
+			EngineMetricsNameMapping: map[string][]string{
+				"vllm": {"vllm:num_requests_swapped"},
 			},
-			Description: "Number of running requests",
+			Description: "Number of swapped requests",
 		},


Incorrect metric type for NumRequestsSwapped

NumRequestsSwapped is marked as a Counter, but it should be a Gauge. According to the vLLM Prometheus metrics specification and the mock server implementation (development/app/app.py line 924), num_requests_swapped represents the current number of swapped requests at any given time, which is a point-in-time measurement that can go up or down.

Counter vs Gauge:

Counter: Monotonically increasing value (e.g., total requests processed)

Gauge: Value that can increase or decrease (e.g., current number of requests)

The number of swapped requests fluctuates as requests are swapped in and out, making it a Gauge metric, not a Counter.

Copilot · 2025-12-02T15:24:08Z

pkg/metrics/metrics.go

 		GenerationTokenTotal: {
 			MetricScope:  PodModelMetricScope,
 			MetricSource: PodRawMetrics,
 			MetricType: MetricType{
 				Raw: Gauge,
 			},
-			EngineMetricsNameMapping: map[string]string{
-				"vllm": "vllm:generation_tokens_total",
+			EngineMetricsNameMapping: map[string][]string{
+				"vllm": {"vllm:generation_tokens_total"},
 			},
 			Description: "Total generation tokens",
 		},


Incorrect metric type for GenerationTokenTotal

GenerationTokenTotal is marked as a Gauge, but it should be a Counter. According to the vLLM Prometheus metrics specification and the mock server implementation (development/app/app.py line 906), generation_tokens_total represents the cumulative count of generation tokens processed, which is a monotonically increasing value.

Counter vs Gauge:

Counter: Monotonically increasing value (e.g., total tokens processed) ✓ Correct for this metric

Gauge: Value that can increase or decrease (e.g., current requests)

Total token counts accumulate over time and never decrease, making this a Counter metric, not a Gauge.

googs1025 · 2025-12-05T05:18:34Z

pkg/metrics/types.go

+	MetricType   MetricType
+	PromQL       string // Optional: Only applicable for PromQL-based metrics
+	LabelKey     string // Optional: Only applicable for QueryLabel-based metrics
+	// EngineMetricsNameMapping defines a fallback list of raw metric names for each engine type.


There is a fallback mechanism here, which can be compatible with different versions of metrics.

Signed-off-by: CYJiang <[email protected]>

googs1025 marked this pull request as draft December 2, 2025 12:27

gemini-code-assist bot reviewed Dec 2, 2025

View reviewed changes

googs1025 force-pushed the fix/cache_metrics branch 3 times, most recently from f8caf24 to d3e74e9 Compare December 2, 2025 12:54

googs1025 requested a review from Copilot December 2, 2025 15:09

Copilot started reviewing on behalf of googs1025 December 2, 2025 15:17 View session

Copilot finished reviewing on behalf of googs1025 December 2, 2025 15:20

Copilot AI reviewed Dec 2, 2025

View reviewed changes

googs1025 force-pushed the fix/cache_metrics branch from d3e74e9 to 5267cf3 Compare December 2, 2025 15:50

googs1025 commented Dec 5, 2025

View reviewed changes

googs1025 force-pushed the fix/cache_metrics branch from 5267cf3 to 0cb5ecc Compare December 5, 2025 08:07

googs1025 mentioned this pull request Dec 5, 2025

[Docs]: update metric names for clarity and consistency #1822

Open

googs1025 requested a review from Jeffwan December 5, 2025 09:06

googs1025 force-pushed the fix/cache_metrics branch from 0cb5ecc to 8c2a16d Compare December 5, 2025 12:05

fix: update incorrect metrics in cache

e756409

Signed-off-by: CYJiang <[email protected]>

googs1025 force-pushed the fix/cache_metrics branch from 8c2a16d to e756409 Compare December 7, 2025 14:01

	"vllm": {"vllm:avg_prompt_throughput_toks_per_s"}, //deprecated metrics https://github.com/vllm-project/vllm/pull/12383
	"vllm": {"vllm:avg_prompt_throughput_toks_per_s"},

	"vllm": {"vllm:avg_generation_throughput_toks_per_s"}, // deprecated metrics https://github.com/vllm-project/vllm/pull/12383
	"vllm": {"vllm:avg_generation_throughput_toks_per_s"},

		return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: %w",
		metricName, identifier, ef.config.MaxRetries+1, lastErr)

	// It was removed entirely in PR #12383(https://github.com/vllm-project/vllm/pull/12383).
	// It was removed entirely in PR #12383 (https://github.com/vllm-project/vllm/pull/12383).

[Bug]: fix: update incorrect metrics in cache #1814

Are you sure you want to change the base?

[Bug]: fix: update incorrect metrics in cache #1814

Conversation

googs1025 commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Description

Related Issues

Pull Request Title Format

Submission Checklist

Uh oh!

gemini-code-assist bot commented Dec 2, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

googs1025 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

googs1025 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

googs1025 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

googs1025 Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

googs1025 commented Dec 2, 2025 •

edited

Loading