Skip to content

Conversation

@googs1025
Copy link
Collaborator

@googs1025 googs1025 commented Dec 2, 2025

Pull Request Description

[Please provide a clear and concise description of your changes here]

Related Issues

Resolves: #[Insert issue number(s)]

part of: #1815

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @googs1025, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a fix for a bug where the system failed to correctly update metrics in the cache due to inconsistencies in metric names provided by different versions of underlying inference engines. By introducing a flexible mapping that supports multiple candidate metric names, the changes enhance the system's ability to reliably fetch and process metrics, improving overall compatibility and data accuracy.

Highlights

  • Flexible Metric Mapping: The EngineMetricsNameMapping structure has been updated from a single string to a slice of strings (map[string]string to map[string][]string). This allows for multiple potential metric names for a given engine type, providing a fallback mechanism.
  • Enhanced Metric Fetching Logic: The logic for fetching metrics in pkg/cache/cache_metrics.go and pkg/metrics/engine_fetcher.go has been revised to iterate through the list of candidate metric names. This ensures that the system can successfully retrieve metrics even if their names change across different versions of inference engines.
  • Improved Compatibility: This change directly addresses issues arising from metric name deprecations or changes in external libraries (e.g., vLLM), ensuring better backward compatibility and robustness in metric collection.
  • Updated Metric Definitions: All metric definitions in pkg/metrics/metrics.go have been adjusted to use the new slice-based mapping, with specific comments added for deprecated vLLM metrics to guide future maintenance.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@googs1025 googs1025 marked this pull request as draft December 2, 2025 12:27
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant improvement by allowing multiple candidate names for a single metric, which is crucial for maintaining compatibility with different versions of inference engines that may use different metric names. The core change from map[string]string to map[string][]string for EngineMetricsNameMapping is well-implemented across the codebase, including the necessary logic updates in the metric fetching and parsing functions. The changes are logical and well-documented, especially in pkg/metrics/types.go.

My review has identified a couple of areas for improvement:

  • In pkg/metrics/engine_fetcher.go, the error propagation can be enhanced to provide more context when metric fetching fails after all retries.
  • In pkg/metrics/metrics.go, there are a couple of redundant inline comments that can be removed to improve code clarity.

Overall, this is a solid contribution that improves the robustness of metric collection.

Comment on lines 133 to 154
// Try each candidate until one exists and can be parsed
var lastErr error
for _, rawMetricName := range candidates {
if _, ok := allMetrics[rawMetricName]; !ok {
continue // skip if not present
}

metricValue, err := ef.parseMetricFromFamily(allMetrics, rawMetricName, metricDef)
if err != nil {
lastErr = err
klog.V(5).InfoS("Failed to parse candidate metric", "candidate", rawMetricName, "error", err)
continue
}

klog.V(4).InfoS("Successfully fetched typed metric from engine endpoint",
"identifier", identifier, "metric", metricName, "rawMetric", rawMetricName, "value", metricValue, "attempt", attempt+1)
return metricValue, nil
}

klog.V(4).InfoS("Successfully fetched typed metric from engine endpoint",
"identifier", identifier, "metric", metricName, "value", metricValue, "attempt", attempt+1)
return metricValue, nil
// If we get here, none of the candidates worked
errMsg := "none of the candidate metric names found or parsed successfully"
if lastErr != nil {
errMsg += ": " + lastErr.Error()
}
klog.V(4).InfoS("Failed to find valid metric among candidates",
"candidates", candidates, "identifier", identifier, "metric", metricName)
// Continue to next retry if any
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The lastErr variable, which captures parsing failures for candidate metrics, is scoped within each attempt of the retry loop. Consequently, if the final attempt fails due to a parsing error, this valuable information is lost, and a generic error message is returned at the end of the function. This makes debugging failures more difficult. To improve error reporting, consider preserving the last error across retries and including it in the final error message returned by the function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// This metric is no longer emitted by vLLM >= v0.7.0.
// TODO: Remove the deprecated vLLM metric names from EngineMetricsNameMapping
// once we confirm no deployments rely on vLLM < v0.7.0.
"vllm": {"vllm:avg_prompt_throughput_toks_per_s"}, //deprecated metrics https://github.com/vllm-project/vllm/pull/12383
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The block comment above (lines 128-132) already provides a detailed explanation about the deprecated vLLM metric, including a link to the relevant pull request. This inline comment is redundant and can be removed to improve code clarity.

Suggested change
"vllm": {"vllm:avg_prompt_throughput_toks_per_s"}, //deprecated metrics https://github.com/vllm-project/vllm/pull/12383
"vllm": {"vllm:avg_prompt_throughput_toks_per_s"},

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// This metric is no longer emitted by vLLM >= v0.7.0.
// TODO: Remove the deprecated vLLM metric names from EngineMetricsNameMapping
// once we confirm no deployments rely on vLLM < v0.7.0.
"vllm": {"vllm:avg_generation_throughput_toks_per_s"}, // deprecated metrics https://github.com/vllm-project/vllm/pull/12383
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the comment on AvgPromptThroughputToksPerS, the block comment on lines 144-148 already explains in detail that this metric is deprecated. This inline comment is redundant and can be removed for better readability.

Suggested change
"vllm": {"vllm:avg_generation_throughput_toks_per_s"}, // deprecated metrics https://github.com/vllm-project/vllm/pull/12383
"vllm": {"vllm:avg_generation_throughput_toks_per_s"},

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes incorrect metric configurations in the cache layer by updating the metric name mapping structure and correcting metric type classifications. The primary change converts EngineMetricsNameMapping from map[string]string to map[string][]string to support fallback metric names across different versions of inference engines (e.g., vLLM).

Key Changes:

  • Introduced fallback mechanism for metric names to handle version differences in inference engines
  • Added comprehensive documentation explaining the fallback ordering strategy
  • Updated metric type classifications for several metrics
  • Added support for SGLang queue_time_seconds metric mapping

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/metrics/types.go Changed EngineMetricsNameMapping from map[string]string to map[string][]string with detailed documentation explaining the fallback mechanism
pkg/metrics/metrics.go Updated all metric definitions to use slice-based mapping, added vLLM version compatibility comments, reorganized metrics by type, and added SGLang support for RequestQueueTimeSeconds
pkg/metrics/engine_fetcher.go Implemented candidate iteration logic to try multiple metric names in order, improved error handling with lastErr tracking
pkg/cache/cache_metrics.go Updated fetchMetrics to iterate through candidate metric names instead of single lookup
pkg/metrics/engine_fetcher_test.go Updated test fixtures to use new slice-based mapping structure

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +156 to +157
return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: %w",
metricName, identifier, ef.config.MaxRetries+1, lastErr)
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential nil error wrapping in error message

If none of the candidate metrics are found in the endpoint response (all candidates fail the existence check at line 135), lastErr will remain nil. Using %w with a nil error will result in a confusing error message like "... after 3 attempts: %!w()".

Suggested fix:

if lastErr != nil {
    return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: %w",
        metricName, identifier, ef.config.MaxRetries+1, lastErr)
}
return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: no matching candidates found",
    metricName, identifier, ef.config.MaxRetries+1)
Suggested change
return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: %w",
metricName, identifier, ef.config.MaxRetries+1, lastErr)
if lastErr != nil {
return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: %w",
metricName, identifier, ef.config.MaxRetries+1, lastErr)
}
return nil, fmt.Errorf("failed to fetch typed metric %s from engine endpoint %s after %d attempts: no matching candidates found",
metricName, identifier, ef.config.MaxRetries+1)

Copilot uses AI. Check for mistakes.
"sglang": "sglang:gen_throughput",
EngineMetricsNameMapping: map[string][]string{
// vLLM deprecated "vllm:avg_generation_throughput_toks_per_s" in v0.7.0.
// It was removed entirely in PR #12383(https://github.com/vllm-project/vllm/pull/12383).
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Missing space in PR reference

There's a missing space between the PR number and the opening parenthesis. For consistency with the comment on line 129, this should be:

// It was removed entirely in PR #12383 (https://github.com/vllm-project/vllm/pull/12383).
Suggested change
// It was removed entirely in PR #12383(https://github.com/vllm-project/vllm/pull/12383).
// It was removed entirely in PR #12383 (https://github.com/vllm-project/vllm/pull/12383).

Copilot uses AI. Check for mistakes.
Comment on lines +63 to 73
NumRequestsSwapped: {
MetricScope: PodModelMetricScope,
MetricSource: PodRawMetrics,
MetricType: MetricType{
Raw: Counter,
},
EngineMetricsNameMapping: map[string]string{
"vllm": "vllm:num_requests_running",
"sglang": "sglang:num_running_reqs",
EngineMetricsNameMapping: map[string][]string{
"vllm": {"vllm:num_requests_swapped"},
},
Description: "Number of running requests",
Description: "Number of swapped requests",
},
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect metric type for NumRequestsSwapped

NumRequestsSwapped is marked as a Counter, but it should be a Gauge. According to the vLLM Prometheus metrics specification and the mock server implementation (development/app/app.py line 924), num_requests_swapped represents the current number of swapped requests at any given time, which is a point-in-time measurement that can go up or down.

Counter vs Gauge:

  • Counter: Monotonically increasing value (e.g., total requests processed)
  • Gauge: Value that can increase or decrease (e.g., current number of requests)

The number of swapped requests fluctuates as requests are swapped in and out, making it a Gauge metric, not a Counter.

Copilot uses AI. Check for mistakes.
Comment on lines 110 to 120
GenerationTokenTotal: {
MetricScope: PodModelMetricScope,
MetricSource: PodRawMetrics,
MetricType: MetricType{
Raw: Gauge,
},
EngineMetricsNameMapping: map[string]string{
"vllm": "vllm:generation_tokens_total",
EngineMetricsNameMapping: map[string][]string{
"vllm": {"vllm:generation_tokens_total"},
},
Description: "Total generation tokens",
},
Copy link

Copilot AI Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect metric type for GenerationTokenTotal

GenerationTokenTotal is marked as a Gauge, but it should be a Counter. According to the vLLM Prometheus metrics specification and the mock server implementation (development/app/app.py line 906), generation_tokens_total represents the cumulative count of generation tokens processed, which is a monotonically increasing value.

Counter vs Gauge:

  • Counter: Monotonically increasing value (e.g., total tokens processed) ✓ Correct for this metric
  • Gauge: Value that can increase or decrease (e.g., current requests)

Total token counts accumulate over time and never decrease, making this a Counter metric, not a Gauge.

Copilot uses AI. Check for mistakes.
MetricType MetricType
PromQL string // Optional: Only applicable for PromQL-based metrics
LabelKey string // Optional: Only applicable for QueryLabel-based metrics
// EngineMetricsNameMapping defines a fallback list of raw metric names for each engine type.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a fallback mechanism here, which can be compatible with different versions of metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant