[issue-4270] [P SDK] Improve GEval compatibility with DashScope Qwen judge model #4271

Susan9001 · 2025-11-28T20:37:15Z

Details

This PR is a follow-up to #4229 and makes DashScope Qwen more robust as a GEval judge model when used via LiteLLMChatModel.

Currently, when a model advertises logprobs and top_logprobs support, GEval enables the logprobs-aware scoring path. For DashScope Qwen this can occasionally lead to MetricComputationError("Failed to calculate g-eval score") because the returned logprobs do not always match the OpenAI-style format expected by the parser.

This PR treats DashScope Qwen as not logprobs-supported in this context, so GEval falls back to the standard text/JSON-based parsing path instead of relying on logprobs.

Change checklist

User facing
Documentation update

Issues

Testing

Locally:

pytest tests/unit/evaluation/models/test_litellm_chat_model.py

Ran more examples with dashscope/qwen-flash as the judge model with code snippets:

      self.judge_model = models.LiteLLMChatModel(
          model_name=judge_model_name,
          api_base="https://dashscope.aliyuncs.com/compatible-mode/v1",
          api_key=os.getenv("DASHSCOPE_API_KEY"),
      )

All samples now score successfully without Failed to calculate g-eval score.

…ik into feat-dashscope-qwen-litellm

yaricom · 2025-11-30T12:01:43Z

Hi @Susan9001 ! Thank you for a contribution! Please fix merge conflicts with current branch.

Cheers,
Iaroslav

Susan9001 and others added 7 commits November 27, 2025 21:02

improve dashscope qwen support in LiteLLMChatModel

904fbcf

refactor model specific filters into per model handlers

b63c6cd

Merge branch 'main' into feat-dashscope-qwen-litellm

4c4d7b9

Merge branch 'main' into feat-dashscope-qwen-litellm

32621f9

Merge branch 'main' into feat-dashscope-qwen-litellm

214e4ed

Refine LiteLLM model filters and tests for GPT-5 and DashScope Qwen

bb9197a

Merge branch 'feat-dashscope-qwen-litellm' of github.com:Susan9001/op…

2e144a6

…ik into feat-dashscope-qwen-litellm

Susan9001 requested a review from a team as a code owner November 28, 2025 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[issue-4270] [P SDK] Improve GEval compatibility with DashScope Qwen judge model #4271

[issue-4270] [P SDK] Improve GEval compatibility with DashScope Qwen judge model #4271

Susan9001 commented Nov 28, 2025

Uh oh!

yaricom commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[issue-4270] [P SDK] Improve GEval compatibility with DashScope Qwen judge model #4271

Are you sure you want to change the base?

[issue-4270] [P SDK] Improve GEval compatibility with DashScope Qwen judge model #4271

Conversation

Susan9001 commented Nov 28, 2025

Details

Change checklist

Issues

Testing

Uh oh!

yaricom commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants