-
Notifications
You must be signed in to change notification settings - Fork 671
[BugFix] reschedule_preempt_task append waiting & PREEMPTED blocksize #5506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] reschedule_preempt_task append waiting & PREEMPTED blocksize #5506
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5506 +/- ##
==========================================
Coverage ? 60.74%
==========================================
Files ? 329
Lines ? 41142
Branches ? 6271
==========================================
Hits ? 24992
Misses ? 14259
Partials ? 1891
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug in the request rescheduling logic where preempted requests could be repeatedly rescheduled due to under-allocation of blocks. The fix tracks the block size at preemption time and ensures that rescheduled requests receive at least as many blocks as they had before preemption.
Key Changes:
- Records
last_preempted_blocksizewhen a request is preempted to remember the previous block allocation - Adds logic to ensure rescheduled requests receive at least the same number of blocks as before to prevent repeated preemption cycles
| # If num_new_block is less than the last preempted block size, use the last preempted block size instead. | ||
| # For normal requests, when allocating blocks, we reserve two extra blocks for decoding. | ||
| # In the request rescheduling scenario, we currently only consider the number of tokens already generated, | ||
| # which might lead to allocating fewer blocks than the previous allocation, causing repeated rescheduling. | ||
| # This adjustment ensures we at least allocate as many blocks as before to avoid this issue. | ||
| if num_new_block < request.last_preempted_blocksize: | ||
| num_new_block = request.last_preempted_blocksize |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description lacks essential information about the motivation and the specific problem being solved. While there is an image reference in the description, there is no text explanation of:
- What bug is being fixed
- Why these modifications are necessary
- What problem scenario triggers the repeated rescheduling issue
Please provide a detailed description explaining the bug scenario, reproduction steps, and how these changes resolve the issue.
Co-authored-by: Copilot <[email protected]>
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.