-
Notifications
You must be signed in to change notification settings - Fork 665
[Model] tp+ep support v1_loader #5465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -25,6 +25,7 @@ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from fastdeploy.model_executor.layers.quantization.quant_base import QuantMethodBase | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from fastdeploy.model_executor.utils import ( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| default_weight_loader, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| fd_cast, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| h2d_copy, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| process_weight_transpose, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| set_weight_attrs, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -878,6 +879,57 @@ def __init__( | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if self.with_bias and self.tp_size > 1 and self.reduce_results: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| set_weight_attrs(self.bias, {"tp_row_bias": True}) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def weight_loader(self, param, loaded_weight, loaded_shard_id: Optional[str] = None): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # In some senerio such as tsp, weight and bias of this layer will not be split in specific module. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # In some senerio such as tsp, weight and bias of this layer will not be split in specific module. | |
| # In some scenario such as tsp, weight and bias of this layer will not be split in specific module. |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name shard_size is misleading. It's actually used as the end index (absolute position), not a size. Consider renaming to shard_end for clarity, consistent with how slice_fn uses start and end parameters.
| shard_size = (self.fd_config.parallel_config.tensor_parallel_rank + 1) * block_size | |
| # when use_sequence_parallel_moe, we don't split. | |
| if layer_in_white_list: | |
| pass | |
| else: | |
| loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size) | |
| shard_end = (self.fd_config.parallel_config.tensor_parallel_rank + 1) * block_size | |
| # when use_sequence_parallel_moe, we don't split. | |
| if layer_in_white_list: | |
| pass | |
| else: | |
| loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_end) |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with the rest of the codebase, use named parameters when calling slice_fn. Change to:
loaded_weight = slice_fn(loaded_weight, output_dim, start=shard_offset, end=shard_size)This matches the pattern used in lines 543, 696, and other weight_loader implementations.
| loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size) | |
| loaded_weight = slice_fn(loaded_weight, output_dim, start=shard_offset, end=shard_size) |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The empty pass statements in the whitelist check reduce code readability. Consider refactoring to a more explicit pattern:
if not layer_in_white_list:
loaded_weight = slice_fn(loaded_weight, output_dim, start=shard_offset, end=shard_size)This eliminates the unnecessary pass statements and makes the control flow clearer.
| if layer_in_white_list: | |
| pass | |
| else: | |
| loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size) | |
| tp_row_bias = getattr(param, "tp_row_bias", None) | |
| if layer_in_white_list: | |
| pass | |
| else: | |
| if tp_row_bias: | |
| loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size | |
| if not layer_in_white_list: | |
| loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size) | |
| tp_row_bias = getattr(param, "tp_row_bias", None) | |
| if not layer_in_white_list and tp_row_bias: | |
| loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size |
Copilot
AI
Dec 9, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Similar to above, the empty pass statements for the whitelist check reduce readability. Consider refactoring to:
if tp_row_bias and not layer_in_white_list:
loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size| if layer_in_white_list: | |
| pass | |
| else: | |
| loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size) | |
| tp_row_bias = getattr(param, "tp_row_bias", None) | |
| if layer_in_white_list: | |
| pass | |
| else: | |
| if tp_row_bias: | |
| loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size | |
| if not layer_in_white_list: | |
| loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size) | |
| tp_row_bias = getattr(param, "tp_row_bias", None) | |
| if tp_row_bias and not layer_in_white_list: | |
| loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size |
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -86,6 +86,8 @@ def __init__( | |||||||
| ) | ||||||||
| if self.tp_size > 1: | ||||||||
| set_weight_attrs(self.linear.weight, {"output_dim": True}) | ||||||||
| set_weight_attrs(self.linear.bias, {"output_dim": True}) | ||||||||
|
||||||||
| set_weight_attrs(self.linear.bias, {"output_dim": True}) | |
| if self.bias_key is not None: | |
| set_weight_attrs(self.linear.bias, {"output_dim": True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type conversion inconsistency: The
weight_loadercasts topaddle.get_default_dtype()at line 232, but q_norm and k_norm weights are defined with dtype "float32" (lines 203, 211) and explicitly cast to float32 inload_state_dict(lines 224-225).Consider casting to float32 for consistency: