Skip to content

Conversation

@Wanglongzhi2001
Copy link
Collaborator

Motivation

tp+ep support v1_loader

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

tp+ep support v1_loader

Usage or Command

--load-choices "default_v1"

Accuracy Tests

Do not affect model outputs.

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings December 9, 2025 13:02
@paddle-bot
Copy link

paddle-bot bot commented Dec 9, 2025

Thanks for your contribution!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for v1_loader with tensor parallelism (tp) and expert parallelism (ep) by implementing weight_loader methods across multiple layer types and ensuring proper weight attribute configuration for parallel computation.

  • Adds weight_loader methods to normalization, linear, and attention layers for v1_loader compatibility
  • Configures output_dim attributes on bias parameters for tensor parallel layers
  • Implements whitelist-based conditional splitting for specific layer patterns (e.g., shared_experts)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
fastdeploy/model_executor/layers/normalization.py Adds weight_loader method to RMSNorm for proper dtype conversion during weight loading
fastdeploy/model_executor/layers/mtp_linear.py Sets output_dim attribute on bias for tensor parallel support (has a bug - missing conditional check)
fastdeploy/model_executor/layers/lm_head.py Conditionally sets output_dim attribute on bias when bias is present
fastdeploy/model_executor/layers/linear.py Implements comprehensive weight_loader for RowParallelLinear with whitelist-based splitting logic and dtype casting
fastdeploy/model_executor/layers/attention/attention.py Adds early return path in weight_loader for qk_norm weights (has dtype inconsistency issue)

Comment on lines +906 to +912
shard_size = (self.fd_config.parallel_config.tensor_parallel_rank + 1) * block_size

# when use_sequence_parallel_moe, we don't split.
if layer_in_white_list:
pass
else:
loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name shard_size is misleading. It's actually used as the end index (absolute position), not a size. Consider renaming to shard_end for clarity, consistent with how slice_fn uses start and end parameters.

Suggested change
shard_size = (self.fd_config.parallel_config.tensor_parallel_rank + 1) * block_size
# when use_sequence_parallel_moe, we don't split.
if layer_in_white_list:
pass
else:
loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)
shard_end = (self.fd_config.parallel_config.tensor_parallel_rank + 1) * block_size
# when use_sequence_parallel_moe, we don't split.
if layer_in_white_list:
pass
else:
loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_end)

Copilot uses AI. Check for mistakes.
if layer_in_white_list:
pass
else:
loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the rest of the codebase, use named parameters when calling slice_fn. Change to:

loaded_weight = slice_fn(loaded_weight, output_dim, start=shard_offset, end=shard_size)

This matches the pattern used in lines 543, 696, and other weight_loader implementations.

Suggested change
loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)
loaded_weight = slice_fn(loaded_weight, output_dim, start=shard_offset, end=shard_size)

Copilot uses AI. Check for mistakes.
Comment on lines +909 to +919
if layer_in_white_list:
pass
else:
loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)

tp_row_bias = getattr(param, "tp_row_bias", None)
if layer_in_white_list:
pass
else:
if tp_row_bias:
loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The empty pass statements in the whitelist check reduce code readability. Consider refactoring to a more explicit pattern:

if not layer_in_white_list:
    loaded_weight = slice_fn(loaded_weight, output_dim, start=shard_offset, end=shard_size)

This eliminates the unnecessary pass statements and makes the control flow clearer.

Suggested change
if layer_in_white_list:
pass
else:
loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)
tp_row_bias = getattr(param, "tp_row_bias", None)
if layer_in_white_list:
pass
else:
if tp_row_bias:
loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size
if not layer_in_white_list:
loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)
tp_row_bias = getattr(param, "tp_row_bias", None)
if not layer_in_white_list and tp_row_bias:
loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size

Copilot uses AI. Check for mistakes.
Comment on lines +909 to +919
if layer_in_white_list:
pass
else:
loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)

tp_row_bias = getattr(param, "tp_row_bias", None)
if layer_in_white_list:
pass
else:
if tp_row_bias:
loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Similar to above, the empty pass statements for the whitelist check reduce readability. Consider refactoring to:

if tp_row_bias and not layer_in_white_list:
    loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size
Suggested change
if layer_in_white_list:
pass
else:
loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)
tp_row_bias = getattr(param, "tp_row_bias", None)
if layer_in_white_list:
pass
else:
if tp_row_bias:
loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size
if not layer_in_white_list:
loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)
tp_row_bias = getattr(param, "tp_row_bias", None)
if tp_row_bias and not layer_in_white_list:
loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size

Copilot uses AI. Check for mistakes.
)
if self.tp_size > 1:
set_weight_attrs(self.linear.weight, {"output_dim": True})
set_weight_attrs(self.linear.bias, {"output_dim": True})
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: Setting attributes on self.linear.bias without checking if bias exists. When with_bias=False, self.linear.bias may not exist, which would cause an AttributeError.

Add a conditional check:

if self.tp_size > 1:
    set_weight_attrs(self.linear.weight, {"output_dim": True})
    if self.bias_key is not None:
        set_weight_attrs(self.linear.bias, {"output_dim": True})
Suggested change
set_weight_attrs(self.linear.bias, {"output_dim": True})
if self.bias_key is not None:
set_weight_attrs(self.linear.bias, {"output_dim": True})

Copilot uses AI. Check for mistakes.
def weight_loader(self, param, loaded_weight, loaded_shard_id: Optional[str] = None):
loaded_weight = get_tensor(loaded_weight).cast(paddle.get_default_dtype())

if self.use_qk_norm and ("q_norm" in param.name or "k_norm" in param.name):
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type conversion inconsistency: The weight_loader casts to paddle.get_default_dtype() at line 232, but q_norm and k_norm weights are defined with dtype "float32" (lines 203, 211) and explicitly cast to float32 in load_state_dict (lines 224-225).

Consider casting to float32 for consistency:

if self.use_qk_norm and ("q_norm" in param.name or "k_norm" in param.name):
    loaded_weight = get_tensor(loaded_weight).astype("float32")
    param.copy_(loaded_weight, False)
    return
Suggested change
if self.use_qk_norm and ("q_norm" in param.name or "k_norm" in param.name):
if self.use_qk_norm and ("q_norm" in param.name or "k_norm" in param.name):
loaded_weight = loaded_weight.astype("float32")

Copilot uses AI. Check for mistakes.
set_weight_attrs(self.bias, {"tp_row_bias": True})

def weight_loader(self, param, loaded_weight, loaded_shard_id: Optional[str] = None):
# In some senerio such as tsp, weight and bias of this layer will not be split in specific module.
Copy link

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "senerio" should be "scenario".

Suggested change
# In some senerio such as tsp, weight and bias of this layer will not be split in specific module.
# In some scenario such as tsp, weight and bias of this layer will not be split in specific module.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant