[Model] tp+ep support v1_loader #5465

Wanglongzhi2001 · 2025-12-09T13:02:16Z

Motivation

tp+ep support v1_loader

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

tp+ep support v1_loader

Usage or Command

--load-choices "default_v1"

Accuracy Tests

Do not affect model outputs.

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-12-09T13:02:28Z

Thanks for your contribution!

Copilot

Pull request overview

This PR adds support for v1_loader with tensor parallelism (tp) and expert parallelism (ep) by implementing weight_loader methods across multiple layer types and ensuring proper weight attribute configuration for parallel computation.

Adds weight_loader methods to normalization, linear, and attention layers for v1_loader compatibility
Configures output_dim attributes on bias parameters for tensor parallel layers
Implements whitelist-based conditional splitting for specific layer patterns (e.g., shared_experts)

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
fastdeploy/model_executor/layers/normalization.py	Adds `weight_loader` method to RMSNorm for proper dtype conversion during weight loading
fastdeploy/model_executor/layers/mtp_linear.py	Sets output_dim attribute on bias for tensor parallel support (has a bug - missing conditional check)
fastdeploy/model_executor/layers/lm_head.py	Conditionally sets output_dim attribute on bias when bias is present
fastdeploy/model_executor/layers/linear.py	Implements comprehensive `weight_loader` for RowParallelLinear with whitelist-based splitting logic and dtype casting
fastdeploy/model_executor/layers/attention/attention.py	Adds early return path in `weight_loader` for qk_norm weights (has dtype inconsistency issue)

Copilot · 2025-12-09T13:06:17Z

fastdeploy/model_executor/layers/linear.py

+            shard_size = (self.fd_config.parallel_config.tensor_parallel_rank + 1) * block_size
+
+            # when use_sequence_parallel_moe, we don't split.
+            if layer_in_white_list:
+                pass
+            else:
+                loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)


The variable name shard_size is misleading. It's actually used as the end index (absolute position), not a size. Consider renaming to shard_end for clarity, consistent with how slice_fn uses start and end parameters.

Suggested change

shard_size = (self.fd_config.parallel_config.tensor_parallel_rank + 1) * block_size

# when use_sequence_parallel_moe, we don't split.

if layer_in_white_list:

pass

else:

loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)

shard_end = (self.fd_config.parallel_config.tensor_parallel_rank + 1) * block_size

# when use_sequence_parallel_moe, we don't split.

if layer_in_white_list:

pass

else:

loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_end)

Copilot · 2025-12-09T13:06:17Z

fastdeploy/model_executor/layers/linear.py

+            if layer_in_white_list:
+                pass
+            else:
+                loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)


For consistency with the rest of the codebase, use named parameters when calling slice_fn. Change to:

loaded_weight = slice_fn(loaded_weight, output_dim, start=shard_offset, end=shard_size)

This matches the pattern used in lines 543, 696, and other weight_loader implementations.

Suggested change

loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)

loaded_weight = slice_fn(loaded_weight, output_dim, start=shard_offset, end=shard_size)

Copilot · 2025-12-09T13:06:18Z

fastdeploy/model_executor/layers/linear.py

+            if layer_in_white_list:
+                pass
+            else:
+                loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)
+
+        tp_row_bias = getattr(param, "tp_row_bias", None)
+        if layer_in_white_list:
+            pass
+        else:
+            if tp_row_bias:
+                loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size


[nitpick] The empty pass statements in the whitelist check reduce code readability. Consider refactoring to a more explicit pattern:

if not layer_in_white_list: loaded_weight = slice_fn(loaded_weight, output_dim, start=shard_offset, end=shard_size)

This eliminates the unnecessary pass statements and makes the control flow clearer.

Suggested change

if layer_in_white_list:

pass

else:

loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)

tp_row_bias = getattr(param, "tp_row_bias", None)

if layer_in_white_list:

pass

else:

if tp_row_bias:

loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size

if not layer_in_white_list:

loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)

tp_row_bias = getattr(param, "tp_row_bias", None)

if not layer_in_white_list and tp_row_bias:

loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size

Copilot · 2025-12-09T13:06:18Z

fastdeploy/model_executor/layers/linear.py

+            if layer_in_white_list:
+                pass
+            else:
+                loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)
+
+        tp_row_bias = getattr(param, "tp_row_bias", None)
+        if layer_in_white_list:
+            pass
+        else:
+            if tp_row_bias:
+                loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size


[nitpick] Similar to above, the empty pass statements for the whitelist check reduce readability. Consider refactoring to:

if tp_row_bias and not layer_in_white_list: loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size

Suggested change

if layer_in_white_list:

pass

else:

loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)

tp_row_bias = getattr(param, "tp_row_bias", None)

if layer_in_white_list:

pass

else:

if tp_row_bias:

loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size

if not layer_in_white_list:

loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)

tp_row_bias = getattr(param, "tp_row_bias", None)

if tp_row_bias and not layer_in_white_list:

loaded_weight = loaded_weight / self.fd_config.parallel_config.tensor_parallel_size

Copilot · 2025-12-09T13:06:18Z

fastdeploy/model_executor/layers/mtp_linear.py

                )
            if self.tp_size > 1:
                set_weight_attrs(self.linear.weight, {"output_dim": True})
+                set_weight_attrs(self.linear.bias, {"output_dim": True})


Potential bug: Setting attributes on self.linear.bias without checking if bias exists. When with_bias=False, self.linear.bias may not exist, which would cause an AttributeError.

Add a conditional check:

if self.tp_size > 1: set_weight_attrs(self.linear.weight, {"output_dim": True}) if self.bias_key is not None: set_weight_attrs(self.linear.bias, {"output_dim": True})

Suggested change

set_weight_attrs(self.linear.bias, {"output_dim": True})

if self.bias_key is not None:

set_weight_attrs(self.linear.bias, {"output_dim": True})

Copilot · 2025-12-09T13:06:19Z

fastdeploy/model_executor/layers/attention/attention.py

    def weight_loader(self, param, loaded_weight, loaded_shard_id: Optional[str] = None):
        loaded_weight = get_tensor(loaded_weight).cast(paddle.get_default_dtype())
+
+        if self.use_qk_norm and ("q_norm" in param.name or "k_norm" in param.name):


Type conversion inconsistency: The weight_loader casts to paddle.get_default_dtype() at line 232, but q_norm and k_norm weights are defined with dtype "float32" (lines 203, 211) and explicitly cast to float32 in load_state_dict (lines 224-225).

Consider casting to float32 for consistency:

if self.use_qk_norm and ("q_norm" in param.name or "k_norm" in param.name): loaded_weight = get_tensor(loaded_weight).astype("float32") param.copy_(loaded_weight, False) return

Suggested change

if self.use_qk_norm and ("q_norm" in param.name or "k_norm" in param.name):

if self.use_qk_norm and ("q_norm" in param.name or "k_norm" in param.name):

loaded_weight = loaded_weight.astype("float32")

Copilot · 2025-12-09T13:06:19Z

fastdeploy/model_executor/layers/linear.py

            set_weight_attrs(self.bias, {"tp_row_bias": True})

+    def weight_loader(self, param, loaded_weight, loaded_shard_id: Optional[str] = None):
+        # In some senerio such as tsp, weight and bias of this layer will not be split in specific module.


Typo in comment: "senerio" should be "scenario".

Suggested change

# In some senerio such as tsp, weight and bias of this layer will not be split in specific module.

# In some scenario such as tsp, weight and bias of this layer will not be split in specific module.

[Model] tp+ep support v1_loader

095556f

Copilot AI review requested due to automatic review settings December 9, 2025 13:02

Copilot started reviewing on behalf of Wanglongzhi2001 December 9, 2025 13:02 View session

Copilot AI reviewed Dec 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model] tp+ep support v1_loader #5465

[Model] tp+ep support v1_loader #5465

Uh oh!

Wanglongzhi2001 commented Dec 9, 2025

Uh oh!

paddle-bot bot commented Dec 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 9, 2025

Uh oh!

Copilot AI Dec 9, 2025

Uh oh!

Copilot AI Dec 9, 2025

Uh oh!

Copilot AI Dec 9, 2025

Uh oh!

Copilot AI Dec 9, 2025

Uh oh!

Copilot AI Dec 9, 2025

Uh oh!

Copilot AI Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	loaded_weight = slice_fn(loaded_weight, output_dim, shard_offset, shard_size)
	loaded_weight = slice_fn(loaded_weight, output_dim, start=shard_offset, end=shard_size)

	set_weight_attrs(self.linear.bias, {"output_dim": True})
	if self.bias_key is not None:
	set_weight_attrs(self.linear.bias, {"output_dim": True})

	if self.use_qk_norm and ("q_norm" in param.name or "k_norm" in param.name):
	if self.use_qk_norm and ("q_norm" in param.name or "k_norm" in param.name):
	loaded_weight = loaded_weight.astype("float32")

	# In some senerio such as tsp, weight and bias of this layer will not be split in specific module.
	# In some scenario such as tsp, weight and bias of this layer will not be split in specific module.

[Model] tp+ep support v1_loader #5465

Are you sure you want to change the base?

[Model] tp+ep support v1_loader #5465

Uh oh!

Conversation

Wanglongzhi2001 commented Dec 9, 2025

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Dec 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant