How to train a Diffusion Model with online RL? #342

carlulxy · 2025-07-30T08:33:56Z

carlulxy
Jul 30, 2025

I'm trying to use online RL to optimize a diffusion model pre-trained by supervised learning. I want to know what should i do to make it.
And I'm wondering if I'm doing the right thing. In case of PPO, i split the shared model into two model. One is diffusion policy to be the policy model. The other is a mlp to be the value model. The relevant part of the code is as follows：

class DiffusionPolicy(GaussianMixin, Model):
    def __init__(self):
        # init
        self.diffusion_model = ...
        self.noise_scheduler = ...
        self.log_std_parameter = nn.Parameter(torch.zeros(self.num_actions))

    def act(self, inputs, role):
        return GaussianMixin.act(self, inputs, role)

    def compute(self, inputs, role):
        return self.generate_next_action_by_model(inputs["states"]), self.log_std_parameter, {}
        
    def generate_next_action_by_model(self, current_obs):
        # generate the action from the observation by denoising
        action = self.denoise()
        return action


class MlpValue(DeterministicMixin, Model):
    def __init__(self, observation_space, action_space, device, clip_actions=False):
        Model.__init__(self, observation_space, action_space, device)
        DeterministicMixin.__init__(self, clip_actions)

        self.net = ...

        self.log_std_parameter = nn.Parameter(torch.zeros(self.num_actions))

    def act(self, inputs, role):
        return DeterministicMixin.act(self, inputs, role)

    def compute(self, inputs, role):
        return self.net(inputs["states"]), {}

env = ...

models = {}
models["policy"] = DiffusionPolicy(env.observation_space, env.action_space, device)
models["value"] = MlpValue(env.observation_space, env.action_space, device)

agent = PPO(models=models,
            ...
            )

trainer = SequentialTrainer(
    agents=agent,
    ...
)

trainer.train()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to train a Diffusion Model with online RL? #342

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to train a Diffusion Model with online RL? #342

Uh oh!

carlulxy Jul 30, 2025

Replies: 0 comments

carlulxy
Jul 30, 2025