Skip to content

Conversation

@newtonkwan
Copy link
Contributor

@newtonkwan newtonkwan commented Jul 5, 2022

Adds LOLA to the set of strategies.
Goal: LOLA plays against LOLA and shapes the opponent to learn.

  • add argument agent_states to PPO, PPO_gru, DQN update() function
  • Add offline actor critic naive learner with policy gradient (Foerster 2017) with simple replay experience buffer. EDIT: Our NL uses advantage instead of a baseline.
  • Naive learner learns to defect against on another, which reproduces the findings of the LOLA paper
  • Add LOLA-DiCE implementation
  • pull in new runner and implement refactored lola.
  • Get LOLA to shape NL.

@newtonkwan newtonkwan requested a review from akbir July 5, 2022 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants