Add lola #11

newtonkwan · 2022-07-05T15:32:32Z

Adds LOLA to the set of strategies.
Goal: LOLA plays against LOLA and shapes the opponent to learn.

add argument agent_states to PPO, PPO_gru, DQN update() function
Add offline actor critic naive learner with policy gradient (Foerster 2017) with simple replay experience buffer. EDIT: Our NL uses advantage instead of a baseline.
Naive learner learns to defect against on another, which reproduces the findings of the LOLA paper
Add LOLA-DiCE implementation
pull in new runner and implement refactored lola.
Get LOLA to shape NL.

pax/watchers.py

pax/ppo/ppo_gru.py

pax/watchers.py

pax/ppo/networks.py

pax/ppo/ppo.py

newtonkwan added 4 commits June 27, 2022 16:02

begin adding centralized learning

2d5ed96

first commit. begin adding centralized training for LOLA

ec3ce6a

add base lola

2664eea

Merge branch 'main' into add_lola

461f772

newtonkwan requested a review from akbir July 5, 2022 15:35

newtonkwan added 4 commits July 5, 2022 16:39

add centralized learner

bcb4833

resolve merge conflict. add centralized learning

39de443

add lola machinery to experiments.py

6c17d02

fix entropy annealing

8fab167

newtonkwan commented Jul 6, 2022

View reviewed changes

pax/watchers.py Outdated Show resolved Hide resolved

newtonkwan commented Jul 6, 2022

View reviewed changes

pax/ppo/ppo_gru.py Show resolved Hide resolved

newtonkwan commented Jul 6, 2022

View reviewed changes

pax/watchers.py Outdated Show resolved Hide resolved

newtonkwan commented Jul 6, 2022

View reviewed changes

pax/ppo/networks.py Show resolved Hide resolved

akbir reviewed Jul 6, 2022

View reviewed changes

pax/ppo/ppo.py Outdated Show resolved Hide resolved

newtonkwan added 16 commits July 7, 2022 02:07

fix done conditiion in additional rollout step in PPO

21dc4fd

minor changes to lola

6fe1d02

merge main with add_lola

ac3a7f1

minor bug fix

b409692

add changes to buffer

8f42170

merge recent main updates Merge branch 'main' into add_lola

acc514a

update confs

fce83a4

add naive learner

d1be0c5

pull changes from main

dbe82c3

lazy commit. commiting to add naive learner PR

a855c4b

merge main

f11dce8

add logic for lola (still debugging)

bb7b03b

add lola (doesn't quite work yet)

c169c75

compiling lola...

1a00280

working lola

be33bc8

update configs

3cedf4e

newtonkwan added 12 commits July 28, 2022 15:24

tidy up

b37aa91

pull in main

a2eb9e2

add working lola with new runner using lax.scan

99c7906

tidy up watchers, fix naive learner, LOLA getting exploited hard ....

dbb3937

tidy up watchers, fix naive learner, LOLA getting exploited hard ....

11b98c6

lola compiles, move TrainingState to utils

aeb6426

lastest lola

c9cd40c

fix axis

dd29be0

fix axis

78c8196

similar lola

c4b2e72

half working lola

2f30284

temporary lola

4ad1b76

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add lola #11

Add lola #11

Uh oh!

newtonkwan commented Jul 5, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add lola #11

Are you sure you want to change the base?

Add lola #11

Uh oh!

Conversation

newtonkwan commented Jul 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

newtonkwan commented Jul 5, 2022 •

edited

Loading