Skip to content

Add P-Tuning LSTM experiment with 50 virtual tokens to MetaMathQA benchmark#3356

Open
Akashsinghbhadoriya wants to merge 14 commits into
huggingface:mainfrom
Akashsinghbhadoriya:main
Open

Add P-Tuning LSTM experiment with 50 virtual tokens to MetaMathQA benchmark#3356
Akashsinghbhadoriya wants to merge 14 commits into
huggingface:mainfrom
Akashsinghbhadoriya:main

Conversation

@Akashsinghbhadoriya

Copy link
Copy Markdown

Description

Added a P-Tuning experiment for MetaMathQA benchmark as discussed in #2310.

P-tuning uses a prompt encoder (LSTM or MLP) to generate virtual tokens prepended to the input. This experiment tests the LSTM variant (encoder_reparameterization_type=LSTM) with 50 virtual tokens, complementing the existing MLP-based experiment.

Changes

  • Added method_comparison/MetaMathQA/experiments/ptuning/llama-3.2-3B-vt50-LSTM/adapter_config.json
  • Added method_comparison/MetaMathQA/results/ptuning--llama-3.2-3B-vt50-LSTM.json

Results

Results

Experiment was run on NVIDIA RTX 4090 (48GB) using default training params.

Metric P-tuning LSTM (vt=50) P-tuning MLP (vt=20)
Test accuracy 0.3495 0.3821
Train time 1584.99s 959.73s
Memory max 27,438 MB 19,980 MB
Trainable params 434,371,584 28,382,208
File size 0.59 MB 0.24 MB
Virtual tokens 50 20
Encoder type LSTM MLP

@Akashsinghbhadoriya

Copy link
Copy Markdown
Author

@BenjaminBossan can you review the PR

@BenjaminBossan

Copy link
Copy Markdown
Member

Thanks for working on this P-Tuning experiment. It looks like the results are worse and it requires more memory compared to the existing default settings. Do you have the opportunity to run more experiments to see if you can better results? Some possible further hyper-parameters to test would be learning rate and num_virtual_tokens.

@Akashsinghbhadoriya

Copy link
Copy Markdown
Author

Thanks for working on this P-Tuning experiment. It looks like the results are worse and it requires more memory compared to the existing default settings. Do you have the opportunity to run more experiments to see if you can better results? Some possible further hyper-parameters to test would be learning rate and num_virtual_tokens.

I tried changing the num_virtual_tokens increased it from 20 to 50 also used LSTM as an encoder instead of MLP. I ran only this experiment do you have any suggestions should i decrease the num_virtual_tokens to 20 and test for LSTM or any suitable learning rate. The memory usage increased because of the increase in virtual tokens.

@BenjaminBossan

Copy link
Copy Markdown
Member

The idea when trying to optimize hyper-parameters is to try different combinations to see what works best. So in this case, you could try LSTM and MLP with different num_virtual_tokens, as well as changing the learning rate. If you see that a specific parameter leads to an improvement (say, increasing the learning rate), you could try changing that parameter even more in that direction to check if there is more of an improvement.

@Akashsinghbhadoriya

Copy link
Copy Markdown
Author

The idea when trying to optimize hyper-parameters is to try different combinations to see what works best. So in this case, you could try LSTM and MLP with different num_virtual_tokens, as well as changing the learning rate. If you see that a specific parameter leads to an improvement (say, increasing the learning rate), you could try changing that parameter even more in that direction to check if there is more of an improvement.

Metric MLP vt=20 lr=1e-4 MLP vt=20 lr=5e-4 MLP vt=50 lr=1e-4 LSTM vt=50 lr=1e-4 LSTM vt=20 lr=1e-4
Test accuracy 0.3821 0.3525 0.3419 0.3495 0.3381
Train time (s) 959.73 928.15 1010.87 1584.99 1322.59
Memory max (MB) 19,989 19,953 21,181 27,438 27,449
Trainable params 28,382,208 28,382,208 28,474,368 434,371,584 434,279,424
File size (MB) 0.23 0.23 0.59 0.59 0.23
Encoder MLP MLP MLP LSTM LSTM
Virtual tokens 20 20 50 50 20
LR 1e-4 5e-4 1e-4 1e-4 1e-4

The default config is the one which is performing best as of now tried running different experiments.

@BenjaminBossan

Copy link
Copy Markdown
Member

Thanks for reporting these new experiments. When you tried vt=50, did you also check lower and higher learning rates?

@Akashsinghbhadoriya

Copy link
Copy Markdown
Author

Thanks for reporting these new experiments. When you tried vt=50, did you also check lower and higher learning rates?

No for vt=50, I only used the default learning rate.

@BenjaminBossan

Copy link
Copy Markdown
Member

Is it something that you could give a try?

@Akashsinghbhadoriya

Copy link
Copy Markdown
Author

Is it something that you could give a try?

yeah sure i guess it will be better to try it with MLP as an encoder instead of LSTM since MLP seems to be giving better results than LSTM. What do you suggest?

@BenjaminBossan

Copy link
Copy Markdown
Member

yeah sure i guess it will be better to try it with MLP as an encoder instead of LSTM since MLP seems to be giving better results than LSTM

I agree.

What do you suggest?

I would vary the vt (let's start with 50) and then check if either increasing or decreasing the learning rate helps. If one of them does, try increasing/decreasing even more, until there is no more improvement. Ideally, this way you can find a setting that beats the current default.

@Akashsinghbhadoriya

Copy link
Copy Markdown
Author

yeah sure i guess it will be better to try it with MLP as an encoder instead of LSTM since MLP seems to be giving better results than LSTM

I agree.

What do you suggest?

I would vary the vt (let's start with 50) and then check if either increasing or decreasing the learning rate helps. If one of them does, try increasing/decreasing even more, until there is no more improvement. Ideally, this way you can find a setting that beats the current default.

Metric MLP vt=20 lr=1e-4 (default) MLP vt=20 lr=5e-4 MLP vt=50 lr=5e-5 MLP vt=50 lr=1e-4 MLP vt=50 lr=1e-3 MLP vt=50 lr=5e-3
Test accuracy 0.3821 0.3525 0.3055 0.3419 0.3669 0.3548
Train time (s) 959.73 928.15 1113.60 1010.87 1036.85 1029.78
Memory max (MB) 19,989 19,953 21,181 21,181 21,181 21,181
LR 1e-4 5e-4 5e-5 1e-4 1e-3 5e-3
Virtual tokens 20 20 50 50 50 50

for vt=50 their is no improvement from the default either we increase or decrease the learning rate. It would be better if we change the vt decrease them and test it since 50 vt is not improving the results we can try vt range between 20-50 and then try different learning rates

@BenjaminBossan

Copy link
Copy Markdown
Member

Thanks a lot for running these tests. Interesting that higher vt doesn't seem to help at all.

It would be better if we change the vt decrease them

If you could try that, it would be great. Starting with 30 would be a good number IMO. Maybe it's also worth trying to decrease vt below 20, like 10 just to give it a try.

@Akashsinghbhadoriya

Copy link
Copy Markdown
Author

If you could try that, it would be great. Starting with 30 would be a good number IMO. Maybe it's also worth trying to decrease vt below 20, like 10 just to give it a try.

Metric MLP vt=10 lr=1e-4 MLP vt=20 lr=1e-4 (default) MLP vt=30 lr=1e-4
Test accuracy 0.3313 0.3821 0.3419
Train time (s) 1028.57 959.73 1194.91
Memory max (MB) 19,893 19,989 20,470
Trainable params 28,351,488 28,382,208 28,412,928
File size (MB) 0.12 0.23 0.35
Virtual tokens 10 20 30

vt 20 is the best config as of now. Let me know if anything else other than p-tuning i can take up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants