Add P-Tuning LSTM experiment with 50 virtual tokens to MetaMathQA benchmark#3356
Add P-Tuning LSTM experiment with 50 virtual tokens to MetaMathQA benchmark#3356Akashsinghbhadoriya wants to merge 14 commits into
Conversation
|
@BenjaminBossan can you review the PR |
|
Thanks for working on this P-Tuning experiment. It looks like the results are worse and it requires more memory compared to the existing default settings. Do you have the opportunity to run more experiments to see if you can better results? Some possible further hyper-parameters to test would be learning rate and |
I tried changing the num_virtual_tokens increased it from 20 to 50 also used LSTM as an encoder instead of MLP. I ran only this experiment do you have any suggestions should i decrease the num_virtual_tokens to 20 and test for LSTM or any suitable learning rate. The memory usage increased because of the increase in virtual tokens. |
|
The idea when trying to optimize hyper-parameters is to try different combinations to see what works best. So in this case, you could try LSTM and MLP with different |
bd8fca6 to
1b09b2c
Compare
1b09b2c to
f8d0501
Compare
The default config is the one which is performing best as of now tried running different experiments. |
|
Thanks for reporting these new experiments. When you tried vt=50, did you also check lower and higher learning rates? |
No for vt=50, I only used the default learning rate. |
|
Is it something that you could give a try? |
yeah sure i guess it will be better to try it with MLP as an encoder instead of LSTM since MLP seems to be giving better results than LSTM. What do you suggest? |
I agree.
I would vary the vt (let's start with 50) and then check if either increasing or decreasing the learning rate helps. If one of them does, try increasing/decreasing even more, until there is no more improvement. Ideally, this way you can find a setting that beats the current default. |
for vt=50 their is no improvement from the default either we increase or decrease the learning rate. It would be better if we change the vt decrease them and test it since 50 vt is not improving the results we can try vt range between 20-50 and then try different learning rates |
|
Thanks a lot for running these tests. Interesting that higher vt doesn't seem to help at all.
If you could try that, it would be great. Starting with 30 would be a good number IMO. Maybe it's also worth trying to decrease vt below 20, like 10 just to give it a try. |
vt 20 is the best config as of now. Let me know if anything else other than p-tuning i can take up. |
Description
Added a P-Tuning experiment for MetaMathQA benchmark as discussed in #2310.
P-tuning uses a prompt encoder (LSTM or MLP) to generate virtual tokens prepended to the input. This experiment tests the LSTM variant (
encoder_reparameterization_type=LSTM) with 50 virtual tokens, complementing the existing MLP-based experiment.Changes
method_comparison/MetaMathQA/experiments/ptuning/llama-3.2-3B-vt50-LSTM/adapter_config.jsonmethod_comparison/MetaMathQA/results/ptuning--llama-3.2-3B-vt50-LSTM.jsonResults
Results
Experiment was run on NVIDIA RTX 4090 (48GB) using default training params.