-
Notifications
You must be signed in to change notification settings - Fork 129
Open
Description
Hello, I'm going to run the code for the project on cpu
The graphics card I have now is 4060ti, but even with the lightest option (minimum batch size, use 1.5B model, etc.), I couldn't run the project due to memory capacity issues
So I want to move this project to cpu and see the results even if it takes some time
However, even though all settings and codes have been checked, the flash attention backend is automatically set and we are having trouble solving the error
So I would like to ask if this project cannot be implemented in cpu through vllm setting change only
Metadata
Metadata
Assignees
Labels
No labels