You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The model I used is llama3-8B.
The only difference in the quantisation process is the versions, 0.2.7 and 0.2.5. My gpu memory size is 16g, and I found that version 0.2.7 had problems with the memory being full, but 0.2.5 was able to quantise without any problems.
Has anyone else had similar problems?
The text was updated successfully, but these errors were encountered:
GodHforever
changed the title
GPU memory used is much more in v0.2.7 than v0.2.5 while quantizing models.
[BUG] GPU memory used is much more in v0.2.7 than v0.2.5 while quantizing models.
Dec 20, 2024
After debugging with the new version code, I found that some memory was not released in module2inspect. clear_memory() may be useful for this problem.
The code is awq/quantize/quantizer.py: _compute_best_scale
The model I used is llama3-8B.
The only difference in the quantisation process is the versions, 0.2.7 and 0.2.5.
My gpu memory size is 16g, and I found that version 0.2.7 had problems with the memory being full, but 0.2.5 was able to quantise without any problems.
Has anyone else had similar problems?
The text was updated successfully, but these errors were encountered: