[BUG] GPU memory used is much more in v0.2.7 than v0.2.5 while quantizing models. #247

GodHforever · 2024-12-18T08:15:10Z

The model I used is llama3-8B.
The only difference in the quantisation process is the versions, 0.2.7 and 0.2.5.
My gpu memory size is 16g, and I found that version 0.2.7 had problems with the memory being full, but 0.2.5 was able to quantise without any problems.
Has anyone else had similar problems?

GodHforever · 2024-12-20T08:24:00Z

After debugging with the new version code, I found that some memory was not released in module2inspect.
clear_memory() may be useful for this problem.
The code is awq/quantize/quantizer.py: _compute_best_scale

int_w_output = self._module_forward(x, module2inspect, kwargs)
clear_memory()

Using this API here, memory consumption is reduced by about half
Will any one fix this or I just submit a patch?

wrsIt · 2025-01-08T09:17:26Z

Hello, I’ve encountered a similar issue. Could you please elaborate further on the solution? I wasn’t able to locate the code you mentioned.（T^T）

GodHforever changed the title ~~GPU memory used is much more in v0.2.7 than v0.2.5 while quantizing models.~~ [BUG] GPU memory used is much more in v0.2.7 than v0.2.5 while quantizing models. Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] GPU memory used is much more in v0.2.7 than v0.2.5 while quantizing models. #247

[BUG] GPU memory used is much more in v0.2.7 than v0.2.5 while quantizing models. #247

GodHforever commented Dec 18, 2024

GodHforever commented Dec 20, 2024

wrsIt commented Jan 8, 2025

[BUG] GPU memory used is much more in v0.2.7 than v0.2.5 while quantizing models. #247

[BUG] GPU memory used is much more in v0.2.7 than v0.2.5 while quantizing models. #247

Comments

GodHforever commented Dec 18, 2024

GodHforever commented Dec 20, 2024

wrsIt commented Jan 8, 2025