Support for DDP with multi-gpus #55

seongjunyun · 2024-07-08T17:40:46Z

Hi, thanks for this great work!
I have a question about using GaLore with DDP. I was trying to use GaLore for training 7B with DDP (multi-gpu).
However, I noticed that when using DDP, the memory gets doubled due to the buffer for gradient synchronization in DDP, so the required memory of 7B (bf16) is around 28GB even before using GaLore. Thus, I got OOM when using GaLore to train 7B in gpus with 32GB.
I was wondering if you've encountered the same issue and any suggestion would be appreciated. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for DDP with multi-gpus #55

Support for DDP with multi-gpus #55

seongjunyun commented Jul 8, 2024

Support for DDP with multi-gpus #55

Support for DDP with multi-gpus #55

Comments

seongjunyun commented Jul 8, 2024