You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for this great work!
I have a question about using GaLore with DDP. I was trying to use GaLore for training 7B with DDP (multi-gpu).
However, I noticed that when using DDP, the memory gets doubled due to the buffer for gradient synchronization in DDP, so the required memory of 7B (bf16) is around 28GB even before using GaLore. Thus, I got OOM when using GaLore to train 7B in gpus with 32GB.
I was wondering if you've encountered the same issue and any suggestion would be appreciated. Thanks!
The text was updated successfully, but these errors were encountered:
Hi, thanks for this great work!
I have a question about using GaLore with DDP. I was trying to use GaLore for training 7B with DDP (multi-gpu).
However, I noticed that when using DDP, the memory gets doubled due to the buffer for gradient synchronization in DDP, so the required memory of 7B (bf16) is around 28GB even before using GaLore. Thus, I got OOM when using GaLore to train 7B in gpus with 32GB.
I was wondering if you've encountered the same issue and any suggestion would be appreciated. Thanks!
The text was updated successfully, but these errors were encountered: