You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I was reading the GaLore paper and noticed that the "ground truth" baseline seems to be pure BF16 training with nearest rounding. It is generally accepted that pure BF16 training with nearest rounding does not converge to the same point as FP32 or BF16/FP32 mixed precision training -- does GaLore only match pure BF16 or does it match FP32 training as well?
The text was updated successfully, but these errors were encountered:
Hi, I was reading the GaLore paper and noticed that the "ground truth" baseline seems to be pure BF16 training with nearest rounding. It is generally accepted that pure BF16 training with nearest rounding does not converge to the same point as FP32 or BF16/FP32 mixed precision training -- does GaLore only match pure BF16 or does it match FP32 training as well?
The text was updated successfully, but these errors were encountered: