Galore unstable on Llama 7B beyond 20K steps #43

kyleliang919 · 2024-05-02T20:04:49Z

To replicate the above results, run cmd in README, machine configuration: A100 80GB, CUDA version: 11.8, other environments are installed following the recommendation in the repo

# LLaMA-7B, 8-bit GaLore-Adam, single GPU, activation checkpointing
# bsz=16, 22.8G, 
torchrun --standalone --nproc_per_node 1 torchrun_main.py \
    --model_config configs/llama_7b.json \
    --lr 0.005 \
    --galore_scale 0.25 \
    --rank 1024 \
    --update_proj_gap 500 \
    --batch_size 16 \
    --total_batch_size 512 \
    --activation_checkpointing \
    --num_training_steps 150000 \
    --warmup_steps 15000 \
    --weight_decay 0 \
    --grad_clipping 1.0 \
    --dtype bfloat16 \
    --eval_every 1000 \
    --single_gpu \
    --optimizer galore_adamw8bit_per_layer

The text was updated successfully, but these errors were encountered:

bhavnicksm · 2024-05-09T22:13:03Z

@kyleliang919
This may be related to the issue I just posted. [ #45 ]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Galore unstable on Llama 7B beyond 20K steps #43

Galore unstable on Llama 7B beyond 20K steps #43

kyleliang919 commented May 2, 2024

bhavnicksm commented May 9, 2024

Galore unstable on Llama 7B beyond 20K steps #43

Galore unstable on Llama 7B beyond 20K steps #43

Comments

kyleliang919 commented May 2, 2024

bhavnicksm commented May 9, 2024