You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm new to this area of Language models, in my use case I want to fine tune SQL coder model with spider dataset using this code base as this repo was working for me, while following the instructions given in the readme.
I'm able to start training with Starcoder model with ArmelR/stack-exchange-instruction dataset.
I replaced python command with model path and also dataset name
!python finetune/finetune.py --model_path="defog/sqlcoder-7b" --dataset_name="spider" --subset="data/finetune" --split="train" --size_valid_set 1000 --streaming --seq_length 1024 --max_steps 1000 --batch_size 1 --input_column_name="question" --output_column_name="query" --gradient_accumulation_steps 16 --learning_rate 1e-4 --lr_scheduler_type="cosine" --num_warmup_steps 100 --weight_decay 0.05 --output_dir="./checkpoints"
I'm facing an issue with attention mask shape while starting training,
I know just by changing model path itself I couldn't directly just start training, please provide me some suggestions on starting the training.I'm providing link to my kaggle notebook here to get started. https://www.kaggle.com/code/bhrt16/notebookb5fd138c63
This is the log of the error
/opt/conda/lib/python3.10/site-packages/scipy/init.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.24.3
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:691: FutureWarning: The use_auth_token argument is deprecated and will be removed in v5 of Transformers. Please use token instead.
warnings.warn(
tokenizer_config.json: 100%|███████████████████| 915/915 [00:00<00:00, 4.98MB/s]
tokenizer.model: 100%|███████████████████████| 493k/493k [00:00<00:00, 1.11MB/s]
tokenizer.json: 100%|██████████████████████| 1.80M/1.80M [00:00<00:00, 51.6MB/s]
special_tokens_map.json: 100%|████████████████| 72.0/72.0 [00:00<00:00, 448kB/s]
/opt/conda/lib/python3.10/site-packages/datasets/load.py:2088: FutureWarning: 'use_auth_token' was deprecated in favor of 'token' in version 2.14.0 and will be removed in 3.0.0.
You can remove this warning by passing 'token=<use_auth_token>' instead.
warnings.warn(
Loading the dataset in streaming mode
100%|████████████████████████████████████████| 400/400 [00:03<00:00, 110.05it/s]
The character to token ratio of the dataset is: 3.16
Loading the model
/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:472: FutureWarning: The use_auth_token argument is deprecated and will be removed in v5 of Transformers. Please use token instead.
warnings.warn(
Loading checkpoint shards: 100%|██████████████████| 2/2 [01:07<00:00, 33.92s/it]
/opt/conda/lib/python3.10/site-packages/peft/utils/other.py:141: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead.
warnings.warn(
trainable params: 41943040 || all params: 3794014208 || trainable%: 1.1055056122762943
Starting main loop
Training...
wandb: Currently logged in as: bhrt95. Use wandb login --relogin to force relogin
wandb: Tracking run with wandb version 0.16.1
wandb: Run data is saved locally in /kaggle/working/starcoder/wandb/run-20231213_114310-6pzqbs68
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run StarCoder-finetuned
wandb: ⭐️ View project at https://wandb.ai/bhrt95/huggingface
wandb: 🚀 View run at https://wandb.ai/bhrt95/huggingface/runs/6pzqbs68
/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
Traceback (most recent call last):
File "/kaggle/working/starcoder/finetune/finetune.py", line 326, in
main(args)
File "/kaggle/working/starcoder/finetune/finetune.py", line 315, in main
run_training(args, train_dataset, eval_dataset)
File "/kaggle/working/starcoder/finetune/finetune.py", line 306, in run_training
trainer.train()
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1540, in train
return inner_training_loop(
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1857, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step
self.accelerator.backward(loss)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1905, in backward
loss.backward(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
torch.autograd.backward(
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
return user_fn(self, *args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 271, in backward
outputs = ctx.run_function(*detached_inputs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 654, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 293, in forward
raise ValueError(
ValueError: Attention mask should be of size (1, 1, 1024, 2048), but is torch.Size([1, 1, 1024, 1024])
The text was updated successfully, but these errors were encountered:
I'm new to this area of Language models, in my use case I want to fine tune SQL coder model with spider dataset using this code base as this repo was working for me, while following the instructions given in the readme.
I'm able to start training with Starcoder model with ArmelR/stack-exchange-instruction dataset.
I replaced python command with model path and also dataset name
!python finetune/finetune.py --model_path="defog/sqlcoder-7b" --dataset_name="spider" --subset="data/finetune" --split="train" --size_valid_set 1000 --streaming --seq_length 1024 --max_steps 1000 --batch_size 1 --input_column_name="question" --output_column_name="query" --gradient_accumulation_steps 16 --learning_rate 1e-4 --lr_scheduler_type="cosine" --num_warmup_steps 100 --weight_decay 0.05 --output_dir="./checkpoints"
I'm facing an issue with attention mask shape while starting training,
I know just by changing model path itself I couldn't directly just start training, please provide me some suggestions on starting the training.I'm providing link to my kaggle notebook here to get started.
https://www.kaggle.com/code/bhrt16/notebookb5fd138c63
This is the log of the error
/opt/conda/lib/python3.10/site-packages/scipy/init.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.24.3
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
/opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:691: FutureWarning: The
use_auth_token
argument is deprecated and will be removed in v5 of Transformers. Please usetoken
instead.warnings.warn(
tokenizer_config.json: 100%|███████████████████| 915/915 [00:00<00:00, 4.98MB/s]
tokenizer.model: 100%|███████████████████████| 493k/493k [00:00<00:00, 1.11MB/s]
tokenizer.json: 100%|██████████████████████| 1.80M/1.80M [00:00<00:00, 51.6MB/s]
special_tokens_map.json: 100%|████████████████| 72.0/72.0 [00:00<00:00, 448kB/s]
/opt/conda/lib/python3.10/site-packages/datasets/load.py:2088: FutureWarning: 'use_auth_token' was deprecated in favor of 'token' in version 2.14.0 and will be removed in 3.0.0.
You can remove this warning by passing 'token=<use_auth_token>' instead.
warnings.warn(
Loading the dataset in streaming mode
100%|████████████████████████████████████████| 400/400 [00:03<00:00, 110.05it/s]
The character to token ratio of the dataset is: 3.16
Loading the model
/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:472: FutureWarning: The
use_auth_token
argument is deprecated and will be removed in v5 of Transformers. Please usetoken
instead.warnings.warn(
Loading checkpoint shards: 100%|██████████████████| 2/2 [01:07<00:00, 33.92s/it]
/opt/conda/lib/python3.10/site-packages/peft/utils/other.py:141: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead.
warnings.warn(
trainable params: 41943040 || all params: 3794014208 || trainable%: 1.1055056122762943
Starting main loop
Training...
wandb: Currently logged in as: bhrt95. Use
wandb login --relogin
to force reloginwandb: Tracking run with wandb version 0.16.1
wandb: Run data is saved locally in /kaggle/working/starcoder/wandb/run-20231213_114310-6pzqbs68
wandb: Run
wandb offline
to turn off syncing.wandb: Syncing run StarCoder-finetuned
wandb: ⭐️ View project at https://wandb.ai/bhrt95/huggingface
wandb: 🚀 View run at https://wandb.ai/bhrt95/huggingface/runs/6pzqbs68
/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
Traceback (most recent call last):
File "/kaggle/working/starcoder/finetune/finetune.py", line 326, in
main(args)
File "/kaggle/working/starcoder/finetune/finetune.py", line 315, in main
run_training(args, train_dataset, eval_dataset)
File "/kaggle/working/starcoder/finetune/finetune.py", line 306, in run_training
trainer.train()
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1540, in train
return inner_training_loop(
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 1857, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step
self.accelerator.backward(loss)
File "/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py", line 1905, in backward
loss.backward(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
torch.autograd.backward(
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
return user_fn(self, *args)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 271, in backward
outputs = ctx.run_function(*detached_inputs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 654, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 293, in forward
raise ValueError(
ValueError: Attention mask should be of size (1, 1, 1024, 2048), but is torch.Size([1, 1, 1024, 1024])
The text was updated successfully, but these errors were encountered: