Error when training #4

XinyuLyu · 2019-07-12T05:17:23Z

Thanks for sharig the codes. I got an error when training with the command "python main.py --num_abots 3 --num_qbots 1 --scratch --outf save/temp_dir". My environment is Python 3.6 with pytorch 0.4.0. Can you help me about that?

(community) lvxinyu@12315:~$ CUDA_VISIBLE_DEVICES=4 python /home/lvxinyu/code/visualdialog-pytorch/main.py --num_abots 3 --num_qbots 1 --scratch --outf /home/lvxinyu/code/visualdialog-pytorch/data/v09/save/temp_dir
DataLoader loading: train
Loading image feature from data/vdl_img_vgg.h5
train number of data: 82783
Loading txt from data/visdial_data.h5
Vocab Size: 8964
DataLoader loading: test
Loading image feature from data/vdl_img_vgg.h5
test number of data: 40504
Loading txt from data/visdial_data.h5
Vocab Size: 8964
Initializing A-Bot and Q-Bot...
/home/lvxinyu/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
Starting Epoch: 1 | K: 10
Done with Batch # 20 | Av. Time Per Batch: 1.090s
Done with Batch # 40 | Av. Time Per Batch: 1.015s
Done with Batch # 60 | Av. Time Per Batch: 0.998s
Done with Batch # 80 | Av. Time Per Batch: 1.034s
Done with Batch # 100 | Av. Time Per Batch: 1.069s
Done with Batch # 120 | Av. Time Per Batch: 1.067s
Done with Batch # 140 | Av. Time Per Batch: 1.088s
Done with Batch # 160 | Av. Time Per Batch: 1.043s
Done with Batch # 180 | Av. Time Per Batch: 1.075s
Done with Batch # 200 | Av. Time Per Batch: 1.092s
Done with Batch # 220 | Av. Time Per Batch: 1.037s
Done with Batch # 240 | Av. Time Per Batch: 1.061s
Done with Batch # 260 | Av. Time Per Batch: 1.038s
Done with Batch # 280 | Av. Time Per Batch: 1.051s
Done with Batch # 300 | Av. Time Per Batch: 1.099s
Done with Batch # 320 | Av. Time Per Batch: 1.096s
Done with Batch # 340 | Av. Time Per Batch: 1.076s
Done with Batch # 360 | Av. Time Per Batch: 1.067s
Done with Batch # 380 | Av. Time Per Batch: 1.057s
Done with Batch # 400 | Av. Time Per Batch: 1.066s
Done with Batch # 420 | Av. Time Per Batch: 1.078s
Done with Batch # 440 | Av. Time Per Batch: 1.051s
Done with Batch # 460 | Av. Time Per Batch: 1.085s
Done with Batch # 480 | Av. Time Per Batch: 1.076s
Done with Batch # 500 | Av. Time Per Batch: 1.064s
Done with Batch # 520 | Av. Time Per Batch: 1.052s
Done with Batch # 540 | Av. Time Per Batch: 1.093s
Done with Batch # 560 | Av. Time Per Batch: 1.045s
Done with Batch # 580 | Av. Time Per Batch: 1.076s
Done with Batch # 600 | Av. Time Per Batch: 1.031s
Done with Batch # 620 | Av. Time Per Batch: 1.043s
Done with Batch # 640 | Av. Time Per Batch: 1.073s
Done with Batch # 660 | Av. Time Per Batch: 1.028s
Done with Batch # 680 | Av. Time Per Batch: 1.039s
Done with Batch # 700 | Av. Time Per Batch: 1.073s
Done with Batch # 720 | Av. Time Per Batch: 1.035s
Done with Batch # 740 | Av. Time Per Batch: 1.074s
Done with Batch # 760 | Av. Time Per Batch: 1.085s
Done with Batch # 780 | Av. Time Per Batch: 1.070s
Done with Batch # 800 | Av. Time Per Batch: 1.047s
Done with Batch # 820 | Av. Time Per Batch: 1.073s
Done with Batch # 840 | Av. Time Per Batch: 1.083s
Done with Batch # 860 | Av. Time Per Batch: 1.058s
Done with Batch # 880 | Av. Time Per Batch: 1.067s
Done with Batch # 900 | Av. Time Per Batch: 1.043s
Done with Batch # 920 | Av. Time Per Batch: 1.063s
Done with Batch # 940 | Av. Time Per Batch: 1.063s
Done with Batch # 960 | Av. Time Per Batch: 1.064s
Done with Batch # 980 | Av. Time Per Batch: 1.070s
Done with Batch # 1000 | Av. Time Per Batch: 1.076s
Done with Batch # 1020 | Av. Time Per Batch: 1.063s
Done with Batch # 1040 | Av. Time Per Batch: 1.073s
Done with Batch # 1060 | Av. Time Per Batch: 1.087s
Done with Batch # 1080 | Av. Time Per Batch: 1.056s
Done with Batch # 1100 | Av. Time Per Batch: 1.070s
Traceback (most recent call last):
File "/home/lvxinyu/code/visualdialog-pytorch/main.py", line 624, in
im_loss_epoch_n = train(epoch,k_curr)
File "/home/lvxinyu/code/visualdialog-pytorch/main.py", line 132, in train
lm_loss.backward()
File "/home/lvxinyu/.local/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/lvxinyu/.local/lib/python3.6/site-packages/torch/autograd/init.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: The expanded size of the tensor (75) must match the existing size (58) at non-singleton dimension 0

XinyuLyu · 2019-07-12T06:38:50Z

I changed the batch size from 75 to 58, but it still has the same error. I think it maybe caused by Pytorch version. How did you solve the problem? Many thanks.

XinyuLyu closed this as completed Jul 12, 2019

XinyuLyu reopened this Jul 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when training #4

Error when training #4

XinyuLyu commented Jul 12, 2019

XinyuLyu commented Jul 12, 2019

Error when training #4

Error when training #4

Comments

XinyuLyu commented Jul 12, 2019

XinyuLyu commented Jul 12, 2019