RuntimeError: EncoderDecoder: VisionMambaSeg: shape '[-1, 14, 14, 192]' is invalid for input of size 37824 #108

VickyHuang1113 · 2024-07-26T19:47:37Z

I tried to fine-tune the segmentation model using the pretrained Vim-T, but encountered the following issue while executing bash scripts/ft_vim_tiny_upernet.sh:

Position interpolate from 14x14 to 32x32   
Traceback (most recent call last):
  File "/home/vic1113/miniconda3/envs/vim_seg/lib/python3.9/site-packages/mmcv/utils/registry.py", line 69, 
in build_from_cfg return obj_cls(**args)
  File "/home/vic1113/PrMamba/seg/backbone/vim.py", line 89, in __init__
    self.init_weights(pretrained)
  File "/home/vic1113/PrMamba/seg/backbone/vim.py", line 143, in init_weights
    interpolate_pos_embed(self, state_dict_model)
  File "/home/vic1113/PrMamba/vim/utils.py", line 258, in interpolate_pos_embed
    pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size, embedding_size).permute(0, 3, 1, 2)
RuntimeError: shape '[-1, 14, 14, 192]' is invalid for input of size 37824

This error is propagated through multiple functions, resulting in the final error:
RuntimeError: EncoderDecoder: VisionMambaSeg: shape '[-1, 14, 14, 192]' is invalid for input of size 37824.

The pretrained weight I used was vim_t_midclstok_76p1acc.pth, which seems to be the correct one. If not, there should be an error while loading, such as size mismatch for norm_f.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([384]), but I didn't get this error.

So, I guess there might be an issue with the model settings, but I’m not sure. 37824 = (14*14 + 1) * 192, and the "+1" is the part that leads to the error. If the "+1" part is for mid cls token, should I just drop it for the segmentation model?

Have anyone ever encountered this problem, or successfully finetuned a segmentation model?

Thank you very much!

The text was updated successfully, but these errors were encountered:

GIT-HYQ · 2024-07-30T14:24:38Z

same issue，have you fix it now?

VickyHuang1113 · 2024-07-30T15:06:50Z

No, I can't apply the pretrained weights to the segmentation model.
It seems the shapes of the backbones are different, and we might need to retrain it.

TiSgrc2002 · 2024-12-16T06:13:20Z

I tried to fine-tune the segmentation model using the pretrained Vim-T, but encountered the following issue while executing bash scripts/ft_vim_tiny_upernet.sh:
Position interpolate from 14x14 to 32x32   
Traceback (most recent call last):
  File "/home/vic1113/miniconda3/envs/vim_seg/lib/python3.9/site-packages/mmcv/utils/registry.py", line 69, 
in build_from_cfg return obj_cls(**args)
  File "/home/vic1113/PrMamba/seg/backbone/vim.py", line 89, in __init__
    self.init_weights(pretrained)
  File "/home/vic1113/PrMamba/seg/backbone/vim.py", line 143, in init_weights
    interpolate_pos_embed(self, state_dict_model)
  File "/home/vic1113/PrMamba/vim/utils.py", line 258, in interpolate_pos_embed
    pos_tokens = pos_tokens.reshape(-1, orig_size, orig_size, embedding_size).permute(0, 3, 1, 2)
RuntimeError: shape '[-1, 14, 14, 192]' is invalid for input of size 37824
This error is propagated through multiple functions, resulting in the final error: RuntimeError: EncoderDecoder: VisionMambaSeg: shape '[-1, 14, 14, 192]' is invalid for input of size 37824.

The pretrained weight I used was vim_t_midclstok_76p1acc.pth, which seems to be the correct one. If not, there should be an error while loading, such as size mismatch for norm_f.weight: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([384]), but I didn't get this error.

So, I guess there might be an issue with the model settings, but I’m not sure. 37824 = (14*14 + 1) * 192, and the "+1" is the part that leads to the error. If the "+1" part is for mid cls token, should I just drop it for the segmentation model?

Have anyone ever encountered this problem, or successfully finetuned a segmentation model?

Thank you very much!

Is VisionMambaSeg a pre trained model? I used Vim small+(26M 81.6 95.4) when loading the pre trained model https://huggingface.co/hustvl/Vim-small-midclstok However, there are many mismatches, such as the absence of the checkpoint ['meta '] setting in mmcv in the downloaded model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: EncoderDecoder: VisionMambaSeg: shape '[-1, 14, 14, 192]' is invalid for input of size 37824 #108

RuntimeError: EncoderDecoder: VisionMambaSeg: shape '[-1, 14, 14, 192]' is invalid for input of size 37824 #108

VickyHuang1113 commented Jul 26, 2024 •

edited

Loading

GIT-HYQ commented Jul 30, 2024

VickyHuang1113 commented Jul 30, 2024

TiSgrc2002 commented Dec 16, 2024

RuntimeError: EncoderDecoder: VisionMambaSeg: shape '[-1, 14, 14, 192]' is invalid for input of size 37824 #108

RuntimeError: EncoderDecoder: VisionMambaSeg: shape '[-1, 14, 14, 192]' is invalid for input of size 37824 #108

Comments

VickyHuang1113 commented Jul 26, 2024 • edited Loading

GIT-HYQ commented Jul 30, 2024

VickyHuang1113 commented Jul 30, 2024

TiSgrc2002 commented Dec 16, 2024

VickyHuang1113 commented Jul 26, 2024 •

edited

Loading