Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileNotFoundError for tokenizer_config.json During Experiment Reproduction #2

Open
ZackAXue opened this issue Jan 6, 2025 · 1 comment

Comments

@ZackAXue
Copy link

ZackAXue commented Jan 6, 2025

Description

Hi,thanks for your nice work! I encountered an issue while reproducing the experiment from this repository. The script fails due to a missing tokenizer_config.json file. The error details are as follows:

FileNotFoundError: [Errno 2] No such file or directory: 'output/3sat9/mdm-alpha0.25-gamma1-bs1024-lr3e-4-ep600-T20-20250106-162926/tokenizer_config.json'

Steps to Reproduce

  1. Clone the repository and set up the environment as per the provided instructions.
  2. Run the script using the following command:
    bash /path/to/scripts/3-sat/train-mdm.sh
    

I do appreciate the effort that has gone into this repository and would be grateful for your help on resolving this issue. Thank you for your time and assistance!

@jiacheng-ye
Copy link
Contributor

Hi Zhiwei, thanks for your interest!
It seems the tokenizer is not saved. The possible reason is when creating a Trainer object, transformers use processing_class instead of tokenizer in a newer version (I'm using an older transformer version 4.37.2):

https://github.com/huggingface/transformers/blob/9895f7df81aaf21b4fcc3a70054d3ac3d5894879/src/transformers/trainer.py#L348

Solution:

  • Check your transformer version, convert to 4.37.2 if it isn't.
  • Or copy from tokenizer_config.json from the model_config_tiny dir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants