Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to reproduce the paper result on videomme #33

Open
joslefaure opened this issue Nov 16, 2024 · 2 comments
Open

Failing to reproduce the paper result on videomme #33

joslefaure opened this issue Nov 16, 2024 · 2 comments

Comments

@joslefaure
Copy link

I use the same script as in the Readme:

accelerate launch --num_processes 8 --main_process_port 12345 -m lmms_eval \
    --model longva \
    --model_args pretrained=lmms-lab/LongVA-7B,conv_template=qwen_1_5,video_decode_backend=decord,max_frames_num=32,model_name=llava_qwen \
    --tasks videomme \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix videomme_longva \
    --output_path ./logs/ 

With the latest in the latest commit of lmms_eval (main branch): bcbdc493

I get the following results:

Tasks Version Filter n-shot Metric Value Stderr
videomme Yaml none 0 videomme_perception_score 23.5185 ± N/A

Could you please advise on what I am doing wrong? Thanks

@jzhang38
Copy link
Collaborator

We recently rerun the evaluation and the score is actually a bit higher than what is reported in the paper because some bugs was fixed for the videomme data in lmms-eval.

Can you check the log output by lmms-eval and see if there is any thing unusual?

@joslefaure
Copy link
Author

Thanks for your reply. Upon inspecting the results, I found an alarming number of !!!!!!!!!!!!!!!! in the generated results for the following key resps: Ex: "resps": [["!!!!!!!!!!!!!!!!"]]. Do you have an idea what the issue might be?

I first installed lmms-eval and then installed longva following the official instructions for both projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants