Failing to reproduce the paper result on videomme #33

joslefaure · 2024-11-16T06:52:37Z

I use the same script as in the Readme:

accelerate launch --num_processes 8 --main_process_port 12345 -m lmms_eval \
    --model longva \
    --model_args pretrained=lmms-lab/LongVA-7B,conv_template=qwen_1_5,video_decode_backend=decord,max_frames_num=32,model_name=llava_qwen \
    --tasks videomme \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix videomme_longva \
    --output_path ./logs/

With the latest in the latest commit of lmms_eval (main branch): bcbdc493

I get the following results:

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
videomme	Yaml	none	0	videomme_perception_score	↑	23.5185	±	N/A

Could you please advise on what I am doing wrong? Thanks

The text was updated successfully, but these errors were encountered:

jzhang38 · 2024-11-23T03:34:47Z

We recently rerun the evaluation and the score is actually a bit higher than what is reported in the paper because some bugs was fixed for the videomme data in lmms-eval.

Can you check the log output by lmms-eval and see if there is any thing unusual?

joslefaure · 2024-11-26T10:10:50Z

Thanks for your reply. Upon inspecting the results, I found an alarming number of !!!!!!!!!!!!!!!! in the generated results for the following key resps: Ex: "resps": [["!!!!!!!!!!!!!!!!"]]. Do you have an idea what the issue might be?

I first installed lmms-eval and then installed longva following the official instructions for both projects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing to reproduce the paper result on videomme #33

Failing to reproduce the paper result on videomme #33

joslefaure commented Nov 16, 2024

jzhang38 commented Nov 23, 2024

joslefaure commented Nov 26, 2024

Failing to reproduce the paper result on videomme #33

Failing to reproduce the paper result on videomme #33

Comments

joslefaure commented Nov 16, 2024

jzhang38 commented Nov 23, 2024

joslefaure commented Nov 26, 2024