This repository has been archived by the owner on Dec 16, 2022. It is now read-only.
New language + vision tasks! #5292
jacob-morrison
started this conversation in
Show and tell
Replies: 1 comment
-
Great work @jacob-morrison! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi everyone!
We’ve added three new language + vision tasks to AllenNLP:
These join our previously added implementations for SNLI-VE, VQA and GQA.
Some notes about these models:
These are the scores this implementation achieves out of the box::
These scores are quite a bit below the state of the art. We believe this is due to our strategy for extracting image features. We’re extracting features using a Faster R-CNN model with a ResNet-50-FPN backbone pre-trained on COCO train2017. Conveniently, this ships with torchvision under the name
fasterrcnn_resnet50_fpn
, so you probably have it installed already. Unfortunately, these features are not quite good enough to achieve state-of-the-art scores on these datasets. We invite you to improve on this, to see if different features (like these) can help these models achieve or exceed the scores from the VilBERT multitask training paper.Beta Was this translation helpful? Give feedback.
All reactions