Image Captioning through Image Transformer

This repository includes the implementation for Image Captioning through Image Transformer (to appear in ACCV 2020).

This repo is not completely.

Requirements

Python 3.6
Java 1.8.0
PyTorch 1.0
cider (already been added as a submodule)
coco-caption (already been added as a submodule)
tensorboardX

Or install full requirements by running:

pip install -r requirements.txt

TODO

instruction to prepare dataset
remove all unnecessary files
add link to download our pre-trained model
clean code including comments
instruction for training
instruction for evaluation

Training ImageTransformer

Prepare data

We used the preprocessed data from the work bottom-up-attention. The adaptive ones are used in our work. Please refer to their repo for more information.
prepare the hierarchy information (denoted as flag in the code) by running compute_nb_h.py. Please modify the file path in Line 58 and the save path in Line 63.
You should also preprocess the dataset and get the cache for calculating cider score for SCST:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Start training

$ CUDA_VISIBLE_DEVICES=0 sh train_v3d1.sh

See opts.py for the options. (You can download the pretrained models from here

Evaluation

$ CUDA_VISIBLE_DEVICES=0 python eval.py --model log/log_aoanet_rl/model.pth --infos_path log/log_aoanet_rl/infos_aoanet.pkl  --dump_images 0 --dump_json 1 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 100 --split test

Trained model

you can download our trained model from our onedrive repo

Performance

You will get the scores close to below after training under xe loss for 37 epochs:

{'Bleu_1': 0.776, 'Bleu_2': 0.619, 'Bleu_3': 0.484, 'Bleu_4': 0.378, 'METEOR': 0.285, 'ROUGE_L': 0.575, 'CIDEr': 1.91, 'SPICE': 0.215}

(notes: You can enlarge --max_epochs in train.sh to train the model for more epochs and improve the scores.)

after training under SCST loss for another 26 epochs, you will get:

{'Bleu_1': 0.807, 'Bleu_2': 0.653, 'Bleu_3': 0.510, 'Bleu_4': 0.392, 'METEOR': 0.291, 'ROUGE_L': 0.590, 'CIDEr': 1.308, 'SPICE': 0.228}

Reference

If you find this repo helpful, please consider citing:

@inproceedings{huang2019attention,
  title={Image Captioning through Image Transformer},
  author={He Sen and Liao, Wentong and Tavakoli, Hamed R. and Yang, Michael and Rosenhahn, Bodo and Pugeault, Nicolas},
  booktitle={Asia Conference on Computer Vision},
  year={2020}
}

Acknowledgements

This repository is based on self-critical.pytorch, and heavily borrow from AoAnet. You may refer to it for more details about the code.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
cider		cider
misc		misc
models		models
scripts		scripts
vis		vis
.gitignore		.gitignore
.gitmodules		.gitmodules
ADVANCED.md		ADVANCED.md
README.md		README.md
coco-caption		coco-caption
compute_nb_h.py		compute_nb_h.py
dataloader.py		dataloader.py
dataloaderraw.py		dataloaderraw.py
eval.py		eval.py
eval_all.py		eval_all.py
eval_cross.py		eval_cross.py
eval_ensemble.py		eval_ensemble.py
eval_ensemble_online.py		eval_ensemble_online.py
eval_online.py		eval_online.py
eval_otest.py		eval_otest.py
eval_oval.py		eval_oval.py
eval_utils.py		eval_utils.py
eval_utils_h.py		eval_utils_h.py
opts.py		opts.py
rela_box.py		rela_box.py
requirements.txt		requirements.txt
test-best.sh		test-best.sh
train-wo-refining.sh		train-wo-refining.sh
train.py		train.py
train.sh		train.sh
train_h.py		train_h.py
train_h.sh		train_h.sh
train_v3d1.sh		train_v3d1.sh
visualize_script.ipynb		visualize_script.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning through Image Transformer

Requirements

TODO

Training ImageTransformer

Prepare data

Start training

Evaluation

Trained model

Performance

Reference

Acknowledgements

About

Releases

Packages

Languages

wtliao/ImageTransformer

Folders and files

Latest commit

History

Repository files navigation

Image Captioning through Image Transformer

Requirements

TODO

Training ImageTransformer

Prepare data

Start training

Evaluation

Trained model

Performance

Reference

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages