Official code for CVPR 2020 paper 'Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content'. We rearrange the VITON dataset for easy access.
[Sample Try-on Video] [Checkpoints]
[Dataset_Test] [Dataset_Train]
python test.py
Note that the results of our pretrained model are only guaranteed in VITON dataset only, you should re-train the pipeline to get good results in other datasets.
The results for computing IS and SSIM are same-clothes reconstructed results.
The code defaultly generates random clothes-model pairs, so you need to modify ACGPN_inference/data/aligned_dataset.py to generate the reconstructed results.
Here, we also offer the reconstructed results on test set of VITON dataset by inferencing this github repo, [Precomputed Evaluation Results] The results here can be directly used to compute the IS and SSIM evalutations. You can get identical results using this github repo.
- Use the pytorch SSIM repo. https://github.com/Po-Hsun-Su/pytorch-ssim
- Normalize the image to [0,1] and reshape correctly. If not normalized correctly, the results differ a lot.
- Compute the score. The SSIM score should be 0.8664, which is a higher score than reported in paper since it is a better checkpoint.
- Use the pytorch inception score repo. https://github.com/sbarratt/inception-score-pytorch
- Normalize the images to [-1,1] and reshape correctly. Please strictly follow the procedure given in this repo.
- Compute the score. The splits number also changes the results. We use splits number =1 to compute the results.
- Note that the released checkpoints produce IS score 2.82, which is slightly lower (but still SOTA) than the paper since it is a different checkpoint with better SSIM performance. Same results of the paper can be reproduced by re-training with different training epochs.
We use the pose map to calculate the difficulty level of try-on. The key motivation behind this is the more complex the occlusions and layouts are in the clothing area, the harder it will be. And the formula is given below. Also, manual selection is involved to improve the difficulty partition.
where t is a certain key point, Mp' is the set of key point we take into consideration, and N is the size of the set.
0 -> Background
1 -> Hair
4 -> Upclothes
5 -> Left-shoe
6 -> Right-shoe
7 -> Noise
8 -> Pants
9 -> Left_leg
10 -> Right_leg
11 -> Left_arm
12 -> Face
13 -> Right_arm
For better inference performance, model G and G2 should be trained with 200 epoches, while model G1 and U net should be trained with 20 epoches.
The use of this software is RESTRICTED to non-commercial research and educational purposes.
If you use our code or models in your research, please cite with:
@InProceedings{Yang_2020_CVPR,
author = {Yang, Han and Zhang, Ruimao and Guo, Xiaobao and Liu, Wei and Zuo, Wangmeng and Luo, Ping},
title = {Towards Photo-Realistic Virtual Try-On by Adaptively Generating-Preserving Image Content},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
VITON Dataset This dataset is presented in VITON, containing 19,000 image pairs, each of which includes a front-view woman image and a top clothing image. After removing the invalid image pairs, it yields 16,253 pairs, further splitting into a training set of 14,221 paris and a testing set of 2,032 pairs.