A deep learning based android application which takes an image as a input and provides automatic caption for the following image. Caption Generation is a challenging task in Artificial Intelligience where a textual description of the image must be generated.
The model we are using is based on attention based CNN-RNN network which will allow us to focus on selective regions while generating description much like the way humans perceive the visual world.
Dataset used:The model is trained on MS-COCO dataset which is used for benchmarking object detection, segmentation and captioning datasets
Flow of the process:
- Loading of the dataset
- Preprocessing the images
- Preprocessing and tokenization of the captions and defining the vocabulry
- Choosing a pre-trained model for image
- Splitting the data into training and testing
- Defining the Model Architecture
- Training the model
- Test the model
The results on validation dataset were:
The unseen image given was:
The results obtained on the model with unseen image is as shown below: