As the number of layers in a deep neural network increases, training becomes increasingly more difficult.
This leads to issues such as vanishing gradients, accuracy degradation, and a longer training time.
Eventually, it led to the development of the ResNet, which allows deeper models having less susceptibility to these issues.
Here, x and y are the input and output vectors of the given layer.
F(x,{Wi}) represents the residual mapping to be learned.
Here, the dimensions of F and x should be the same so that they can be added; else, according conversions need to be done.
The ResNet is not an architecture as such, but rather implements the residual module as shown above.
However, the authors of the paper did implement a few models using residual blocks and published their results for the same, evaluated on the ImageNet 2012 dataset that consists of 1000 classes.
The following image shows how one can implement residual blocks in a normal network, to retain the layers while increasing efficiency.
Here are some architectures using ResNets.
They trained models of different numbers of layers (18, 34, 50, 101, 152 layers) and published the results in the original paper.
This is a training curve of normal versus residual networks of same number of layers.
Here is a comparison of the results obtained in the paper, from implementing different models.
You can find the original paper here.
Here is the link to a repo containing the PyTorch implementation.
Here is a link to a simple ResNet model I have implemented myself: Basic Image Classifier