Defining a Residual Block
- The classical block in resnet is a residual block
- Resnet introduces a so-called identity shortcut connection
- This connection attempts to skip one or more layers
- We can define a residual block as the following:
- We can simplify the above to look like the following:
- We can visualize the chain of operations as the following:
Benefit of ResNet
- Theoretically, the training error should continue to decrease as we increase the number of layers to a plain network
- Realistically, the training error begins to increase as the number of layers reaches a certain point
- This is an issue caused by the vanishing gradient problem
- Resnet is able avoid this problem
- Specifically, resnet is able to increase accuracy as the number of layers increases
Why ResNets Work?
- Suppose we begin to observe the vanishing gradient problem during training
- In this case, since the parameters and
- Therefore, the resnet will observe the following:
- Each solution will still learn something from even in worst case scenario
- In other words, will generally improve even if
- This helps prevent the vanishing gradient problem to some degree
tldr
- The classical block in resnet is a residual block
- Resnet introduces a so-called identity shortcut connection
- This connection attempts to skip one or more layers
- Theoretically, the training error should continue to decrease as we increase the number of layers to a plain network
- Realistically, the training error begins to increase as the number of layers reaches a certain point
- This is an issue caused by the vanishing gradient problem
- Resnet is able avoid this problem
- Specifically, resnet is able to increase accuracy as the number of layers increases
References
Previous
Next