Motivating the Dying ReLU Problem
- ReLU doesn't suffer from the vanishing gradient problem as much as other activation functions
- However, the relu function still has a vanishing gradient problem, but only on one side
- Therefore, we call it something else
- We call it the dying relu problem instead
Describing the Dying ReLU Problem
- Since the relu function returns for all negative inputs, the gradient of negative sums is also
- Indicating, a neuron stops learning once it becomes negative
- This usually happens because of a very large negative bias
- Since the gradient is always , then a neuron is unlikely to recover
- Therefore, the weights will not adjust in gradient descent
- This is good if we are at a global minimum
- However, we'll frequently get stuck at local minimums and plateaus because of the dying relu problem
The Vanishing Gradient and Dying ReLU Problem
- The derivatives of many activation functions (e.g. tanh, sigmoid, etc.) are very close to
- In other words, if the gradient becomes smaller, then the slower and harder it is to return into the good zone
- Roughly speaking, this demonstrates a major effect of the vanishing gradient problem
- The gradient of the relu function doesn't become smaller in the positive direction
- Therefore, the relu function doesn't suffer from the vanishing gradient problem in the positive direction
Introducing the Leaky ReLU
- The leaky relu attempts to solve the dying relu problem
- Specifically, the leaky relu does this by providing a very small gradient for negative values
- This represents an attempt to allow neurons to recover
- We can define the leaky relu function as the following:
- Unfortunately, the leaky relu doesn't perform as well as the relu
- Also, there isn't much of an accuracy boost in most circumstances
tldr
- ReLU doesn't suffer from the vanishing gradient problem as much as other activation functions
- However, the relu function still has a vanishing gradient problem, but only on one side
- Therefore, we call it something else
- We call it the dying relu problem instead
- The leaky relu attempts to solve the dying relu problem
References
Previous
Next