Motivating Numerical Approximation of Gradients
- Sometimes we want to verify that our implementation of backward propagation is correct
- For these situations, we can perform gradient checking
- Gradient checking requires us to have numerical approximations of our gradients
- Therefore, we need to make numerical approximations of our gradients before we perform gradient checking
Why are Approximations Needed?
- The first question that comes to mind is: why do we need to approximate derivatives at all?
- Shouldn't we know how to analytically differentiate functions?
-
The following are some reasons:
- We usually don't know the exact underlying function
- We may be interested in studying changes in the data
- We have an exact formula available, but it's too complicated
Defining Gradient Approximation
-
When approximating changes in our cost function with respect to any parameter , we should do the following:
- Nudge them by a tiny amount in each direction
- Calculate the following change:
Defining Gradient Checking
-
Reformat into vectors
- Where the matrix becomes a vector
- Where the vector remains a vector
-
Reformat into vectors
- Where the matrix becomes a vector
- Where the vector remains a vector
-
Concatenate the new parameter vectors into one big vector
- Where the vectorized are concatenated into one big vector
-
Concatenate the new derivative vectors into one big vector
- Where the vectorized are concatenated into one big vector
-
Approximate gradients using gradient approximation
- Where represents our gradient using approximation
- Where is a formula that is based on
- Where the approximated gradients for are concatenated together into the big vector
-
Check if
- This is a check to see if is the gradient of
- If not, then there might be a bug in our code
- We use the euclidean distance to compute the similarity of these two vectors and
- Then, we normalize the similarity score
- Specifically, we calculate the following:
-
Evaluate the similarity score
- Typically, we'll use an for gradient checking
- An output is great, meaning they're the same
- An output is okay, meaning they basically the same, but might need double checking
- An output or smaller is concerning, meaning there is most likely a bug in our backward propagation code
Gradient Checking Implementation Footnotes
-
Gradient Checking should be used for debugging purposes only
- It should not be used in training
- Specifically, we should only calculate in training
- Afterwards, we would calculate
- Then, we'd compare and
- We would never make this comparison during training, since this would destroy the training performance
-
If gradient checking fails, then look at components to identify a bug
- In other words, we should look at certain areas of the vector representing different layers or parameters
- For example, we should look compare and if there is a bug
- Then, we may find that areas representing aren't similar, whereas all the areas representing are similar
-
Remember to include regularization
- If includes a regularization term, then needs to reflect the regularization term as well
- We can't approximate gradients easily with dropout
tldr
- Sometimes we want to verify that our implementation of backward propagation is correct
- For these situations, we can perform gradient checking
- Gradient checking requires us to have numerical approximations of our gradients
- Therefore, we need to make numerical approximations of our gradients before we perform gradient checking
- We'll want to approximate our derivatives to double-check that our gradients in the backward propagation implementation are correct
References
Previous
Next