Motivating Normalization of Inputs
- If the scale of our input data is vastly different across features, then our cost function may take much longer to converge to
- In other words, gradient descent will take a long time to find optimal parameter values for and if our data is unnormalized
- In certain situations, the cost function may not even converge to depending on the scale of our input data and the size of our learning rate for gradient descent
Illustrating the Need for Normalization
- Let's say we define our cost function as the following:
- And our input data looks like the following:
- The contour could look like the following after gradient descent:
- Where the darker colors represents a smaller value
Illustrating the Goal of Normalization
- Using out input data from before, our normalized data could look like the following:
- The contour could look like the following after gradient descent:
- Note that there isn't a great need for normalization if our input data is on the same relative scale
- However, normalization is especially important if the scale of our input data varies across features dramatically
- To be safe, we should just always normalize our data
Normalization Algorithm
-
Normalizing input data includes the following steps:
- Center data around the mean
- Normalize the variance
- Centering around will make it so the new mean of is
- Normalizing the variance of will make the new variance of equal to
- Input data should be normalized for both the training and test sets
- We should first calculate and for the training set
- Then, the test set should use those same parameters and from the training set
- We should not calculate one set of parameters for the training set, and a different set of parameters for the test set
- In other words, our training and test sets should be scaled in the exact same way
tldr
- We need to normalize our input data to improve training performance
-
Normalization involves the following steps:
- Centering around will make it so the new mean of is
- Normalizing the variance of will make the new variance of equal to
- Input data should be normalized for both the training and test sets
- Normalization involves calculating parameters and for each of our input features
- We should use the same and parameters for both the training set and test set
- To reiterate, this form of normalization can only be applied to our input data
- This can't be applied to any activations in our hidden layers
References
Previous
Next