Motivating Regularization
- Regularization is a variable selection technique used in regression, roughly speaking
- We typically use regularization when when we have a large amount of variables in our model (and a smaller amount of data), since this type of model typically leads to overfitting
- Regularization will significantly reduce the variance of the model without substantially increasing the bias
- In other words, regularization is commonly used to avoid overfitting
- To achieve this, regularization methods involve shrinking OLS coefficients, where only the most significant coefficients remain (as we increase the tuning parameter to the most optimal value)
- Essentially, regularization coefficients are just OLS coefficients with an added penalty
Steps of Basic Regularization Methods
-
Standardize data
- Centering the variables will means there is no longer an intercept
- This is useful because the intercept will not be much of a factor in any shrinkage
- Minimize the objective function (i.e. OLS objective function) including an added penalty to the OLS coefficients, in order to find the most significant coefficients
References
Next