Describing Bayesian Regularization
-
The MAP estimator with different prior distributions lead to different regulizers
- A MAP estimator with a zero-mean Gaussian prior equals the cost function associated with L1 regularization of OLS estimation (i.e. LASSO)
- A MAP estimator with a zero-mean Laplacean prior equals the cost function associated with L2 regularization of OLS estimation (i.e. Ridge)
- In other words, the choice of regulizer is analogous to the choice of prior over the weights in the Bayesian framework
L1 Regularization
- The cost function associated with L1 regularization is equal to a MAP estimator with a zero-mean Laplacean prior
- The Laplace distribution is similar to the Guassian distribution in form, but appears more like two exponential distributions placed back to back
- This Laplace distribution promotes sparsity, meaning the estimates of the coefficients will quickly shrink to zero
L2 Regularization
- The cost function associated with L2 regularization is equal to a MAP estimator with a zero-mean Gaussian prior
- This Gaussian distribution doesn't promote sparsity, meaning the estimates of the coefficients will slowly shrink to zero, but never quite equal exactly zero
References
Previous