Motivating Smoothing Splines
- We create a regression splines by specifying a set of knots, producting a sequence of basis functions (constraints), and then using least squares to estimate the spline coefficients
- Smoothing splines are created using a different approach
- Instead, smoothing splines are created by finding a function that minimizes the with some penalty
- This notion of minimizing the loss function + penalty term is also found in ridge regression and lasso
Describing Smoothing Splines
- A spline is a special function defined piecewise by polynomials
- Again, smoothing splines are created by minimizing the RSS with some constraints
- If we don't put any constraints on , then we will always overfit the data by interpolating all of the data, which will always make zero
- Therefore, we want to find the the function that minimizes the with added constraints, which will make the curve smooth
- A smoothing spline is the function that minimizes the and the penalty term
Finding a Function
- We find a function by minimizing the following:
- The term refers to our loss function
- The term refers to our penalty term
- The loss function encourages to fit the data well
- The penalty term penalizes the variability in
-
The notation indicates the second dervative of the function
- The first derivative would measure the slope of a function at
- The second dervative corresponds to the amount by which the slope is changing
- Roughly speaking, the second derivative is a measure of its roughness
- Specifically, it is large if is very wiggly near , and it is close to zero otherwise
- For example, the second derivative of a straight line is zero, since a straight line is perfectly smooth
- The notation is an integral, which we can think of as a summation over the range of
- In other words, is simply a measure of the total amount of roughness across the function
Summarizing the Minimization Formula
- The loss function encourages to fit the data well
- The penalty term penalizes the variability in
- Roughly speaking, the second derivative of a function is a measure of its roughness
- Minimizing the error and roughness is what we want, since we want a smooth curve that fits well
- When the tuning parameter , then the penalty term has no effect
- In this case, the function will perfectly fit the data, causing overfitting
- When the tuning parameter , then will be perfectly smooth
-
This is case, will just be a straight line
- Specifically, will be the linear least squares line
- Essentially, the tuning parameter controls the bias-variance trade-off of the smoothing spline
Finding the Best Smoothing Spline
- The function that minimizes the smoothing spline function is a natural cubic spline with knots at each data point
-
Specifically, the function that minimizes the smoothing spline is a piecewise cubic polynomial with the following properties:
- Knots at the unique values of
- Continuous first derivative
- Continuous second derivative
- However, it is not the same natural cubic spline that one would get if one applied the basis function approach occurring in spline regression
- Instead, it is a shrunken version of the natural cubic spline found in spline regression, where the value of the tuning parameter in the smoothing spline function controls the level of shrinkage
Choosing the Smoothing Parameter
- We have seen that a smoothing spline is simply a natural cubic spline with knots at every unique value of
- It might seem that a smoothing spline will have far too many degrees of freedom, since a knot at each data point allows a great deal of flexibility
- However, the tuning parameter will control the roughness of the smoothing spline, and hence the effective degrees of freedom
- Roughly speaking, as the tuning parameter increases from to , the effective degrees of freedom decrease from to
- Therefore, we do not need to select the number or locations of the knots, since there will be a knot at each training observation when fitting a smoothing spline
- Instead, we need to choose the value of the tuning parameter that makes the cross-validated as small as possible
- The can be computed very efficiently for smoothing splines
References
Previous
Next