Describing Local Regression
- Local regression is a generalization of moving average and polynomial regression
- The most common method of local regression is loess (locally estimated scatterplot smoothing)
- Local regression is used for fitting non-linear functions
- Local regression involves computing a fit for each data point using only nearby training observations (i.e. performing a moving average-like computation when fitting the model)
- Specifically, we need to fit a new weighted least squares regression model in order to obtain the local regression fit at each new point
The Local Regression Algorithm
-
Gather the fraction (i.e. span) of training points whose are closest to
- In other words, we determine the training observations considered in our local regression fit (at each point) by choosing a reasonable fraction of data points closest to our point we're fitting on
- The span plays a role like that of the tuning parameter in smoothing splines
- Specifically, it controls the flexibility of the non-linear fit
- More specifically, a smaller value of s will lead to a more local (or wiggly) fit to the data, and a larger value of s will lead to a global (or linear) fit to the data (i.e. using all of the training observations)
-
Assign a weight to each point in this neighborhood, so that the point furthest from has weight zero, and the closest has the highest weight
- In other words, we assign high weights to each training point closest to our training point we're fitting on, and we assign very small weights to each training point furthest from our training point we're fitting on
- We only assign weights to points within our fraction of training observations we're considering
- Any training point will be assigned a weight of zero if it is either outside of our fraction of observations or extremely far away from the point we are fitting on
-
Fit a weighted least squares regression of the on the using the aforementioned weights, by finding the beta-coefficients that minimize the following equation:
- In other words, estimate the beta-coefficients by minimizing the weighted least squares cost function
- The fitted value at is given by the following:
Additional Notes about the Algorithm
-
In order to perform local regression, the following choices need to be made:
- What fraction of closest training observations should we include in our group of observations (from step one)
- What should we choose for our weighting functions (from step 2)
- Do we fit a linear, constant, or quadratic regression model (i.e. step 3)
- The most important choice is the span , defined in step 1
References
Previous
Next