Describing OLS Estimation
- The ordinary least squares criterion is a method for estimating regression coefficients (i.e. population parameters)
- The least squares criterion involves minimizing the sum of squares of the residuals (i.e. )
- In other words, the least squares criterion chooses coefficient estimates, such as , that minimize the
- Using the least squares approach to find the population parameters, we can use some calculus to find equations for the coefficient estimates that minimize the residual sum of squares
- For example, we can use calculus to find the following coefficient estimates:
Residual Sum of Squares
- The residual sum of squares is defined as the sum of the squared residuals
- Mathematically, we write the equation for the residual sum of squares as the following
- Since is the difference between our observations and predictions, the formula can be written as the following:
- In the case of linear regression, our formula looks like this:
The True Regression Line
- The least squares regression line is the linear regression line represented by our coefficient estimates
- For example, our least squares regression line could look like the following:
- The true regression line is a linear regression line represented by our population parameters and some random error:
- The least squares regression line is our best guess at representing the true regression line, assuming the true relationship is linear
MLE versus OLS
- Minimizing the squared error is equivalent to maximizing the likelihood when the errors are normally distributed (i.e. in the case of linear regression)
- We can use MLE for predicting normally-distributed values in linear regression, or other response variable that have a non-normal distribution
- In other words, we can use MLE for predicting the parameters of our response variable, which could be a bernoulli-distributed random variable, exponentially-distributed random variable, poisson-distributed random variable, etc.
- In this case, we would map the linear predictor to the non-normal distribution of the response variable using a link function
- Then, the likelihood function becomes the product of all the marginal probabilities of the outcomes after the transformation of the predictor variables, assuming independence
References
Previous
Next