Linear Regression

Describing Linear Regression

  • Regression is a statistical technique for estimating the relationship among a continuous response variable and a set of predictor variables
  • Linear regression is a type of regression that models a linear relationship among a continuous response variable and its predictor variables
  • In other words, linear regression assumes a constant rate of change between a response variable and its predictor variables
  • The most common approach for estimating the population parameters involved in the linear regression model is the method of least squares

    • There are other ways of estimating the population parameters, such as maximum likelihood estimation (or MLE)
  • Roughly speaking, residuals are estimates (or realizations) of any random error described by the error term ϵ\epsilon

Model Components

  • In linear regression, our true data generating process could look like the following:
Y=β0+βX+ϵY = \beta_{0} + \beta X + \epsilon YN(β0+βX,σ2)Y \sim N(\beta_{0}+\beta X, \sigma^{2}) E[Y]=β0+βXE[Y] = \beta_{0} + \beta X Var[Y]=σ2Var[Y] = \sigma^{2}
  • In this case, the response variable YY, the predictor variable XX, and the error term ϵ\epsilon are all random variables
  • On the other hand, the β0\beta_{0} and β1\beta_{1} coefficients are our fixed population parameters
  • Usually, we're interested in the mean of the response variable conditional on our predictor variables, which looks like this:
E[YX]=μYX=β0+βXE[Y|X] = \mu_{Y|X} = \beta_{0} + \beta X
  • We'll typically estimate these conditional means using MLE or OLS parameter estimates:
μYX^=β0^+β^X\hat{\mu_{Y|X}} = \hat{\beta_{0}} + \hat{\beta} X
  • The response variable conditional on the predictor variables is denoted as the following:
YX=β0+βX+ϵY|X = \beta_{0} + \beta X + \epsilon YXN(β0+βX,σ2)Y|X \sim N(\beta_{0}+\beta X, \sigma^{2}) E[YX]=β0+βXE[Y|X] = \beta_{0} + \beta X Var[YX]=σ2Var[Y|X] = \sigma^{2}
  • The error term conditional on the predictor variables is denoted as the following:
ϵXN(0,σ2)\epsilon|X \sim N(0, \sigma^{2}) E[ϵX]=0E[\epsilon|X] = 0 Var[ϵX]=σ2Var[\epsilon|X] = \sigma^{2}

Assumptions of the Gaussian-Noise Simple Linear Regression Model

  1. The distribution of any predictor variable is unspecified (possibly even deterministic)
  2. The relationship between the response variable and each value of the predictors variables is linear:
YX=β0+βX+ϵY|X = \beta_{0} + \beta X + \epsilon
  1. The error term is normally distributed with a mean of 0 and constant variance for all values of XX
ϵXN(0,σ2)\epsilon|X \sim N(0,\sigma^{2})
  • Which implies the error term is uncorrelated across observations and is uncorrelated with predictors
  • In other words, this implies ϵ\epsilon is independent of observations, and ϵ\epsilon is independent of the predictor variables
  • Said another way, homoscedasticity is maintained (i.e. constant variance):
E[ϵX=x]=0E[\epsilon|X=x] = 0 Var[ϵX=x]=σ2Var[\epsilon|X=x] = \sigma^{2}

Benefits of Assuming the the Gaussian-Noise Model

  • We can use the Central Limit Theorem

    • The noise might be due to adding up the effects of lots of little random causes, all nearly independent of each other and of X , where each of the effects are of roughly similar magnitude
    • Then the central limit theorem will take over, and the distribution of the sum of effects will indeed be pretty Gaussian
    • Therefore, the Central Limit Thorem makes the assumption that ϵ\epsilon is normally-distributed
  • It will be mathetmatically convenient

    • Assuming Gaussian noise lets us work out a very complete theory of inference and prediction for the model, with lots of closed-form answers to questions like:

      • What is the optimal estimate of our β\beta terms using MLE or OLS?
      • Can we assume our β\beta estimates are Gaussian?
      • What is the optimal estimate of the variance?
      • What is the probability that we'd see a fit this good from a line with a non-zero intercept if the true like goes through the origin?
    • Answering such questions without the Gaussian-noise assumption needs somewhat more advanced techniques, and much more advanced computing

Clarifying Properties of Linear Models

  • The predictor variables do not need to be normally distributed
  • The reponse variable does not need to be normally distributed
  • The response variable conditional on the predictor variables needs to be normally distributed

Clarifying Properties of Residuals

  • The residuals should have an expected value of zero:
E[ϵX=x]=0E[\epsilon|X=x] = 0
  • The residuals should show a nearly constant variance:
Var[ϵX=x]=σ2Var[\epsilon|X=x] = \sigma^{2}
  • We don't expect the residuals to ever be completely uncorrelated with each other, but the correlation should be extremely weak and grow negligable as nn \rarr \infty
  • The residuals should be Gaussian, since the errors should be Gaussian

References

Previous
Next

Ordinary Least Squares Estimation

Generalized Linear Models