Describing Errors
- An error is the difference between a population parameter and its observed value
- A population parameter is typically fixed and unobserved
- In some situations, we may actually have a very close estimate of the population parameter, but we still don't know the exact population parameter because we will be off by some margin
- For example, we may have a very good guess of a population mean, such as the mean height of NBA players
- However, every measurement device has finite error inherent in the measurement device
- The best we could claim is that the population mean equals some very close estimate plus or minus the observational error or measurement error
- Specifically, our ruler could capture measurements down to 1 millimeter, so the best we can measure the mean height of NBA players is within 1 millimeter
- Therefore, errors are almost always theoretical and unobservable
Describing Residuals
- A residual is the difference between an estimated parameter and its observed value
- In other words, residuals are empirical and observable
- Roughly speaking, a residual is an estimate of the true error
- For example, the sample mean would be a good estimate of the population mean for a normally-distributed random variable
- In this case, the difference between each observation in our sample and the unobservable population mean is a statistical error:
- The difference between each observation in our sample and the observable sample mean is a residual:
- In linear regression, we define an error as the following:
- Then, we would define a residual as the following:
- Essentially, we are able to estimate an error associated with a population parameter by estimating the population parameter itself
- Then, our parameter estimate will allow us to work with residuals, meaning we'll be able to work with our data
Sampling Error
- Sampling error refers to error in our estimates caused by observing a sample instead of the entire population
- Therefore, sampling error is a property of an estimator
- Sampling error is more of a theoretical type of error, rather than an empirical type of error
- The sampling error is influenced by our sample size and sample standard deviation
- Essentially, the sampling error states that the more data we receive, the smaller our errors become
- Said another way, as our errors become smaller, the impact of estimating the errors using residuals becomes extremely marginal
Standard Error
- The standard error measures sampling error, along with some other properties about our sample
- Specifically, the standard error is used to quantify the amount of uncertainty around our point estimate
- Therefore, the standard error is a property of an estimator
- The standard error is defined as the following:
- In other words, the accuracy of our estimates depends on our sample standard deviation and our sample size
- With that being said, our sample standard deviation is typically considered fixed while our sample size can change easily
- If our sample size is small, then our sample standard deviation has a large effect on our estimates
- If our sample size is decently large, then our sample standard deviation has some effect on our estimates
- If our sample size is infinitely large, then our sample standard deviation doesn't really have an effect on our estimates
Source of Inaccuracies in Estimates
- Sampling error is one source of inaccurate estimates
- Bias can be a greater source of inaccurate estimates
References
Previous
Next