Sample Statistics

Sample Mean

  • An estimator of the population mean μ\mu is represented as μ^\hat{\mu}
  • In most cases, our best estimate μ^\hat{\mu} is our sample mean Xˉ\bar{X}
  • Additionally, μ^\hat{\mu} is an unbiased estimator of the population mean μ\mu
  • The sample mean is the unbiased estimator of the population mean
  • The sample mean is just what you'd expect:
Xˉ=1ni=1nxi\bar{X} = \frac{1}{n}\sum_{i=1}^{n}x_i

Sample Variance

  • An estimator of the population variance σ2\sigma^2 is represented as σ2^\hat{\sigma^2}
  • In most cases, we'd expect our best estimate σ2^\hat{\sigma^2} to be our sample variance s2s^2
  • The sample variance s2s^2 is defined as the following:
s2=i=1n(xixˉ)ns^2 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})}{n}
  • However, the sample variance is not a perfect estimate of the population variance σ2\sigma^2
  • Specifically, it's a biased estimator of the population variance σ2\sigma^2
  • Therefore, it’s usally too small
  • The population variance σ2\sigma^2 is best estimated as the following:
σ2=nn1s2\sigma^2 = \frac{n}{n−1}s^2
  • This is the reason why we use the notation s2s^2, instead of σ2^\hat{\sigma^2}
  • The story here, heuristically, is that we tend to lose variation under sampling
  • So, measures of variation in the sample need to be corrected upwards, so this is the right correction to use
  • A sophisticated story claims that this distinction is really important in estimation, and what we really should divide through by, is not the number of data-points but the number of degrees of freedom
  • And, to get the variance, we need to estimate the mean, thereby losing one degree of freedom
  • Essentially, while we should use nn1s2\frac{n}{n−1}s^2 as our estimate of the population variance σ\sigma, if the difference between that and s² is big enough to matter, you probably should think about getting more data points

References

Previous
Next

Counting

Correlation