Example of Causality

Defining Notation for Determining Causality

  • As an example, let's say we want to know if providing schools with laptops will cause their students to receive better SAT scores

    • The control group will include schools without laptops
    • The treatment group will include schools with laptops
  • Each variable has the following notation:

    • ii represents a given school
    • tt represents the group that school ii falls into

      • t=0t=0 if the ithi^{th} school isn't provided laptops
      • t=1t=1 if the ithi^{th} school is provided laptops
    • YY represents the average SAT scores for school ii

      • Y0Y^{0} are the average SAT scores for the control group
      • Y1Y^{1} are the average SAT scores for the treatment group

Illustrating ATEATE with SAT Scores

  • Theoretically, let's just imagine we're able to somehow capture data for each ithi^{th} school in both a treatment and control group
  • Again, this wouldn't be possible, but our data could look like this:
ii tt YY Y0Y^{0} Y1Y^{1} δ\delta
School 1 00 800800 800800 850850 5050
School 2 00 900900 900900 870870 30-30
School 3 11 11501150 11001100 11501150 5050
School 4 11 14001400 13001300 14001400 100100
  • In this case, we can calculate the population parameter ATEATE
  • Specifically, it would be the following:
ATE=850+870+1150+14004800+900+1100+13004=10671025=42ATE = \frac{850+870+1150+1400}{4} - \frac{800+900+1100+1300}{4} = 1067 - 1025 = 42
  • This indicates providing laptops to schools would increase SAT scores by 4242 points on average
  • Thus, we see there isn't any real significant difference between the control group and treatment group

Illustrating SDOSDO with Test Scores

  • In reality, customers can be in either the control or treatment group

    • For example, school 11 can't actually participate in both groups at the same time
    • For school 1, Y1Y^{1} is a counterfactual
    • For school 4, Y0Y^{0} is his counterfactual
  • As a result, our data actually looks like the following:
ii tt YY Y0Y^{0} Y1Y^{1} δ\delta
School 1 00 800800 800800 null null
School 2 00 900900 900900 null null
School 3 11 11501150 null 11501150 null
School 4 11 14501450 null 14501450 null
  • Notice, in almost all cases we can't calculate ATEATE
  • Instead, we must estimate ATEATE by calculating SDOSDO
  • Specifically, SDOSDO would be the following:
SDO=1150+14502800+9002=1300850=450SDO = \frac{1150+1450}{2} - \frac{800+900}{2} = 1300 - 850 = 450
  • Notice, this indicates providing laptops to schools would increase SAT scores by 450 points on average
  • In this case, making an assumption about causality based on SDOSDO would be a mistake
  • Again, SDOSDO is an estimator, so this difference is due to selection bias

Motivating Selection Bias in Causality

  • Bias is what separates association from causation
  • Selection bias could exist if the treatment group contains schools with more resources

    • For example, maybe the treatment group contains private schools
    • And, maybe the control group are underfunded
  • Since bias exists, we can't draw any conclusions about causal effects using SDOSDO, ATUATU, or ATTATT
  • If there wasn't any bias, we could see the treatment group has much better SAT scores compared to the control group
  • Specifically, we could see this by computing and comparing ATUATU and ATTATT
  • We can see ATUATU and ATTATT are the following:
ATT=100+502=75ATT = \frac{100+50}{2} = 75 ATU=50302=10ATU = \frac{50-30}{2} = 10
  • Notice, the ATEATE is just a weighted average of ATTATT and ATUATU

References

Previous
Next

Evaluating Assumptions

Randomized Experiments