Defining Notation for Determining Causality
-
As an example, let's say we want to know if providing schools with laptops will cause their students to receive better SAT scores
- The control group will include schools without laptops
- The treatment group will include schools with laptops
-
Each variable has the following notation:
- represents a given school
-
represents the group that school falls into
- if the school isn't provided laptops
- if the school is provided laptops
-
represents the average SAT scores for school
- are the average SAT scores for the control group
- are the average SAT scores for the treatment group
Illustrating with SAT Scores
- Theoretically, let's just imagine we're able to somehow capture data for each school in both a treatment and control group
- Again, this wouldn't be possible, but our data could look like this:
School 1 | |||||
School 2 | |||||
School 3 | |||||
School 4 |
- In this case, we can calculate the population parameter
- Specifically, it would be the following:
- This indicates providing laptops to schools would increase SAT scores by points on average
- Thus, we see there isn't any real significant difference between the control group and treatment group
Illustrating with Test Scores
-
In reality, customers can be in either the control or treatment group
- For example, school can't actually participate in both groups at the same time
- For school 1, is a counterfactual
- For school 4, is his counterfactual
- As a result, our data actually looks like the following:
School 1 | null | null | |||
School 2 | null | null | |||
School 3 | null | null | |||
School 4 | null | null |
- Notice, in almost all cases we can't calculate
- Instead, we must estimate by calculating
- Specifically, would be the following:
- Notice, this indicates providing laptops to schools would increase SAT scores by 450 points on average
- In this case, making an assumption about causality based on would be a mistake
- Again, is an estimator, so this difference is due to selection bias
Motivating Selection Bias in Causality
- Bias is what separates association from causation
-
Selection bias could exist if the treatment group contains schools with more resources
- For example, maybe the treatment group contains private schools
- And, maybe the control group are underfunded
- Since bias exists, we can't draw any conclusions about causal effects using , , or
- If there wasn't any bias, we could see the treatment group has much better SAT scores compared to the control group
- Specifically, we could see this by computing and comparing and
- We can see and are the following:
- Notice, the is just a weighted average of and
References
Previous
Next