Motivating Matching for Causality
- The goal of any causal analysis is to isolate some causal effect
-
To do this, we must satisfy the backdoor criterion in our study
- Meaning, we must close all open backdoor paths
- Closing backdoor paths can be achieved through carefully performing conditioning strategies in our study
-
Roughly, there are three different types of conditioning strategies:
- Subclassification
- Exact matching
- Approximate matching
Motivating the Conditional Independence Assumption
- Conditional independence assumption (or CIA) states that a treatment assignment is independent of potential outcomes after conditioning on observed covariates
-
Sometimes we know that randomization occurred only conditional on some observable characteristics
- This would violate the backdoor path criterion
-
In order to estimate a causal effect when there is a confounder, we must satisfy CIA
- In DAGs notation, this refers to enforcing closed paths everywhere for confounders
- Meaning, CIA implies there isn't any confounding bias
Introducing Matching for Estimating
- Matching is one of three conditioning method used for satisfying the backdoor criterion
- Matching estimates by imputing missing potential outcomes by conditioning on the confounding
- Specifically, we could fill in missing potential outcomes for each treatment unit using a control group unit that was closest to the treatment group unit for some confounder
- This would give us estimates of all the counterfactuals from which we could simply take the average over the differences
- Specifically, matching ensures that CIA isn't violated
Using Matching instead of Subclassification
- Subclassification uses the difference between treatment and control group units and achieves covariate balance by using the probability weights to weight the averages
- As long as there is enough data for stratifying our covariates, subclassification can be a viable option
- However, if subclassification suffers from the curse of dimensionality, then we must use other methods (like matching)
- Typically, curse of dimensionality exists, so we'll prefer other methods like matching
- Specifically, subclassification is a weighting method used on all individuals, regardless of the overlap of distributions
- Whereas, matching is a form of stratification (or sampling method) that attempts to match distributions
Illustrating Exact Matching
-
Suppose we have the following data:
- Where, our earnings is
- And, our age is a confounder
-
And, an observation is either a trainees or non-trainees
- Which represents our treatment variable
Trainees | Non-Trainess | Matched Sample | ||||||
---|---|---|---|---|---|---|---|---|
Unit | Age | Earnings | Unit | Age | Earnings | Unit | Age | Earnings |
1 | 18 | 9500 | 1 | 20 | 8500 | 14 | 18 | 8050 |
2 | 29 | 12250 | 2 | 27 | 10075 | 6 | 29 | 10525 |
3 | 24 | 11000 | 3 | 21 | 8725 | 9 | 24 | 9400 |
4 | 27 | 11750 | 4 | 39 | 12775 | 2 | 27 | 10075 |
5 | 33 | 13250 | 5 | 38 | 12550 | 11 | 33 | 11425 |
6 | 22 | 10500 | 6 | 29 | 10525 | 13 | 22 | 8950 |
7 | 19 | 9750 | 7 | 39 | 12775 | 17 | 19 | 8275 |
8 | 20 | 10000 | 8 | 33 | 11425 | 1 | 20 | 8500 |
9 | 21 | 10250 | 9 | 24 | 9400 | 3 | 21 | 8725 |
10 | 30 | 12500 | 10 | 30 | 10750 | avg(10,18) | 30 | 9875 |
11 | 33 | 11425 | ||||||
12 | 36 | 12100 | ||||||
13 | 22 | 8950 | ||||||
14 | 18 | 8050 | ||||||
15 | 43 | 13675 | ||||||
16 | 39 | 12775 | ||||||
17 | 19 | 8275 | ||||||
18 | 30 | 9000 | ||||||
19 | 51 | 15475 | ||||||
20 | 48 | 14800 | ||||||
Mean | 24.3 | $11075 | 31.95 | $11101 | 24.3 | $9380 |
-
Notice, the treatment and control groups have different age distributions
- So, we create a third group sampling from the non-trainees group to match the age distribution of the trainess group
- By imputing missing counterfactuals, we satisfy the CIA (which would have been violated otherwise)
- Now, estimating on this matched sample provides a better estimate:
-
And, refers to the unit matched to the unit based on the being closest to the unit for some covariate
- Here, refers to an index in the treatment group
- Whereas, refers to an index in the control group
Motivating Approximate Matching
- Exact matching works well if we can find another unit with that exact same value we're looking for in the other group
- Otherwise, we'll need to us approximate matching
References
Previous
Next