Data Science

The goal of any causal analysis is to isolate some causal effect
To do this, we must satisfy the backdoor criterion in our study
- Meaning, we must close all open backdoor paths
Closing backdoor paths can be achieved through carefully performing conditioning strategies in our study
Roughly, there are three different types of conditioning strategies:
- Subclassification
- Exact matching
- Approximate matching

Conditional independence assumption (or CIA) states that a treatment assignment is independent of potential outcomes after conditioning on observed covariates
Sometimes we know that randomization occurred only conditional on some observable characteristics
- This would violate the backdoor path criterion
In order to estimate a causal effect when there is a confounder, we must satisfy CIA
- In DAGs notation, this refers to enforcing closed paths everywhere for confounders
- Meaning, CIA implies there isn't any confounding bias

(Y^{1}, Y^{0}) \perp T | X

Matching is one of three conditioning method used for satisfying the backdoor criterion
Matching estimates $ATE$ by imputing missing potential outcomes by conditioning on the confounding
Specifically, we could fill in missing potential outcomes for each treatment unit using a control group unit that was closest to the treatment group unit for some $X$ confounder
This would give us estimates of all the counterfactuals from which we could simply take the average over the differences
Specifically, matching ensures that CIA isn't violated

Subclassification uses the difference between treatment and control group units and achieves covariate balance by using the $K$ probability weights to weight the averages
As long as there is enough data for stratifying our covariates, subclassification can be a viable option
However, if subclassification suffers from the curse of dimensionality, then we must use other methods (like matching)
Typically, curse of dimensionality exists, so we'll prefer other methods like matching
Specifically, subclassification is a weighting method used on all individuals, regardless of the overlap of distributions
Whereas, matching is a form of stratification (or sampling method) that attempts to match distributions

Suppose we have the following data:
- Where, our earnings is $Y$
- And, our age is a confounder $X$
- And, an observation is either a trainees or non-trainees
  - Which represents our treatment variable $T$

Notice, the treatment and control groups have different age distributions
- So, we create a third group sampling from the non-trainees group to match the age distribution of the trainess group
- By imputing missing counterfactuals, we satisfy the CIA (which would have been violated otherwise)
Now, estimating $ATE$ on this matched sample provides a better estimate:

\hat{\delta}_{ATE} = \frac{1}{N} \sum_{i=1}^{N} (2D_{i}-1)(Y_{i} - (\frac{1}{M} \sum_{m=1}^{M} Y_{j_{m}i}))

And, $Y_{j(i)}$ refers to the $j^{th}$ unit matched to the $i^{th}$ unit based on the $j^{th}$ being closest to the $i^{th}$ unit for some $X$ covariate
- Here, $j$ refers to an index in the treatment group
- Whereas, $i$ refers to an index in the control group

Exact matching works well if we can find another unit with that exact same value we're looking for in the other group
Otherwise, we'll need to us approximate matching

Subclassification

Approximate Matching

Exact Matching