Defining Causal Inference
- Causal inference involves estimating the impact of events on a given outcome of interest
- It involves determining the independent, actual effect of a phenomenon that is part of a larger system
- Causal inference attempts to observe the response of an effect variable when a cause of the effect variable is changed
Correlation is not Causation
- When the rooster crows, the sun rises soon afterwards
- We know the rooster didn’t cause the sun to rise
- If the rooster had been eaten by a cat, the sun still would have risen
- In other words, a rooster's crow is correlated with the sun rising
- But, a rooster's crow doesn't cause the sun to rise
Comparing Prediction with Causation
-
Making predictions requires strict boundaries, where the data used to train the model doesn't really change
- So, it can be useful for translating from english to portuguese
- It can be useful for recognizing faces
- It can be useful for classifying sentiments
-
Determining causality requires us to answer what if questions
- For example, what if we change the price of our product?
- What if I do a low fat diet, instead of a low sugar diet?
- Or, how does revenue change if we make modifications to our customer line?
Describing Potential Outcomes
- Theoretically, each observation has two potential outcomes
- In reality, each observation only has one actual outcome
- In other words, actual outcomes are realized outcomes
-
Whereas, potential outcomes are hypothetical random variables
- In other words, they are usually estimated using predictions
- Potential outcomes are defined by
- Note, each observation has two separate potential outcomes
- And, refers to an actual, known control group or treatment group
- Whereas, refers to an unknown, potential outcome for each group
Defining Notation for Causal Data
- Causal data usually follows a specific type of notation
- represents a given observation
-
represents the actual and known group that has observation
- when the customer is in the control group
- when the customer is in the treatment group
-
represents the unknown variable being measured in the study
- represents of only customers in the control group
- represents of only customers in the treatment group
-
represents the unit-specific treatment effect
- This can be used to measure a causal effect
Describing the Struggle with Causality
-
The fundamental problem with causal inference is that we can never observe the same unit with and without treatment
- In other words, we're wanting to compute
- But, we can only observe either or
- Thus, it's impossible to be certain abound causal effects
- As a result, we can't know individual treatement effects
- Instead, we estimate the treatment of the overall group as an average
References
Next