What is Causality?

Defining Causal Inference

  • Causal inference involves estimating the impact of events on a given outcome of interest
  • It involves determining the independent, actual effect of a phenomenon that is part of a larger system
  • Causal inference attempts to observe the response of an effect variable when a cause of the effect variable is changed

Correlation is not Causation

  • When the rooster crows, the sun rises soon afterwards
  • We know the rooster didn’t cause the sun to rise
  • If the rooster had been eaten by a cat, the sun still would have risen
  • In other words, a rooster's crow is correlated with the sun rising
  • But, a rooster's crow doesn't cause the sun to rise

Comparing Prediction with Causation

  • Making predictions requires strict boundaries, where the data used to train the model doesn't really change

    • So, it can be useful for translating from english to portuguese
    • It can be useful for recognizing faces
    • It can be useful for classifying sentiments
  • Determining causality requires us to answer what if questions

    • For example, what if we change the price of our product?
    • What if I do a low fat diet, instead of a low sugar diet?
    • Or, how does revenue change if we make modifications to our customer line?

Describing Potential Outcomes

  • Theoretically, each observation has two potential outcomes
  • In reality, each observation only has one actual outcome
  • In other words, actual outcomes are realized outcomes
  • Whereas, potential outcomes are hypothetical random variables

    • In other words, they are usually estimated using predictions
  • Potential outcomes are defined by YtY^{t}
  • Note, each ithi^{th} observation has two separate potential outcomes
  • And, tt refers to an actual, known control group or treatment group
  • Whereas, YY refers to an unknown, potential outcome for each group

Defining Notation for Causal Data

  • Causal data usually follows a specific type of notation
  • ii represents a given observation
  • tt represents the actual and known group that has observation ii

    • t=0t=0 when the ithi^{th} customer is in the control group
    • t=1t=1 when the ithi^{th} customer is in the treatment group
  • YY represents the unknown variable being measured in the study

    • Y0Y^{0} represents YY of only customers in the control group
    • Y1Y^{1} represents YY of only customers in the treatment group
  • δ\delta represents the unit-specific treatment effect

    • This can be used to measure a causal effect
Yi=tiYi1+(1+ti)Yi0Y_{i} = t_{i} Y_{i}^{1} + (1+t_{i})Y_{i}^{0} δi=Yi1Yi0\delta_{i} = Y_{i}^{1} - Y_{i}^{0}

Describing the Struggle with Causality

  • The fundamental problem with causal inference is that we can never observe the same unit with and without treatment

    • In other words, we're wanting to compute δi\delta_{i}
    • But, we can only observe either Yi1Y_{i}^{1} or Yi0Y_{i}^{0}
    • Thus, it's impossible to be certain abound causal effects
  • As a result, we can't know individual treatement effects δi\delta_{i}
  • Instead, we estimate the treatment of the overall group as an average

References

Next

Measuring Causality