Data Science

Correlation is defined as the association (or dependence) between two variables (typically referring to a linear relationship)
Causation is defined as the cause of one variable results in an effect in the other variable
Correlation is not causation
Not even statistical dependence is causation
At best, both of these statements tell us that there's an association between two variable, but not necessarily causation

Confounding variables
- This happens when a third variable causes our response and predictor variable
- For example, ice cream sales and homicide rates are correlated, but the relationship is not causal (i.e. ice cream sales and homicides are most likely caused by weather)
Reversed causality
- This happens when our predictor variable actually causes our response variable
- An example of this is saying my bad mood causes it to rain, rather than rain causes me to be in a bad mood
Bidirectional causality
- Causality is not necessarily one-way
- An example of this is preserving grasslands causes there to be more elephants
- Also, having more elephants causes better preservation of grasslands
  - This is because elephants feed the grass with manure and play a role in the ecosystem such that more elephants create more grass and vice versa
Coincidental causality
- This happens when our predictor variable and response variable aren't related at all, but correlate by chance
- An example of this relates to alternating bald-hairy Russian leaders
- Specifically, a bald state leader of Russia has succeeded a non-bald (i.e. hairy) leader, and vice versa, for nearly 200 years

Determining correlation between two variable involves simpler statistical testing, such as hypothesis testing
Determining causation between two variables involves complicated statistical testing in a completely controlled environment using AB testing

Hypothesis Testing

Multicollinearity