Illustrating Causality using DAGs
- A DAG is a theoretical representation of the knowledge about a studied phenomena
- In reality, causality can run in multiple directions
-
However, causality runs in one direction using DAG notation
- Specifically, it only runs forward in time
- Meaning, there are no cycles in a DAG
-
DAGs are useful for explaining causality in terms of counterfactuals
-
A causal effect is defined as a comparison between two states:
- One state that actually happened when some intervention took on some value
- And another state that didn’t happen (i.e. the counterfactual) under some other intervention
-
- We can think of a DAG as a graphical representation of a chain of causal effects
Defining Notation for DAGs
- DAGs consist of a combination of nodes and arrows
-
Nodes represent random variables
- These random variables are created by a data-generating process
-
Arrows represent a causal effect between two random variables
- The direction of the arrow captures the direction of causality
-
Causal effects can happen in two related ways:
- Direct relationship:
- Indirect relationship:
Defining Bi-Directional and Circular Paths
- A circular path can't be represented using DAGs
- However, circular paths appear all the time in reality
-
For example, there is a causal effect of learning resources on IQ
- Also, there is a causal effect of IQ on income
- And, there is a causal effect of income on learning resources
- Similarly, bi-directional paths can't be represented using DAGS
- Bi-directional paths are very similar to circular relationships
-
For example, there is a causal effect of chances of getting an interview on amount of experience
- Also, there is a causal effect of amount of experience on chances of getting an interview
Illustrating a Simple DAG
- The below DAG has three random variables: , , and
-
Here, there is a direct path
- This represents a causal effect
-
Also, there is a backdoor path
- This isn't a causal effect
- This path creates spurious correlations between and
- Open backdoor paths are a common source of bias
-
In this example, is known as a confounder
- This is because jointly determines and
- So, confounds our ability to determine the effect of and in naive comparisons
Defining a Causal Path
- The below DAG illustrates a causal path
- Here, we see a treatment linked to an outcome
- Notice, there is a mediating variable between them
- Mediating variables are those which form part of a causal pathway
-
For example, sexual promiscuity is a risk factor for HPV
- Also, HPV is a risk factor for cervical cancer
- So in this case, HPV is a mediating variable
Defining a Confounding Path
- The confounder is a parent of both treatment and outcome
- Here, is known as the confounder
-
For example, there is a causal effect of weather on ice cream sales
- Also, there is a causal effect of weather on sunburns
- As a result, there are spurious correlations created between ice cream sales and sunburns
- Every open backdoor path has a confounder, but not all confounders indicate a backdoor path is open
Defining a Colliding Path
- A collider is a child of both treatment and outcome
- Here, is known as the collider
-
For example, there is a causal effect of pneumonia on hospital admittance
- Also, there is a causal effect of a stroke on hospital admittance
- Colliders close a backdoor path when they're excluded from a model
Defining a Backdoor Path
-
A backdoor path is a non-causal path from a node to node that would remain if any arrows pointing out of were removed
- These removed arrows are potentially causal paths
-
The most common example of a backdoor path is a confounding path
- But, not all confounding paths are backdoor paths
-
The following DAG is an example of an open backdoor path:
- There is a causal effect of smoking on obesity
- There is a causal effect of smoking on mortality
- These is a causal effect of obesity on mortality
Defining an Open Backdoor Path
-
A backdoor path is open if the following are true:
- There is a causal effect of on
- There is a common ancestor of and
-
An open backdoor path is the most common source of bias
- Thus, our goal is to close backdoor paths
- Every open backdoor path has a confounder, but not all confounders indicate a backdoor path is open
Creating Bias with Open Backdoor Paths
-
There are three reasons a backdoor path can be open:
- We could be conditioning on a collider
- We could be conditioning on a mediator to a collider
- We may not be capturing or controlling for an unobserved confounder
-
In summary, there are three basic types of open backdoor paths
- Confounding bias
- Selection bias due to conditioning on a collider
-
Selection bias due to conditioning on a mediator
- A mediator is a variable between the treatment and outcome
Closing Open Backdoor Paths
-
There are two ways to close an open backdoor path:
-
Conditioning on a confounder
- Obviously, we can only do this if a confounder exists on an open backdoor path
- Conditioning on a variable is equivalent to fixing (or including) a variable in our regression model
-
Not conditioning on a collider or its mediators
- Not conditioning on a collider always closes a backdoor path
- Not conditioning on a mediator to a collider always closes a backdoor path
- Not conditioning on a variable is equivalent to excluding a variable from our regression model
-
- Both methods must be enforced in order to close all open backdoor paths