Panel Data

Motivating the Use of Panel Data

  • Often, our outcome variable depends on several factors

    • These factors may be observed or unobserved in our data
    • As we know, if any unobserved variables are correlated with the treatment variable, then the treatment variable is endogenous
    • Meaning, any correlations are not estimates of a causal effect
  • Panel data refers to data where we observe the same units over more than one time period

    • E.g. individuals, firms, countries, etc.
  • Panel data is very similar to time series data with one key difference

    • Time series data refers to data consisting of observations of one individual at multiple time points
    • Whereas, panel data refers to data consisting of observations of multiple individuals at multiple time points
  • Panel data is used to estimate causal effects when there are unobserved confounders (that are constant over time)

    • To do this, we make the assumption that these unobserved confounders are constant over time

Illustrating Panel Data with Unobserved Confounders

  • Panel data allows us to control for unobserved variables by using a fixed, known variable
  • For example, we can’t measure attributes like beauty and intelligence
  • But, we know that a person is the same individual across time

    • Meaning, their beauty and intelligence is fixed across time
  • So, we can create a dummy variable (i.e. their name) referring to that person with a set of fixed, unobserved variables

    • Then, we can control for their unobserved variables by adding that person to a regression model
  • This is what we mean when we say we can control for the person itself

    • We are adding a variable (dummy in this case) that denotes that particular person
  • By controlling for this dummy variable, we can estimate causal effects of a treatment on outcomes when there are unobserved confounders

Defining Types of Estimators for Panel Data

  • There are several different kinds of estimators for panel data
  • For now, we'll focus on fixed effects (FE)
  • Note, panel methods are usually based on the traditional notation

    • And, not the potential outcomes notation
    • Keep this in mind when we define their notation
  • The notation is defined as the following:

    • Let YY be our observed, outcomes
    • Let D=(D1,D2,...,DK)D = (D_{1}, D_{2}, ..., D_{K}) be a set of kk observed, variables
    • Let uu be an unobservable random variable
    • We're interested in the partial effects of variable DjD_{j} in the population regression function:
    E[YD1,D2,...,DK,u]E[Y | D_{1}, D_{2}, ..., D_{K}, u]
    • Thus, our regression model looks like the following for an observation ii:
    Yit=δ1Dit1+δ2Dit2+...+δKDitKY_{it} = \delta_{1} D_{it1} + \delta_{2} D_{it2} + ... + \delta_{K} D_{itK}
    • And, the entire panel (sample of data) looks like the following for an ithi^{th} observation:
    Yi=(Yi1YitYiT)T×1Di=(Di11Di12Di1jDi1KDit1Dit2DitjDitKDiT1DiT2DiTjDiTK)T×KY_{i} = \begin{pmatrix} Y_{i1} \\ \vdots \\ Y_{it} \\ \vdots \\ Y_{iT} \end{pmatrix}_{T \times 1} D_{i} = \begin{pmatrix} D_{i11} & D_{i12} & D_{i1j} & \dots & D_{i1K} \\ \vdots & \vdots & \vdots & & \vdots \\ D_{it1} & D_{it2} & D_{itj} & \dots & D_{itK} \\ \vdots & \vdots & \vdots & & \vdots \\ D_{iT1} & D_{iT2} & D_{iTj} & \dots & D_{iTK} \end{pmatrix}_{T \times K}

Illustrating Fixed Effects

  • Generally, a fixed effect model is defined as the following:
yit=βXit+αUi+ϵity_{it} = \beta X_{it} + \alpha U_{i} + \epsilon_{it}
  • Here, yity_{it} is the outcome of the ithi^{th} individual at time tt

    • Let XitX_{it} be a set of observed variables for individual ii at time tt
    • Let UiU_{i} be a set of unobserved variables for individual ii

      • Notice, these unobservables are assumed to be fixed over time
      • Hence, the lack of the time tt subscript
    • Finally, ϵit\epsilon_{it} is the error term
  • As an example, yity_{it} could represent wages

    • And XitX_{it} are the observed variables that change over time

      • E.g. marriage and experience
    • And UiU_{i} are the unobserved variables that are constant over time

      • E.g. beauty and intelligence

Defining Fixed Effects

  • The fixed effects model gets the average for every person in our panel
  • Essentially, the individual dummy is regressed on the other variables
  • This motivates the following estimation procedure:

    1. Create time-demeaned variables by subtracting the mean for the individual:
    Y¨it=YitYˉi\ddot{Y}_{it} = Y_{it} - \bar{Y}_{i} X¨it=XitXˉi\ddot{X}_{it} = X_{it} - \bar{X}_{i}
    1. Regress Y¨it\ddot{Y}_{it} on X¨it\ddot{X}_{it}
    Y¨it=βX¨it+e¨it\ddot{Y}_{it} = \beta \ddot{X}_{it} + \ddot{e}_{it}
  • Notice, the unobserved variables UiU_{i} vanishes after performing the above transformation, since UiU_{i} is constant over time

    • Actually, even observed variables XiX_{i} that are constant over time are eliminated after performing the above transformation
    • For this reason, including any variables that are constant across time would be removed, since they would be a linear combination of the dummy variables

Defining the Identifying Assumptions

  • To identify β\beta with a fixed effects model, we must satisfy the following assumptions:

    1. E[ϵitDi1,Di2,...,DiT,u]=0E[\epsilon_{it} | D_{i1}, D_{i2}, ..., D_{iT}, u] = 0

      • In other words, there can't be any unobserved variables changing over time
    2. rank(t=1TE[D¨itD¨it])=Krank(\sum_{t=1}^{T} E[\ddot{D}_{it}' \ddot{D}_{it}]) = K

      • Meaning, there aren't any collinear observed variables XX

Describing the Problem with Panel Data

  • Panel data is useful when controlling for confounding with non-random data (i.e. non-experimental data)
  • However, it isn't great for every scenario, due to its assumptions
  • There are two common situations when panel data doesn't work effectively to estimate causal effects:

    1. When we have reverse causality
    2. When unmeasured confounding is changing in time

References

Previous
Next

Instrumental Variables

Difference-in-Differences