Difference-in-Difference 101. What is Difference-in-difference (DiD⦠| by Henam Singla | May, 2024
Our research question is: what is the effect of treatment D on outcome y? DiD allows us to estimate what would have happened to the treatment group if the intervention had not occurred. This counterfactual scenario is essential for understanding the true effect of the treatment. Every job or work revolves around answering similar questions like the effect of interventions, policy changes, or treatments across various fields. In economics, it assesses the impact of tax cuts on economic growth, while in public policy, it evaluates the effects of new traffic laws on accident rates. In marketing, DiD analyzes the influence of advertising campaigns on sales.
For example, in the diagram above, we have population data in our sample. We will divide the data into treatment and control where the treatment received the intervention. We can observe post and pre-variables for both groups.
Simple Treatment/Control Difference Estimator
This equation will calculate the treatment effect by comparing the changes in the outcome over time between the treatment and control groups.
I have created a fake example to help understand the math.
The DiD coefficient would be 9 using the formula mentioned above.
DiD Estimator: Calculation using a regression
DiD helps to control for time-invariant characteristics that might bias the estimation of treatment effects. This means that it removes the influence of variables that are constant over time (eg., geographical location, gender, ethnicity, innate ability, etc.). It can do so because these characteristics affect both pre-treatment and post-treatment periods equally for each group.
The core equation for a basic DiD model is:
where:
- yβ is the outcome variable for individual π in group j at time π‘.
- π΄ππ‘ππβ is a dummy variable equal to 1 if the observation is in the post-treatment period.
- πππππ‘ππππ‘ is a dummy variable equal to 1 if the observation belongs to the treatment group.
- π΄ππ‘ππ Γ πππππ‘ππππ‘β is the interaction term, with the coefficient Ξ² capturing the DiD estimate.
The coefficient for the interaction term is the DiD estimator in y. The regression is more popular among researchers because it helps to give standard errors and control for additional variables.
This is one of the key assumptions in DiD. It is based on the idea that, in the absence of treatment, the difference between the treatment and control groups would remain constant over time. In other words, in the absence of treatment, Ξ² (DiD estimate) = 0.
Formally, this means:
Another way to think about this is that the difference between the two groups would have remained the same over time without the policy change. If the trends are not parallel before the treatment, the DiD estimates may be biased.
How to check this assumption
Now the next question is: how to check for it? The validity of the parallel trend assumption can be assessed through graphical analysis and placebo tests.
The assumption is that, in the absence of treatment, the treatment group (orange line) and the control group (blue dashed line) would follow parallel paths over time. The intervention (vertical line) marks the point at which the treatment is applied, allowing the comparison of the differences in trends between the two groups before and after the intervention to estimate the treatment effect.
Examples which violate Parallel Trends Assumption
In simple words, we look for two things in the treatment which are the following:
- Change in the slope
In both of the above cases, the Parallel trend assumption is not satisfied. Treatment group outcome is either growing faster (part a) or slower (part b) than control group outcome. The mathematical way of saying this is:
DiD = true effect + differential trend (Differential trend should be 0)
Differential trend could be positive (part a) or negative ( part b)
DiD wonβt be able to isolate the impact of the intervention (true effect) since we have a differential trend in it as well.
2. Jump in the treatment line (either up or down) after the intervention
In the above image, the treatment groupβs trend changed differently from the control groupβs trend, which should have remained consistent without the intervention. A jump is not allowed in the study of DiD.
Placebo tests are used to verify whether observed treatment effects are truly due to the treatment and not due to other confounding factors. They involve applying the same analysis to a period or group where no treatment effect is expected. If a significant effect is found in these placebo tests, it suggests that the original results may be spurious.
For example, an intervention study of giving tablets to high schools was done in 2019. We can do a placebo test meaning that we can create a fake year of intervention say 2017 where we know no policy change occurred. If applying the treatment effect analysis to the placebo date (2017) shows no significant change, it will suggest that the observed effect in 2019 (if any) is likely due to the actual policy intervention.
- Event Study DiD: Estimates year-specific treatment effects, which is useful for assessing the timing of treatment effects and checking for pre-trends. The model allows the treatment effect to vary by year. We can study the effect at time t+1, t+2, β¦, t+n
- Synthetic Control Method (SCM): SCM constructs a synthetic control group by weighting multiple untreated units to create a composite that approximates the characteristics of the treated unit before the intervention. This method is particularly useful when a single treated unit is compared to a pool of untreated units. It provides a more credible counterfactual by combining information from several units.
There are many more, but I will limit it to only two. I might write a post later explaining in detail all the rest.
In this post, I have analyzed the Difference-in-Differences (DiD) estimator, a popular method for estimating average treatment effects. DiD is widely used to study policy effects by comparing changes over time between treatment and control groups. The key advantage of DiD is its ability to control for unobserved confounders that remain constant over time, thereby isolating the true impact of an intervention.
We also explored key concepts like the parallel trends assumption, the importance of pre-treatment data, and how to check for assumption violations using graphical analysis and placebo tests. Additionally, I discussed extensions and variations of DiD, such as the Event Study DiD and the Synthetic Control Method, which offer further insights and robustness in different scenarios.
[1] Wing, C., Simon, K., & Bello-Gomez, R. A. (2018). Designing difference in difference studies: best practices for public health policy research. Annual review of public health, 39, 453β469.
[2] Callaway, B., & SantβAnna, P. H. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200β230.
[3] Donald, S. G., & Lang, K. (2007). Inference with difference-in-differences and other panel data. The review of Economics and Statistics, 89(2), 221β233.
Thank you for reading!
Thank you for reading! π€ If you enjoyed this post and want to see more, consider following me. You can also follow me on LinkedIn. I plan to write blogs about causal inference and data analysis, always aiming to keep things simple.
A small disclaimer: I write to learn, so mistakes might happen despite my best efforts. If you spot any errors, please let me know. I also welcome suggestions for new topics!