Oh my! Matching can make DID worse - Causal Book: Design Patterns in Causal Inference

In [[Design Pattern III - Difference-in-Differences (DID)|the Difference-in-Differences (DID) pattern]], a temptation is to make the treated and control groups look the same before treatment. Suppose an online travel agency rolls out an instant-book feature to 2,000 hotels at the start of 2025, while a larger pool of comparable hotels remains untreated. In the pre-treatment year, 2024, the treated hotels have higher baseline booking revenue than the untreated hotels. A natural instinct is to match treated hotels to untreated hotels with similar 2024 revenue, then run the DID on the matched sample. The result looks cleaner: the groups are balanced before treatment, the plot looks more comparable, and the estimate looks more plausible. That instinct can be wrong. Here's why. DID clearly does not require the treated and control groups to have the same pre-treatment *levels*. It requires their untreated changes or *trends* to be comparable. Daw and Hatfield (2018) focus on this very point: a variable is a DID confounder only if it is related to treatment assignment and to the change in the outcome over time, not merely to the level of the outcome at baseline.[^1] Matching on pre-period levels can therefore target the wrong problem. Worse, if the matching variable is transitory and time-varying, the matching step can introduce a new problem through [[Regression to the mean|regression to the mean (RTM)]]. The mechanism behind this is simple. Suppose treated hotels start with higher booking revenue than untreated hotels. If we match on observed 2024 revenue, we will tend to select treated hotels that are unusually low for the treated group and untreated hotels that are unusually high for the control group. Those hotels may look comparable in 2024, but part of that comparability can come from temporary shocks: a bad year for the treated hotel and a good year for the untreated hotel. In 2025, even if the same hotels remain in the sample and instant-book has no effect, those temporary shocks may not persist. Therefore, the treated hotels tend to move back up toward the treated-group mean, while the untreated hotels tend to move back down toward the control-group mean. The matched sample now has a mechanically induced difference in the trends. The DID estimator would read that mechanically induced trend difference as a treatment effect. When treatment assignment is correlated only with the pre-period outcome *level*, the unmatched DID is unbiased anyway: level differences alone do not violate parallel trends. But matching on the pre-period *outcome level* induces bias. The bias grows when the baseline level difference between groups is larger and when the outcome variable has lower serial correlation. Low serial correlation means baseline outliers are less persistent, so selected units regress more strongly toward their group means. **The paradox is that the larger the baseline level gap, the more tempting matching becomes, and the worse the regression-to-the-mean risk can be if matched on that level.** Matching on *covariate levels* is not a no-brainer either. If a baseline covariate differs between groups and is correlated with outcome levels, matching on that covariate can create the same regression-to-the-mean problem. The risk is highest when the covariate is unstable over time and strongly related to the outcome. The risk is lower for fixed or highly persistent covariates such as region, chain affiliation, or stable property type, because there is little longitudinal noise to regress away. > [!NOTE] > In a DID model with *unit fixed effects*, these time-invariant differences are absorbed by the fixed effects in the outcome model, so matching on them may improve common support, but it is less likely to change the identifying comparison than matching on pre-period outcomes or other time-varying measures. To be clear, this is not an argument against matching before DID; it is against matching on the wrong variables under the wrong conditions. Daw and Hatfield (2018) offer a decision tree to explain the tradeoffs: ```mermaid %%{init: {'markdownAutoWrap': false, 'themeVariables': {'fontSize': '13px'}, 'flowchart': {'nodeSpacing': 29, 'rankSpacing': 34, 'padding': 10}}}%% flowchart TD root(["Pre-intervention difference<br/>between treatment and control?"]) root -->|No| ok1["OK to match†"] root -->|Yes, in outcome trend| violate["Violates DID assumptions"] root -->|Yes, in outcome level| rtm_level["RTM risk: do not match"] root -->|Yes, in covariate level| corr{"Covariate correlated<br/>with outcome?"} ok1 ~~~ corr violate ~~~ corr rtm_level ~~~ corr violate ~~~ rtm_level corr -->|No| instrument["Instrument: do not match‡"] corr -->|Yes| tv{"Time-varying<br/>covariate?"} tv -->|Yes| rtm_cov["RTM risk: do not match"] tv -->|No| ok2["OK to match‡"] instrument ~~~ footnotes["<div style='white-space:nowrap; text-align:left;'>† May increase precision (Stuart, 2010).    ‡ May increase bias (Bhattacharya & Vogt, 2007) and reduce precision (Brookhart et al., 2006).</div>"] rtm_cov ~~~ footnotes ok2 ~~~ footnotes style ok1 fill:#d5f5e3,stroke:#27ae60 style ok2 fill:#d5f5e3,stroke:#27ae60 style root fill:#ffffff,stroke:#808080 style corr fill:#ffffff,stroke:#808080 style tv fill:#ffffff,stroke:#808080 style violate fill:#ffffff,stroke:#c23b22 style rtm_level fill:#ffffff,stroke:#c23b22 style rtm_cov fill:#ffffff,stroke:#c23b22 style instrument fill:#ffffff,stroke:#c23b22 style footnotes fill:transparent,stroke:transparent,color:#555555,font-size:11px,text-align:left ``` *Adapted from Figure 4 in Daw & Hatfield (2018)* If treatment assignment is correlated with the pre-period trend, the parallel trends assumption fails and unmatched DID is biased, as we would expect. Matching on that trend does not fully repair the design, but it may reduce bias when pre-period trends are stable. When the pre-period trend reflects a transitory swing rather than a lasting direction of change, though, matching mainly produces a nicely aligned pre-period plot while the groups would still diverge afterward. That is the reminder in [[Oh my! Pre-trends are not parallel]]: a clean pre-period is only limited evidence. What if parallel trends is already in doubt? In a recent working paper, Ge and Ham (2026) focus on the bias-variance implications.[^2] While Daw and Hatfield (2018) focus on what goes wrong when matching is unnecessary but executed, Ge and Ham (2026) ask whether matching is worth it with respect to the bias–variance tradeoff. The bottom line is that matching on covariates can reduce bias but may increase variance by dropping control units and limiting the sample. But there's a silver lining. In their structural linear model, once we match on covariates, adding pre-treatment outcomes reduces the variance relative to covariate-only matching.[^3] The two papers complement each other by asking about different matching targets: Daw and Hatfield (2018) distinguish outcome levels, covariate levels, and pre-period trends; Ge and Ham (2026) focus on the bias-variance implications of covariate matching when parallel trends is in doubt. | Situation | Daw & Hatfield (2018) | Ge & Ham (2026) | | ------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- | | Baseline levels differ, but parallel trends is credible | Prefer **unmatched DID**; do not match away baseline outcome or covariate **levels** just for balance. | Same: **unmatched DID** keeps ATT identification and the full sample, avoiding the bias-variance tradeoff. | | A stable covariate predicts outcome changes | Matching can be reasonable if a covariate is fixed or persistent; avoid unstable time-varying covariates. | Matching on covariates may reduce bias but may increase variance by discarding controls; use MSE to compare. | | Pre-period outcome trends differ | Trend matching may reduce bias when trends are stable, but it does not repair the design and can fail when trends are transitory. | Conditional on covariate matching, adding pre-treatment outcomes may reduce variance relative to covariate-only matching. | The intuitive rule is to ask what imbalance means for the counterfactual trend: 1. **Don't match away baseline levels just because they exist.** DID does not require baseline outcome levels to be the same. Only consider matching on a baseline variable if it predicts how the outcome will change over time (the trend), rather than just predicting where it starts. 2. **Prefer stable variables; be cautious with time-varying ones.** If a variable is fixed or highly persistent over time, it is relatively safe from RTM bias. However, if a variable is time-varying and noisy (which often includes the outcome), avoid matching on it. Doing so forces you to select units based on transitory shocks, mechanically inducing RTM bias in the post-period.[^4] 3. **Do not match on instrumental variables.** If a baseline difference between groups is entirely unrelated to your outcome, do not match on it. Matching on these instrumental variables can amplify bias and reduce precision. 4. **Do not fall into the illusion of nice parallel pre-period lines.** If pre-period trends differ, matching on trends may improve comparability, but it does not *validate* the [[Parallel trends assumption|parallel trends assumption]]. Report the unmatched and matched estimates, and show the pre-period trajectories before and after matching. 5. Finally, as Ge and Ham (2026) warn, if matching discards many control units, treat the design choice as a mean squared error problem: the reduction in bias has to be worth the increase in variance. In the hypothetical travel-agency rollout, imagine treated hotels have higher 2024 revenue because they are larger properties in business-travel districts. That level difference is not a DID violation by itself if both treated and untreated hotels would have grown by the same amount absent the instant-book option. Matching hotels on their 2024 revenue could select temporarily low-revenue treated hotels and temporarily high-revenue untreated hotels. When both sets of hotels return toward their own group means in 2025, the matched DID could overstate the effect of the instant-book option. The takeaway is that, in matching for parallel trends, the question is whether the matched control group’s *trend* is a credible counterfactual for what the treated groups would have done absent treatment. Pre-period *levels* are not an identification requirement and matching on the wrong variables does more harm than good. [^1]: Daw, J. R., & Hatfield, L. A. (2018). Matching and Regression to the Mean in Difference‐in‐Differences Analysis. *Health Services Research*, 53(6), 4138–4156. https://doi.org/10.1111/1475-6773.12993 [^2]: Ge, M., & Ham, D. W. (2026). Bias-Variance Tradeoff of Matching Prior to Difference-in-Differences. *arXiv:2510.20191*. https://arxiv.org/abs/2510.20191 [^3]: This variance reduction is conditional on already matching on covariates. The added step of matching on pre-treatment outcomes helps control for unobserved latent variables without discarding additional units. [^4]: If matching on a continuous time-varying variable with moderate serial correlation is unavoidable, Daw and Hatfield (2018) recommend using a stabilizing transformation such as binning the variable into broader categories (e.g., low, medium, and high) before matching. > [!info]- Last updated: June 11, 2026