Local identification by a cutoff - Causal Book: Design Patterns in Causal Inference

Regression Discontinuity (RD) design is a [[Quasi-experiment|quasi-experimental]] approach to estimate the causal effect of a treatment ($D$) on an outcome ($Y$) when treatment assignment is determined by a sharp, known cutoff on a continuous variable, the **[[Running variable|running variable]]** ($X$). By comparing individuals who are just slightly above the cutoff ($X \geq c$) to those just slightly below it ($X < c$), the RD design pattern creates a local environment that is "as good as random," allowing for the identification of the [[Local Average Treatment Effect]] (LATE) for individuals infinitesimally close to the cutoff. **Let's understand the problem first:** Let's use a common business case. Let's say we want to identify the causal effect of a "Platinum Status." Awarding customers with status layers is a common practice to establish customer loyalty across the board. For example, [Expedia](https://www.expedia.com/welcome-one-key) awards its customers one of the four status layers: Blue, Silver, Gold, and Platinum. We would like to estimate the causal effect of **Platinum Status ($D$)** on **Total Spend in the Next Year ($Y$)**. Let's assume that the treatment assignment is based on a sharp cutoff ($c = \$5,000$) on the **Running Variable ($X$)**, which is total spend in the last year. **So, we have the following variables:** * **$X$ (Running Variable):** Total Spend in the last year. * **$D$ (Treatment):** Platinum Status ($1$ if $X \geq \$5,000$). * **$Y$ (Outcome):** Total Spend in the Next Year. * **$U$ (Unmeasured Confounders):** Customer characteristics (e.g., brand loyalty, personal income). **What seems to be the problem here?** A naïve comparison of future spending ($Y$) of Platinum members ($D=1$) to non-Platinum members ($D=0$) is severely biased. Customers who spent more in the past ($X$) are naturally predisposed to spend more in the future ($Y$), and unmeasured factors ($U$) such as loyalty or income drive both. Let's look at the problem graphically. The following DAG ([[Directed acyclic graph]]) illustrates the initial [[Confounding|confounding]] problem in a naïve comparison. We will then move on to how the RD design pattern *locally* solves the problem by appealing to the continuity assumption, which we will also explain next. ![[Local identification by the cutoff - 2025-11-03-15-14-05.png]] Looks messy, right? Higher income individuals are likely to spend more before, achieving the Platinum status, and they are also likely to spend more after. This confounding leads to two open backdoor paths in **($D \rightarrow Y$):** 1. $D \leftarrow X \rightarrow Y$ 2. $D \leftarrow U \rightarrow Y$ **Here's the solution:** RD design pattern assumes that a customer who spent **\$4,999.99** is virtually identical in terms of unmeasured characteristics ($U$) and intrinsic spending behavior (the continuous part of $X$) to a customer who spent **\$5,000**. The only difference is the treatment they received ($D$), which in this case is the Platinum status. In using the RD design pattern, we argue that the confounding paths are locally controlled by comparing the future spending ($Y$) for these two groups. This way, we define the discontinuous jump in average future spending at the $\$5,000$ cutoff as the treatment effect (LATE) of the Platinum Status ($D$) for customers near the cutoff. Let's see the solution in DAG form: ![[Local identification by the cutoff - 2025-11-03-15-54-12.png]] Nice and simple! By comparing only individuals near the cutoff, we locally control for the continuous part of $X$ and for $U$, which we assume locally balanced around the cutoff. **Why might this not work?** The RDD design pattern depends entirely on the **continuity assumption** holding true at the cutoff. If either the running variable's density or the unobserved confounders are discontinuous at the cutoff, the local identification is violated. 1\. **Discontinuous sorting (Manipulation of $X$)** **Potential threat to validity:** Individuals are able to precisely manipulate the running variable $X$ to place themselves on the treated side of the cutoff. * **In our example:** Customers intentionally spend an extra dollar or cent to hit the $\$5,000$ threshold and gain the Platinum status. * **Implication in DAG form:** The assumption that $U$ is balanced for $X \approx c$ is violated. Individuals just above the cutoff are systematically different (more motivated/strategic) than those just below it. Basically, the initial [[Confounding|confounding]] problem remains unsolved. * **How to test it?** Plot the density of the running variable $X$. A significant, discontinuous jump in the number of individuals right at the cutoff indicates manipulation. We will discuss this further when we demonstrate the implementation of the design pattern. 2\. **Discontinuities in other (potentially unmeasured) variables** **Potential threat to validity:** Another factor causes a discontinuous jump in the outcome $Y$ *exactly at the same cutoff* $c$, independently of the treatment $D$. * **In our example:** A separate, unannounced "Premium Club" is also assigned at exactly $\$5,000$ and provides benefits that directly increase future spending ($Y$). * **Implication in DAG form:** A new confounding path is opened, specifically an unobserved common cause $V$ that **causes a discontinuity in $Y$ at** $c$. The measured jump in $Y$ is then a mix of $D \rightarrow Y$ and $V \rightarrow Y$. * **How to test it?** Run the RDD on a set of **pre-treatment/placebo outcomes**. These should *not* be affected by future treatment $D$. If a jump is found in a pre-treatment or placebo outcome at the cutoff, the RDD assumption is likely violated. We will also discuss this further in our empirical demonstration. **Before we wrap-up, let's also discuss an extension of the RD design pattern:** What if the cutoff does not guarantee the treatment, but only makes customers eligible to opt-in for the Platinum status? This scenario introduces the *fuzzy* RD design because compliance comes into picture as a source of bias. * **In our example:** The $\$5,000$ cutoff makes customers *eligible* for the Platinum Status, but they must explicitly **opt-in**. * **Implication in DAG form:** $U$ is locally balanced except for motivation. "In this case, the **cutoff determines eligibility ($Z$)**, and the **eligibility ($Z$) influences the actual treatment ($D$)** (Platinum Status), which in turn affects the outcome ($Y$). * **Solution:** You may be surprised how straightforward the solution for this problem is: [[Design Pattern I - Instrumental Variable (IV)]]. If the **\$5,000 spend** only makes a customer *eligible* for the Platinum Status ($Z$), but they still have to manually *opt-in* (Actual Treatment $D$), the cutoff only creates a discontinuous jump in the *probability* of treatment. Let's see the problem in a DAG: ![[Local identification by the cutoff - 2025-11-04-13-17-49.png]] $U$ is locally balanced except for motivation because some qualified customers opt-in, while others don't (depending on how motivated they are). This imbalance is the same [[Confounding|confounding]] problem we discussed in [[Confounding by intention]]. The solution is also the same: the **Eligibility ($Z$)** based on $X \geq \$5,000$ acts as an **Instrumental Variable (IV)** for the actual **Platinum Status ($D$)**, and the design is now a *fuzzy* RDD. Using the **Eligibility ($Z$)** as an [[Design Pattern I - Instrumental Variable (IV)|instrumental variable]] (IV), we isolate the effect of the actual Platinum Status on the Total Spend next year: ![[Local identification by a cutoff - 2025-11-04-16-21-51.png]] How so? Because **Eligibility ($Z$)** meets all three qualifications of an IV for this problem (see [[Data centricity in the IV pattern]] for a discussion of the IV assumptions): * **Relevance ($Z \rightarrow D$ and $\text{Cor}(Z, D) \neq 0$):** The cutoff enables the eligibility for the Platinum Status. In other words, the probability of treatment changes discontinuously at the cutoff but is not deterministically assigned. * **Exclusion ($Z \not\rightarrow Y$ and $\text{Cor}(Z, Y | D, X) = 0$):** The cutoff affects the outcome ($Y$) *only* through its effect on the actual treatment ($D$): **Platinum Status**. In other words, the mere *eligibility* for the Platinum Status (crossing the $\$5,000$ line) should not affect next year's spending ($Y$) other than by causing the customer to actually *become* a Platinum Status member ($D$). That is, if customers who cross the cutoff suddenly feel special and spend more than usual, regardless of whether they opt-in, the exclusion is violated (doesn't sound plausible!). * **Exogeneity ($U \not\rightarrow Z$ and $\text{Cor}(Z, U) = 0$):** Unobserved confounders ($U$) between the Platinum Status ($D$) and the next year's spending ($Y$) are unrelated to the assignment of eligibility ($Z$) for the Platinum Status. This is related to the core RD assumption that individuals infinitesimally close to the cutoff are exchangeable with respect to their unobserved characteristics. Cool solution to the compliance problem, isn't it? This is a good time to stop our conceptual discussion and get our hands on data in [[Data and conceptual model (RD)|Data and conceptual model]]! > [!info]- Last updated: November 4, 2025