Design Pattern II - Regression Discontinuity (RD) - Causal Book: Design Patterns in Causal Inference

## Causal Design Pattern II: Regression Discontinuity #### Problem: Suppose we want to measure the causal effect of a **"Platinum Status"** (treatment) on **customers' future spending** (outcome). The status assignment is determined by a customer's preceding spending. Let's say the assignment rule is: **Platinum Status** is awarded only to customers who spent at least $5,000 in the past year. The challenge is that customers who spent more are inherently different from those who spent less (for example, they may have a lower price elasticity of demand, higher brand loyalty, or higher average purchase frequency). A simple comparison between the two groups would suffer from severe [[Selection Bias|selection bias]]. The regression discontinuity (RD) design is suited to address situations where: 1. Selection bias is expected due to the non-random assignment of the treatment. The treatment group (awarded Platinum Status) and control group (not awarded) represent different populations. 2. This selection bias arises from unobserved confounding variables (e.g., brand loyalty, income) that are expected to affect both the assignment variable (preceding spending) and the outcome (future spending). #### Solution: Use the (in this case, sharp) cutoff ($5,000) to create a comparison between individuals who fall just below and just above the threshold. This approach relies on the principle of *local randomization*, which assumes that individuals *right* around the cutoff are essentially comparable. For example, the difference between a customer who spent $4,999 and one who spent $5,000 is considered quasi-random, thus isolating the causal effect of **Platinum Status** from the effect of inherent spending differences. The causal effect of the intervention is estimated by comparing the average outcome (future spending) immediately to the left and immediately to the right of the cutoff. This discontinuity in the outcome variable is attributed to the treatment. See [[Data and conceptual model (RD)|Data and conceptual model]] for a conceptual explanation. #### Requirements: For the RD design to yield an unbiased estimate of the local treatment effect at the cutoff, the following requirements must be met: 1. The treatment assignment must be a perfectly deterministic function of the running variable (Sharp RD), or the probability of receiving the treatment must change discontinuously at the cutoff (Fuzzy RD). 2. In the absence of treatment, the outcome variable and all confounding variables must be continuous at the cutoff. This implies that there is no other intervention or event that also changes abruptly at the same threshold. 3. Customers should not be able to precisely manipulate their assignment variable value (preceding spending) to sort themselves into or out of the treatment group (**gaining Platinum Status**). #### Challenges: For various reasons, the design pattern may not work as intended. A key challenge is the *manipulation* of the running variable. If customers are aware of the $5,000 threshold and strategically adjust their spending in the preceding year to qualify for **Platinum Status**, the *local randomization* assumption is violated. This creates clustering of observations right at the cutoff, making the estimates biased. This is discussed in detail in [[Testing RD for manipulation]]. Another challenge is *external validity*. The causal effect estimated is strictly local; it is the effect *at the $5,000 cutoff*. No guarantees exist that the effect of **Platinum Status** for a customer who spent $10,000 will be the same as the effect for the customer who spent $5,000. The causal inference is only valid for those near the threshold. #### Mathematical background explained using DAGs: If you are quantitatively savvy, see [[Local identification by a cutoff]] for a summary of the design pattern along with the challenges described above. > [!info]- Last updated: November 4, 2025