#### How to implement the design pattern using statistics:
The standard approach to estimating the treatment effect in a regression discontinuity design is local polynomial regression at the cutoff. Unlike the parametric models used in the [[Statistical modeling of IV|IV design pattern]], the RD design pattern benefits from a nonparametric approach that avoids assumptions about the functional form of the relationship between the running variable and the outcome far from the cutoff. We will use the _rdrobust_ library, the workhorse package for nonparametric RD estimation with data-driven bandwidth selection.[^1]
###### Using R
Prior to the RD model, we fit a naïve model to seek the raw difference in future spending between customers who received Platinum Status and those who did not. Let's load and model the data generated in [[Data and conceptual model (RD)]]:[^2]
```r
df <- readr::read_csv('synthetic-loyalty-data-for-rd_v100.csv')
names(df) <- tolower(names(df)) # standardize column names
fixest::etable(fixest::feols(future_spending ~ status_platinum, data = df))
fixest::feols(fu..
Dependent Var.: future_spending
Constant 8,414.4*** (8.213)
status_platinumTRUE 2,101.0*** (10.12)
___________________ __________________
S.E. type IID
Observations 50,000
R2 0.46285
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
The naïve results show that Platinum customers spend approximately \$2,101 more than non-Platinum customers in the following year on average. We cannot trust this estimate because of the [[Selection bias|selection bias]] discussed in [[Design Pattern II - Regression Discontinuity (RD)]]: customers with higher past spending are systematically different from those with lower past spending (e.g., higher income, stronger brand loyalty), and these same characteristics independently drive future spending. The naïve estimate bundles the true effect of Platinum Status with the inherent spending differences between the two populations. Note that the intercept (\$8,414) represents the average future spending of non-Platinum customers.
Next, we turn to the RD design pattern to *locally* isolate the causal effect at the cutoff, as discussed in [[Local identification by a cutoff]]. Local polynomial regression estimates the relationship between past spending and future spending on each side of the cutoff separately, and the treatment effect is the discontinuous jump in the fitted values at the cutoff. The key decision is the **bandwidth**: how far from the cutoff should we look? Too wide, and we include customers who are fundamentally different; too narrow, and we lose statistical power. The `rdrobust` library solves this by selecting an [[MSE-optimal bandwidth]] automatically:
```r
library(rdrobust)
rd <- rdrobust(y = df$future_spending, x = df$past_spending, c = 10000)
summary(rd)
```
The results are as follows:
```r
Sharp RD estimates using local polynomial regression.
Number of Obs. 50000
BW type mserd
Kernel Triangular
VCE method NN
Number of Obs. 17081 32919
Eff. Number of Obs. 6697 6838
Order est. (p) 1 1
Order bias (q) 2 2
BW est. (h) 1849.278 1849.278
BW bias (b) 3385.648 3385.648
rho (h/b) 0.546 0.546
Unique Obs. 16816 32487
=====================================================================
Point Robust Inference
Estimate z P>|z| [ 95% C.I. ]
---------------------------------------------------------------------
RD Effect 1567.178 36.225 0.000 [1496.639 , 1667.855]
=====================================================================
```
The estimated treatment effect at the cutoff is approximately \$1,567: Platinum Status increases future spending by this amount for customers near the \$10,000 threshold. This is the [[Local Average Treatment Effect]] (LATE) at the cutoff, and it is close to the ground truth $\tau = \$1,500$ defined in the [[Synthetic data for the RDD|data generation process]]. Compared to the naïve estimate of \$2,101, the RD estimate is substantially lower because it removes the bias from comparing inherently different populations.
> [!NOTE]
> The output reports the point estimate (the conventional local polynomial estimate) and robust inference (the z-statistic, p-value, and confidence interval that account for the additional uncertainty introduced by bias correction). The robust confidence interval is the recommended one for inference.[^3]
>
> The [[MSE-optimal bandwidth]] selected here is approximately \$1,849 on each side of the cutoff. This means the estimation uses customers who spent between roughly \$8,151 and \$11,849 in the past year. The effective number of observations within this window is about 13,535 (out of 50,000). You can override the bandwidth with `rdrobust(..., h = 1500)` or change the bandwidth selection method with `rdrobust(..., bwselect = 'cerrd')` for a coverage error rate (CER) optimal bandwidth, which is typically narrower. For details on how the bandwidth affects the estimates, see [[Oh my! Bandwidth sensitivity in the RD model]].
The following plot provides a visual summary of the RD design. The binned scatter plot shows the average future spending within evenly spaced bins of past spending, along with fitted local polynomials on each side of the cutoff:
```r
p <- rdplot(y = df$future_spending, x = df$past_spending, c = 10000, hide = TRUE)
bins <- p$vars_bins
bins$side <- ifelse(bins$rdplot_mean_x < 10000, "below", "above")
poly <- p$vars_poly
poly$side <- ifelse(poly$rdplot_x < 10000, "below", "above")
ggplot2::ggplot() +
geom_point(data = bins, aes(x = rdplot_mean_x, y = rdplot_mean_y, color = side),
size = 2, alpha = 0.7) +
geom_line(data = poly, aes(x = rdplot_x, y = rdplot_y, color = side),
linewidth = 1) +
geom_vline(xintercept = 10000, linetype = "dashed", color = "grey50") +
scale_color_manual(values = c("below" = "#E74C3C", "above" = "#2980B9")) +
labs(x = "Past Spending ($)", y = "Future Spending ($)") +
guides(color = "none") +
theme_minimal(base_size = 14) +
theme(panel.grid.minor = element_blank(), legend.position = "top", text = element_text(size = 20))
```
![[statistical-modeling-of-RD-2026-04-12-20-24-47.png]]
The visible jump in the fitted values at \$10,000 is the treatment effect. The dots on each side follow a smooth upward trend, confirming that the relationship between past and future spending is continuous except at the cutoff, where the Platinum Status is assigned.
**Diagnostic tests**
To validate the results, we will run two diagnostics tests. First, the McCrary density test checks whether there is an unusual concentration of observations right at the cutoff, which would suggest that customers manipulated their spending to qualify for Platinum Status (discussed in [[Local identification by a cutoff]]):
```r
library(rddensity)
dens <- rddensity(X = df$past_spending, c = 10000)
summary(dens)
Manipulation testing using local polynomial density estimation.
Number of obs = 50000
Model = unrestricted
Kernel = triangular
BW method = estimated
VCE method = jackknife
c = 10000 Left of c Right of c
Number of obs 17081 32919
Eff. Number of obs 5462 6344
Order est. (p) 2 2
Order bias (q) 3 3
BW est. (h) 1492.155 1726.459
Method T P > |T|
Robust -0.9576 0.3382
```
The p-value of the manipulation test (0.34) is well above conventional significance levels, so we fail to reject the null hypothesis stating that the density of the running variable is continuous at the cutoff. The results pass the first test.
Second, we check whether observable covariates exhibit a discontinuity at the cutoff. If the RD design is valid, pre-treatment characteristics should be balanced across the cutoff, meaning there should be no jump in age, bookings, or travel patterns at \$10,000:
```r
covariates <- c('age_years', 'total_bookings', 'avg_booking_value', 'pct_international_trips')
balance <- do.call(rbind, lapply(covariates, function(var) {
rd_cov <- rdrobust(y = df[[var]], x = df$past_spending, c = 10000)
data.frame(Covariate = var,
Estimate = round(rd_cov$coef[1], 3),
P_value = round(rd_cov$pv[1], 3))
}))
balance
Covariate Estimate P_value
1 age_years 0.387 0.326
2 total_bookings 0.028 0.582
3 avg_booking_value -42.269 0.223
4 pct_international_trips -0.007 0.255
```
None of the covariates show a statistically significant discontinuity at the cutoff. This is consistent with the validity of the RD design: customers just below and just above \$10,000 are comparable in their observable characteristics, and the only systematic difference is the Platinum Status assignment.
> [!NOTE]
> We estimated the Sharp RD above. For the Fuzzy RD case, where the cutoff determines only *eligibility* rather than guaranteed treatment (discussed in [[Local identification by a cutoff]]), `rdrobust` can be extended with the `fuzzy` argument: `rdrobust(..., fuzzy = df$actual_treatment)`. The fuzzy RD estimate is numerically equivalent to a local Wald estimator, which is the ratio of the reduced-form jump to the first-stage jump at the cutoff, the same identification strategy as in the [[Design Pattern I - Instrumental Variable (IV)|IV design pattern]].
Now, click [[Replication of RD in Python|here]] to implement the same model in Python!
[^1]: Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). Robust nonparametric confidence intervals for regression-discontinuity designs. Econometrica, 82(6), 2295-2326. And Calonico, S., Cattaneo, M. D., Farrell, M. H., & Titiunik, R. (2017). rdrobust: Software for regression-discontinuity designs. The Stata Journal, 17(2), 372-404.
[^2]: All data files used in this book can be found in the book's [GitHub repository](https://github.com/causalbook).
[^3]: For a concise guide to the bias-correction and robust inference procedure, see Cattaneo, M. D., Idrobo, N., & Titiunik, R. (2020). A practical introduction to regression discontinuity designs: Foundations. Cambridge University Press.
> [!info]- Last updated: April 12, 2026