Oh my! Bandwidth sensitivity in the RD model - Causal Book: Design Patterns in Causal Inference

The estimate from [[Statistical modeling of RD|Chapter 2.2.2]] is \$1,567 at the [[MSE-optimal bandwidth|MSE-optimal bandwidth]] of \$1,849. But what does the estimate look like at \$500? At \$5,000? The bandwidth determines which customers are "near the cutoff" and which are too far away to matter, and it is the single most consequential choice in [[rdrobust]]. The [[MSE-optimal bandwidth|MSE-optimal selector]] picks $h$ to minimize the mean squared error of the estimator, but it does not know the *right* answer: it just picks the optimal trade-off between bias and variance. We'll rerun the same `rdrobust` call with a grid of bandwidths plus the two data-driven selectors ([[MSE-optimal bandwidth|MSE-optimal]] and [[CER-optimal bandwidth|CER-optimal]]): ```r manual_h <- c(500, 1000, 2000, 3000, 4000, 5000) run_h <- function(h) { rd <- rdrobust(y = df$future_spending, x = df$past_spending, c = 10000, h = h) data.frame(h = h, coef_conv = rd$coef[1], coef_rob = rd$coef[3], se_rob = rd$se[3], ci_lo = rd$ci[3, 1], ci_hi = rd$ci[3, 2], n_eff = sum(rd$N_h)) } bw_table <- do.call(rbind, lapply(manual_h, run_h)) ``` We add the two data-driven selectors (`bwselect = 'mserd'` and `bwselect = 'cerrd'`) to the same table: ```r spec h coef_conv coef_rob se_rob ci_lo ci_hi n_eff h = $500 500.0 1629.93 1758.08 101.69 1558.76 1957.40 3689 h = $1000 1000.0 1555.51 1635.13 74.16 1489.78 1780.49 7432 CER-optimal 1076.6 1553.39 1558.45 51.53 1457.45 1659.46 8005 MSE-optimal 1849.3 1567.18 1582.25 43.68 1496.64 1667.85 13535 h = $2000 2000.0 1568.20 1547.81 53.42 1443.11 1652.50 14608 h = $3000 3000.0 1542.43 1583.07 44.19 1496.46 1669.68 21449 h = $4000 4000.0 1519.70 1573.69 38.61 1498.01 1649.36 27259 h = $5000 5000.0 1509.84 1550.62 34.82 1482.38 1618.86 32226 ``` Reading the table top to bottom reveals the bias-variance trade-off: the standard error shrinks from \$102 at $h=\$500$ to \$35 at $h=\$5,000$ as the effective sample size grows from 3,689 to 32,226. The endpoints show a downward drift: the robust column from \$1,758 to \$1,551, the conventional column from \$1,630 to \$1,510. Every robust CI covers the ground truth $\tau = \$1,500$. Visualizing the manual rows: ![[oh-my-bandwidth-sensitivity.png]] Two things to note. First, the [[MSE-optimal bandwidth|MSE-optimal]] $h = \$1,849$ and the [[CER-optimal bandwidth|CER-optimal]] $h = \$1,077$ disagree by almost a factor of two, and they should: the MSE selector minimizes MSE of the point estimate, while the CER selector minimizes the coverage error of the confidence interval. They optimize for different objectives. The CER-optimal bandwidth is narrower because tighter coverage requires lower bias, which requires fewer observations far from the cutoff. Second, the estimate at $h = \$500$ is not particularly unstable. Its CI [\$1,559, \$1,957] is wide, but the inference is internally consistent. What the small-$h$ row shows is that you can always shrink the bandwidth to make the estimate "more local," and the cost is paid in standard errors. The MSE-optimal row hits the balance where the bias added by including one more observation away from the cutoff equals the variance reduction it provides. > [!NOTE] > The conventional point estimate at $h = \$2,000$ (\$1,568.20) is essentially identical to the MSE-optimal point at $h = \$1,849$ (\$1,567.18). The robust-column points disagree by a wider margin (\$1,547.81 vs \$1,582.25), and the reason is the [[Oh my! The point estimate is not centered in the CI|bias-correction term in the robust calculation]]: the bias estimate is itself bandwidth-dependent, and the sign of the correction can flip across nearby bandwidths. The practical guidance follows the empirical pattern. Report the [[MSE-optimal bandwidth|MSE-optimal]] estimate as your primary result. But also show a sensitivity table like the one above in a footnote or appendix to demonstrate that the conclusion is not an artifact of the bandwidth choice. If the conclusion changes when you halve or double the bandwidth, the design is fragile and the local-randomization story does not actually hold; if the conclusion is stable, you have evidence that you are estimating a real effect at the cutoff. Calonico et al. (2017) give the canonical recommendation: always report the [[MSE-optimal bandwidth|MSE-optimal]] estimate, and supplement it with the [[CER-optimal bandwidth|CER-optimal]] estimate when the coverage of the confidence interval is the focus rather than the point estimate.[^1] [^1]: Calonico, S., Cattaneo, M. D., Farrell, M. H., & Titiunik, R. (2017). rdrobust: Software for regression-discontinuity designs. *The Stata Journal*, 17(2), 372-404. > [!info]- Last updated: May 13, 2026