Replication of RD in Python - Causal Book: Design Patterns in Causal Inference

#### Replication of the naïve model In what follows, we replicate the naïve model in [[Statistical modeling of RD]] using Python's *[[PyFixest]]* library. ```python import pandas as pd import pyfixest as pf df = pd.read_csv('synthetic-loyalty-data-for-rd_v100.csv') df.columns = df.columns.str.lower() # standardize column names pf.feols("future_spending ~ status_platinum", data = df).summary() ``` The results are as follows: ```python Estimation: OLS Dep. var.: future_spending, Fixed effects: 0 Inference: iid Observations: 50000 | Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% | |:----------------|-----------:|-------------:|----------:|-----------:|---------:|---------:| | Intercept | 8414.359 | 8.213 | 1024.471 | 0.000 | 8398.260 | 8430.457 | | status_platinum | 2101.010 | 10.122 | 207.561 | 0.000 | 2081.170 | 2120.850 | --- RMSE: 1073.419 R2: 0.463 ``` The results are identical to the R model in [[Statistical modeling of RD]]: Platinum customers spend approximately \$2,101 more than non-Platinum customers in the following year on average. The standard errors also match because both use IID standard errors. ##### Replication of the RD model We now replicate the (Sharp) RD estimation using the Python port of `rdrobust`:[^1] ```python from rdrobust import rdrobust rd = rdrobust(y = df['future_spending'], x = df['past_spending'], c = 10000) print(rd) ``` The results are as follows: ```python Call: rdrobust Number of Observations: 50000 Polynomial Order Est. (p): 1 Polynomial Order Bias (q): 2 Kernel: Triangular Bandwidth Selection: mserd Var-Cov Estimator: NN Left Right ------------------------------------------------ Number of Observations 17081 32919 Number of Unique Obs. 16816 32487 Number of Effective Obs. 6697 6838 Bandwidth Estimation 1849.278 1849.278 Bandwidth Bias 3385.648 3385.648 rho (h/b) 0.546 0.546 Method Coef. S.E. t-stat P>|t| 95% CI ------------------------------------------------------------------------- Conventional 1567.178 38.508 40.697 0.000e+00 [1491.703, 1642.653] Robust - - 36.225 2.452e-287 [1496.639, 1667.855] ``` The point estimate (\$1,567), bandwidth (\$1,849), and robust confidence interval all match the R results. This is expected: both the R and Python versions of `rdrobust` are maintained by the same authors and use the same underlying algorithms. Note that the Python implementation presents the output in a slightly different layout than the R version: the bandwidth selection details come first, followed by a two-row table showing the Conventional and Robust estimates side by side. ##### Replication of the diagnostic tests **McCrary density test** ```python from rddensity import rddensity dens = rddensity(X = df['past_spending'], c = 10000) print(dens.test) ``` The results are as follows: ```python t_asy NaN t_jk -0.957637 p_asy NaN p_jk 0.338246 dtype: float64 ``` The jackknife-based test statistic (`t_jk = -0.96`) and p-value (`p_jk = 0.34`) match the R result exactly. Unlike the R version, the default `print(dens)` in Python does not show the test result, so we access it through the `.test` attribute. The `_asy` entries are `NaN` here because the asymptotic variance is not computed by default (the default `vce='jackknife'`); they would populate if `vce='plugin'` were specified. **Covariate balance** ```python covariates = ['age_years', 'total_bookings', 'avg_booking_value', 'pct_international_trips'] for var in covariates: rd_cov = rdrobust(y = df[var], x = df['past_spending'], c = 10000) print(f"{var:30s} Estimate: {rd_cov.coef.iloc[0, 0]:8.3f} P-value: {rd_cov.pv.iloc[0, 0]:.3f}") ``` The results are as follows: ```python age_years Estimate: 0.387 P-value: 0.326 total_bookings Estimate: 0.028 P-value: 0.582 avg_booking_value Estimate: -42.269 P-value: 0.223 pct_international_trips Estimate: -0.007 P-value: 0.255 ``` All covariates are balanced at the cutoff, consistent with the R results. ##### RD plot The Python `rdrobust` package also includes `rdplot` for visualization: ```python from rdrobust import rdplot rdplot(y = df['future_spending'], x = df['past_spending'], c = 10000, x_label = "Past Spending ($)", y_label = "Future Spending ($)", title = "") ``` The resulting plot should be visually identical to the R version shown in [[Statistical modeling of RD]] except for the aesthetics. > [!NOTE] > The Python port of `rdrobust` uses the same bandwidth selection, kernel functions, and bias-correction procedures as the R version. Unlike the [[Replication of IV in Python|IV replication]], where differences in small sample corrections between `fixest` and `PyFixest` led to slight discrepancies in standard errors, the `rdrobust` outputs should be numerically identical across the two languages because the package does not involve the same clustering or degrees-of-freedom adjustments. [^1]: The Python packages `rdrobust` and `rddensity` can be installed via `pip install rdrobust rddensity`. > [!info]- Last updated: April 17, 2026