Replication using linearmodels - Causal Book: Design Patterns in Causal Inference

##### Replication of the naïve model The following is the replication of the naïve model in [[Statistical modeling of IV]] using Python's *linearmodels* library: ```python import pandas as pd from linearmodels import PanelOLS # Load the data df = pd.read_csv('cpi-policy-lobbyist-as-iv_v202.csv') # Convert date to datetime df['date'] = pd.to_datetime(df['date']) # Set state and date as index df = df.set_index(['state', 'date']) # Create the panel regression model model = PanelOLS.from_formula('consumer_price_index ~ esworker_health_bill + EntityEffects + TimeEffects', data=df) # Fit the model with clustered standard errors by state results = model.fit(cov_type='clustered', cluster_entity=True) # Display the results summary print(results) ``` ```python PanelOLS Estimation Summary ================================================================================== Dep. Variable: consumer_price_index R-squared: 0.1447 Estimator: PanelOLS R-squared (Between): 0.0028 No. Observations: 302 R-squared (Within): 0.0652 Cov. Estimator: Clustered R-squared (Overall): 0.0035 Log-likelihood -831.55 F-statistic: 35.198 Entities: 8 P-value 0.0000 Avg Obs: 37.750 Distribution: F(1,208) Min Obs: 17.000 Max Obs: 86.000 F-statistic (robust): 8.2824 P-value 0.0044 Time periods: 86 Distribution: F(1,208) Avg Obs: 3.5116 Min Obs: 1.0000 Max Obs: 6.0000 Parameter Estimates ======================================================================================== Parameter Std. Err. T-stat P-value Lower CI Upper CI ---------------------------------------------------------------------------------------- esworker_health_bill 10.683 3.7121 2.8779 0.0044 3.3649 18.001 ======================================================================================== F-test for Poolability: 91.695 P-value: 0.0000 Distribution: F(92,208) Included effects: Entity, Time ``` The results are identical to those of *fixest*, with a slight downward change in the standard error (3.71 vs. 3.90). ##### Replication of the IV model The following is the replication of the IV model in [[Statistical modeling of IV]] using Python's *linearmodels* library: ```python import pandas as pd import numpy as np from linearmodels import IV2SLS # Load the data df = pd.read_csv('cpi-policy-lobbyist-as-iv_v202.csv') # Convert date to datetime df['date'] = pd.to_datetime(df['date']) # Create dummy variables for state and time fixed effects state_dummies = pd.get_dummies(df['state'], prefix='state', drop_first=True) time_dummies = pd.get_dummies(df['date'], prefix='time', drop_first=True) # Combine the original dataframe with the dummy variables df_with_dummies = pd.concat([df, state_dummies, time_dummies], axis=1) # Set up the dependent variable, endogenous variable, and instrument y = df_with_dummies['consumer_price_index'] X = df_with_dummies['esworker_health_bill'] Z = df_with_dummies['n_of_regis_lobbyists'] # Create a list of exogenous variables (including the constant and fixed effects) exog = pd.concat([pd.Series(1, index=df_with_dummies.index, name='const'), state_dummies, time_dummies], axis=1) # Create the 2SLS model model = IV2SLS(y, exog, X, Z) # Fit the model with clustered standard errors by state results = model.fit(cov_type='clustered', clusters=df_with_dummies['state']) # Display the results summary print(results) ``` ```python IV-2SLS Estimation Summary ================================================================================ Dep. Variable: consumer_price_index R-squared: 0.9065 Estimator: IV-2SLS Adj. R-squared: 0.8647 No. Observations: 302 F-statistic: 2.623e+15 Cov. Estimator: Clustered P-value (F-stat) 0.0000 Distribution: chi2(93) Parameter Estimates ============================================================================================ Parameter Std. Err. T-stat P-value Lower CI Upper CI -------------------------------------------------------------------------------------------- esworker_health_bill 60.780 19.583 3.1037 0.0019 22.398 99.162 ============================================================================================ Endogenous: esworker_health_bill Instruments: n_of_regis_lobbyists Clustered Covariance (One-Way) Debiased: False Num Clusters: 8 ``` Note that these results continue to be identical in effect size but slightly (and strangely) downward adjusted standard errors persist here in the *Python/linearmodels* implementation (compared to the *R/fixest* implementation). Note that while the difference is less noticeable in the naïve model above (3.71 vs. 3.90), it is quite large in the IV model (19.58 vs. 24.77). To understand the discrepancies in standard errors, see [[Oh my! Different standard errors everywhere]]. > [!info]- Last updated: January 18, 2025