Oh my! Different standard errors everywhere - Causal Book: Design Patterns in Causal Inference

While replicating the IV model in R using the `fixest` library ([[Statistical modeling of IV]]) and in Python using the `PyFixest` library ([[Replication of IV in Python]]), or using the `Python/linearmodels` library (available in [[Replication using linearmodels]] for different data) , we observed a discrepancies in the way the standard errors are corrected for within-state correlation (clustered). Below we list all the ways you can obtain identical results for the IV model. We will also add a fourth, very popular and stable, package that has been around since 2006: `R/plm` (where plm stands for Panel Linear Models). The differences between the outputs of these packages and functions are larger than might otherwise be attributed to computational differences, such as the use of different matrix inversion algorithms. First of all, if the standard errors are assumed to be [[Independent and identically distributed random variables|i.i.d.]], the results are exactly the same in the three languages and packages. In `R/fixest`, we can add the `vcov = 'iid'` parameter to the model to apply this undesirable assumption: ```r fixest::etable(fixest::feols(pack_sales_per_capita ~ price_per_pack | as.factor(state) + as.factor(year), data = df), vcov = 'iid') ``` In `Python/PyFixest`, the syntax is almost identical: ```python pf.feols("pack_sales_per_capita ~ price_per_pack | state + year", data = df, vcov = 'iid').summary() ``` In `Pythonlinearmodels`, the `model.fit()` by default applies [[Independent and identically distributed random variables|i.i.d.]], but it can also be applied explicitly as follows: ```python model.fit(cov_type = 'unadjusted') ``` These three models will report identical standard errors also in a panel data model with fixed effects. The inconsistency in standard errors arises because we do not want to incorrectly assume [[Independent and identically distributed random variables|i.i.d.]] as discussed in [[Clustered standard errors]]. So, the question is, how we can obtain identical standard errors when we correctly apply clustering. In `R/fixest`: ```r fixest::etable(fixest::feols(pack_sales_per_capita ~ price_per_pack | as.factor(state) + as.factor(year), data = df), vcov = 'cluster') fixest::feols(pack_.. Dependent Var.: pack_sales_per_capita price_per_pack -6.625** (2.022) Fixed-Effects: --------------------- as.factor(state) Yes as.factor(year) Yes ________________ _____________________ S.E.: Clustered by: state) Observations 2,550 R2 0.89377 Within R2 0.06331 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` Ideally, with the assumptions discussed in [[Data and conceptual model]], this is our (correctly) corrected standard error, which takes into account the within-state correlations of the consumer price index. To have a referee for this comparison, we have also [[Naïve model in Stata|replicated the model in Stata]] using the `xtreg` function. Why do this? Because Stata's `vce(cluster clustvar)` has a well-documented behavior as discussed in [[Clustered standard errors]] and performs the intended correction we want to apply here. Therefore, we know that this is the correct standard error unless Stata screwed up its code for the function. But then we also know that Stata did not screw up its code, because the results are identical to the `R/fixest` results! However, `Python/PyFixest` will **not** report identical standard errors: ```r pf.feols("pack_sales_per_capita ~ price_per_pack | state + year", data= df).summary() Estimation: OLS Dep. var.: pack_sales_per_capita, Fixed effects: state+year Inference: CRV1 Observations: 2550 | Coefficient | Estimate | Std. Error | t value | Pr(>|t|) | 2.5% | 97.5% | |:---------------|-----------:|-------------:|----------:|-----------:|--------:|--------:| | price_per_pack | -6.625 | 2.002 | -3.310 | 0.002 | -10.646 | -2.605 | --- RMSE: 13.537 R2: 0.894 R2 Within: 0.063 ``` And this is due to a behind-the-scenes computational correction called small sample adjustment. Small sample adjustment originates from the inherent trade-off of using a sample for population-level inference. The larger the sample, the smaller the impact of the correction on the standard error, because the adjustment is a function of the sample size and the number of parameters estimated. While this part is fairly straightforward, the adjustment itself can be applied in many different ways. This is where `R/fixest` and `Python/PyFixest` differ. > [!note] > In `R/fixest`, the function to manage the small sample correction is `ssc()`. By default, `R/fixest` applies two steps: > 1. `adj = True` > 2. `cluster.adj = True` > > The first argument to the function, `adj`, is a binary decision whether to apply the small sample correction (in the form of `(n - 1) / (n - K)` with a sample size of `n` and the number of estimated parameters `K`). There is another argument `cluster.adj` which adds another term to the functional form of the correction, `G/(G-1)`, where `G` is the number of unique elements of the cluster variable. In other words, the first argument multiplies the variance-covariance matrix by "the `n` correction", and the second argument multiplicatively adds "the `G` correction": > $(n - 1) / (n - K) * G/(G-1)$ So, the default, not shown parameter in our `R/fixest` model above is `ssc = fixest::ssc(adj = FALSE, cluster.adj = FALSE)`. Now, to get identical results from `R/fixest` and `Python/PyFixest`, we need to disable the second step in the small sample correction: ```r fixest::etable(fixest::feols(pack_sales_per_capita ~ price_per_pack | as.factor(state) + as.factor(year), data = df), vcov = "cluster", ssc = fixest::ssc(adj = TRUE, cluster.adj = FALSE), coefstat = "se") fixest::feols(pack_.. Dependent Var.: pack_sales_per_capita price_per_pack -6.625** (2.002) Fixed-Effects: --------------------- as.factor(state) Yes as.factor(year) Yes ________________ _____________________ S.E.: Clustered by: state) Observations 2,550 R2 0.89377 Within R2 0.06331 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` Notice that the standard error was previously `2.022` and now is `2.002` and is the same as the `Pyton/PyFixest` output. This settles the consistency between the two packages. Now on to `R/plm` and `Python/linearmodels`, both of which existed long before `R/fixest` and `Python/PyFixest`. `R/plm` does not apply a small sample correction by default. So, to get identical results with `R/fixest`, both sample correction parameters must be disabled. Note that `cluster.adj` must be set to `False` in addition to `adj`, because setting `adj` to `False` removes `(n - 1) / (n - K)` from the variance-covariance matrix correction, but keeps `G/(G-1)` as shown in the note above. So, the results that are identical to those of `R/plm`: ```r fixest::etable(fixest::feols(pack_sales_per_capita ~ price_per_pack | as.factor(state) + as.factor(year), data = df), vcov = "cluster", ssc = fixest::ssc(adj = FALSE, cluster.adj = FALSE), coefstat = "se") fixest::feols(pack_.. Dependent Var.: pack_sales_per_capita price_per_pack -6.625** (1.982) Fixed-Effects: --------------------- as.factor(state) Yes as.factor(year) Yes ________________ _____________________ S.E.: Clustered by: state) Observations 2,550 R2 0.89377 Within R2 0.06331 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` While we get a standard error of `1.982` from `R/plm` defaults, we can also ask `R/plm` to apply the same small sample correction as `R/fixest` by using the argument-value pair `sandwich::vcovHC(type = 'sss')` in the `R/plm` model and get the original standard error of `2.022` from `R/plm`. Finally, let's turn to `Python/linearmodels`. As in `R/fixest`, removing the small sample correction requires multiple arguments. We need to change the values of the following three arguments in the `fit()` function (by default they are all `True`): 1. Set `debiased = False` to remove the small sample correction, `n / (n - K)`, from the estimation. 2. Set `auto_df = False` to disable the automatic adjustment of the degrees of freedom for fixed effects (`n - K - G`). 3. Set `count_effects = False` to disable explicit counting of fixed effects in the degrees of freedom adjustment. This argument is used only when `auto_df=False`, and also results in the `n - K - G` correction when set to True (default). The difference between using `auto_df` and `count_effects` is a slight difference in how `G` is counted. Now let's see the changes in a model fit of `Python/linearmodels`: ```python results = model.fit(cov_type = 'clustered', cluster_entity = True, debiased = False, auto_df = False, count_effects = False) ``` The results of this function will confirm the standard errors originally reported by `R/fixest`, which is `2.022`. Voilà! The mystery is solved. > [!info]- Last updated: January 18, 2025