Clustered standard errors - Causal Book: Design Patterns in Causal Inference

Panel data means repeated observations in clusters. Clustering the standard errors allows for within-group correlation, relaxing the usual requirement that observations be independent. The observations are still expected to be independent across states (clusters), but not necessarily within states. Obviously, the consumer price index will be correlated in the same location. Note that clustering affects the variance-covariance matrix of the estimators (hence the standard errors), but not the coefficients. The correction of standard errors for within-cluster correlation is a type of robust standard error. The formula for the robust estimator of variance is: $\hat{V} = \hat{V}\left(\sum_{j=1}^N \mathbf{u}_j'\mathbf{u}_j\right)\hat{V}$ where $\hat{V} = (-\partial^2\ln L/\partial\beta^2)^{-1}$ (the conventional estimator of variance) and $\mathbf{u}_j$ (a row vector) is the contribution from the $j$th observation to $\partial\ln L/\partial\beta$. In the example above, observations are assumed to be independent. Assume for a moment that the observations denoted by $j$ are not independent but that they can be divided into $M$ groups $G_1$, $G_2, \ldots, G_M$ that are independent. The robust estimator of variance is: $\hat{V} = \hat{V}\left(\sum_{k=1}^M \mathbf{u}_k^{(G)'}\mathbf{u}_k^{(G)}\right)\hat{V}$ where $\mathbf{u}_k^{(G)}$ is the contribution of the $k$th group to $\partial\ln L/\partial\beta$. That is, application of the robust variance formula merely involves using a different decomposition of $\partial\ln L/\partial\beta$, namely, $\mathbf{u}_k^{(G)}$, $k = 1,\ldots,M$, rather than $\mathbf{u}_j$, $j = 1,\ldots,N$. Moreover, if the log-likelihood function is additive in the observations denoted by $j$, $\ln L = \sum_{j=1}^N \ln L_j$ then $\mathbf{u}_j = \partial\ln L_j/\partial\beta$, so $\mathbf{u}_k^{(G)} = \sum_{j\in G_k} \mathbf{u}_j$ This is what *R/fixest* and *Stata/xtreg/vce(cluster clustvar)* do by default in panel data models, and is a generalization of Huber's (1967) sandwich estimation by Froot (1989) and Rogers (1993).[^1] [^1]: "The name “sandwich” refers to the mathematical form of the estimate, namely, that it is calculated as the product of three matrices: the matrix formed by taking the outer product of the observation-level likelihood/pseudolikelihood score vectors is used as the middle of these matrices (the meat of the sandwich), and this matrix is in turn pre- and postmultiplied by the usual model-based variance matrix (the bread of the sandwich)." > [!info]- Last updated: September 4, 2024