## . reg api00 ell meals mobility cname [pweight = pw]
## (sum of wgt is 6.1940e+03)
##
## Linear regression Number of obs = 200
## F( 4, 195) = 104.68
## Prob > F = 0.0000
## R-squared = 0.6601
## Root MSE = 72.589
##
## ------------------------------------------------------------------------------
## | Robust
## api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
## -------------+----------------------------------------------------------------
## ell | -.513901 .4063997 -1.26 0.208 -1.315404 .287602
## meals | -3.148314 .2925792 -10.76 0.000 -3.72534 -2.571288
## mobility | .2346743 .4047053 0.58 0.563 -.5634871 1.032836
## |
## cname |
## Group2 | -9.708186 19.92028 -0.49 0.627 -48.99504 29.57867
## _cons | 830.4303 21.18687 39.20 0.000 788.6455 872.2152
## ------------------------------------------------------------------------------
##
## . estout . using mod1.txt, cells("b se t p") stats(N) replace
## (note: file mod1.txt not found)
## (output written to mod1.txt)
##
## . estimates store t1R vs. STATA
R vs. STATA
There apparently are differences between pweight and aweight in STATA, and weights in R (for instance in the glm function). To summarize:
- In STATA,
pweight(probability weights) is equivalent toaweight(analytic weights) with robust standard errors. aweightis equivalent toweightsinglm.pweightis equivalent toweightsinglmwith robust standard errors.
Here I provide a numerical example to show that indeed the use of robust standard errors leads to results identical to those obtained in STATA with pweight.
A numerical example
We will use the same data as here, so that we can match the results obtained in R with those obtained in STATA.
The dataset df contains a column, pw, which represents our weights. The aim is to show that by calculating robust standard errors in R, we obtain the same results as those in STATA when using pweight (or aweight and robust standard errors).
The results for STATA are shown below:
We will now fit a simple weighted linear model.
mod <- glm(
api00 ~ ell + meals + mobility + cname,
data = df,
weights = pw
)As you can see in Table 1, the standard errors differ.
| term | R1 | STATA1 |
|---|---|---|
| ell | 0.3721 | 0.4064 |
| meals | 0.2701 | 0.2926 |
| mobility | 0.4629 | 0.4047 |
| cnameGroup2 | 16.8738 | 19.9203 |
| 1 Standard errors | ||
We will now estimate the standard errors using the sandwich R package.
robust_se <- lmtest::coeftest(
mod,
vcov. = sandwich::vcovHC(mod, type = "HC1")
)| term | R1 | STATA1 |
|---|---|---|
| ell | 0.4064 | 0.4064 |
| meals | 0.2926 | 0.2926 |
| mobility | 0.4047 | 0.4047 |
| cnameGroup2 | 19.9203 | 19.9203 |
| 1 Standard errors | ||
As you can see in Table 2, now the standard errors match.