R vs. STATA

There apparently are differences between pweight and aweight in STATA, and weights in R (for instance in the glm function). To summarize:

In STATA, pweight (probability weights) is equivalent to aweight (analytic weights) with robust standard errors.
aweight is equivalent to weights in glm.
pweight is equivalent to weights in glm with robust standard errors.

Here I provide a numerical example to show that indeed the use of robust standard errors leads to results identical to those obtained in STATA with pweight.

A numerical example

We will use the same data as here, so that we can match the results obtained in R with those obtained in STATA.

The dataset df contains a column, pw, which represents our weights. The aim is to show that by calculating robust standard errors in R, we obtain the same results as those in STATA when using pweight (or aweight and robust standard errors).

The results for STATA are shown below:

## . reg api00 ell meals mobility cname [pweight = pw]
## (sum of wgt is   6.1940e+03)
## 
## Linear regression                                      Number of obs =     200
##                                                        F(  4,   195) =  104.68
##                                                        Prob > F      =  0.0000
##                                                        R-squared     =  0.6601
##                                                        Root MSE      =  72.589
## 
## ------------------------------------------------------------------------------
##              |               Robust
##        api00 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
## -------------+----------------------------------------------------------------
##          ell |   -.513901   .4063997    -1.26   0.208    -1.315404     .287602
##        meals |  -3.148314   .2925792   -10.76   0.000     -3.72534   -2.571288
##     mobility |   .2346743   .4047053     0.58   0.563    -.5634871    1.032836
##              |
##        cname |
##      Group2  |  -9.708186   19.92028    -0.49   0.627    -48.99504    29.57867
##        _cons |   830.4303   21.18687    39.20   0.000     788.6455    872.2152
## ------------------------------------------------------------------------------
## 
## . estout . using mod1.txt, cells("b se t p") stats(N) replace
## (note: file mod1.txt not found)
## (output written to mod1.txt)
## 
## . estimates store t1

We will now fit a simple weighted linear model.

mod <- glm(
  api00 ~ ell + meals + mobility + cname,
  data = df,
  weights = pw
)

As you can see in Table 1, the standard errors differ.

Table 1: Comparison of STATA’s pweight with R’s standard glm

term	R¹	STATA¹
ell	0.3721	0.4064
meals	0.2701	0.2926
mobility	0.4629	0.4047
cnameGroup2	16.8738	19.9203
¹ Standard errors

We will now estimate the standard errors using the sandwich R package.

robust_se <- lmtest::coeftest(
  mod,
  vcov. = sandwich::vcovHC(mod, type = "HC1")
)

Table 2: Comparison of STATA’s pweight with R’s robust glm

term	R¹	STATA¹
ell	0.4064	0.4064
meals	0.2926	0.2926
mobility	0.4047	0.4047
cnameGroup2	19.9203	19.9203
¹ Standard errors

As you can see in Table 2, now the standard errors match.