We consider a randomized experiment as per Quant 2, lecture 2. From a large population we draw a sample of units indexed by \(i=1,..,n\). For each unit we have a treatment, \(D_i=0,1\), potential outcomes \(Y_{1i}\) and \(Y_{0i}\), observed outcomes \(Y_i = D_i Y_{1i} + (1-D_i)Y_{0i}\), and then a single covariate \(X\). The experiment uses complete random assignment of \(n_1\) units to treatment (\(D_i=1\)) and the rest to control (\(D_i=0\)).
We consider three estimators: (i) the simple difference in means, equivalent to the OLS regression of \(Y_i\) on \(D_i\), (ii) OLS regression of \(Y_i\) on \(D_i\), \(X_i\) and a constant, and (iii) the interacted OLS regression of \(Y_i\) on \(D_i\), \(X_i\), \(D_iX_i\), and a constant. Estimator (i) is unbiased, as per the lecture notes. Lin (2013) provides expressions for the leading terms in the bias of (ii) and (iii) relative to the SATE. For (ii), we have, \[ ltb(SATE)_{(ii)} = - \frac{1}{n}\frac{1}{\sigma^2_X} \lim_{n \rightarrow \infty} \frac{1}{n}\sum_{i=1}^n[\rho_i - \rho](X_i - \bar{X})^2, \] where \(\sigma^2_X = (1/N)\sum_{i=1}^n(X_i - \bar{X})^2\). Then for (iii), we have \[ ltb(SATE)_{(iii)} = -\frac{1}{\sigma^2_X}\left[\left(\frac{1}{n_1} - \frac{1}{n} \right) \lim_{n \rightarrow \infty} \frac{1}{n}\sum_{i=1}^n e_{1i}(X_i - \bar{X})^2 - \left(\frac{1}{n-n_1} - \frac{1}{n} \right) \lim_{n \rightarrow \infty} \frac{1}{n}\sum_{i=1}^n e_{0i}(X_i - \bar{X})^2 \right] , \] where \(e_{di}\) is the prediction error for \(Y_{di}\) from regression (iii).
Examining these expressions, it is clear that the bias depends strongly on the size of the sample. In small samples, the bias is driven by correlation between the squared, centered covariate values and the potential outcomes/unit-level effects.
We simulate a population with data distribution such that the small sample bias from the adjusted regressions will be pretty bad in small samples—that is, where there is strong correlation between the centered, squared covariate and the unit-level effects:
N <- 1000
x <- runif(N, 0, 2)
y0 <- .5*x - .25*(x-mean(x))^2 +.25*rnorm(N)
y1 <- 2*y0 + .25 + (x-mean(x))^2 + .25*rnorm(N)
index <- 1:N
potcov <- data.frame(index, y1,y0,x)
Now plot these data:
We will see how, with such a population, the unadjusted (i) and then adjusted (ii, iii) regression estimators perform as the sample size \(n\) increases. Our metric will be bias measured with respect to the standard error:
# Sample sizes
n.vec <- c(20,40,80,160,320)
nsim <- 2000
# vectors to store results
sate.hat.noadj <- sate.hat.simpadj <- sate.hat.intadj <- sate <- matrix(NA, ncol=length(n.vec), nrow=nsim)
for(j in 1:length(n.vec)){
n <- n.vec[j]
for(i in 1:nsim){
# Simple random sample without replacement
potcov.s <- potcov[potcov$index %in% sample(potcov$index, n),]
sate[i,j] <- mean(potcov.s$y1 - potcov.s$y0)
# Complete random assignment of treatment
n1 <- floor(.33*n) # you can play around with treatment allocation.
# this is set to an imbalanced design.
potcov.s$D <- 0
potcov.s$D[potcov.s$index %in% sample(potcov.s$index, n1)] <- 1
potcov.s$Y <- with(potcov.s, D*y1 + (1-D)*y0)
# No adjustment (difference in means)
ols.1 <- lm(Y~D, data=potcov.s)
# Simple covariate adjustment
ols.2 <- lm(Y~D+x, data=potcov.s)
# Two equivalent ways of doing OLS with covariate interaction adjustment:
ols.3 <- lm(Y~D*I(x-mean(x)), data=potcov.s)
# This interacted regression uses the centered X variable.
# The resulting coefficient on D is the same what you would get
# if you ran the interacted regression without centering X, but then
# computed the difference in means using the predicted values from
# that latter regression. Note that in the latter regression (without
# centering X), the coefficient on D is not the estimate of the SATE.
sate.hat.noadj[i,j] <- coef(ols.1)["D"]
sate.hat.simpadj[i,j] <- coef(ols.2)["D"]
sate.hat.intadj[i,j] <- coef(ols.3)["D"]
}
}
se.sate.hat.noadj <- apply(sate.hat.noadj, 2, sd)
bias.sate.hat.noadj <- apply(sate.hat.noadj-sate, 2, mean)
std.bias.sate.hat.noadj <- bias.sate.hat.noadj/se.sate.hat.noadj
se.sate.hat.simpadj <- apply(sate.hat.simpadj, 2, sd)
bias.sate.hat.simpadj <- apply(sate.hat.simpadj-sate, 2, mean)
std.bias.sate.hat.simpadj <- bias.sate.hat.simpadj/se.sate.hat.simpadj
se.sate.hat.intadj <- apply(sate.hat.intadj, 2, sd)
bias.sate.hat.intadj <- apply(sate.hat.intadj-sate, 2, mean)
std.bias.sate.hat.intadj <- bias.sate.hat.intadj/se.sate.hat.intadj
res.tab <- cbind(c("SE","Bias","Bias/SE","SE","Bias","Bias/SE","SE","Bias","Bias/SE"),
round(1000*rbind( se.sate.hat.noadj,
bias.sate.hat.noadj,
std.bias.sate.hat.noadj,
se.sate.hat.simpadj,
bias.sate.hat.simpadj,
std.bias.sate.hat.simpadj,
se.sate.hat.intadj,
bias.sate.hat.intadj,
std.bias.sate.hat.intadj),0)/1000)
res.tab.out <- cbind(c("No adjustment","","",
"Simple adjustment","","",
"Interacted adjustment","",""), res.tab)
colnames(res.tab.out) <- c("Estimator","Statistic",paste("N=",n.vec,sep=""))
Estimator | Statistic | N=20 | N=40 | N=80 | N=160 | N=320 |
---|---|---|---|---|---|---|
No adjustment | SE | 0.326 | 0.225 | 0.161 | 0.115 | 0.079 |
Bias | -0.005 | -0.003 | -0.005 | -0.002 | 0.001 | |
Bias/SE | -0.015 | -0.015 | -0.03 | -0.017 | 0.011 | |
Simple adjustment | SE | 0.254 | 0.173 | 0.126 | 0.089 | 0.059 |
Bias | -0.017 | -0.005 | -0.005 | -0.003 | -0.001 | |
Bias/SE | -0.067 | -0.031 | -0.042 | -0.036 | -0.01 | |
Interacted adjustment | SE | 0.277 | 0.175 | 0.125 | 0.087 | 0.058 |
Bias | -0.034 | -0.009 | -0.006 | -0.004 | -0.001 | |
Bias/SE | -0.123 | -0.051 | -0.049 | -0.043 | -0.022 |
Barring an unlucky run in the simuation, some things you should observe:
So, this is an example of a bias-variance trade-off that is favorable for the biased estimators. That is why we use regression adjustment even though we have experimental data.
Note as well that the specification here is linear in X, but as is clear from the scatter plot above this fails to capture the non-linearities. That is no matter, though. Regression adjustment is motivated merely by efficiency considerations here, and so even though the partial relationship between X and the outcomes is in accurate, this regression still partials away some of the variation in the outcome, yielding the efficiency gains.