Design-Based Inference for Spatial Effects with the SpatialEffect package

Illustrations

Toy raster data simulation

Simulated raster data

We illustrate using a toy example with 4 intervention nodes in a spatial field where outcomes are recorded in a 4-by-4 raster.

set.seed(123567)
Yscale <- 4     # Number of raster grid cells per side.  
                # Be sure to make it a multiple of YZratio
Y0_sd <- 1      # Noisiness of null outcomes
YZratio <- 2    # Size of strata within with Z is assigned, relative to gridcells
Zjitter <- 1    # Amount of randomness away from center of assignment stratum for
                      # placing the intervention nodes (Z).

# Create a null Y0 raster   
    ras0 <- raster( nrows=Yscale, 
        ncols=Yscale, 
        xmn=0, xmx=Yscale, 
        ymn=0, ymx=Yscale, 
        vals=rnorm(Yscale*Yscale, mean=0, sd=Y0_sd))

# Dataset with Y0 values and then centroids of the raster cells.
Y0 <- getValues(ras0)
Yxy <- coordinates(ras0)
Yindex <- 1:nrow(Yxy)
Ydata <- data.frame(Yindex,cbind(Y0,Yxy))

# Intervention nodes
# we create them using the aggregate function:

Zxy <- coordinates(aggregate(ras0, fact=YZratio, fun=mean))
Zxy[,1] <- jitter(Zxy[,1], amount=Zjitter)
Zxy[,2] <- jitter(Zxy[,2], amount=Zjitter)
index <- 1:nrow(Zxy)
Zdata <- as.data.frame(cbind(index, Zxy))
Zdata$Zstart <- 0

# View the null raster:
par(mfrow=c(1,1), mar=c(2,2,3,3))
plot(ras0, main="Null raster")  
points(Zxy, pch=19, cex=2.25, col=gray(0))  
text(Zxy, labels=Zdata$index, col="gray")

You will note that the three data inputs are (1) the raster file, ras0, (2) the raster outcome data, which may be stored in the raster file itself or in a separate dataset, here Ydata, that indexes back to the raster file, and then (3) the intervention node data, Zdata:

print(ras0)

## class       : RasterLayer 
## dimensions  : 4, 4, 16  (nrow, ncol, ncell)
## resolution  : 1, 1  (x, y)
## extent      : 0, 4, 0, 4  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## data source : in memory
## names       : layer 
## values      : -1.566185, 1.12945  (min, max)

Ydata

##    Yindex          Y0   x   y
## 1       1  0.26307772 0.5 3.5
## 2       2 -0.59826394 1.5 3.5
## 3       3 -0.69950201 2.5 3.5
## 4       4 -0.13265323 3.5 3.5
## 5       5  0.53971262 0.5 2.5
## 6       6 -0.94654426 1.5 2.5
## 7       7  1.12945022 2.5 2.5
## 8       8 -1.56618485 3.5 2.5
## 9       9  0.32988531 0.5 1.5
## 10     10 -0.94057807 1.5 1.5
## 11     11 -0.09254688 2.5 1.5
## 12     12  0.37808100 3.5 1.5
## 13     13 -0.45701250 0.5 0.5
## 14     14  0.30168029 1.5 0.5
## 15     15 -1.07571926 2.5 0.5
## 16     16 -0.43678372 3.5 0.5

Zdata

##   index         x         y Zstart
## 1     1 0.9444997 2.0109679      0
## 2     2 3.7144239 3.8361388      0
## 3     3 1.9624461 0.9393304      0
## 4     4 3.3109567 0.4648655      0

In the example here, the raster coordinate \(x\) actually appears as the data columns x and y. The coordinates for the Ydata are the raster centroids. In fact, each raster cell contains the whole set of points contained with the cell.

Simulated spatial effects

For the purposes of illustration, we create a hypothetical effect function that allows us to see how the estimation and inference would work. Suppose that when a treatment is activated an intervention node, it generates effects that have the following properties:

for raster cells very close to a treated node, outcomes are pulled up,
for raster cells at intermediate distance from a treated node, outcomes are pulled down,
for raster cells far from a treated node, outcomes are unchanged by the treatment at that node.

We construct this hypothetical effect function by mixing two gamma-distribution kernels. This yields a non-monotonic effect function that takes the following arguments:

d is the distance of a given raster cell centroid from the intervention node,
sh controls the shape of one gamma kernels, and sc controls the scale of the gamma kernels, and
a controls the amount of mixing of the two gammas.

Here is the function along with an illustration of how it affects outcomes over distance:

effectFun <- function(d, sh, sc, a){(dgamma(abs(d), shape=sh, scale=2*sc) - a*dgamma(abs(d), shape=5*sh, scale=sc))}

dist_vec_up <- seq(from=.1, to=2, by = .1) # Distances over which we will evaluate effects

par(mfrow=c(1,1))
plot(dist_vec_up,
     effectFun(dist_vec_up, 1, .15, 1.5),
     type="l",
     ylab="Effect on Yx",
     xlab="distance from intervention node")

To apply this effect function in our simulation, we need the distances of each raster cell centroid from each intervention node:

for(i in 1:nrow(Zdata)){
Ydata[,paste("dZ",i, sep="")] <- as.matrix(dist(rbind(Zdata[i,c("x","y")], Ydata[,c("x","y")])))[-1,1]
}
print(Ydata)

##    Yindex          Y0   x   y       dZ1       dZ2       dZ3       dZ4
## 1       1  0.26307772 0.5 3.5 1.5539615 3.2319515 2.9488604 4.1368489
## 2       2 -0.59826394 1.5 3.5 1.5892756 2.2397907 2.6020925 3.5343465
## 3       3 -0.69950201 2.5 3.5 2.1533225 1.2600852 2.6164849 3.1416066
## 4       4 -0.13265323 3.5 3.5 2.9576677 0.3987065 2.9868212 3.0410160
## 5       5  0.53971262 0.5 2.5 0.6608573 3.4810613 2.1387937 3.4703385
## 6       6 -0.94654426 1.5 2.5 0.7400898 2.5862985 1.6277426 2.7242130
## 7       7  1.12945022 2.5 2.5 1.6305623 1.8055725 1.6506525 2.1907585
## 8       8 -1.56618485 3.5 2.5 2.6018713 1.3532348 2.1908358 2.0438957
## 9       9  0.32988531 0.5 1.5 0.6772505 3.9736715 1.5662372 2.9954934
## 10     10 -0.94057807 1.5 1.5 0.7547641 3.2188845 0.7267784 2.0859212
## 11     11 -0.09254688 2.5 1.5 1.6372750 2.6329394 0.7767333 1.3149730
## 12     12  0.37808100 3.5 1.5 2.6060833 2.3459586 1.6365886 1.0522551
## 13     13 -0.45701250 0.5 0.5 1.5749934 4.6327468 1.5270101 2.8111762
## 14     14  0.30168029 1.5 0.5 1.6098462 4.0041847 0.6378618 1.8112975
## 15     15 -1.07571926 2.5 0.5 2.1685491 3.5503024 0.6942444 0.8117174
## 16     16 -0.43678372 3.5 0.5 2.9687718 3.3430225 1.5990882 0.1922806

The true MIR

To get the true MIR, we need to marginalize over all of the ways that treatment could be applied. We suppose complete random assignment of half of nodes to treatment, in which case there are \({4 \choose 2}=6\) ways treatment could be applied. We can use the genperms command from the ri package to produce a permutation matrix:

pZ <- .5        # Share of intervention nodes that should have Z=1.
Zdata$Zstart[1:floor(pZ*length(Zdata$Zstart))] <- 1
curZ <- genperms(Zdata$Zstart)
print(curZ)

##   [,1] [,2] [,3] [,4] [,5] [,6]
## 1    1    1    1    0    0    0
## 2    1    0    0    1    1    0
## 3    0    1    0    1    0    1
## 4    0    0    1    0    1    1

The genperms function is embedded in the SpatialEffect function to implement permutation tests.

We can then simulate the potential outcomes that would correspond to each treatment assignment and view the result:

for(k in 1:ncol(curZ)){
    Ydata[,paste("Ypo",k,sep="")] <- Ydata$Y0 + 3*apply(Ydata[,grep("dZ", names(Ydata))], 
        1,
        function(x){sum(effectFun(x, 1, .25, 1)*curZ[,k])})
}

par(mfrow=c(2,3), mar=c(2,2,3,3))
for(k in 1:6){
    rasUp <- setValues(ras0, Ydata[,paste("Ypo",k,sep="")])
    plot(rasUp, main=paste("Z", k, sep=""))  
    points(Zxy, pch=19, cex=3, col=gray(curZ[,k]))  
    text(Zxy, labels=Zdata$index, col="gray")
}

We can now construct the causal quantities. First up are the \(\tau_{ix}(\alpha)\) values (the node-and-point-specific effects). For each grid cell, there will be a separate \(\tau_ix(\alpha)\) value corresponding to the effect of switching treatment status of each intervention node, averaging over possible assignments at other nodes. This corresponds to the difference in mean potential outcomes when node \(i\) is in treatment versus in control, where these means are taken with respect to the set of assignments in the assignment matrix (\(\mathcal{Z}\) in the formal analysis, curZ in the code).

So, to compute the \(\tau_{ix}(\alpha)\) values, we go through each row in curZ and take differences in means for outcome values corresponding to columns in curZ with 1 minus columns with 0.

for(i in 1:nrow(curZ)){
Ydata[,paste("tau",i,sep="")] <- apply(Ydata[,grep("Ypo", names(Ydata))][,curZ[i,]==1], 
                1, mean)-apply(Ydata[,grep("Ypo", names(Ydata))][,curZ[i,]==0], 
                    1, mean)
}
print(Ydata)

##    Yindex          Y0   x   y       dZ1       dZ2       dZ3       dZ4
## 1       1  0.26307772 0.5 3.5 1.5539615 3.2319515 2.9488604 4.1368489
## 2       2 -0.59826394 1.5 3.5 1.5892756 2.2397907 2.6020925 3.5343465
## 3       3 -0.69950201 2.5 3.5 2.1533225 1.2600852 2.6164849 3.1416066
## 4       4 -0.13265323 3.5 3.5 2.9576677 0.3987065 2.9868212 3.0410160
## 5       5  0.53971262 0.5 2.5 0.6608573 3.4810613 2.1387937 3.4703385
## 6       6 -0.94654426 1.5 2.5 0.7400898 2.5862985 1.6277426 2.7242130
## 7       7  1.12945022 2.5 2.5 1.6305623 1.8055725 1.6506525 2.1907585
## 8       8 -1.56618485 3.5 2.5 2.6018713 1.3532348 2.1908358 2.0438957
## 9       9  0.32988531 0.5 1.5 0.6772505 3.9736715 1.5662372 2.9954934
## 10     10 -0.94057807 1.5 1.5 0.7547641 3.2188845 0.7267784 2.0859212
## 11     11 -0.09254688 2.5 1.5 1.6372750 2.6329394 0.7767333 1.3149730
## 12     12  0.37808100 3.5 1.5 2.6060833 2.3459586 1.6365886 1.0522551
## 13     13 -0.45701250 0.5 0.5 1.5749934 4.6327468 1.5270101 2.8111762
## 14     14  0.30168029 1.5 0.5 1.6098462 4.0041847 0.6378618 1.8112975
## 15     15 -1.07571926 2.5 0.5 2.1685491 3.5503024 0.6942444 0.8117174
## 16     16 -0.43678372 3.5 0.5 2.9687718 3.3430225 1.5990882 0.1922806
##           Ypo1       Ypo2        Ypo3        Ypo4        Ypo5        Ypo6
## 1  -0.98428974 -1.0162048 -0.96062059  0.18200540  0.23758961  0.20567455
## 2  -2.11076833 -1.9088113 -1.77404693 -1.08848618 -0.95372181 -0.75176478
## 3  -2.72434015 -1.2574879 -1.15087305 -2.44413483 -2.33752003 -0.87066773
## 4   1.85875701 -0.2384635 -0.23113482  1.86314836  1.87047706 -0.22674347
## 5   0.39234524 -0.0289492  0.39194541  0.09606504  0.51695965  0.09566521
## 6  -1.72025584 -2.6750068 -1.67488846 -2.20146583 -1.20134752 -2.15609845
## 7  -0.80202813 -1.0391737 -0.35700042 -0.76999226 -0.08781895 -0.32496455
## 8  -3.22355088 -2.0964542 -2.23842877 -3.46539122 -3.60736577 -2.48026911
## 9   0.08296758 -1.1184032  0.03537035 -0.87522286  0.27855072 -0.92282010
## 10 -1.66913192 -2.1922082 -2.12749341 -1.51480828 -1.45009351 -1.97316977
## 11 -1.31526120 -1.9974414 -2.73820763 -1.04093427 -1.78170048 -2.46388071
## 12 -0.03551086 -0.8552870 -1.36520892 -0.98355526 -1.49347721 -2.31325332
## 13 -1.64623243 -2.9116808 -1.72910739 -1.72238134 -0.53980791 -1.80525630
## 14 -0.83343844 -0.8085217 -1.65467083  0.32330598 -0.52284318 -0.49792642
## 15 -1.49002627 -1.8346971 -2.45920858 -1.43815111 -2.06266262 -2.40733342
## 16 -0.50759177 -1.6408752  3.51324097 -1.60490500  3.54921115  2.41592774
##           tau1        tau2       tau3        tau4
## 1  -1.19546157  0.40215204  0.3595986  0.43371091
## 2  -0.99988459  0.09388223  0.3631583  0.54284409
## 3   0.17320718 -1.40898879  0.5468143  0.68896734
## 4  -0.70590776  2.09624141 -0.7000526 -0.69028103
## 5   0.01555052  0.18223617 -0.3794897  0.18170306
## 6  -0.17041308  0.46097483 -0.8120264  0.52146467
## 7  -0.33847550  0.02043312 -0.2957610  0.61380340
## 8   0.66486408 -1.16038525  0.3424103  0.15311088
## 9   0.17314231  0.49738281 -1.1044449  0.43391983
## 10 -0.35025398  0.55294588 -0.1444891 -0.05820276
## 11 -0.25479827  1.02054461  0.1109710 -0.87671731
## 12  0.84475969  0.67373529 -0.4192995 -1.09919546
## 13 -0.73985837  0.84587428 -0.8413902  0.73537434
## 14 -0.86638911  0.64271443  0.6759368 -0.45226210
## 15  0.04140508  0.57013302  0.1105720 -0.72211006
## 16 -0.99848662 -0.95052639 -2.4615709  4.41058394

The MIR is then based on the distance-specific averages around each node. Since outcomes are recorded in a raster format, we use the cellFromXY() function from the raster package:

# A component of the circle average function:
unitCircle <- function(numpts_in){
                ptE <- sin((1:numpts_in/numpts_in)*2*pi)
                ptN <- cos(((1:numpts_in)/numpts_in)*2*pi) 
                circPoints <- cbind(ptE, ptN)
                return(circPoints)
}

# A circle average function, which
# computes average at distance d (hence
# "dMeans"):

dMeans <- function( Zdata_in = NULL, 
                    ras_in = NULL,
                    Ydata_in = NULL,
                    Yvar="Y",
                    Outstub=Yvar,
                    gridRes_in=res(ras_in)[1],
                    dist_vec=(gridRes_in:10*gridRes_in),
                    evalpts_in=10,
                    only.unique=FALSE){
    for(distUp in dist_vec){
        numpts <- ceiling(ceiling(distUp/(gridRes_in/sqrt(2))+1)*pi)*evalpts_in
        circPoints <- unitCircle(numpts)
        for(i in 1:nrow(Zdata_in)){
        ZxTemp <- Zdata_in[i,"x"] + circPoints[,1]*distUp
        ZyTemp <- Zdata_in[i,"y"] + circPoints[,2]*distUp
        if(only.unique==TRUE){
        Zdata_in[i,paste(Outstub, "bar", distUp, sep="")] <- mean(Ydata_in[unique(na.omit(cellFromXY(ras_in, 
                                                        cbind(ZxTemp, ZyTemp)))),Yvar])
        }
        if(only.unique==FALSE){
        Zdata_in[i,paste(Outstub, "bar", distUp, sep="")] <- mean(Ydata_in[na.omit(cellFromXY(ras_in, 
                                                        cbind(ZxTemp, ZyTemp))),Yvar])
        }                                                       
        }
    }
    return(Zdata_in)
}

#Illustrating cicrle average function 
# using the indices for the Yx values:

dMeans( Zdata_in=Zdata, 
        ras_in=ras0, 
        Ydata_in=Ydata, 
        Yvar="Yindex", 
        gridRes_in=res(ras0)[1],
        dist_vec=dist_vec_up,
        evalpts_in=10,
        only.unique=FALSE)

##   index         x         y Zstart Yindexbar0.1 Yindexbar0.2 Yindexbar0.3
## 1     1 0.9444997 2.0109679      1      7.20000     7.285714     7.428571
## 2     2 3.7144239 3.8361388      1      4.00000     4.000000     4.000000
## 3     3 1.9624461 0.9393304      0     13.17143    12.771429    12.685714
## 4     4 3.3109567 0.4648655      0     16.00000    16.000000    16.000000
##   Yindexbar0.4 Yindexbar0.5 Yindexbar0.6 Yindexbar0.7 Yindexbar0.8
## 1     7.457143     7.457143     7.457143     7.485714     7.430000
## 2     4.000000     4.000000     4.000000     4.000000     3.676471
## 3    12.685714    12.600000    12.600000    12.600000    12.610000
## 4    15.771429    15.672131    14.818182    14.297872    13.384615
##   Yindexbar0.9 Yindexbar1 Yindexbar1.1 Yindexbar1.2 Yindexbar1.3
## 1     7.450000   7.303371     7.578313     7.784810     7.773333
## 2     5.060606   5.281250     5.500000     5.516129     5.612903
## 3    12.610000  12.191011    11.626506    11.202532    10.840000
## 4    13.021277  12.822222    12.785714    12.707317    12.538462
##   Yindexbar1.4 Yindexbar1.5 Yindexbar1.6 Yindexbar1.7 Yindexbar1.8
## 1     7.958904     8.021277     8.086957     8.133333     8.181818
## 2     5.800000     5.894737     5.973684     5.947368     5.842105
## 3    10.534247    10.354839    10.175824     9.988764     9.793103
## 4    12.256410    12.163265    11.062500    10.404255    10.021739
##   Yindexbar1.9 Yindexbar2
## 1     8.162791   8.654321
## 2     6.789474   6.918919
## 3     9.701149   9.389610
## 4     9.866667   9.755556

# Plot for checking whether the circle averages 
# above are correct
par(mfrow=c(1,1))
plot(ras0)
points(Zdata[,c("x","y")], cex=3)
text(Zdata[,c("x","y")], labels=Zdata$index)
text(Ydata[,"x"], Ydata[,"y"], Ydata$Yindex)

We can thus get the \(d\)-specific \(\tau_{ix}(\alpha)\) values for each of the intervention nodes, and then aggregate to get the vector of \(d\)-specific MIR values:

# This is just to create the dummy cells in the Z data:
Zdata_out <- dMeans(    Zdata_in=Zdata, 
                        ras_in=ras0, 
                        Ydata_in=Ydata, 
                        Yvar="tau1", 
                        Outstub="tau",
                        gridRes_in=res(ras0)[1],
                        dist_vec=dist_vec_up,
                        evalpts_in=10,
                        only.unique=FALSE)

# Now go intervention node by node:
for(i in 1:nrow(Zdata_out)){
Zdata_out[i,] <- dMeans(    Zdata_in=Zdata_out[i,], 
                        ras_in=ras0, 
                        Ydata_in=Ydata, 
                        Yvar=paste("tau",i,sep=""), 
                        Outstub="tau",
                        gridRes_in=res(ras0)[1],
                        dist_vec=dist_vec_up,
                        evalpts_in=10,
                        only.unique=FALSE)  
}

# Now we can get the true MIRs:
MIRvec <- apply(Zdata_out[,grep("taubar", names(Zdata_out))], 2, mean)
plot(dist_vec_up,
     MIRvec, 
    type="l",
    ylab="true MIR",
    xlab="distance (d)")

The true MIR values reflect the non-monotonicity of the effect function, although note that it is not a perfect match, because of the marginalization that is involved in defining the MIR values.

Getting the simulated data set up

Let’s get our simulated data set up so that it resembles what we would have in an actual application.

We have our Zdata for the intervention nodes, and we can consider the Zstart variable to be our treatment variable:

print(Zdata)

##   index         x         y Zstart
## 1     1 0.9444997 2.0109679      1
## 2     2 3.7144239 3.8361388      1
## 3     3 1.9624461 0.9393304      0
## 4     4 3.3109567 0.4648655      0

The value of the Zstart vector corresponds to the first column in the matrix of treatment assignments, curZ. We can set up the raster data so that we set the raster cell values to the associated potential outcomes:

ras_sim <- setValues(ras0, Ydata[,paste("Ypo",1,sep="")])
print(ras_sim)

## class       : RasterLayer 
## dimensions  : 4, 4, 16  (nrow, ncol, ncell)
## resolution  : 1, 1  (x, y)
## extent      : 0, 4, 0, 4  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## data source : in memory
## names       : layer 
## values      : -3.223551, 1.858757  (min, max)

We can also create an outcome dataset that corresponds to this raster object:

Ysim <- getValues(ras_sim)
Yxy_sim <- coordinates(ras_sim)
Yindex <- 1:nrow(Yxy)
Ydata_sim <- data.frame(Yindex,cbind(Ysim,Yxy_sim))
print(Ydata_sim)

##    Yindex        Ysim   x   y
## 1       1 -0.98428974 0.5 3.5
## 2       2 -2.11076833 1.5 3.5
## 3       3 -2.72434015 2.5 3.5
## 4       4  1.85875701 3.5 3.5
## 5       5  0.39234524 0.5 2.5
## 6       6 -1.72025584 1.5 2.5
## 7       7 -0.80202813 2.5 2.5
## 8       8 -3.22355088 3.5 2.5
## 9       9  0.08296758 0.5 1.5
## 10     10 -1.66913192 1.5 1.5
## 11     11 -1.31526120 2.5 1.5
## 12     12 -0.03551086 3.5 1.5
## 13     13 -1.64623243 0.5 0.5
## 14     14 -0.83343844 1.5 0.5
## 15     15 -1.49002627 2.5 0.5
## 16     16 -0.50759177 3.5 0.5

Estimating the MIR

We use the SpatialEffect() function to estimate the MIR. We need supply the Ydata_sim, ras_sim, and Zdata:

MIRest <- SpatialEffect(ras = ras_sim,
                        Zdata = Zdata,
                        x_coord_Z = "x",
                        y_coord_Z = "y",
                        treatment = "Zstart",
                        dVec = seq(from=.1, to=2, by=.1),
                        cutoff = .4,
                        nPerms = 5000)
print(MIRest)

## $MIRhat.mat
##      d    taudhat
## 1  0.1  1.4907275
## 2  0.2  1.5504199
## 3  0.3  1.5504199
## 4  0.4  1.6907676
## 5  0.5  1.7141589
## 6  0.6  1.6996987
## 7  0.7  1.6780083
## 8  0.8  1.0015140
## 9  0.9 -0.8164325
## 10 1.0 -0.7909072
## 11 1.1 -0.7891421
## 12 1.2 -1.0581549
## 13 1.3 -1.1995415
## 14 1.4 -1.1907508
## 15 1.5 -1.1235566
## 16 1.6 -0.8219402
## 17 1.7 -0.8732635
## 18 1.8 -0.7611800
## 19 1.9 -0.2987104
## 20 2.0  0.1796545
## 
## $Per.CI
##             2.5%     97.5%
##  [1,] -1.5664619 1.5664619
##  [2,] -1.6261543 1.6261543
##  [3,] -1.6261543 1.6261543
##  [4,] -1.6664999 1.6664999
##  [5,] -1.6840434 1.6840434
##  [6,] -1.6731982 1.6731982
##  [7,] -1.6569304 1.6569304
##  [8,] -1.0049027 1.0049027
##  [9,] -1.3283508 1.3283508
## [10,] -1.3070277 1.3070277
## [11,] -1.0292437 1.0292437
## [12,] -1.0250823 1.0250823
## [13,] -1.1550229 1.1550229
## [14,] -1.1484299 1.1484299
## [15,] -1.0730147 1.0730147
## [16,] -1.0260541 1.0260541
## [17,] -0.9875616 0.9875616
## [18,] -0.9374055 0.9374055
## [19,] -0.3514481 0.3514481
## [20,] -0.6405816 0.6405816
## 
## $Conley.SE
##  [1] 0.6959008 0.6399043 0.5235779 0.4709446 0.4427664 0.4596011 0.5091843
##  [8] 0.7799539 1.2086808 1.0798117 1.1239454 1.0518100 0.9536329 0.8971970
## [15] 0.8204585 0.5894988 0.5436474 0.6151568 0.5038444 0.7596568
## 
## $Conley.CI
##             [,1]      [,2]
##  [1,]  0.1267620 2.8546930
##  [2,]  0.2962075 2.8046322
##  [3,]  0.5242072 2.5766326
##  [4,]  0.7677163 2.6138190
##  [5,]  0.8463369 2.5819810
##  [6,]  0.7988805 2.6005168
##  [7,]  0.6800070 2.6760095
##  [8,] -0.5271956 2.5302235
##  [9,] -3.1854468 1.5525818
## [10,] -2.9073383 1.3255238
## [11,] -2.9920750 1.4137909
## [12,] -3.1197026 1.0033927
## [13,] -3.0686620 0.6695791
## [14,] -2.9492568 0.5677553
## [15,] -2.7316552 0.4845420
## [16,] -1.9773580 0.3334775
## [17,] -1.9388125 0.1922854
## [18,] -1.9668874 0.4445274
## [19,] -1.2862454 0.6888246
## [20,] -1.3092728 1.6685819

yrange_sim <- range(c(MIRest$Per.CI, MIRest$Conley.CI))
par(mar=c(4,4, 1,1))
plot(MIRest$MIRhat.mat, type="l",
     ylim=yrange_sim)
for(colUp in c(1,2)){
points(MIRest$MIRhat.mat[,1], 
      MIRest$Per.CI[,colUp],
      type="l",
      col="blue",
      lty="dashed")
points(MIRest$MIRhat.mat[,1], 
      MIRest$Conley.CI[,colUp],
      type="l",
      col="red",
      lty="dashed")
}

In the graph, the black solid line is the MIR estimate for each of the distance (d) values. The red dashed line is a confidence interval based on the spatial-HAC standard error. The blue dashed line shows the 0.025 and .0975 quantiles of the sharp null permutation distribution, which can be used to gauge statistical significance relative to sharp null hypothesis.

Monte Carlo

Here we conduct a Monte Carlo simulation to illustrate the performance of the estimators. Rather than the toy example with only 4 nodes and a 4-by-4 raster, we enlargen the simulation to have a more realistic 100 intervention nodes:

Yscale <- 20    # Number of raster grid cells per side.  
                # Be sure to make it a multiple of YZratio
Y0_sd <- 1      # Noisiness of null outcomes
YZratio <- 2    # Size of strata within with Z is assigned, relative to gridcells
Zjitter <- 1    # Amount of randomness away from center of assignment stratum for
                # placing the intervention nodes (Z).
pZ <- .5        # Share of intervention nodes that should have Z=1.
dist_vec_up <- seq(from=.1, to=(Yscale/2), by = .1) # Distances over which we will evaluate effects

# Create a null Y0 raster   
    ras0 <- raster( nrows=Yscale, 
        ncols=Yscale, 
        xmn=0, xmx=Yscale, 
        ymn=0, ymx=Yscale, 
        vals=rnorm(Yscale*Yscale, mean=0, sd=Y0_sd))

# Dataset with Y0 values and then centroids of the raster cells.
Y0 <- getValues(ras0)
Yxy <- coordinates(ras0)
Yindex <- 1:nrow(Yxy)
Ydata <- data.frame(Yindex,cbind(Y0,Yxy))

# Intervention nodes
# we create them using the aggregate function:

Zxy <- coordinates(aggregate(ras0, fact=YZratio, fun=mean))
Zxy[,1] <- jitter(Zxy[,1], amount=Zjitter)
Zxy[,2] <- jitter(Zxy[,2], amount=Zjitter)
index <- 1:nrow(Zxy)
Zdata <- as.data.frame(cbind(index, Zxy))

# View the null raster:
par(mfrow=c(1,1))
plot(ras0, main="Null raster")  
points(Zxy, pch=19, cex=1.25, col=gray(0))  
text(Zxy, labels=Zdata$index, col="gray", cex=.5)

# Create a potential outcome function:
# - if you are very close to a treated node,
#   you get pulled up,
# - if you are intermediate distance from a treated
#   node, you get pulled down,
# - if you are far from a treated node, nothing happens.
# It is a non-monotonic kernel function, that takes arguments:
# - d is the distance,
# - sh controls the shape of the gammas
# - sc controls the scale of the gammas
# - a controls the amount of mixing of two gammas.

effectFun <- function(d, sh, sc, a){(dgamma(abs(d), shape=sh, scale=2*sc) - a*dgamma(abs(d), shape=5*sh, scale=sc))}

par(mfrow=c(1,1))
plotVec <- seq(from=0, to = (Yscale/2), by = .1)
plot(plotVec,
    effectFun(plotVec, 1, .25, 1),
    type="l",
    xlab="distance from intervention node",
    ylab="Yx",
    main="Effect function")

# For POs, need euclidean distances of each observation cell
# from each intervention node:
for(i in 1:nrow(Zdata)){
Ydata[,paste("dZ",i, sep="")] <- as.matrix(dist(rbind(Zdata[i,c("x","y")], Ydata[,c("x","y")])))[-1,1]
}

# We have a different potential outcome for each
# possible assignment. Thus, to get the POs, we need
# the set of feasible assignments.
# Construct this assigment matrix, assuming complete 
# random assignment
# (Could also do sims that look at Bernoulli)
# requires "ri" package:
Zdata$Zstart <- 0
Zdata$Zstart[1:floor(pZ*length(Zdata$Zstart))] <- 1
curZ <- genperms(Zdata$Zstart, maxiter=500)

## Too many permutations to use exact method.
## Defaulting to approximate method.
## Increase maxiter to at least 1.00891344545563e+29 to perform exact estimation.

# Then, create potential outcomes for each assignment:

for(k in 1:ncol(curZ)){
    Ydata[,paste("Ypo",k,sep="")] <- Ydata$Y0 + 3*apply(Ydata[,grep("dZ", names(Ydata))], 
        1,
        function(x){sum(effectFun(x, 1, .25, 1)*curZ[,k])})
}

# Now start constructing the causal quantities. 
# First up are the tau_ix values (node-and-point-specific effects).
# For each grid cell, there will be a separate 
# tau_ix value corresponding to the effect of switching
# treatment status of each intervention node, averaging
# over possible assignments at other nodes.
# This corresponds to the difference in mean POs when
# node i is in treatment versus in control, where
# these means are taken with respect to the set of assignments
# in the assignment matrix (curZ).
# So, to compute these tau_ix values, go through each row in curZ
# and take differences in means for tau_ix values corresponding to 
# columns in curZ with 1 minus columns with 0.

for(i in 1:nrow(curZ)){
Ydata[,paste("tau",i,sep="")] <- apply(Ydata[,grep("Ypo", names(Ydata))][,curZ[i,]==1], 
                1, mean)-apply(Ydata[,grep("Ypo", names(Ydata))][,curZ[i,]==0], 
                    1, mean)
}

# Now start computing the distance-specific aggregates.
# Function to estimate E[Vx|d_ix = d]:
# Get d-specific raster cell means around each node
# (requires "raster" package)
# Gives you the option of marginalizing over only
# unique cells versus over evaluation points.
unitCircle <- function(numpts_in){
                ptE <- sin((1:numpts_in/numpts_in)*2*pi)
                ptN <- cos(((1:numpts_in)/numpts_in)*2*pi) 
                circPoints <- cbind(ptE, ptN)
                return(circPoints)
}

dMeans <- function( Zdata_in = NULL, 
                    ras_in = NULL,
                    Ydata_in = NULL,
                    Yvar="Y",
                    Outstub=Yvar,
                    gridRes_in=res(ras_in)[1],
                    dist_vec=(gridRes_in:10*gridRes_in),
                    evalpts_in=10,
                    only.unique=FALSE){
    for(distUp in dist_vec){
        numpts <- ceiling(ceiling(distUp/(gridRes_in/sqrt(2))+1)*pi)*evalpts_in
        circPoints <- unitCircle(numpts)
        for(i in 1:nrow(Zdata_in)){
        ZxTemp <- Zdata_in[i,"x"] + circPoints[,1]*distUp
        ZyTemp <- Zdata_in[i,"y"] + circPoints[,2]*distUp
        if(only.unique==TRUE){
        Zdata_in[i,paste(Outstub, "bar", distUp, sep="")] <- mean(Ydata_in[unique(na.omit(cellFromXY(ras_in, 
                                                        cbind(ZxTemp, ZyTemp)))),Yvar])
        }
        if(only.unique==FALSE){
        Zdata_in[i,paste(Outstub, "bar", distUp, sep="")] <- mean(Ydata_in[na.omit(cellFromXY(ras_in, 
                                                        cbind(ZxTemp, ZyTemp))),Yvar])
        }                                                       
        }
    }
    return(Zdata_in)
}

# Now get the d-specific tau values for each of the intervention nodes

# This is just to create the dummy cells in the Z data:
Zdata_out <- dMeans(    Zdata_in=Zdata, 
                        ras_in=ras0, 
                        Ydata_in=Ydata, 
                        Yvar="tau1", 
                        Outstub="tau",
                        gridRes_in=res(ras0)[1],
                        dist_vec=dist_vec_up,
                        evalpts_in=10,
                        only.unique=FALSE)

# Now go intervention node by node:
for(i in 1:nrow(Zdata_out)){
Zdata_out[i,] <- dMeans(    Zdata_in=Zdata_out[i,], 
                        ras_in=ras0, 
                        Ydata_in=Ydata, 
                        Yvar=paste("tau",i,sep=""), 
                        Outstub="tau",
                        gridRes_in=res(ras0)[1],
                        dist_vec=dist_vec_up,
                        evalpts_in=10,
                        only.unique=FALSE)  
}

# Now we can get the true MIRs:
MIRvec <- apply(Zdata_out[,grep("taubar", names(Zdata_out))], 2, mean)

par(mfrow=c(1,1))
plot(dist_vec_up, MIRvec, type="l",
     ylab="True MIR",
     xlab="distance from intervention node")

# Trial run before going through all of curZ

assignUp <- 1
Zdata$Z <- curZ[,assignUp]
Ydata_use <- Ydata[,c("Yindex", "x","y")]
Ydata_use$Y <- Ydata[,paste("Ypo", assignUp, sep="")]

ras_use <- setValues(ras0, Ydata[,paste("Ypo",assignUp,sep="")])

for(assignUp in 1:ncol(curZ)){

  Zdata$Z <- curZ[,assignUp]
  Ydata_use <- Ydata[,c("Yindex", "x","y")]
  Ydata_use$Y <- Ydata[,paste("Ypo", assignUp, sep="")]
  
  ras_use <- setValues(ras0, Ydata[,paste("Ypo",assignUp,sep="")])
  
  MIRest_out <- SpatialEffect(ras = ras_use,
                              Zdata = Zdata,
                              x_coord_Z = "x",
                              y_coord_Z = "y",
                              treatment = "Z",
                              dVec = dist_vec_up,
                              cutoff = (Yscale/5),
                              per.se = 0,
                              conley.se = 0)  

    
  tauHat <- MIRest_out$MIRhat.mat$taudhat
  if(assignUp==1){
    tauHat.mat <- tauHat
  }
  if(assignUp>1){
    tauHat.mat <- rbind(tauHat.mat, tauHat)
  }
  
}


# Plot results:

plot(dist_vec_up, MIRvec, 
    ylim=range(tauHat.mat),
    ylab="MIR estimates",
    xlab="distance from intervention node",
    type="n")
for(i in 1:nrow(tauHat.mat)){
points(dist_vec_up, tauHat.mat[i,], type="l", col=gray(0, alpha=.1))
}
points(dist_vec_up, MIRvec, type="l", col="red")

It is clear from the graph that our estimator provides unbiased estimates of the MIR curve (red line) as the average of the 500 estimates (black lines) is almost identical to the true curve at each of the distance values.

Examples

Now we demonstrate how to apply our method in empirical studies using two examples. The first example comes from Paler et al. (2015) in which the authors conduct a field experiment in Aceh, Indonesia by randomly assigning forest rangers to 28 villages. The outcome of interest is forest coverage in the district (ranging from 0 to 1) and the dataset takes the form of a raster object. Each tile represents a grid on the map. The second example is based on Miguel and Kremer (2004). In this study, pupils from 25 randomly picked villages out of 50 in a Kenyan area received anti-worm treatment. We know the average effect of the treatment at village level and we want to understand its diffusion effect in the whole area. For this purpose, kriging is first used to extrapolate the outcome variable’s value at each spatial point. The MIR is then estimated using these extrapolated values.

load("~/Documents/GitHub/spatial/data/forest_ras.RData")
load("~/Documents/GitHub/spatial/data/forest_Zdata.RData")
len <- 1.5
dVec <- seq(from=.25, to=len, by=.01)
result.list <- SpatialEffect(ras = forest_ras, Zdata = forest_Zdata, x_coord_Z = "x_coord", y_coord_Z = "y_coord", treatment = "treatment", dVec = dVec, numpts = NULL, evalpts = 0.001, only.unique = 0, per.se = 1, conley.se = 1, cutoff = 0.3, m = 2, lambda = 0.02, nPerms = 1000)

## Too many permutations to use exact method.
## Defaulting to approximate method.
## Increase maxiter to at least 40116600 to perform exact estimation.

## Warning in sqrt(diag(VCE)): NaNs produced

MIRhat.mat <- result.list[[1]]
Per.CI <- result.list[[2]]
Conley.SE <- result.list[[3]]
Conley.CI <- result.list[[4]]
par(mfrow=c(1,1))
par(pty="s")
plot(MIRhat.mat[,1:2],
     ylab="MIR estimates",
     xlab="distance from intervention node",
     type="l",
     main="MIR Estimates for Paler et al. (2015)",
     ylim =c(-.5, .5),
     col="black")
points( Conley.CI[, 1] ~ MIRhat.mat[,1], col="red", type="l", lty = 2)
points( Conley.CI[, 2] ~ MIRhat.mat[,1], col="red", type="l", lty = 2)
points( Per.CI[, 1] ~ MIRhat.mat[,1], col="blue", type="l", lty = 2)
points( Per.CI[, 2] ~ MIRhat.mat[,1], col="blue", type="l", lty = 2)

data(worms)
dVec <- seq(.01,.4,.01)

result.list <- SpatialEffect(outcome = "any_ics99", Zdata = wormsprep, 
x_coord_Z = "east", y_coord_Z = "north", treatment = "wgrp1", dVec = dVec, 
numpts = 10, evalpts = 1, only.unique = 0, cutoff = 0.1, per.se = 1, 
conley.se = 1, m = 2, lambda = 0.02, nPerms = 1000)

## Too many permutations to use exact method.
## Defaulting to approximate method.
## Increase maxiter to at least 126410606437752 to perform exact estimation.

MIRhat.mat <- result.list[[1]]
Per.CI <- result.list[[2]]
Conley.SE <- result.list[[3]]
Conley.CI <- result.list[[4]]

plot(MIRhat.mat,
     ylab="MIR estimates",
     xlab="distance from intervention node",
     type="l",
     ylim=c(-.3, 0.2),
     main="MIR Estimates for Miguel and Kremer (2004)")
points( Conley.CI[, 1] ~ MIRhat.mat[,1], col="red", type="l", lty = 2)
points( Conley.CI[, 2] ~ MIRhat.mat[,1], col="red", type="l", lty = 2)
points( Per.CI[, 1] ~ MIRhat.mat[,1], col="blue", type="l", lty = 2)
points( Per.CI[, 2] ~ MIRhat.mat[,1], col="blue", type="l", lty = 2)

As before, the black solid line in each of the two figures represents the estimate of the MIR curve. The red and blue dashed lines around it indicate confidence intervals based on the spatial-HAC standard error and the sharp null permutation distribution, respectively. We find a declining diffusion effect for treatment in both examples, even though the effect is not precisely estimated at all the distance values.

Design-Based Inference for Spatial Effects with the SpatialEffect package

Cyrus Samii (cds2083@nyu.edu) and Ye Wang (yw1576@nyu.edu)

2019-01-22

Overview

Loading the package

Methods

Setting

Circle averages

The marginalized individualistic response (MIR)

Estimator

Inference