We are increasingly performing spatial analyses in R. The replicability and the
efficiency of programming languages is much more appealing than using pieces of
user friendly software like ArcGIS,
even though you can still code your way through analyses when using them
(latter versions of QGIS do a fantastic
job in that regard!). The performance of tools available for spatial analyses in
R is however not completely certain.
In this post, we compare four different methods to perform spatial intersects
between objects in R, from three different packages:
raster::intersect
rgeos::gIntersects
rgeos::gIntersection
sf::st_intersects
sf::st_intersection
More specifically, we test how these methods fare when performing binary
(TRUE/FALSE) and zonal or aerial intersects. Keep in mind, not all methods
can be used for both binary and zonal intersects:
Function
Binary
Zonal
raster::intersect
X
X
rgeos::gIntersects
X
rgeos::gIntersection
X
X
sf::st_intersects
X
sf::st_intersection
X
X
Obviously, if you mean to perform binary intersects only, the binary functions
make more sense as they are built to include less calculations. We nonetheless
compare all the functions together for the sake of comparison in this post.
We start be generating random spatial object in space. For the record,
the area selected is within the St. Lawrence estuary in eastern Canada
(see online ecology series),
although the actual location really does not matter for this post!
Grid
We use a regular grid to intersect vectorized data, i.e. points and polygons
for this post. This simulate the use of a grid used to extract environmental
data (biotic and/or abiotic) from multiple sources to characterize a study area.
# Projection in lambersprojSpat<-"+proj=lcc +lat_1=46 +lat_2=60 +lat_0=44 +lon_0=-68.5 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"# Bounding boxlatmax<-625000latmin<-585000lonmax<-150000lonmin<-100000# Create a spatial bounding box for the area of interest using the 'sp' package:bb<-cbind(c(lonmin,lonmax,lonmax,lonmin,lonmin),c(latmin,latmin,latmax,latmax,latmin))%>%Polygon()%>%list()%>%Polygons(ID='ID1')%>%list()%>%SpatialPolygons(proj4string=CRS(projSpat))# Create spatial grid that will be used for intersectsA<-5*1000000# Surface of cells in m^2 (area in km^2 * 1000000)cellsize<-sqrt(2*A)/3^(1/4)# value of the small diagonalgrid<-sp::spsample(bb,type="hexagonal",cellsize=cellsize,offset=c(0,0))%>%# Points for a hexagonal gridsp::HexPoints2SpatialPolygons()# creating polygons# We now have a grid of polygons to work with!plot(grid)lines(bb,col='blue',lwd=2)
1
2
3
4
5
6
7
8
9
10
# We need this grid in `sp` and `sf` usable formatsgridSP<-gridgridSF<-sf::st_as_sf(grid)class(gridSP)#R> [1] "SpatialPolygons"#R> attr(,"package")#R> [1] "sp"class(gridSF)#R> [1] "sf" "data.frame"
Points and Polygons
Now we generate random points within the bounding box to test the intersects.
This is done for 1, 10, 50, 100, 250, 500, 1000, 10000 points. Then, to get
all data required to perform the tests, we also need to create polygons from
the point data.
# Number of samplessamp<-c(1,10,50,100,250,500,1000,10000)nSamp<-length(samp)# Names of points samplessampNames<-paste0('Pt',samp)sampNames#R> [1] "Pt1" "Pt10" "Pt50" "Pt100" "Pt250" "Pt500" "Pt1000"#R> [8] "Pt10000"# Random points for all samplesfor(iin1:nSamp){assign(x=sampNames[i],value=data.frame(lon=runif(n=samp[i],min=lonmin,max=lonmax),lat=runif(n=samp[i],min=latmin,max=latmax)))}# We now have nSamp new objects with randomly generated coordinatesls()#R> [1] "A" "bb" "cellsize" "eplot" "grid" "gridSF" #R> [7] "gridSP" "i" "latmax" "latmin" "lonmax" "lonmin" #R> [13] "nSamp" "projSpat" "Pt1" "Pt10" "Pt100" "Pt1000" #R> [19] "Pt10000" "Pt250" "Pt50" "Pt500" "res1" "res2" #R> [25] "samp" "sampNames"# Let's now create spatial objects with those coordinates for use with the `sp` packagesampNamesSP<-paste0(sampNames,'SP')for(iin1:nSamp){assign(x=sampNamesSP[i],value=SpatialPoints(coords=get(sampNames[i]),proj4string=CRS(projSpat)))}# Visualize thempar(mfrow=c(3,3))for(iin1:nSamp){par(mar=c(0,0,0,0))plot(bb,lwd=2)points(get(sampNamesSP[i]),cex=0.75,col='transparent',bg='#1e6b7955',pch=21)}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Now we generate polygons from the point data using the `rgeos::gBuffer`,# see `sf::st_buffer` is the equivalent for the `sf` packagesampNamesPoly<-paste0('Poly',samp)sampNamesSPPoly<-paste0(sampNamesPoly,'SP')for(iin1:nSamp){assign(x=sampNamesSPPoly[i],value=rgeos::gBuffer(get(sampNamesSP[i]),byid=T,width=2000))}# Visualize thempar(mfrow=c(3,3))for(iin1:nSamp){par(mar=c(0,0,0,0))plot(bb,lwd=2)plot(get(sampNamesSPPoly[i]),border='transparent',col='#1e6b7955',add=T)}
1
2
3
4
5
6
7
8
9
10
# Transform points and polygons for use with the `sf` packagesampNamesSF<-paste0(sampNames,'SF')# for pointssampNamesSFPoly<-paste0(sampNamesPoly,'SF')# for polygonsfor(iin1:nSamp){assign(x=sampNamesSF[i],sf::st_as_sf(get(sampNamesSP[i])))# pointsassign(x=sampNamesSFPoly[i],sf::st_as_sf(get(sampNamesSPPoly[i])))# polygons}
Benchmarking
Now that the data is ready for use, we can perform the tests! But first, let’s
take a quick look at the type of results that are returned by each function.
Points intersections
First, we will test the intersects only with the points.
# Visualize resultscols<-c("#3fb3b2","#484f42","#ffdd55","#c7254e","#1b95e0")layout(matrix(c(1,2),ncol=2),widths=c(5,2),heights=3)par(mar=c(4,4,1,1),family="serif",las=1)#plot(res1$raster_intersect~res1$n,type='l',lwd=2,col=cols[1],ylim=c(0,max(res1[,2:6])),xlab='Number of points',ylab='Time (s)')lines(res1$rgeos_gIntersects~res1$n,lwd=2,col=cols[2])lines(res1$rgeos_gIntersection~res1$n,lwd=2,col=cols[3])lines(res1$sf_st_intersects~res1$n,lwd=2,col=cols[4])lines(res1$sf_st_intersection~res1$n,lwd=2,col=cols[5])# Legendpar(mar=c(0,0,0,0),family="serif",las=1)eplot()legend(x='center',legend=as.character(colnames(res1)[-1]),bty='n',lty=1,col=cols,seg.len=2,cex=0.8,title=expression(bold('Fonctions')))
1
2
3
4
5
6
7
8
9
10
11
12
13
# Visualize results without gIntersectionlayout(matrix(c(1,2),ncol=2),widths=c(5,2),heights=3)par(mar=c(4,4,1,1),family="serif",las=1)#plot(res1$raster_intersect~res1$n,type='l',lwd=2,col=cols[1],ylim=c(0,max(res1[,c(2,3,5,6)])),xlab='Number of points',ylab='Time (s)')lines(res1$rgeos_gIntersects~res1$n,lwd=2,col=cols[2])lines(res1$sf_st_intersects~res1$n,lwd=2,col=cols[4])lines(res1$sf_st_intersection~res1$n,lwd=2,col=cols[5])# Legendpar(mar=c(0,0,0,0),family="serif")eplot()legend(x='center',legend=as.character(colnames(res1)[-c(1,4)]),bty='n',lty=1,col=cols[-3],seg.len=2,cex=0.8,title=expression(bold('Fonctions')))
In this analysis rgeos::gIntersection is clearly much less efficient than the
alternative options. Using raster::intersect, rgeos::gIntersects,
sf::st_intersects or sf::st_intersection significantly decreases calculation
time, with sf::st_intersects proving to be the most efficient option.
Polygons intersections
Now let’s take a look at intersects using polygons only.
# Visualize resultslayout(matrix(c(1,2),ncol=2),widths=c(5,2),heights=3)par(mar=c(4,4,1,1),las=1,family="serif")##plot(res2$raster_intersect~res2$n,type='l',lwd=2,col=cols[1],ylim=c(0,max(res2[,2:6])),xlab='Number of polygons',ylab='Time (s)')lines(res2$rgeos_gIntersects~res1$n,lwd=2,col=cols[2])lines(res2$rgeos_gIntersection~res2$n,lwd=2,col=cols[3])lines(res2$sf_st_intersects~res2$n,lwd=2,col=cols[4])lines(res2$sf_st_intersection~res2$n,lwd=2,col=cols[5])# Legendpar(mar=c(0,0,0,0),family="serif")eplot()legend(x='center',legend=as.character(colnames(res2)[-1]),bty='n',lty=1,col=cols,seg.len=2,cex=0.8,title=expression(bold('Fonctions')))
1
2
3
4
5
6
7
8
9
10
11
12
# Visualize results without gIntersectionlayout(matrix(c(1,2),ncol=2),widths=c(5,2),heights=3)par(mar=c(4,4,1,1),family="serif",las=1)#plot(res2$rgeos_gIntersects~res2$n,type='l',lwd=2,col=cols[2],ylim=c(0,max(res2[,c(3,5,6)])),xlab='Number of points',ylab='Time (s)')lines(res2$sf_st_intersects~res2$n,lwd=2,col=cols[4])lines(res2$sf_st_intersection~res2$n,lwd=2,col=cols[5])# Legendpar(mar=c(0,0,0,0),family="serif")eplot()legend(x='center',legend=as.character(colnames(res1)[-c(1,2,4)]),bty='n',lty=1,col=cols[-c(1,3)],seg.len=2,cex=0.8,title=expression(bold('Fonctions')))
We see here that rgeos::gIntersects, sf::st_intersects, sf::st_intersection
are far more efficient when dealing with polygons only intersects, with
rgeos::gIntersects the most efficient option. raster::intersects loses its
previous efficiency, while the efficiency of rgeos::gIntersection decreases
even further.
Concluding remarks
Et voilà! It is obvious from these simulations that the sf package overall
provides the most efficient options to perform spatial intersects in R.
rgeos is also very efficient when it comes to binary intersects, especially
with polygons on polygons intersects where it edges st_intersects by
decreasing calculation time in half.
Our recommendation: use sf::st_intersects for binary intersects and
use sf::st_intersection for zonal intersects. However, be aware that the sf
package evolves very rapidly and functions are likely to be modified,
although one would hope that efficiency decrease will not be the price of
further development.
If you wish to stick with the older packages, then binary intersects could be
done quite efficiently, but if you need zonal intersects, we recommend that you
start considering changing your ways!