Clip polygons in R

Clipping polygons is a common GIS task. For example, you might want to study local authorities (LADs) in the Yorkshire and the Humber region but can only obtain a shapefile with all the LADs in England. Removing all the LADs outside of the Yorkshire and the Humber can be achieved by ‘clipping’ the LADs, using the extent of the larger region as a template.

HUGOMORE42

First, install and load sf for loading the shapefiles, and tidyverse for manipulating the data and plotting the maps.

# install.packages("tidyverse")
# install.packages("sf")

library("tidyverse")
library("sf")

Then download and unzip our test data (shapefiles of English regions and local authorities):

tmp_dir = tempdir()

# a region shapefile for England ~6MB
tmp_reg = tempfile(tmpdir = tmp_dir, fileext = ".zip")
download.file(
  "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_gor_2011.zip",
  destfile = tmp_reg
)
unzip(tmp_reg, exdir = tmp_dir)

# local authorities for England ~ 
tmp_lad = tempfile(tmpdir = tmp_dir, fileext = ".zip")
download.file(
  "https://borders.ukdataservice.ac.uk/ukborders/easy_download/prebuilt/shape/England_lad_2011.zip",
  destfile = tmp_lad
)
unzip(tmp_lad, exdir = tmp_dir)

Now load these files ready for clipping:

reg = read_sf(  # regions
  dsn = tmp_dir,
  layer = "england_gor_2011"
)

lad = read_sf(  # Local Authority District
  dsn = tmp_dir,
  layer = "england_lad_2011"
)

Let’s check these have loaded correctly with a plot:

ggplot() + 
  geom_sf(data = lad, colour = "light gray", fill = NA) +
  geom_sf(data = reg, colour = "dark gray", fill = NA)

To clip the LADs to a region, use sf::st_contains() with sparse = FALSE argument:

yh = 
  reg %>% 
  filter(name == "Yorkshire and The Humber")

yh_lads = 
  lad %>% 
  filter(st_contains(yh, ., sparse = FALSE))

And a plot to check the clip succeeded:

ggplot(yh_lads) + geom_sf()

We can also check the data are still included in the yh_lads object (without the geometry):

yh_lads
## Simple feature collection with 21 features and 4 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 363191.2 ymin: 378555.1 xmax: 542343 ymax: 519226.1
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
## # A tibble: 21 x 5
##    code    name       altname label                                geometry
##    <chr>   <chr>      <chr>   <chr>                      <MULTIPOLYGON [m]>
##  1 E07000… Hambleton  <NA>    E0700… (((420403.7 487934, 420403.8 487934, …
##  2 E07000… Selby      <NA>    E0700… (((449563.5 448780.2, 449563.5 448780…
##  3 E08000… Doncaster  <NA>    E0800… (((447460 404259.3, 447455.8 404261.4…
##  4 E07000… Scarborou… <NA>    E0700… (((478796.4 519204.3, 478800.3 519201…
##  5 E08000… Bradford   <NA>    E0800… (((401481.6 443899, 401472.5 443917.2…
##  6 E07000… Harrogate  <NA>    E0700… (((436734.8 478785.4, 436734.6 478768…
##  7 E07000… Ryedale    <NA>    E0700… (((466763.2 457589.2, 466763.2 457589…
##  8 E07000… Richmonds… <NA>    E0700… (((411525.3 511326.6, 411531.8 511321…
##  9 E06000… North Eas… <NA>    E0600… (((529614.4 410900.2, 529622.3 410894…
## 10 E08000… Leeds      <NA>    E0800… (((419443 436664, 419443.2 436664.9, …
## # … with 11 more rows

Original post (rgdal)

I’ve archived the original post based on rgdal and reproduced the instructions below, but I strongly recommend using sf for new code.

# install.packages("rgdal") # uncomment and run if not already installed
# install.packages("rgeos")

# If you run in to trouble installing rgdal and rgeos on Linux (Ubuntu) see:
# https://philmikejones.me/post/2014-07-14-installing-rgdal-in-r-on-linux/

require("rgdal")
require("rgeos")


tmp_dir = tempdir()

tmp_reg = tempfile(tmpdir = tmp_dir, fileext = ".zip")
download.file(
  "https://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_gor_2011.zip",
  destfile = tmp_reg
)  # ~6.4MB

tmp_lad = tempfile(tmpdir = tmp_dir, fileext = ".zip")
download.file(
  "https://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_lad_2011.zip",
  destfile = tmp_lad
)  # ~25MB
unzip(tmp_reg, exdir = tmp_dir)
unzip(tmp_lad, exdir = tmp_dir)

# I recommend loading the shapefiles with readOGR() in the rgdal package

regions <- readOGR(tmp_dir, "england_gor_2011")
lads <- readOGR(".", "england_lad_2011")

# Extract Yorkshire and the Humber
regions <- regions[regions@data$name == "Yorkshire and The Humber", ]

# Clip the LADs
yh_lads <- gIntersection(regions, lads, byid = TRUE, drop_lower_td = TRUE)


# Easy, but we're not finished yet.
# The clipped polygons no longer contain a data frame because the gIntersection
# doesn't (and can't) know which data frame items to save in to the new object.
# This means we must add them back in manually, but even this is relatively
# straight-forward.

# Recreate the data frame
# The row names for the clipped polygons are a concatenation of the regions row
# name and the LAD row names. These are in the form x yyyy where x is the
# region row name (in our case always 5, because Yorkshire and The Humber was
# row 5) and yyyy is the LAD row name.
row.names(yh_lads) <- gsub("5 ", "", row.names(yh_lads))

keep <- row.names(yh_lads)

# For the sake of simplicity I also change the polygon IDs so they are 
# consistent with their respective row names. This step is probably optional, 
# but I prefer to ensure they match so I have a consistent set of IDs to work
# with later on. If you choose not to do this I don't think you'll run into 
# any problems but YMMV
yh_lads <- spChFIDs(yh_lads, keep)

# Now we create a copy of the LAD dataframe and filter it so that it only
#  contains the rows we want to keep, using row names to perform the match:
yh_data <- as.data.frame(lads@data[keep, ])

# Finally we create a SpatialPolygonsDataFrame object with our clipped polygons
#  and our subsetted data frame:
yh_lads <- SpatialPolygonsDataFrame(yh_lads, yh_data)

# Check it worked with a plot
plot(yh_lads)

# and check the attributes:
yh_lads@data
##           CODE                        NAME ALTNAME
##  2   E07000164                   Hambleton    
##  8   E07000169                       Selby    
##  9   E08000017                   Doncaster    
##  37  E07000168                 Scarborough    
##  39  E08000032                    Bradford    
##  43  E07000165                   Harrogate    
##  105 E07000167                     Ryedale    
##  124 E07000166               Richmondshire    
##  162 E06000012     North East Lincolnshire    
##  165 E08000035                       Leeds    
##  175 E07000163                      Craven    
##  204 E08000033                  Calderdale    
##  207 E06000013          North Lincolnshire    
##  214 E08000019                   Sheffield    
##  225 E06000010 Kingston upon Hull, City of    
##  245 E06000011    East Riding of Yorkshire    
##  248 E06000014                        York    
##  251 E08000036                   Wakefield    
##  254 E08000034                    Kirklees    
##  297 E08000016                    Barnsley    
##  320 E08000018                   Rotherham    
R  rstats  GIS  sf  clip  polygon-clip