Dissolve polygons in R

Dissolving polygons is an elementary GIS task that I need to perform regularly. A dissolve removes internal boundaries, leaving only the outline.

Packages

Install and load the packages we’ll need. I use sf because it’s more intuitive than rgdal, and I’m loading tidyverse because it plays well with sf.

# install.packages(c("tidyverse", "sf"))

library("tidyverse")
library("sf")

Boundary data

Download the shapefile we’re going to dissolve.

tmp_dir = tempdir()
tmp     = tempfile(tmpdir = tmp_dir, fileext = ".zip")

download.file(
  "http://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_gor_2011.zip",
  destfile = tmp
)
unzip(tmp, exdir = tmp_dir)

And now load the shapefile we just extracted and plot it to ensure it’s worked:

regions = read_sf(tmp_dir, "england_gor_2011")
ggplot(regions) + geom_sf()

Now lets have a look at the data (without the geometry):

regions %>%
  st_set_geometry(NULL) %>%
  glimpse()
## Observations: 9
## Variables: 3
## $ code  <chr> "E12000006", "E12000007", "E12000002", "E12000001", "E12...
## $ name  <chr> "East of England", "London", "North West", "North East",...
## $ label <chr> "E12000006", "E12000007", "E12000002", "E12000001", "E12...

Dissolves with sf() do not save the data frame (other than the geometry column) by default without specifying how to aggregate the data. Using area as an example:

regions$area <- st_area(regions)
regions
## Simple feature collection with 9 features and 4 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 82643.6 ymin: 5333.602 xmax: 655989 ymax: 657599.5
## epsg (SRID):    NA
## proj4string:    +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
## # A tibble: 9 x 5
##   code    name       label                               geometry area    
##   <chr>   <chr>      <chr>                     <MULTIPOLYGON [m]> <S3: un>
## 1 E12000… East of E… E1200… (((547491 193549, 547472.1 193465.5,… 1957502…
## 2 E12000… London     E1200… (((526048.4 161889.6, 526037.4 16189… " 15947…
## 3 E12000… North West E1200… (((392703.4 434383.5, 392681.8 43436… 1491524…
## 4 E12000… North East E1200… (((420570.7 637160.3, 420570.6 63716… " 86764…
## 5 E12000… East Midl… E1200… (((456472.5 274014.3, 456432.6 27408… 1581475…
## 6 E12000… Yorkshire… E1200… (((537480.1 415251.6, 537473.2 41526… 1556415…
## 7 E12000… South West E1200… (((90195.03 10349.04, 90194.1 10345.… 2439226…
## 8 E12000… West Midl… E1200… (((325576.3 238639.3, 325570.7 23865… 1300380…
## 9 E12000… South East E1200… (((429042.2 84841.15, 429056.2 84832… 1941297…

Now we have something some data that’s meaningful to group, we can just use summarise() to group the data and perform the dissolve:

england <-
  regions %>%
  summarise(area = sum(area))
ggplot(england) + geom_sf()

If you don’t have or want data to save the dissolve just create a column to group_by() so that the features (rows) that are to be grouped together are given the same data:

england <-
  regions %>%
  mutate(country = "england") %>%
  group_by(country) %>%
  summarise()

Original rgdal instructions

I’ve archived these in case but I strongly suggest using sf now for most GIS tasks in R, including dissolves.

# Load a few packages. dplyr makes merges easier
require("rgdal")
require("rgeos")
require("dplyr")

# Use the original English regions shapefile downloaded above:
region = readOGR(tmp_dir, "England_gor_2011")

# Check the shapefile has loaded correctly:
plot(region)

# We're going to dissolve all regions in to one country (England!)
# For this we'll create a lookup table and merge with the spatial data
# Hopefully for your 'real' data you have a lookup table of all polygons and
# their larger geography already!

lu = data.frame()
lu = rbind(lu, region@data)
lu$CODE = as.character(lu$CODE)
lu$NAME = as.character(lu$NAME)
lu$country = NA
lu$country = "England"

# Merge
# We now need to merge the lookup table into our spatial object data frame.
# We should end up with one row per zone to dissolve, each with a reference for the relevant larger geography.
# I think the trick is to make sure the row names match exactly, and if you can match the polygon IDs as well with spChFIDs().
region@data$CODE = as.character(region@data$CODE)
region@data = full_join(region@data, lu, by = "CODE")

# Tidy merged data
region@data = select(region@data, -NAME.x)
colnames(region@data) = c("code", "name", "country")

# Ensure shapefile row.names and polygon IDs are sensible
row.names(region) = row.names(region@data)
region = spChFIDs(region, row.names(region))

# Dissolve
# I use gUnaryUnion() (and indeed I think unionSpatialPolygons() in the maptools package uses this by default).
region = gUnaryUnion(region, id = region@data$country)

# If you want to just plot this using base plot you can stop there.
# If you want to do anything with the data or plot using ggplot you need to recreate the data frame.

# If you want to recreate an object with a data frame make sure row names match
row.names(region) = as.character(1:length(region))

# Extract the data you want (the larger geography)
lu = unique(lu$country)
lu = as.data.frame(lu)
colnames(lu) = "country"  # your data will probably have more than 1 row!

# And add the data back in
region = SpatialPolygonsDataFrame(region, lu)

# Check it's all worked
plot(region)

# And your data frame should look like this:
region@data

##   country
## 1 England