Test your data

I’ve written and spoken before about how important it is to test your functions and data analysis scripts. I decided to revisit these ideas and write this tutorial based on my recent experience of calculating the number of units of alcohol the panel members in the NCDS and BCS70 birth cohorts drank at different time points. I initially thought this would be a straightforward mathematical calculation but this turned out to be vastly more complicated than I thought (it always does!). My tests of the data identified the problem (something I would likely have missed without them) and confirmed when I had solved it. I use testthat in R although the ideas are language–agnostic.

[Read More]

Trigpoints data set released

I’ve packaged up the Ordnance Survey’s archive of trig points into an R package for immediate download and use with R. Install it with: install.packages("trigpoints") Load it as you would a normal package (I also load a few other useful packages here): library("trigpoints") library("dplyr") ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union library("sf") ## Linking to GEOS 3. [Read More]

Formal informal testing of research code

When writing research code I do test my code and results, but until recently I've only been doing this informally and not in any systematic way. I decided it was time to change my testing habits when I noticed I had recoded a variable incorrectly despite my informal tests suggesting nothing was wrong. When I went back and corrected this error this made a small, but noticeable, difference to my model. [Read More]

Clip polygons in R

Clipping polygons is a common GIS task. For example, you might want to study local authorities (LADs) in the Yorkshire and the Humber region but can only obtain a shapefile with all the LADs in England. Removing all the LADs outside of the Yorkshire and the Humber can be achieved by ‘clipping’ the LADs, using the extent of the larger region as a template.

[Read More]
R  rstats  GIS  sf  clip  polygon-clip 

Regression Diagnostics with R

The R statistical software is my preferred statistical package for many reasons. It's mature, well-supported by communities such as Stack Overflow, has programming abilities built right in, and, most-importantly, is completely free (in both senses) so that anyone can reproduce and check your analyses. R is extremely comprehensive in terms of available statistical analyses, which can be easily expanded using additional packages that are free and easily installed. When there isn't a readily-available built-in function, for example I don't know of a function to calculate the standard error, R's support for programming functions means it's a doddle to write your own. [Read More]