Origin-Destination Network Analysis

Learning objectives

This tutorial takes you through a simple approach to measuring either distances or time between two points on a road network - for multiple points. You will construct a road network from OpenStreetMap, and utilise this network along with the dodgr library to calculate your chosen metrics between two coordinate datasets.

Our analysis case study

For this tutorial, we’ll be using Portsmouth in the U.K. as our area of interest for our analysis. The city is located on the south coast of the U.K., and is actually the only city in the U.K whose population density exceeds that of London (in 2011)! One of the reasons is that the city primarily occupies an outcrop of land on the south-coast (an island called Portsea Island), and extends only slightly into the north, past the M27. There are lots of geographical issues and challenges within the city that you could investigate, including (TBC).

Portsmouth city, extending just north of the M27 into Paulsgrove, Cosham, Drayton and Farlington.

One prominent topic within the city is the issue of public health and childhood obesity. According to figures released in March 2020 by Public Health Engalnd, more than one in three school pupils are overweight or obese by the time they finish primary school - higher than the national average of one in four. One potential contributor to the health crisis is the ease and availability of fast food in the city. From the local newspaper in the city, the Portsmouth News, Healthwatch Portsmouth chairman Roger Batterbury was quoted: ‘For people in areas of deprivation, every penny counts and when it comes to buying food, the cheapest option is important but that is rarely the healthy choice.’ See more at: https://www.portsmouth.co.uk/health/one-three-portsmouth-pupils-are-overweight-or-obese-time-they-finish-primary-school-2063613/

The City Council itself has aimed to address the issue by banning new fast food takeaways within a 400m range of schools – it started with a pilot at Arundel Court Primary Academy in Landport in September 2019. Since the pilot, no new hot food takeaways will be able to open within a 400m radius of the school.

To assess the likely impact of this policy, we will investigate the accessibility of fast food outlets for school children - we want to know if there is a geography to accessibility, and, in future tutorials, whether certain socio-economic demographics are more exposed to fast food then others. We will measure accessibility by understanding how many fast food outlets are within specific walking distances of each school, starting at 400m, then 800m and finally a 1km walking distance. We’ll then aggregate these counts at the Lower Super Output Area (LSOA) and compare across the city.

To get this data ready for our spatial and socio-economic analyses, we’ll need to first calculate the distances between our schools and fast food outlets. This involves calculating the shortest distance a child would walk between a school and a fast food outlet, using roads or streets. This means we need to conduct a road network analysis between each school and fast food outlet - just what this tutorial is designed to do!

Let’s get started!

Setting up your script - libraries and working directory

For our network analysis, we will be using the very recent (as of August 2020!), dodgr library, more info: https://atfutures.github.io/dodgr/index.html. Prior to the creation of dodgr, this analysis would have been incredibly complex to do. Whilst R has had many network analysis libraries, the majority of these focus on utilising networks in graphical spaces, rather than geographical. Creating measures of distance or time therefore would be more complex as you would need to transform your graph distances into geographical ones - but thanks to dodgr not anymore!

In addition to dodgr, another network library is in the works: sfnetworks (https://luukvdmeer.github.io/sfnetworks/index.html). sfnetworks is still in active development, whilst dodgr appears to have a stable release (it appears on R’s CRAN database), so we will focus on using this library for now! They appear to have very similar functions and functionality, so hopefully in the future, we can use our case study to check the efficiency and results of our work!

We’ll also be loading several other libraries to help with our analysis, including:

magrittr: to allow us to use the pipe function (%>%) within our work, to enable efficient programming.
osmdata: to download OSM data from the OSM API server.

NB: dodgr also has an in-built function to enable you to create a network download directly for use within their libraries (without any of the formatting you’ll see us use), but this function (currently) does not provide the flexibility of determining what type of network you want to construct. In our case, we want to omit any motorways from our network to ensure our network represents where people will walk. Hence we’ll use the osmdata library to create our pedestrian-friendly network!

sf: the Simple Features library, used to encode spatial data. *This has replaced sp as the default spatial library as it works better with the tidyverse way of doing things, see more at: https://www.nickbearman.me.uk/2019/04/spatial-r-moving-from-sp-to-sf/.

and finally (for now):

expss: this library provides many functions that are usually seen in Excel or SPSS. Here we use the count_if function in our code (you’ll see why later on) , but this functionality probably can be replaced by base libraries in R, we just found a quick solution to what we needed to make this tutorial work!

** Make sure to install these libraries using the R console first using: install.packages(c('magrittr', 'osmdata', 'dodgr', 'sf', 'expss'))

# Load our libraries
library(magrittr)
library(osmdata)

## Data (c) OpenStreetMap contributors, ODbL 1.0. https://www.openstreetmap.org/copyright

library(dodgr)
library(sf)
library(expss)

## 
## Attaching package: 'expss'

## The following objects are masked from 'package:magrittr':
## 
##     and, equals, or

Then set your working directory:

# Set working directory
# Replace the path below with your file path
setwd("~/R-GIS-Tutorials/")

Downloading our datasets from OpenStreetMap

Once we’ve loaded our libraries and set our directory, the first step in our tutorial is to download our OpenStreetMap data to create the network dataset for our analysis.

To do this, we’ll use osmdata library and the add_osm_feature function. You can find out more about this and how to construct your queries at the following tutorial: https://cran.r-project.org/web/packages/osmdata/vignettes/osmdata.html.

To use the function, we need to provided it with either a bounding box of our area of interest (AOI) or a set of points, from which the function will create its own bounding box.

We’ll use the former approach, and a very useful tool for extracting your bounding box can be found at:https://boundingbox.klokantech.com . You simply navigate to your AOI and then use the rectangle + arrow button to access a tool that will draw you a bounding box you can then edit. Alternatively, you can use the pentagon button to create your own polygon. At the bottom of the webpage you’ll see options to copy and paste your box. Choose CSV, ready to copy and paste your coordinates into the code below.

Downloading and extracting our road network dataset

To download our road network dataset, we first define a variable to store our bounding box coordinates, p_bbox. We then use this within our osm query to extract specific types of highways within that bounding box - the results of our query are then stored in an osmdata object (this one is for sf). You can find out more info about the osmdata object in Section 3 of the osmdata vignette tutorial linked above.

# Define our bbox coordinates, here our coordinates relate to Portsmouth
p_bbox <- c(-1.113197,50.775781,-1.026508,50.859941)

# Pass our bounding box coordinates into the OverPassQuery (opq) function
osmdata <- opq(bbox = p_bbox ) %>%
  # Pipe this into the add_osm_feature data query function to extract our highways
  # Note here, we specify the values we are interested in, omitting motorways
  add_osm_feature(key = 'highway', value = c('primary', 'secondary', 'tertiary', 'residential','path','footway', 'unclassified','living_street', 'pedestrian')) %>% 
  # And then pipe this into our osmdata_sf object
  osmdata_sf()

You should now see an osmdata variable appear in your environment window - as explained in the linked tutorial, the osmdata object contains the bounding box of your query, a time-stamp of the query, and then the spatial data as ‘osm_points’, ‘osm_lines’, ‘osm_multilines’ and ‘osm_polgyons’ (which are listed with their respective fields also detailed). Some of the spatial features maybe empty, depending on what you asked your query to return.

What is important to know is that the actual spatial data contained in an osmdata object can be extracted - and will be in the sf format, when using the osmdata_sf() function (as we did) or in the sp format if you use the osmdata_sp() function instead.

Our next step therefore is to extract our spatial data from our osmdata object to create our road network dataset. This is in fact incredibly easy, using the traditional $ R approach to access these spatial features from our object.

Deciding what to extract is probably the more complicated aspect of this - mainly as you need to understand how to represent your road network, and this will usually be determined by the library/functions you’ll be using it within. Lucikly, I’ve done all the pre-reading for you and we want to pass in preference what is known as edges of the network, i.e. the lines that represent the roads, rather than the nodes of the network, i.e. the points that represent the locations at which the roads intersect. The latter can be used by the dodgr library, but edges are used in preference due to the unintended data errors than can occur if you delete nodes, versus deleting edges from a network. I won’t explain this in any further detail, but in preference, choose your edges!

Despite this, here we will extract both the points/nodes and the lines/our road edges within our network - as we might want to use the former for visualisation later on in our analysis. During extraction, we’ll also reduce the amount of fields the spatial data contains. For our points, we’ll just keep the osm_id, just in case we need to refer to this later. For our lines, we’ll keep a little more information that we might want to use either within our road network or analysis, including the type of highway, what the maximum speed is on the road, and whether the road is one-way or not. Remember, OpenStreetMap is an open-source of spatial data, therefore these fields may be not complete for each road, and the accuracy and currency of these fields cannot be guaranteed.

Extract the data as follows:

# Extract our spatial data into variables of their own

# Extract the points, with their osm_id.
ports_roads_nodes <- osmdata$osm_points[, "osm_id"]

# Extract the lines, with their osm_id, name, type of highway, max speed and oneway attributes 
ports_roads_edges <- osmdata$osm_lines[ , c("osm_id", "name", "highway","maxspeed", "oneway")]

We should now have two additional variables in our Environment, ready to create our road network dataset - or in network terms, our graph.

To check our dataset, we can quickly plot the edges of our road network using the plot() function:

plot(ports_roads_edges)

This looks like Portsmouth to me! And our main plot for the dataset (osm_id) looks pretty complete. Our other plots are also interesting to look at, including where there are one way streets in Portsmouth - as well as the predictable similarities between the highway and maxspeed variables.

Downloading and extracting our point datasets for analysis

Before we construct our graph, we need to also create our ORIGIN and DESTINATION points, i.e. the datasets we wish to calculate the distances between. As we will use the dodgr_dists function to calculate these distances, according to the dodgr documentation, these points need to be in either a vector or matrix format, containing the two coordinates for each point.

For our Portsmouth scenario, we are interested in calculating the shortest distances between schools and fast food outlets, therefore we need to download these datasets ready for our use - and we’ll use OpenStreetMap to do this. (If you are using this tutorial to help guide your own analysis and already have your OD datasets ready, you can skip this step!).

Following a similar structure to our query above, we’ll use our knowledge of Openstreetmap keys and values to extract the points of interest (POIs) we interested in:

Note, we do not need to restate the bounding box coordinates as this is still stored as a variable in our session’s memory.

# Download our schools from OSM
schools <- opq(bbox = p_bbox) %>%
  add_osm_feature(key = 'amenity', value = 'school') %>% 
  osmdata_sf()

# And our fast food outlets (could add convenience stores - separate query)

ff_outlets <- opq(bbox = p_bbox) %>%
  add_osm_feature(key = 'amenity', value = 'fast_food') %>%
  osmdata_sf()

We also need to follow a similar extraction of our two datasets from the osmdata object as we did for our road dataset:

# Extract our school points
ports_schools <- schools$osm_points[ , c("osm_id", "name")]

# Extract our fast food outlet points
ports_ff <- ff_outlets$osm_points[ , c("osm_id", "name")]

We now have our road network data and our OD points - we’re ready to construct our network graph and run our network analysis!

Running our network analysis

With any network analysis, the main data structure is a graph, constructed by our nodes and edges. To create a graph for use within dodgr, we pass our ports_roads_edges into the weight_streetnet function. The dodgr library also contains weighting profiles, that you can customise, for use within your network analysis. These weighting profiles contain weights based on the type of highway, determined by the type of transportation the profile aims to model. Here we will use the weighting profile foot, as we’re looking to model walking accessibility.

Let’s create our graph:

# Create network graph using are edge data, with the foot weighting profile
graph <- weight_streetnet(ports_roads_edges, wt_profile = "foot")

Once we have our graph, we can then use this to calculate our network distances between our OD points. Here we use the dodgr_distances function, which you can find out more about in the its documentation.

In this function, we first pass our graph, then our origin points (schools), in the from argument, and then our destination points (fast food outlets), in the to argument. There are several other arguments the function takes, which, again, you can read about in the documentation.

One thing to note is our addition of the st_coordinates function as we pass our two point datasets within the from and to functions. In their current format, our point data is as an sf data frame, which the function cannot pass - we need to instead provide it with a vector or matrix. We can achieve this simply by using the st_coordinates function, which retrieves the coordinates of any (spatial) data frame in matrix form.

We can now calculate our distances:

# Calculate distances between schools to fast food stores
sch_to_ff_cal <- dodgr_distances(graph, from=st_coordinates(ports_schools), to= st_coordinates(ports_ff), shortest = TRUE, pairwise = FALSE, quiet=FALSE)

## Calculating shortest paths ... done.

For our dataset, the query runs very quickly - a total of 876 x 294 calculations in a few seconds.

Let’s check our output:

head(sch_to_ff_cal, 1)

##    1  2  3        4        5        6        7        8        9       10
## 1 NA NA NA 9709.562 11262.46 2208.698 11943.54 2320.115 692.3284 1671.141
##         11       12       13 14 15       16 17 18 19 20       21       22
## 1 1674.336 1691.717 1671.141 NA NA 2061.186 NA NA NA NA 3360.103 1697.704
##         23 24       25       26       27       28       29       30       31
## 1 580.8907 NA 3622.593 1357.423 3342.665 340.6474 1101.115 11155.47 2975.787
##         32       33       34       35       36       37       38       39
## 1 2747.139 3529.495 2343.549 2830.942 2948.673 2948.673 735.5805 3422.806
##         40       41       42       43       44 45       46       47       48
## 1 6227.275 3417.211 9532.238 9833.969 9942.848 NA 3546.562 2688.493 7210.982
##         49     50       51 52 53 54 55       56       57       58       59
## 1 9339.646 9764.1 9851.139 NA NA NA NA 762.1435 2541.033 1009.477 1009.477
##         60       61       62       63       64       65       66       67
## 1 1925.115 1365.276 2975.787 1244.659 2152.941 354.1211 688.6854 2190.683
##         68      69       70       71       72       73       74       75
## 1 2233.061 2147.06 4840.919 2747.139 843.9919 614.8746 788.2592 2693.716
##         76      77 78 79 80 81 82 83       84       85       86       87
## 1 2693.716 728.304 NA NA NA NA NA NA 11943.54 11943.54 11943.54 11943.54
##         88       89       90       91       92       93       94       95 96 97
## 1 11943.54 11943.54 11943.54 11782.79 6804.289 6851.086 6851.086 6326.165 NA NA
##         98       99      100      101      102      103      104      105
## 1 4504.263 5966.234 4420.597 580.8907 597.1904 2681.864 5153.359 5321.988
##        106      107      108      109      110      111     112      113
## 1 5092.194 924.6992 5092.194 4057.267 752.9266 556.4247 713.106 622.4875
##        114      115      116      117      118      119      120      121
## 1 597.1904 1194.845 3304.624 3203.167 5474.037 6893.461 6893.461 6893.461
##        122      123      124      125      126      127      128      129
## 1 6929.389 6893.461 6893.461 6893.461 6883.751 6929.389 6929.389 6883.751
##        130      131      132      133      134      135      136 137 138
## 1 6929.389 6929.389 6929.389 6883.751 6929.389 6929.389 10113.29  NA  NA
##        139      140      141      142      143      144      145 146      147
## 1 5508.324 9692.951 9692.951 9644.571 9018.487 9644.571 9692.951  NA 12124.33
##       148      149      150      151      152     153      154      155
## 1 12131.6 12123.91 10045.19 3598.953 814.6745 5713.23 904.4865 904.4865
##        156      157      158      159      160      161 162      163      164
## 1 904.4865 4976.261 4995.176 857.6564 3237.491 3409.297  NA 5154.341 1829.055
##        165      166      167      168      169      170     171      172
## 1 1663.962 10060.59 10120.46 4697.202 3198.588 3487.517 3146.88 6207.778
##        173      174      175      176      177      178      179      180
## 1 9733.056 10120.46 5264.898 4295.464 9961.269 6457.816 6393.738 9266.235
##        181      182      183      184      185      186      187      188
## 1 6883.925 6316.309 11130.95 1463.405 5021.929 11586.28 1573.278 5463.288
##        189      190      191      192      193      194      195      196
## 1 1802.281 3015.228 2959.247 9463.623 4751.554 4995.176 4057.217 13254.05
##        197      198      199      200      201      202      203      204
## 1 5483.216 3203.167 3203.167 556.4247 788.2592 10472.26 5135.303 9733.056
##        205      206      207 208      209 210      211      212      213
## 1 9733.056 9733.056 9745.451  NA 9745.451  NA 9733.056 9733.056 6006.144
##        214      215      216      217      218      219     220      221
## 1 5611.084 4727.932 4052.797 4897.011 5148.395 1543.357 1510.18 1528.613
##        222      223      224      225      226      227      228      229
## 1 4822.007 4956.435 4519.939 3546.562 6051.524 5457.228 5321.988 5321.988
##        230      231      232      233      234      235      236     237
## 1 5321.988 5321.988 5321.988 5321.988 5321.988 5021.929 6277.979 5409.59
##        238      239      240      241      242      243      244      245
## 1 4751.554 6075.769 6893.461 4938.736 11656.55 11681.13 5092.194 5092.194
##        246      247      248      249      250      251      252      253 254
## 1 5092.194 5092.194 5135.303 5165.044 5135.303 5165.044 5165.044 5159.308  NA
##   255 256 257 258 259 260     261      262      263      264      265      266
## 1  NA  NA  NA  NA  NA  NA 4905.45 4995.176 4786.382 4385.305 1075.949 1248.722
##        267      268 269      270      271      272      273      274 275 276
## 1 1248.722 1267.172  NA 4714.308 1330.853 1121.067 1121.067 5312.597  NA  NA
##        277 278      279      280      281      282      283      284      285
## 1 3030.409  NA 5789.909 3410.498 3188.345 11681.13 2948.673 1674.582 2683.729
##        286      287      288      289      290      291      292      293 294
## 1 5885.667 5873.697 5873.697 5885.667 5885.667 4964.517 1074.861 9947.722  NA

Our output shows the calculations for the first school - and the distances between the school and every fast food outlet. We do have several NAs - this is likely because our network and nodes do not connect with each other and so warrants further investigation. We would want to check the reasons for this and see how we can fix it.

We won’t do this today, but it does highlight that all analysis is fallible to errors and mistakes.

The next step of processing all depends on what you’re trying to assess - here we want to understand which schools have a higher exposure or accessibility to fast food outlets compared to others, quantified by how many outlets are within walking distance of specific distances.

We will therefore look to count how many outlets are with X walking distance from each school and store this as a new column within our ports_school data frame.

To do this, we’ll use the count_row_if function from the expss library, which will count the number of observations for each row that meet a certain condition. Here we use the lt function, which means less than (there’s also gt, which means greater than). We simply need to add in the distance we’re looking to use - and store the output of all of this as a new variable, which will be a new column (remember this is denoted through $) within our data frame.

We can repeat this for all three distances, simply exchanging the values of our distance and renaming our variable/fieldname. We can also automate this by using a for loop, but with only three distnances, we’ll stick with our copy and paste method.

# Add column to ports_schools, counting number of ff within 400m walking distance from them
ports_schools$ff_within_400m <- count_row_if(lt(401),sch_to_ff_cal)

# Add column to ports_schools, counting number of ff within 800m walking distance from them
ports_schools$ff_within_800m <- count_row_if(lt(801),sch_to_ff_cal)

# Add column to ports_schools, counting number of ff within 1000m walking distance from them
ports_schools$ff_within_1km <- count_row_if(lt(1001),sch_to_ff_cal)

We can then look at our outputs quickly again using the plot function:

plot(ports_schools)

Just from this simple plot, we can see across our distances some clear geographical patterns in exposure and accessibility. Areas with greater access/exposure to fast food outlets (denoted by the yellow/pink colours) appear to be within the city centre and in the south, whereas those schools in the north have less exposure. If we head back to the interactive map at the start of this practical, you will be able to see that these two areas correlate quite well with the more commercial areas within Portsmouth, the high street and an area known as Gunwharf Keys.

We can quantify this by looking to many of the spatial analysis techniques you’ve used over the module. For example, using interpolation to create a raster dataset. Alternatively, we can aggregate our datasets to the Lower Super Output Area (LSOA) and use this to check for spatial autocorrelation.