Learning objectives

This tutorial takes you through a simple approach to measuring greenspace access for schools in London, using geometric operations as the main methods for processing and analysing your data. You will construct a buffer dataset around our greenspace and determine whether nearby schools intersect with this buffer. We will first visualise our data as points to see if we can identify areas of high versus low access - and then aggregate the data to the ward level for potential further use within analysis with statistical data, such as census information.

Our analysis case study

Recent research (Bijnens et al, 2020) has shown that children brought up in proximity to greenspace have a higher IQ and fewer behavioral problems, irrespective of socio-economic background.

In our analysis today, we will look to understand whether there are geographical patterns to schools that have high versus low access of greenspace and where a lack of greenspace needs to be addressed in London.

Below, we can see where schools are located in London and get a general understanding of their proximity to large greenspace just through a simple navigation of the map. Our following practical allows us to quantify these visual patterns we may observe.

Let’s get started!

Greenspace analysis in R-Studio

The following practical will replicate the steps seen in the Q-GIS demonstration to map the access of schools in London to greenspace, according to our chosen parameters.

Setting up R-Studio

We will use the Project approach as outlined in your second week of CASA0005 when using R-Studio - this enables us to keep our code and data working together seamlessly within the same folder.

To get started, make sure you have a Principles of Spatial Analysis (PSA) folder and within this a week_3 (or wk3 per CASA0005 notation, whatever you’d prefer) folder/directory to hold all of the work and data for this week. 0. If you haven’t done so already, create a week_3 folder within your PSA folder of your computer folder/files storage.

Within this folder, we will create a new Project within R-Studio:

  1. To create a new project select File > New Project

  2. Start a new project in your week_3 folder.

Now we have our project created, we can create a new data folder within our project and then copy over our data into this folder.

  1. Within your new R Project, create a new folder called data.

Next, if you haven’t already, download this week’s practical data zipfile from Moodle. This will be the last week, we’ll provide the data ready bundled for you! Once downloaded, we want to move this zipfile from your Downloads and into this data folder. You can do this either in your computer OS’s normal file management tool, e.g. finder in Mac OS, or you can do this using R-Studio within the Files window.

  1. Move the zipfile within the data folder of your Project.

Next, we want to unzip the file to gain access to the data within our R Project.

  1. Click on the zipfile and it should unzip automatically into a raw data folder within the R project. You’ll then see each of the datasets are within their own named folder.We created this folder set up prior to zipping the file to make this next steps as seamless as possible - but in the future, don’t forget to create your own raw folder when adding data to your folder.

  2. Move your original zipfolder into the raw folder to store it there as an archive, just in case you make a permanent mistake with any of your files. To do this in R-Studio, check the box next to the zip, click on More and then select Move and place it within your raw folder.

And we’re done! - let’s get ready to process our data!

Setting up your script - script, libraries and working directory

To enable the efficient, repeatable and reproducible functionality of our work, we will use R-Studio’s ability to create and host code as a script. Before we do anything therefore, we will need to create a new R script:

  1. File > New File > R Script

This will set up a basic scripting environment, as you have already used within CASA0005. There are many other types of scripting options, including creating an R-Markdown document as this one, which you’ll get to work with in the coming weeks in CASA0005 - but for now, we’re focusing on how to write R code and structure our scripts appropriately to analyse our datasets. We can add on the interactivity of these other options at a latter stage!

Let’s go ahead and save our script now, so we know it’s stored in our system - and in the future, we only need to remind ourselves to complete a quick save (e.g. CMD/CTRL + S).

  1. File > Save As > greenspace_access.rmd .

Right, now we’re ready to start writing our processing script!

The first two tasks you will do everytime you create a new script is to first, point your computer to your working directory (so it knows where all your data is) and second, (pre-emptively) load many of the libraries you think you’ll be using in your analysis. Luckily, we can now use the latter to also do the former - saving us a little bit of time in our set-up and a lot of time in our script-writing later on - by using the here library introduced to you in CASA0005.

As a recap, the here library creates a simple command that simply points to a file path - which when using our Project system, will point automatically at this project - it’ll be one of the first libraries you’ll load into your script. You should already have the package installed from last week, so the only thing you need to do at the moment is at the top of your script write:

# Load libraries, ready for use
library(here)
## here() starts at /Users/Jo/R-GIS-Tutorials

If you do not have the package installed, simply type into the console: install.packages("here")

But this library alone will not be enough for our analysis today! At the moment, we know that we’ll need to load some spatial data - therefore we need to load a library capable of handling our datsets. For PSA (and CASA0005), and preferably within your R programming moving forward, we will focus on using the sf or Simple Features library that allows us to load, manipulate, process and export spatial data.


A note about the sf library and spatial data in R

Prior to the sf library, which was introduced to R-Studio in 2017, the sp library was the main spatial library used in R-Studio. As a result, you may see older code or scripts using the sp to handle spatial data. sf has replaced sp as the default spatial library as it works better with the tidyverse way of doing things, see more here and within Chapter 2 of Geocomptuation with R by Lovelace et al (2020) on your reading list (specifically section 2.2.1!).

Ultimately, you can convert between the two library formats (and some other libraries we will use later on in the term still only work with sp) - but it is best practice to try to use the sf ibrary in your code moving forward.


In addition to the sf library, we want to add in the magrittr library that will allow us to use the pipe function (%>%) within our work and enable more efficient programming. You should have come across this pipe function in CASA0005 last week - but to be clear, you need to load the magrittr library in order for this function to work within your code.

We are likely going to need some additional libraries to help further manipulate or visualise our datasets as we move forward with our processing - we’ll add these in now, but explain them in a little more detail as we get to use them in our code. These libraries include: dplyr, units, ggplot2, tmap, and mapview.

As a result, the top of our script should look something like:

# Load libraries, ready for use
library(here)
library(sf)
library(magrittr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(units)
## udunits system database from /Users/Jo/Library/R/3.6/library/units/share/udunits
library(ggplot2)
library(tmap)
library(mapview)
  1. Copy across the load libraries code above and execute this in your R script - your console should confirm the libraries are loaded and also provide you with the data path at which the here command is pointing to!

Once we’ve got our libraries loaded up and ready to run, we can focus on loading up our datasets.

Loading our datasets

From now on, please copy the code in the boxes below into your script (or console when instructed!) and execute each line as you progress through the practical.

All three of our datasets are provided as shapefiles which will make working with the data relatively straight-forward (e.g. even for our point data, the schools, we do not need to convert them from a csv as we often find with this type of data). But As you know from the QGIS demonstration, we’ll need to do quite a few steps of processing to get our final dataset.

We will follow the exact same steps used in the Q-GIS demonstration - but in a slightly different order, to make our script more readable. Usually once we add our libraries, the next step is to load all of our data into variables - it’s really useful if this is done at the top of the script and all in one place, as it makes it easier to a) identify these variables quickly (if you’re new to the script) and b) change these variables if necessary later on.

Let’s go ahead and load our three variables - we will use the sf library st_read command to load our datasets into variables for use within our code:

### Load our datasets

# Load our london school shapefile into a variable called london_schools
london_schools <- st_read(here::here("data/raw/schools", "school_data_london_Atlas_2016.shp"))
## Reading layer `school_data_london_Atlas_2016' from data source `/Users/Jo/R-GIS-Tutorials/data/raw/schools/school_data_london_Atlas_2016.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3889 features and 44 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -63477.1 ymin: 6665527 xmax: 40792.11 ymax: 6751279
## CRS:            3857
# Load our london ward shapefile into a variable called london_wards
london_wards <- st_read(here::here("data/raw/administrative_boundaries", "London_Ward_CityMerged.shp"))
## Reading layer `London_Ward_CityMerged' from data source `/Users/Jo/R-GIS-Tutorials/data/raw/administrative_boundaries/London_Ward_CityMerged.shp' using driver `ESRI Shapefile'
## Simple feature collection with 633 features and 6 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: 503568.2 ymin: 155850.8 xmax: 561957.5 ymax: 200933.9
## CRS:            27700
# Load our two greenspace shapefiles into separate variables
TL_greenspace <- st_read(here::here("data/raw/greenspace", "TL_GreenspaceSite.shp"))
## Reading layer `TL_GreenspaceSite' from data source `/Users/Jo/R-GIS-Tutorials/data/raw/greenspace/TL_GreenspaceSite.shp' using driver `ESRI Shapefile'
## Simple feature collection with 8614 features and 6 fields
## geometry type:  MULTIPOLYGON
## dimension:      XYZ
## bbox:           xmin: 499139.7 ymin: 198946.3 xmax: 601149.4 ymax: 300165
## z_range:        zmin: 0 zmax: 0
## CRS:            27700
TQ_greenspace <- st_read(here::here("data/raw/greenspace", "TQ_GreenspaceSite.shp"))
## Reading layer `TQ_GreenspaceSite' from data source `/Users/Jo/R-GIS-Tutorials/data/raw/greenspace/TQ_GreenspaceSite.shp' using driver `ESRI Shapefile'
## Simple feature collection with 19575 features and 6 fields
## geometry type:  MULTIPOLYGON
## dimension:      XYZ
## bbox:           xmin: 499419.4 ymin: 99770.75 xmax: 600740.5 ymax: 201001.5
## z_range:        zmin: 0 zmax: 0
## CRS:            27700

To see what each variable looks like, you can type in plot(VARIABLE) into the R console. This is a quick command to understand both the spatial coverage and attributes of your data - as it will display the data by each of its attribute fields as a plot.

Data Processing

Now we have our data loaded as variables, we’re ready to start processing! As we know from the QGIS demonstration, each datasets needs a few steps of processing. In spatial data processing, the question always is: where do I start first? And the easiest answer to that is: make sure all of your data is in the same Projected (or Coordinate) Reference System as each other.

Checking - and changing projections - should always be the first step of any workflow as this will ensure you do not carry through any potential mistakes or errors that using the wrong system can cause.

When you loaded your datasets in the above step, you may have notice that in the console additional information about the dataset is printed - this includes the metadata on the dataset’s Coordinate Referene System! As a result, it is quite easy to simply scroll the terminal to check the CRS for each dataset - which as you’ll see, all the datasets bar the school are using 27700, whih is the code for British National Grid, whereas our schools dataset shows 3857, the code for WGS84.

That means we need to start with our london_schools variable - as we know that this is the only dataset currently in the wrong projection (WGS84) instead of using British National Grid.

Reprojecting data in R

To reproject our dataset, we can use a function within the sf library, known as st_transform. It is very simple to use - you only need to provide the function with the dataset and the code for the new CRS you wish to use with the data. For now, we will simply store the result of this transformation as a new variable - but you could in the future, rewrite this code to use pipes to pipe this transformation when loading the dataset (see section 5.7.1. of the CASA pratical workbook for an example).

# Reproject our london schools dataset to BNG and store as a new variable
london_schools_BNG <- st_transform(london_schools, 27700)

We can now double-check our new variable is in the correct CRS by typing the following into the console and checking the result:

# Type this into the console, not your script!
st_crs(london_schools_BNG)
## Coordinate Reference System:
##   User input: EPSG:27700 
##   wkt:
## PROJCS["OSGB 1936 / British National Grid",
##     GEOGCS["OSGB 1936",
##         DATUM["OSGB_1936",
##             SPHEROID["Airy 1830",6377563.396,299.3249646,
##                 AUTHORITY["EPSG","7001"]],
##             TOWGS84[446.448,-125.157,542.06,0.15,0.247,0.842,-20.489],
##             AUTHORITY["EPSG","6277"]],
##         PRIMEM["Greenwich",0,
##             AUTHORITY["EPSG","8901"]],
##         UNIT["degree",0.0174532925199433,
##             AUTHORITY["EPSG","9122"]],
##         AUTHORITY["EPSG","4277"]],
##     PROJECTION["Transverse_Mercator"],
##     PARAMETER["latitude_of_origin",49],
##     PARAMETER["central_meridian",-2],
##     PARAMETER["scale_factor",0.9996012717],
##     PARAMETER["false_easting",400000],
##     PARAMETER["false_northing",-100000],
##     UNIT["metre",1,
##         AUTHORITY["EPSG","9001"]],
##     AXIS["Easting",EAST],
##     AXIS["Northing",NORTH],
##     AUTHORITY["EPSG","27700"]]

As you can see from the output above, our dataset has been reprojected into 27700 or British National Grid!

The next step to process our london_schools_BNG dataset is to reduce the schools to only our chosen London extent. As you’ll remember from the Q-GIS practical and even from looking at the map above, we can see that our schools cover an area larger than our usual London extent. We can even make a quick map of this to check:

# A quick map overlaying our london schoools variable on our london ward variable
tm_shape(london_wards) + tm_polygons() + tm_shape(london_schools_BNG) + tm_dots()

As we can see, we do have schools outside of our London wards - as a result, we want to remove those schools outside of this boundary. We will use the same approach used in Q-GIS - first dissolving our ward file to create a more simplified shapefile for use as a “cookie-cutter”.

Dissolving our London Wards shapefile

Creating a dissolved output with R-Studio is a little different to how we saw this working within QGIS. Unlike in QGIS, there is no specific “dissolve” function per se in sf, compared to the other geometric operations we will use in the remainder of the practical. Instead, we have to use a little bit of data manipulation to create our London Outline - luckily it’s only one line of code!

To dissolve a polygon shapefile using R code, we will use the summarise function that comes from the dplyr library and summarise our London Wards dataset by summing its total area (supplied in the HECTARES attribute field/column) across all records. This will reduce our data frame to a single row, which will only contain one attribute - our total area of London, which we can then map/use as our clip (cookie-cutter) feature!

If this is still a little confusing, do not worry too much - just remember that this code is here and you can come back to it in the future to use in other dissolve operations.

# Pipe our london_wards into the dplyr summarise function and summarise it by its area
# Store this output as a london_outline
london_outline <- london_wards %>% summarise(area = sum(HECTARES))

You can check the resulting feature by typing plot(london_outline) into the console.

Spatially subsetting our schools data to our London outline

Now we have out London outline, we can go ahead and clip our schools dataset by our London outline. Whilst there is a clip function within the sf library, what we will do here is use a techinque known as spatial subsetting, which is more similar to selecting by location: we will subset our london schools dataset by filtering out those that are not within the London Outline.

This approach in R is much quicker than using the clip function - although deciding which approach to use is not only a question of speed but also how each function will affect the filtered data.

When using a clip function, the function acts exactly like a cookie-cutter and will trim off any data that overlaps with the boundaries used.

Conversely, when using a subsetting approach, if a data point or polygon overlaps on the boundary, it will still be included (depending on the topological relationship used) but in its entirety (i.e. no trimming!).

We’ll use the clip approach for our greenspace dataset later on in the practical.

As we’re using point data, it is generally easier to use a subset approach. There are multiple ways to conduct spatial subsetting within R-Studio:

First, we can either use [] just like you would use for selecting and slicing a normal (table-based) dataframe from R’s base package - or we can use the filter function from dplyr within the tidyverse.

Second, sf has its own library of subsetting through geometric operations, including: intersection, difference, symmetrical difference and snap.

To keep things simple, we will use the base subsetting approach - which also works similarly when programming in Python.

# Subset our London schools data by our London outline, for now, store this as a new variable
london_schools_BNG_ss <- london_schools_BNG[london_outline,]

In the future, I would usually just overwrite the current london_schools_BNG variable as I know this is the dataset I want to use in the future - I would do this by simply storing the output of this subset into a variable of the same name (i.e. london_schools_BNG). Much of this code could be condensed into several lines using pipes to make our code shorter and more efficient - but then it would be harder to explain! As you progress with R and programming, you are welcome to bring pipes and restructuring into own your code - but even if you don’t, as long as your code does what you need it to do, then that’s our main aim with this course!

Once you have run the above code, you should notice that your london_schools_BNG_ss variable now only contains 3372 records, instead of the original 3889. We can also plot our variable using the same code as above, to double-check that it worked:

# A quick map overlaying our new london schoools variable on our london ward variable
tm_shape(london_wards) + tm_polygons() + tm_shape(london_schools_BNG_ss) + tm_dots()

We should now see that our schools are all contained within our ward dataset, so we know this dataset is ready to be used for analysis, i.e. determining which schools are within 400m of greenspace or not. But first, we now need to get our greenspace data ready so we can create the 400m buffers needed for this analysis!

Unioning our greenspace datasets

We’ve done a lot of processing so far to do with our schools and ward data, but now it’s time for the greenspace datasets. If you look back at your code, you should remember that we have two datasets for our greenspace in London, which (if you’ve watched the Q-GIS demo) we now need to join together. in GUI-GIS, this type of join is known as a union - and this is the type of tool you would want to look for across any GUI system.

When it comes to programming, however, in either R or python, there is a much simpler way of joining datasets - and that’s simply copying over the records or observations from one variable into another - and the sf library has a ready-to-go function for us to use, known as rbind. This function allows you to ‘bind’ rows from one sf variable to another, including copying the geometry column over:

# Bind our datasets together, forming a unioned greenspace variable
all_greenspace = rbind(TQ_greenspace, TL_greenspace)

Clipping our greenspace dataset*

The next step is to clip our reduced greenspace data to our London outline. Within sf, the clip function is known as the st_intersection function - not to be confused with st_intersects from above! As explained in the QGIS demo, a clip will change the geometry of some of our greenspaces on the outskirts of London, i.e. cookie-cut them precisely to the London outline. If we used the subset approach approach as we did earlier with our point data, we would simply extract all greenspaces that intersect with the London outline - but not change their geometry.

What we can do however if reduce the processing required by our computer by using a mixture of these two methods - if we first subset our all_greenspace dataset by our London outline and then run the clip, our processing will be much faster:

# Clip our london greenspace dataset by our london outline
london_greenspace = all_greenspace[london_outline,] %>% st_intersection(london_outline)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries

Now we have only London greenspaces in our dataset, the next step, if you remember from the QGIS demo, is to reduce the number of greenspaces to only those bigger than a football pitch. To do this, we will use another type of subsetting you’ve probably come across, which is attribute subsetting - by using a simple query to subset only records that have an area larger than 76,900 square feet or 7,140 square metres. To do this, we’ll use the filter function from the dplyr library we mentioned earlier as well as another function called set_units which is from the unit library that you’ve loaded - but we haven’t yet discussed.

Attribute filtering our greenspace dataset

To be able to query on our area, we must first calculate the area of each of our greenspaces - a bit like we did in QGIS. To do so in R, we can use the st_area function within sf, which will calculate the area of each of our records/observations in our greenspace dataset.

To store the output of this function as a new column in our london_greenspace dataset, we use a simple notation at the end of our london_greenspace variable: $area . The $ in R means for this data frame, access the column that proceeds this sign. In our case, we do not as yet have a column called area_m, therefore R will automatically create this column and then store the outputs of the function in this column:

# Calculate area of each greenspace record by using the st_area function,  store in a new column called area_m
london_greenspace$area_m <- st_area(london_greenspace)

Once we have our area column, we can now filter our dataset based on that column. One thing to note here from the QGIS tutorial is the value we will use - for the QGIS tutorial, I got the value wrong due to my own confusion on units - yes, we can make mistakes! - and used the square footage of a football pitch to filter our dataset rather than the square metres, believing QGIS had calculated our area in square foot (don’t ask me why, it is just what has happened!). But to get exactly the same results as the QGIS processing (and to keep our processing relatively simple), we will stick with this value of 76,900 square metres as an ‘adequate size of greenspace’ (irrespective of the football pitch reference - or rather, now we want something the size of 10 football pitches instead!).

Another technical note is that we’re finally using a function from the units library we loaded at the start of pratical - this library and specifically the set_units function allows us to assign units to numerical values we are using within our query, i.e. here, for our query to run, our value must be in square metres to match the unit of the area_m column.

# Filter our london greenspace dataset to only large greenspaces in London (i.e. bigger than 76,900 square metres)
large_london_greenspace <- london_greenspace %>% filter(area_m > set_units(76900.0, m^2))

We now can look at our final greenspace dataset against our london outline to see its final coverage:

# A quick map overlaying our unioned greenspace variable on our london ward variable
tm_shape(london_outline) + tm_polygons() + tm_shape(large_london_greenspace) + tm_polygons()

Great - this looks pretty similar to the dataset we generated in QGIS!

Buffering our greenspace dataset

We now have our London greenspace dataset - we are ready for the last step of processing with this dataset - generating our buffers. Once again, the sf library has a function for generating buffers - we just need to know how to deploy it successfully on our London greenspace dataset - and this involves understanding how to denote our distance correctly - as well as understanding if and how we can dissolve our buffer into a single record.

To do this, we would investigate the documentation of the function st_buffer to find out what additional parameters it takes - and how. What we can find out is that we need to (of course!) provide a distance for our buffer - but whatever figure we supply, this will be interpreted within the units of the CRS we are using.

In our case, we are using British National Grid and, luckily for us, the units of the CRS is metres - which makes are life significantly easier when calculating these buffers. For other CRS, many use a base unit of an Arc Degree, e.g. WGS84.

In this case, you have two options: 1) reproject your data into a CRS that uses metres as its base unit OR 2) convert your distance into an Arc Degree measurement. This type of manipulation applies to QGIS - but here we have a +1 for ArcGIS, where you can proactively set or change the units you’ll use for your buffer!

Fortunately none of this is our concern - we know we can simply input the figure or 400 into our buffer and this will generate a buffer of 400m. The one issue we do face though is that the parameter arguments do not allow us to dissolve our buffers into a single record, as we did in QGIS. But, as we’ve seen above, there is some code we can use to dissolve our buffers into a single feature after generating them.

# Generate a 400m buffer around our greenspaces
gs_buffer_400m <- st_buffer(large_london_greenspace, dist = 400)

You can then go ahead and plot our buffer to see the results, entering plot(gs_buffer_400m) within the console.

As our final bit of processing with our greenspace buffer, we want to dissolve the whole buffer into a single record. To do this, we’ll replicate the code used for our london ward dissolve, creating a an area value for our buffer records in the process to be used within the summarisation - and then result in a new gs_buffer_400m_single variable:

# Pipe our gs_buffer into the dplyr summarise function and summarise it by its area
# Store this output as a new variable, gs_buffer_400m_single
gs_buffer_400m_single <- gs_buffer_400m %>% summarise(area = sum(st_area(gs_buffer_400m)))

You can then go ahead and plot our single buffer to see the results, entering plot(gs_buffer_400m_single) within the console.

Preparing for Data Analysis

Great, we are now ready to bring our two datasets together reaady for anlaysis - and to do so, we’ll use subsetting as well as the st_intersects function, although with this one, we’ll use it in two different ways!

Identify schools with access to greenspace

Our first task is to identify those schools that have access to greenspace - and extract them to create a new variable for use within our final point-in-polygon count (i.e. how many schools within each ward has access to greenspace). As we know, we can subset our london_schools dataset by our greenspace buffer quite easily using the subset approach:

# Subset our London schools data by our London outline and store this as a new variable
london_schools_gs <- london_schools_BNG_ss[gs_buffer_400m_single,]

Our london_schools_gs variable has been subsetted correctly if we end up with 1477 records, instead of the 3372 records we had previously. We can now use this dataset and our previous london_schools_BNG_ss dataset to create counts at the ward level.

But before we do that, we will create our binary attribute of greenspace access within our london_schools_BNG_ss variable to visualise our school ‘points’ as we did in QGIS. To do this, we’ll use the st_intersects function mentioned above and add a new column, gs_access (i.e. greenspace access), which will tell us which schools have access to greenspace or not.

The st_intersects function is really useful as its output is a simple TRUE or FALSE statement - does this record intersect with the greenspace buffer? This result is what will be stored in our new column as a TRUE or FALSE response and what we can use to map our schools and their greenspace access:

# Test whether a school intersects with our greenspace buffer and store this as a new column called gs_access
# We set the `sparse` parameter to `FALSE` to have a value returned for each school, not just those that do intersect
london_schools_BNG_ss$gs_access <-st_intersects(london_schools_BNG_ss, gs_buffer_400m_single, sparse=FALSE)

We could go ahead and recode this to create a 1 or 0, or YES or NO after processing, but for now we’ll leave it as TRUE or FALSE. We can go head and now visualise our schools based on this column, to see if they have access (TRUE) or do not have access (FALSE) to greenspace. To do this, we’ll use the tmap library again:

# Plot our london schools and visualise their greenspace accessibility, setting the palette to reflect greenspace
tm_shape(london_schools_BNG_ss) + tm_dots(col="gs_access", palette="BuGn")

We can also add a little bit more functionality to this map by using some additional basic interactivity tmap offers by changing its (tmap’s) visualising mode from “plot” mode into “view” mode (more on this in your CASA0005 module!) and then (re)call the last plot, like so:

tmap_mode("view")
## tmap mode set to interactive viewing
tmap_last()

As you’ll see, we now have a nice intereactive map to investigate our points further. You can switch back to the original plot mode by changing the tmap_mode back to plot.

###Calculating greenspace access for schools at the ward level

You’ll be pleased to read that we are finally here - we are at the last stage of our processing and can finally create our rate of schools that have greenspace access, versus those that do not!

To do this, we’ll be using the same process we used in QGIS - counting the number of points in each of our polygons, i.e. the number of schools in each ward.

To do so in R and with sf, it is one line of code - which at first look does not sound at all like it is completing a point-in-polygon calculation - but it does!

To create a PIP count within sf, we use the st_intersects function again - but instead of using the output of TRUE or FALSE, what we actually extract from our function is its lengths recording. The lengths part of this function records how many times a join feature (i.e. our schools) intersects with our main features (i.e. our wards). (Note here, we do not set the sparse function to FALSE but leave it as TRUE/its default by not entering the parameter). As a result, the length of this list is equal to the count of how many schools are within the polygon - i.e. a PIP calculation.

This is a really simple way of doing a PIP calculation - and makes it easy for us to store the output of the function and its lengths (and thus the count) directly as a column within our london_wards dataset, as so:

# PIP for total number of schools using st_intersects function
london_wards$total_schools <- lengths(st_intersects(london_wards, london_schools_BNG_ss))

# PIP for number of schools with access to gs using st_intersects function
london_wards$gs_schools <- lengths(st_intersects(london_wards, london_schools_gs))

As you can seee from the code above, we’ve now calculated this for our total schools dataset and the schools that have access to greenspace. The final step in our processing therefore is to create our rate. To do so, we’ll use the same approach of generating a new column within our london_wards dataset - and then use a mathematical formula to calculate our rates, just as we saw in QGIS:

# Calculate the rate of schools per ward with access to greenspace
# times by 100 to get a percentage
london_wards$gs_rate <- (london_wards$gs_schools/london_wards$total_schools)*100

And that’s it! We now have our greenspace rate for our wards, which we can now again map:

tmap_mode("plot")
## tmap mode set to plotting
tm_shape(london_wards) + tm_polygons(col="gs_rate", palette="Greens")

And again, we can look at this interactively as well by entering the same code as above:

tmap_mode("view")
tmap_last()

We now have our final dataset ready for analysis. Right now, we haven’t introduced you to any statistical or spatial analysis technqiues to fully analyse our dataset - but instead, we can focus on what are data shows visually!

In preparation for this week’s seminar, have a look at the data and write down one or two clear patterns that you can see within the data in terms of access to greenspace for schools within London.

You could also think about how this might impact children in the area, referring back to our original research context.

Exporting our data

The last step of any programming is to extract our variables into permanent datasets for use at a later time. You can at any point in this practical, extract a permanent data file for each of our variables. For now, we’ll extract our new london_wards dataset as we might want to use this in some additional analysis that we could look at next week or for our assessments at a later stage. The great thing about coding this up now, is that it will be easy to re-run all of this analysis and export any of the variables, again, at a later time!

st_write(obj = london_wards, dsn = "data/london_ward_gs.shp", delete_dsn = TRUE)
## Warning in abbreviate_shapefile_names(obj): Field names abbreviated for ESRI
## Shapefile driver
## Deleting source `data/london_ward_gs.shp' using driver `ESRI Shapefile'
## Writing layer `london_ward_gs' to data source `data/london_ward_gs.shp' using driver `ESRI Shapefile'
## Writing 633 features with 9 fields and geometry type Polygon.

You should now see the dataset appear in your files - whereever you have exported it to!

Now you’ve watched Jo complete the analysis on Q-GIS, and then used these instructions to repeat it in R-Studio, which one do you prefer? Or do you think they have different advantages and limitations?