R4SpatialAnalysis

Author

Guixing Wei, Ph.D., Senior Reserach Scientist,
Spatial Structures in the Social Sciences (S4),
Brown University
Email: guixing_wei@brown.edu

Intro

CRAN Task Views

CRAN Task View: Analysis of Spatial Data

CRAN Task View: Handling and Analyzing Spatio-Temporal Data

Packages

General: Dealing with Spatial Data

Map-Making: Visualizing Spatial Data

Spatial Analysis: Analyzing Spatial Data

Data Structures & Formats

Data Structures

  • Vector Data: objects are represented by points, lines, and polygons. E.g., state boundary data

    1

  • Raster Data: continuous varying phenomena are modeled by a grid of cells. E.g.,, satellite imagery data

    2

Data/File Formats:

  • Vector:

    • Shapefile: a GIS data format developed by Environmental Systems Research Institute (ESRI). NOTE: Shapefile data is NOT a single file.

      File Extension Information
      .dbf non-spatial information, i.e, attribute table
      .prj Coordinate reference system information, e..g, units of the data, whether data is projected
      .shp geometry information. i.e., coordiantes
      .shx spatial index information
      optional, metadata
    • GeoJSON: an extension of JSON format, encoding geographic data structures.

      3

  • Raster

Tutorial: Exploring the Relationship between Obesity and Outdoor Recreation Facilities at the sub-county level

Data

All the data sets used for this tutorial can be found here or here.

  • Point locations depicting ballfields, playgrounds, basketball & tennis courts, beaches, swimming pools, etc in RI (Source: RIGIS)

  • RI Municipalities (towns and cities) boundary data 2020 (Source: Census Bureau) Note: choose county subdivision when you download the data from Census Bureau website.

  • RI Census Tract Obesity Rate Data (Source: CDC). This is a point data set. Each point represents the centroid of a census tract. Census tract is a smaller geographic unit than county sub-division. [Have trouble understanding the census geographic units? Read this diagram. ]

Processing

Install Packages

please install “sf”, “tidyverse” and “tmap” packages.

  • Menu ->Tools->Install Packages.

  • install.packages(c(“sf”, “tidyverse”, “tmap”))

Load Packages

library(sf)
Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.0     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.1     ✔ tibble    3.1.8
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tmap)

Read Data

path.towns <- "./Data/tl_2020_44_cousub/tl_2020_44_cousub.shp"
path.rec <- "./Data/Rec_Facilities_RI.geojson"
path.ob <- "./Data/Obesity_RI_2020.geojson"

towns <- read_sf(path.towns)#read as a tibble object
recs <- read_sf(path.rec)
obs <- st_read(path.ob)#read as a df object
Reading layer `Obesity_RI_2020' from data source 
  `C:\Users\gwei3\Dropbox (Brown)\0Brown\Work\Workshops_Trainings\2023 Spaital Analysis with R\Data\Obesity_RI_2020.geojson' 
  using driver `GeoJSON'
Simple feature collection with 240 features and 22 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -71.82613 ymin: 41.1764 xmax: -71.16283 ymax: 42.01409
Geodetic CRS:  WGS 84

Check Coordinate Reference System (CRS)

Two types of CRS:

  • Geographic Coordinate System (GCS): Latitude, Longitude; Units: Degree

  • Projected Coordinate System (PCS): X, Y; Units: meter/feet

st_crs(towns)
Coordinate Reference System:
  User input: NAD83 
  wkt:
GEOGCRS["NAD83",
    DATUM["North American Datum 1983",
        ELLIPSOID["GRS 1980",6378137,298.257222101,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4269]]
st_crs(recs)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
st_crs(obs)
Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
st_is_longlat(towns)
[1] TRUE
st_is_longlat(recs)
[1] TRUE
st_is_longlat(obs)
[1] TRUE

Project the Data (“EPSG:6568”)

EPSG: 6568 is a state plane coordinate system for RI, unit: Feet.

towns.prj <- st_transform(towns, crs = "EPSG:6568")
recs.prj <- st_transform(recs, crs = 6568)
obs.prj <- st_transform(obs, crs = st_crs(6568))
st_is_longlat(towns.prj)
[1] FALSE
st_is_longlat(recs.prj)
[1] FALSE
st_is_longlat(obs.prj)
[1] FALSE
st_crs(towns.prj)
Coordinate Reference System:
  User input: EPSG:6568 
  wkt:
PROJCRS["NAD83(2011) / Rhode Island (ftUS)",
    BASEGEOGCRS["NAD83(2011)",
        DATUM["NAD83 (National Spatial Reference System 2011)",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",6318]],
    CONVERSION["SPCS83 Rhode Island zone (US Survey feet)",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",41.0833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-71.5,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.99999375,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",328083.3333,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["US survey foot",0.304800609601219]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["US survey foot",0.304800609601219]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["United States (USA) - Rhode Island - counties of Bristol; Kent; Newport; Providence; Washington."],
        BBOX[41.13,-71.85,42.02,-71.08]],
    ID["EPSG",6568]]

Plotting

plot(towns.prj)
Warning: plotting the first 10 out of 18 attributes; use max.plot = 18 to plot
all

plot(towns$geometry)

plot(towns.prj$AWATER)

plot(towns.prj['AWATER'])

# visualize with tmap 
tm_shape(towns.prj)+
  tm_polygons()+
  tm_shape(recs.prj)+
  tm_dots()

#use tmap as a GIS system
tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(towns.prj)+
  tm_polygons()+
  tm_shape(recs.prj)+
  tm_dots()

Calculate how many recreation facilities are within each town

recs.in.town <- st_intersects(towns.prj, recs.prj)
recs.in.town
Sparse geometry binary predicate list of length 40, where the predicate
was `intersects'
first 10 elements:
 1: 89, 96, 100, 101, 102, 199, 397, 399, 404, 407, ...
 2: 121, 197, 252, 253, 434, 444, 450, 451, 452, 453, ...
 3: 105, 106, 176, 177, 178, 230, 412, 417, 418, 426, ...
 4: 123, 124, 229, 466, 478, 480, 482, 502, 510, 764, ...
 5: 152, 241, 521, 525, 526, 528, 539, 544, 545, 547, ...
 6: (empty)
 7: 168, 170, 239, 254, 584, 586, 587, 594, 595, 599, ...
 8: 126, 250, 467, 469, 477, 479, 484, 485, 487, 489, ...
 9: 133, 144, 155, 523, 524, 532, 538, 549, 555, 560, ...
 10: 161, 162, 164, 166, 180, 181, 212, 213, 537, 558, ...
#get the length of each list item
num.recs.inTown <- lengths(recs.in.town)
num.recs.inTown
 [1]  21  25  14  20  18   0  23  28  14  29  12  23  10  12   7  21  19  33  62
[20]  11  11  18  39  12  37  23  10  20  43  23  38 121  18  14  12  43  17  16
[39]  11  40
#add the list as a new column back to towns.prj
towns.prj <- towns.prj %>% mutate(num.recs = num.recs.inTown)
# make a thematic map using the num.recs column
tm_shape(towns.prj)+
  tm_polygons("num.recs")+
  tm_text("NAME")

Calculate the population-weighted average obesity rate at the sub-county level

We have the obesity rate data organized at the census tract level. The goal of this step is to spatially “aggregate” the rates by county subdivision.

\[ Rate_{pop}=\frac{\sum{pop_i*rate_i}}{\sum{pop_i}} \]

Example:

\[ Rate_{pop} = \frac{pop_1*rate_1+pop_2*rate_2+pop_3*rate_3}{pop_1+pop_2+pop_3} \]

#look at the obs.prj layer
class(obs.prj)
[1] "sf"         "data.frame"
view(obs.prj)
colnames(obs.prj)
 [1] "measure"                    "low_confidence_limit"      
 [3] "data_value_unit"            "data_value"                
 [5] "short_question_text"        "statedesc"                 
 [7] "locationid"                 "countyname"                
 [9] "year"                       "high_confidence_limit"     
[11] "categoryid"                 "stateabbr"                 
[13] "data_value_footnote"        "data_value_type"           
[15] "data_value_footnote_symbol" "locationname"              
[17] "category"                   "datavaluetypeid"           
[19] "measureid"                  "countyfips"                
[21] "datasource"                 "totalpopulation"           
[23] "geometry"                  
#select only the colums that will be used in this step and rename them
rates.prj <- obs.prj %>% transmute(rate = as.numeric(data_value),
                                 pop = as.numeric(totalpopulation),
                                 geometry = geometry,
                                 pop_times_rate = rate*pop)
# Spatial aggregation
rates.wby.pop <- aggregate(rates.prj[c("pop_times_rate", "pop")], by = towns.prj, FUN= sum)
rates <- rates.wby.pop %>% 
  mutate(rate = pop_times_rate/pop)
  
#add the rates vector back to the town.prj
towns.prj <- towns.prj %>%
  mutate(rate = rates$rate, pop = rates$pop)
#make a thematic map using the weighted rate column
tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(towns.prj) +
  tm_polygons("rate") +
  tm_text("NAME")

Calculate Facility Rate/Density

\[ Rate_{rec} = \frac{num_{rec}}{pop} \]

towns.prj <- towns.prj%>%
  mutate(rate_rec = num.recs/pop)

Calculate the correlation between facility rate and obesity rate

cor(towns.prj$rate_rec, towns.prj$rate)#because there is an NA
[1] NA
#let's remove New Shoreham town: 4400950500; County Subdivision not defined: 4400500000

data <- towns.prj %>%
  filter(!GEOID %in% c("4400950500", "4400500000"))

cor(data$rate, data$rate_rec)
[1] -0.5945802

Conclusion

The higher the recreation facility rate, the lower the obesity rate at the town level in RI

Q&A

Footnotes

  1. Image Source: https://datacarpentry.org/organization-geospatial/02-intro-vector-data/↩︎

  2. Image Source: https://r.geocompx.org/spatial-class.html↩︎

  3. Image Source: https://geojson.org/↩︎