Skip to contents

Overview

The adjacency structure is a set of instructions that show the spatial relationship between each region and informs RSTr which counties to pull information from when spatially smoothing. In this vignette, we will cover the requirements for the adjacency object along with walking through the setup of the adjacency information.

Requirements

Adjacency information is input as a list adj where each element adj[[i]] is a vector of indices denoting regions that are neighbors of region i. Below, we will walk through the construction of the adjacency information and address common issues that may arise.

Example: Massachusetts Map Data

For this example, we will continue using Massachusetts data as we did in section 2. Included is an example dataset that contains Massachusetts shapefile information, generated using the tigris package:

library(RSTr)
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE

head(mamap)
#> Simple feature collection with 6 features and 12 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -73.06851 ymin: 41.4811 xmax: -70.52557 ymax: 42.73664
#> Geodetic CRS:  NAD83
#>     STATEFP COUNTYFP COUNTYNS       AFFGEOID GEOID      NAME         NAMELSAD
#> 15       25      017 00606935 0500000US25017 25017 Middlesex Middlesex County
#> 31       25      005 00606929 0500000US25005 25005   Bristol   Bristol County
#> 32       25      015 00606934 0500000US25015 25015 Hampshire Hampshire County
#> 46       25      025 00606939 0500000US25025 25025   Suffolk   Suffolk County
#> 320      25      023 00606938 0500000US25023 25023  Plymouth  Plymouth County
#> 356      25      027 00606940 0500000US25027 25027 Worcester Worcester County
#>     STUSPS    STATE_NAME LSAD      ALAND     AWATER
#> 15      MA Massachusetts   06 2118256715   75315911
#> 31      MA Massachusetts   06 1432553805  357457333
#> 32      MA Massachusetts   06 1365533877   46623894
#> 46      MA Massachusetts   06  150875411  160499488
#> 320     MA Massachusetts   06 1705521260 1126101572
#> 356     MA Massachusetts   06 3912618040  177370899
#>                           geometry
#> 15  MULTIPOLYGON (((-71.89877 4...
#> 31  MULTIPOLYGON (((-70.83595 4...
#> 32  MULTIPOLYGON (((-73.06577 4...
#> 46  MULTIPOLYGON (((-70.93091 4...
#> 320 MULTIPOLYGON (((-70.88335 4...
#> 356 MULTIPOLYGON (((-72.31363 4...

This dataset contains several variables related to Census information on Massachusetts, but for our purposes we are only interested in two variables:

  • GEOID contains the FIPS code for each county. We can use this to associate each county with the counties in our event/population data; and

  • geometry describes how to map each region. We can use this to generate our adjacency information.

To begin, let’s first arrange mamap by GEOID to ensure the counties are in the same order as our maexample data, then generate some adjacency information using the sfdep package:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(sfdep)

ma_shp <- arrange(mamap, GEOID)
ma_adj <- st_contiguity(ma_shp)
#> Warning in spdep::poly2nb(geometry, queen = queen, ...): some observations have no neighbours;
#> if this seems unexpected, try increasing the snap argument.
#> Warning in spdep::poly2nb(geometry, queen = queen, ...): neighbour object has 3 sub-graphs;
#> if this sub-graph count seems unexpected, try increasing the snap argument.

The poly2nb() function takes the information in ma_shp$geometry and looks for any regions with touching boundaries (i.e., queen contiguity). ma_adj is a list that contains the indices of each region that is a neighbor with a given region. Looking at ma_adj gives us some more information:

ma_adj
#> Neighbour list object:
#> Number of regions: 14 
#> Number of nonzero links: 38 
#> Percentage nonzero weights: 19.38776 
#> Average number of links: 2.714286 
#> 2 regions with no links:
#> 4, 10
#> 3 disjoint connected subgraphs

ma_adj tells that there are 14 regions linked in 38 unique ways. However, it also notes that there are two regions with no links. Regions with no links are problematic for RSTr because it has no information with which to spatially smooth, causing the model to crash. Because our data is in this incomplete format, we will have to do some investigation and fixing to get it ready for use.

Let us first look at the dataset to see which counties have no neighbors. ma_adj tells us that regions 4 and 10 are our regions with no links, but we can also use the data generated by poly2nb() to give us the indices of our no-link counties. poly2nb assigns regions without links a 0:

no_neigh <- which(sapply(ma_adj, \(x) all(x == 0)))

Using the sapply() function, we can return a logical vector that checks each element of ma_adj for zeros. Then, with which(), we specify the indices where our logical is true. We can now investigate ma_shp for which counties have no neighbors:

ma_shp[no_neigh, ]
#> Simple feature collection with 2 features and 12 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -70.95137 ymin: 41.23796 xmax: -69.96018 ymax: 41.52178
#> Geodetic CRS:  NAD83
#>    STATEFP COUNTYFP COUNTYNS       AFFGEOID GEOID      NAME         NAMELSAD
#> 4       25      007 00606930 0500000US25007 25007     Dukes     Dukes County
#> 10      25      019 00606936 0500000US25019 25019 Nantucket Nantucket County
#>    STUSPS    STATE_NAME LSAD     ALAND     AWATER
#> 4      MA Massachusetts   06 267292993 1004291414
#> 10     MA Massachusetts   06 119637319  666826424
#>                          geometry
#> 4  MULTIPOLYGON (((-70.8071 41...
#> 10 MULTIPOLYGON (((-70.23405 4...

According to this, our two no-link counties are Dukes and Nantucket (FIPS 25007 and 25019, respectively). Let’s check out a ggplot of our data to further investigate:

library(ggplot2)

ggplot(mamap) +
  geom_sf(aes(fill = NAME)) +
  geom_sf_label(aes(label = NAME))
#> Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
#> give correct results for longitude/latitude data

Here, we’ve generated a map colored by region and labeled with the name of each county. We can see Dukes and Nantucket in the southeast corner, and if you zoom in, you can see that these two counties aren’t touching any other counties. Upon closer inspection, Barnstable County seems like a good contender for a neighbor to both Dukes and Nantucket County, and Dukes and Nantucket County are also close enough to be neighbors to each other. To rectify this, we can manually make some changes to our adjacency information. Let’s first get a feel for which counties are associated with which indices:

county_key <- seq_along(ma_shp$NAME)
names(county_key) <- ma_shp$NAME
county_key
#> Barnstable  Berkshire    Bristol      Dukes      Essex   Franklin    Hampden 
#>          1          2          3          4          5          6          7 
#>  Hampshire  Middlesex  Nantucket    Norfolk   Plymouth    Suffolk  Worcester 
#>          8          9         10         11         12         13         14

We can see that Barnstable has an index of 1, Dukes has an index of 4, and Nantucket has an index of 10. Now, let’s manually add neighbors as necessary to our adjacency structures:

ma_adj[[1]] <- as.integer(c(ma_adj[[1]], 4, 10)) # Add neighbors to Barnstable
ma_adj[[4]] <- as.integer(c(1, 10)) # Replace 0 with neighbors for Dukes
ma_adj[[10]] <- as.integer(c(1, 4)) # Replace 0 with neighbors for Nantucket

Here, we append the neighbors of Barnstable to the pre-existing neighbors of Barnstable, but overwrite the 0’s for Dukes and Nantucket and cast them all as integer vectors. Now if we look at ma_adj, the message about no-link regions is gone.

One last feature we can add to these adjacencies are dimension names. This isn’t necessary, but it can be useful to associate regions in our data object with regions inside of our adjacency structure:

names(ma_adj) <- ma_shp$GEOID
for (i in seq_along(ma_adj)) {
  names(ma_adj[[i]]) <- ma_shp$GEOID[ma_adj[[i]]]
}

The above code assigns each FIPS code to each element of ma_adj, then assigns the indices inside of each element of ma_adj to its associated FIPS code.

Note that if you run a model in RSTr and load in the adjacency information saved in your model’s spatial_data.Rds file, the indices will have changed. This is because ma_adj is only used within Rcpp, which uses a 0-indexing system as opposed to the 1-indexing system used by R. When inputting data into initialize_model(), only use a 1-indexed adjacency structure.

Finally, even though we connected the two island counties to the mainland counties of MA, as long as each region has at least one neighbor, it is usable within RSTr. This means that, theoretically, we could have made just Dukes and Nantucket neighbors of each other, creating two separate “islands” of regions with no related neighbors, but every region on both “islands” still has at least one neighbor.

Closing Thoughts

In this vignette, we used the poly2nb() function inside of the sfdep package to generate our adjacency structure and demonstrated the two types of appropriate adjacency objects: a list of indices or a binary matrix. We then fixed issues with counties without neighbors and applied a naming scheme to the adjacency information consistent with the data object created in section 2. From here, the RSTr model is ready to be run with our data and adjacency information!