Getting Started with deprivateR

Overview

deprivateR provides a unified framework for calculating measures of area-level deprivation in the United States. These measures are commonly used in social determinants of health research to quantify neighborhood disadvantage.

The package supports the following indices:

Area Deprivation Index (ADI) ("adi") - a factor-based measure of socioeconomic deprivation (via sociome)
Gini Coefficient ("gini") - a measure of income inequality (via tidycensus)
Neighborhood Deprivation Index, Messer ("ndi_m") - a factor-based deprivation measure (via ndi)
Neighborhood Deprivation Index, Powell-Wiley ("ndi_pw") - an alternative NDI formulation (via ndi)
Social Vulnerability Index (SVI) ("svi10", "svi14", "svi20", "svi20s") - the CDC’s composite vulnerability measure, with four methodology variants

Data can be retrieved at the county, census tract, ZCTA5, or ZCTA3 level for years 2010 through 2022.

Setup

Installation

The easiest way to install deprivateR is from CRAN:

install.packages("deprivateR")

Alternatively, you can install deprivateR from GitHub:

# install.packages("remotes")
remotes::install_github("pfizer-opensource/deprivateR")

Census API Key

To download data from the Census Bureau, you need a free API key. You can request one at https://api.census.gov/data/key_signup.html.

Once you have your key, store it for use with tidycensus:

tidycensus::census_api_key("YOUR_KEY_HERE", install = TRUE)

This saves the key to your .Renviron file so it is available across sessions.

Quick Start with Sample Data

The package includes sample data so you can explore functionality without an API key. The sample data contains 2022 ACS 5-year estimates for all 115 counties in Missouri.

library(deprivateR)

Load and Calculate an Index

# load sample data for the Messer NDI
ndi_data <- dep_sample_data(index = "ndi_m")

# calculate the index
ndi_results <- dep_calc_index(
  ndi_data,
  geography = "county",
  index = "ndi_m",
  year = 2022,
  return_percentiles = TRUE
)
#> Warning: The proportion of variance explained by PC1 is less than 0.50.

# view the results
ndi_results[, c("GEOID", "NAME", "NDI_M")]
#> # A tibble: 115 × 3
#>    GEOID NAME                       NDI_M
#>    <chr> <chr>                      <dbl>
#>  1 29001 Adair County, Missouri     63.2 
#>  2 29003 Andrew County, Missouri     6.14
#>  3 29005 Atchison County, Missouri  24.6 
#>  4 29007 Audrain County, Missouri   58.8 
#>  5 29009 Barry County, Missouri     59.6 
#>  6 29011 Barton County, Missouri    92.1 
#>  7 29013 Bates County, Missouri     83.3 
#>  8 29015 Benton County, Missouri    68.4 
#>  9 29017 Bollinger County, Missouri 78.9 
#> 10 29019 Boone County, Missouri     14.0 
#> # ℹ 105 more rows

The NDI_M column contains the calculated Neighborhood Deprivation Index scores. Higher values indicate greater deprivation.

Quantiles for Analysis

To use deprivation scores as categorical variables in statistical models, you can split them into quantiles:

# split NDI into quartiles
ndi_results <- dep_quantiles(
  ndi_results,
  source_var = NDI_M,
  new_var = ndi_quartile,
  n = 4L,
  return = "label"
)

# view the distribution
table(ndi_results$ndi_quartile)
#> 
#>  (1) Lowest Quartile  (2) Second Quartile   (3) Third Quartile 
#>                   29                   29                   28 
#> (4) Highest Quartile 
#>                   29

Map Breaks for Visualization

To create choropleth maps, use dep_map_breaks() to calculate appropriate classification breaks:

# calculate Fisher-Jenks breaks with 5 classes
ndi_results <- dep_map_breaks(
  ndi_results,
  var = "NDI_M",
  new_var = "map_class",
  classes = 5,
  style = "fisher"
)

# view the break labels
levels(ndi_results$map_class)
#> [1] "0.00 - 19.74"   "19.75 - 39.91"  "39.92 - 60.09"  "60.10 - 80.26" 
#> [5] "80.27 - 100.00"

You can also specify manual breaks:

# define custom break points
my_breaks <- c(
  min(ndi_results$NDI_M, na.rm = TRUE),
  25, 50, 75,
  max(ndi_results$NDI_M, na.rm = TRUE)
)

# apply manual breaks
ndi_results <- dep_map_breaks(
  ndi_results,
  var = "NDI_M",
  new_var = "map_class_manual",
  breaks = my_breaks
)

levels(ndi_results$map_class_manual)
#> [1] "0.00 - 25.00"   "25.01 - 50.00"  "50.01 - 75.00"  "75.01 - 100.00"

Downloading Data with dep_get_index()

When you have a Census API key configured, dep_get_index() handles the full workflow of downloading raw data and computing indices in one step:

# download and calculate SVI for Missouri tracts
mo_svi <- dep_get_index(

  geography = "tract",
  index = "svi20",
  year = 2020,
  state = "MO"
)

Multiple Indices at Once

You can request multiple indices in a single call:

# calculate ADI and Gini together for Missouri counties
mo_multi <- dep_get_index(
  geography = "county",
  index = c("adi", "gini"),
  year = 2022,
  state = "MO"
)

Spatial Output for Mapping

Set output = "sf" to get results as an sf object with geometry attached, ready for mapping with ggplot2 or leaflet:

# get SVI with geometry for mapping
mo_svi_sf <- dep_get_index(
  geography = "tract",
  index = "svi20",
  year = 2020,
  state = "MO",
  output = "sf"
)

# plot with ggplot2
library(ggplot2)
ggplot(mo_svi_sf) +
  geom_sf(aes(fill = SVI20), color = NA) +
  scale_fill_viridis_c(direction = -1) +
  theme_void() +
  labs(title = "Social Vulnerability Index, Missouri Tracts (2020)")

Subscales and Components

For deeper analysis, you can retain subscales and the underlying component variables:

# keep SVI theme subscales and all component variables
mo_detailed <- dep_get_index(
  geography = "county",
  index = "svi20",
  year = 2020,
  state = "MO",
  keep_subscales = TRUE,
  keep_components = TRUE
)

Two-Step Workflow

For more control, you can separate data retrieval from calculation. This is useful when you want to inspect or modify the raw data before computing scores:

# step 1: build the variable list and download data
library(tidycensus)

vars <- dep_build_varlist(
  geography = "county",
  index = "ndi_m",
  year = 2022
)

raw_data <- get_acs(
  geography = "county",
  variables = vars,
  year = 2022,
  state = "MO",
  output = "wide"
)

# step 2: calculate the index on your data
results <- dep_calc_index(
  raw_data,
  geography = "county",
  index = "ndi_m",
  year = 2022
)

Summary of Key Functions

Function	Purpose
`dep_get_index()`	Download data and calculate indices (one step)
`dep_calc_index()`	Calculate indices on existing data
`dep_build_varlist()`	Get the Census variable names needed for an index
`dep_sample_data()`	Load bundled sample data (no API key required)
`dep_quantiles()`	Split scores into quantile categories
`dep_percentiles()`	Calculate percentile ranks
`dep_map_breaks()`	Create classification breaks for choropleth maps