Calculate Deprivation Measures — dep_get

Downloads raw data and then calculates various measures of deprivation and/or vulnerability, including a range of options for structuring output. The included measures include four versions of the CDC's social vulnerability index, which is a unique offering, along with wrappers that bring in additional measures from related packages: the area deprivation index (ADI; via sociome), gini coefficient (via tidycensus), and the neighborhood deprivation index (NDI; via ndi). Both ADI and NDI contain variations as well. See Details for more information.

dep_get_index(geography, index, year, survey = "acs5",
    return_percentiles = FALSE, keep_subscales = FALSE,
    keep_components = FALSE, output = "wide",
    state = NULL, county = NULL, puerto_rico = FALSE, zcta = NULL,
    zcta_geo_method = NULL, zcta_cb = FALSE, zcta3_method = NULL,
    shift_geo = FALSE, key = NULL)

Arguments

geography

A character scalar; one of "county", "zcta3", "zcta5", or "tract"

index

A character scalar or vector listing deprivation measures to return. These include the area deprivation index ("adi"), the gini coefficient ("gini"), two versions of the neighborhood deprivation index by Messer ("ndi_m") and Powell and Wiley ("ndi_pw"), and four versions of the social vulnerability index ("svi10", "svi14", "svi20", and "svi20s"). See Details.

year

A numeric scalar or vector. 2010 is earliest year deprivateR supports, while 2022 is the most recent year.

survey

A character scalar representing the Census product. It can be any American Community Survey product (either "acs1", "acs3", or "acs5"). Note that "acs3" was discontinued after 2013.

return_percentiles

A logical scalar; if TRUE, scales (and their subscales) will be returned as percentiles instead of in raw scores. If FALSE (default), raw scores will be returned. Note that SVI is returned as a percentile regardless of what return_percentiles is set to.

keep_subscales

A logical scalar; if FALSE (default), only the full ADI and/or SVI scores (depending on what is passed to the index argument) will be returned. If TRUE and "svi" is listed for the index argument, the four SVI "themes" (see Details) will be returned along with the full SVI score. Similarly, if "adi" is listed for the index argument, the three ADI subscales (see Details) will be returned.

keep_components

A logical scalar; if FALSE (default), none of the components used to calculate the deprivation measures will be returned. If TRUE, all of the demographic variables used to calculate ADI and/or SVI will be returned.

output

A character scalar; if "wide" (default), a tibble will be returned with row per jurisdiction where individual measures of deprivation stored in columns. If "tidy", a tibble will be returned with one row for each combination of jurisdiction and deprivation measure. If "sf", a "wide" data set will be returned with geometric data appeneded to facilitate mapping and/or spatial statistics.

state

A character scalar or vector with character state abbreviations (e.x. "MO") or numeric FIPS codes (e.x. 29).

county

A character scalar or vector with character GEOIDs (e.x. "29510")

puerto_rico

A logical scalar; if TRUE (default), data for Puerto Rico will be included in calculations. If FALSE, Puerto Rico will not be included.

zcta

An optional vector of ZCTAs that demographic data are requested for. If this is NULL and geography = "zcta5", data will be returned for all ZCTAs. If a vector is supplied and geography = "zcta5", only data for those requested ZCTAs will be returned. The vector can be created with zippeR::zi_get_geometry() and should only contain five-digit ZCTAs.

zcta_geo_method

A character scalar; if geography = "zcta5" or geography = "zcta3", either "intersect" or "centroid", should be supplied. These two options alter how ZCTA overlap with states or counties is defined. See zippeR::zi_get_geometry() for more information.

zcta_cb

A logical scalar; if FALSE, the most detailed TIGER/Line data will be used for style = "zcta5". If TRUE, a generalized (1:500k) version of the data will be used. The generalized data will download significantly faster, though they show less detail. According to the tigris::zctas() documentation, the download size if TRUE is ~65MB while it is ~500MB if cb = FALSE.

This argument does not apply to geography = "zcta3", which only returns generalized data. It only applies if output = "sf".

zcta3_method

A character scalar; if geography = "zcta3", a method for aggregating spatially intensive values should be given; either "mean" or "median". In either case, a weighted approach is used where total population for each five-digit ZCTA is used to calculate individual ZCTAs' weights. For American Community Survey Data, this is applied to the margin of error as well.

shift_geo

A logical scalar; if TRUE, Alaska, Hawaii, and Puerto Rico will be re-positioned so that the lie to the southwest of the continental United States. This defaults to FALSE, and can only be used when states are not listed for the state argument. It only applies if output = "sf".

key

A Census API key, which can be obtained at https://api.census.gov/data/key_signup.html. This can be omitted if tidycensus::census_api_key() has been used to write your key to your .Renviron file. You can check whether an API key has been written to .Renviron by using Sys.getenv("CENSUS_API_KEY").

Value

A tibble with the requested deprivation measures. The number of columns and rows depends upon the input arguments. If output = "wide", the number of columns will be equal to the number of deprivation measures requested plus the number of columns needed to store the geographic information. Each unique combination of jurisdiction and year will receive its own row.

If output = "tidy", the number of columns will be equal to the number of deprivation measures requested plus the number of columns needed to store the geographic information. Each unique combination of jurisdiction and year will receive its own row. Each unique combination of jurisdiction, year, and deprivation measure will receive its own row.

Details

deprivateR provides a unique implementation of the Centers for Disease Control's Social Vulnerability Index at a greater range of years and geographies than the CDC originally supported. Four versions of the SVI are offered:

"svi10": The CDC's 2010 SVI vintage did not include a measure of civilians with a disability, unlike their later vintages. This version can be calculated using deprivateR for each year from 2010 through 2021.
"svi14": The CDC's 2014, 2016, and 2018 vintages added the measure of civilians with a disability to their SVI calculations. The disability measure was added to the American Community Survey beginning in 2012, so this version can be calculated using deprivateR for each year from 2012 through 2021.
"svi20": The CDC's 2020 vintage made multiple substantive changes to how SVI is calculated that changed the underlying data used for the first three of the four themes. In the SES theme: (1) per capita income was replaced with a measure of housing burden; (2) poverty was converted to 150 insurance. The Household Composition & Disability (HCD) theme was renamed Household Characteristics (HOU), and the English language proficiency measure was moved here from the former Minority Status and Language (MSL) theme. Since the English language measure was removed from MSL theme, it was renamed Racial & Ethnic Minority Status (REM). Though the CDC released this definition with their 2020 data, the underlying data can be accessed from the American Community Survey from 2012 onward. This means that this version can be calculated using deprivateR for each year from 2012 through 2021.
"svi20s": The CDC's 2020 vintage changed the variables used to calculate the number of single-parent households. Their new approach does not have the backward compatibility that the other changes made in 2020 do. This version of SVI uses the same underlying data for single-parent households that the CDC's 2020 vintage does, along with the other changes made in 2020. This version can be calculated using deprivateR for each year from 2012 through 2019.

In addition, wrappers to the sociome, ndi, and tidycensus package create a single point of departure for comparative work using multiple measures of deprivation or inequality.

Examples

# \donttest{
  # calculate ADI for all US counties
  dep_get_index(geography = "county", index = "adi", year = 2022)
#> Warning: • You have not set a Census API key. Users without a key are limited to 500
#> queries per day and may experience performance limitations.
#> ℹ For best results, get a Census API key at
#> http://api.census.gov/data/key_signup.html and then supply the key to the
#> `census_api_key()` function to use it throughout your tidycensus session.
#> This warning is displayed once per session.
#> Warning: 
#> The variables C24010_039 and C24010_040 are both present.
#> C24010_039 will be used for "civilian females age 16+ in
#> white-collar occupations", which is incorrect for pre-2010 data.
#> If seeking pre-2010 estimates, remove C24010_039 from dataset.
#> 
#> Single imputation performed
#> # A tibble: 3,144 × 3
#>    GEOID NAME                       ADI
#>    <chr> <chr>                    <dbl>
#>  1 01001 Autauga County, Alabama   88.5
#>  2 01003 Baldwin County, Alabama   84.1
#>  3 01005 Barbour County, Alabama  137. 
#>  4 01007 Bibb County, Alabama     124. 
#>  5 01009 Blount County, Alabama   109. 
#>  6 01011 Bullock County, Alabama  147. 
#>  7 01013 Butler County, Alabama   122. 
#>  8 01015 Calhoun County, Alabama  115. 
#>  9 01017 Chambers County, Alabama 117. 
#> 10 01019 Cherokee County, Alabama 107. 
#> # ℹ 3,134 more rows

  # calculate two forms of SVI for all Missouri ZCTAs
  dep_get_index(geography = "zcta5", index = c("svi20", "svi20s"), year = 2022,
    state = "MO")
#> # A tibble: 33,642 × 3
#>    GEOID SVI_20 SVI_20S
#>    <chr>  <dbl>   <dbl>
#>  1 01001 0.771   0.766 
#>  2 01002 0.676   0.691 
#>  3 01003 0.206   0.211 
#>  4 01005 0.215   0.202 
#>  5 01007 0.362   0.379 
#>  6 01008 0.0468  0.0501
#>  7 01009 0.143   0.154 
#>  8 01010 0.240   0.202 
#>  9 01011 0.361   0.376 
#> 10 01012 0.222   0.147 
#> # ℹ 33,632 more rows

  # calculate ADI and two forms of NDI for all US counties over three years
  # percentiles are returned to ease comparison
  dep_get_index(geography = "county", index = c("adi", "svi14"),
    year = c(2018:2020), return_percentiles = TRUE)
#> 
#> Single imputation performed
#> 
#> Single imputation performed
#> 
#> Single imputation performed
#> # A tibble: 9,427 × 5
#>    GEOID NAME                     YEAR   SVI   ADI
#>    <chr> <chr>                   <int> <dbl> <dbl>
#>  1 01001 Autauga County, Alabama  2018  44.2  28.3
#>  2 01001 Autauga County, Alabama  2019  38.8  31.6
#>  3 01001 Autauga County, Alabama  2020  47.5  35.1
#>  4 01003 Baldwin County, Alabama  2018  22.7  16.6
#>  5 01003 Baldwin County, Alabama  2019  24.1  16.3
#>  6 01003 Baldwin County, Alabama  2020  22.9  15.5
#>  7 01005 Barbour County, Alabama  2018  99.6  96.0
#>  8 01005 Barbour County, Alabama  2019  99.7  97.2
#>  9 01005 Barbour County, Alabama  2020  99.5  97.0
#> 10 01007 Bibb County, Alabama     2018  59.5  65.9
#> # ℹ 9,417 more rows
# }