Skip to contents

This function takes input ZCTA data and aggregates it to three-digit areas, which are considerably larger. These regions are sometimes used in American health care contexts for publishing geographic identifiers.

Usage

zi_aggregate(.data, year, extensive = NULL, intensive = NULL,
    intensive_method = "mean", survey, output = "tidy", zcta = NULL,
    key = NULL)

Arguments

.data

A tidy set of demographic data containing one or more variables that should be aggregated to three-digit ZCTAs. This data frame or tibble should contain all five-digit ZCTAs within the three digit ZCTAs that you plan to use for aggregating data. See Details below for formatting requirements.

year

A four-digit numeric scalar for year. zippeR currently supports data for from 2010 to 2022. Different survey products are available for different years. See the survey parameter for more details.

extensive

A character scalar or vector listing all extensive (i.e. count data) variables you wish to aggregate. These will be summed. For American Community Survey data, the margin of error will be calculated by taking the square root of the summed, squared margins of error for each five-digit ZCTA within a given three-digit ZCTA.

intensive

A character scalar or vector listing all intensive (i.e. ratio, percent, or median data) variables you wish to aggregate. These will be combined using the approach listed for intensive_method.

intensive_method

A character scalar; either "mean" (default) or "median". In either case, a weighted approach is used where total population for each five-digit ZCTA is used to calculate individual ZCTAs' weights. For American Community Survey Data, this is applied to the margin of error as well.

survey

A character scalar representing the Census product. It can be either a Decennial Census product (either "sf1" or "sf3") or an American Community Survey product (either "acs1", "acs3", or "acs5"). For Decennial Census calls, only the 2010 Census is available. In addition, if a variable cannot be found in "sf1", the function will look in "sf3". Also note that "acs3" was discontinued after 2013.

output

A character scalar; one of "tidy" (long output) or "wide" depending on the type of data format you want. If you are planning to join these data with geometric data, "wide" is the strongly encouraged format.

zcta

An optional vector of ZCTAs that demographic data are requested for. If this is NULL, data will be returned for all ZCTAs. If a vector is supplied, only data for those requested ZCTAs will be returned. The vector can be created with zi_get_geometry(). If style = "zcta5", this vector should be made up of five-digit GEOID values. If style = "zcta3", this vector should be made up of three-digital ZCTA3 values.

key

A Census API key, which can be obtained at https://api.census.gov/data/key_signup.html. This can be omitted if tidycensus::census_api_key() has been used to write your key to your .Renviron file. You can check whether an API key has been written to .Renviron by using Sys.getenv("CENSUS_API_KEY").

Value

A tibble containing all aggregated data requested in either "tidy" or "wide" format.

Examples

# load sample demographic data
mo22_demos <- zi_mo_pop

  # the above data can be replicated with the following code:
  # zi_get_demographics(year = 2022, variables = c("B01003_001", "B19013_001"),
  #   survey = "acs5")

# load sample geometric data
mo22_zcta3 <- zi_mo_zcta3

  # the above data can be replicated with the following code:
  # zi_get_geometry(year = 2022, style = "zcta3", state = "MO",
  #   method = "intersect")

# aggregate a single variable
zi_aggregate(mo22_demos, year = 2020, extensive = "B01003_001", survey = "acs5",
  zcta = mo22_zcta3$ZCTA3)
#> # A tibble: 62 × 4
#> # Groups:   ZCTA3 [31]
#>    ZCTA3 variable   estimate     moe
#>    <chr> <chr>         <dbl>   <dbl>
#>  1 501   B01003_001   190606   1928.
#>  2 501   B19013_001  4927943 224587.
#>  3 516   B01003_001    22779    649.
#>  4 516   B19013_001  1326659 128553.
#>  5 525   B01003_001   109992   1607.
#>  6 525   B19013_001  2965635 153947.
#>  7 526   B01003_001   101629   1496.
#>  8 526   B19013_001  2187815 116391.
#>  9 620   B01003_001   296331   2842.
#> 10 620   B19013_001  4623927 163752.
#> # ℹ 52 more rows

# \donttest{
# aggregate multiple variables, outputting wide data
zi_aggregate(mo22_demos, year = 2020,
  extensive = "B01003_001", intensive = "B19013_001", survey = "acs5",
  zcta = mo22_zcta3$ZCTA3, output = "wide")
#> Warning:  You have not set a Census API key. Users without a key are limited to 500
#> queries per day and may experience performance limitations.
#>  For best results, get a Census API key at
#> http://api.census.gov/data/key_signup.html and then supply the key to the
#> `census_api_key()` function to use it throughout your tidycensus session.
#> This warning is displayed once per session.
#> # A tibble: 31 × 5
#> # Groups:   ZCTA3 [31]
#>    ZCTA3 B01003_001E B01003_001M B19013_001E B19013_001M
#>    <chr>       <dbl>       <dbl>       <dbl>       <dbl>
#>  1 501        190606       1928.      74666.      19684.
#>  2 516         22779        649.      69824.      21896.
#>  3 525        109992       1607.      63099.      17528.
#>  4 526        101629       1496.      70575.      16179.
#>  5 620        296331       2842.      66056.      16494.
#>  6 630        736536       7216.      84735.      12280.
#>  7 631        901642       8313.      69807.       9031 
#>  8 633        536266       5361.      78343.      16652 
#>  9 634         67129       1583.      60690.      21671.
#> 10 635         60410       1185.      54968.      17621.
#> # ℹ 21 more rows
# }