This function compares input data containing ZIP Codes with a crosswalk file that will append ZCTAs. This is an important step because not all ZIP Codes have the same five digits as their enclosing ZCTA.
Usage
zi_crosswalk(.data, input_var, zip_source = "UDS", source_var,
source_result, year = NULL, qtr = NULL, target = NULL, query = NULL,
by = NULL, return_max = NULL, key = NULL, return = "id")
Arguments
- .data
An "input object" that is data.frame or tibble that contains ZIP Codes to be crosswalked.
- input_var
The column in the input data that contains five-digit ZIP Codes. If the input is numeric, it will be transformed to character data and leading zeros will be added.
- zip_source
Required character scalar or data frame; specifies the source of ZIP Code crosswalk data. This can be one of either
"UDS"
(default) or"HUD"
, or a data frame containing a custom dictionary.- source_var
Character scalar, required when
zip_source
is a data frame containing a custom dictionary; specifies the column name in the dictionary object that contains ZIP Codes.- source_result
Character scalar, required when
zip_source
is a data frame containing a custom dictionary; specifies the column name in the dictionary object that contains ZCTAs, GEOIDs, or other values.- year
Optional four-digit numeric scalar for year; varies based on source. For
"UDS"
, years 2009 through 2023 are available. For"HUD"
, years 2010 through 2024 are available. Does not need to be specified when a custom dictionary is used.- qtr
Numeric scalar, required when
zip_code
is"HUD"
. Integer value between 1 and 4, representing the quarter of the year.- target
Character scalar, required when
zip_code
is"HUD"
. Can be one of"TRACT"
,"COUNTY"
,"CBSA"
,"CBSADIV"
,"CD"
, and"COUNTYSUB"
.- query
Scalar or vector, required when
zip_code
is"HUD"
. This can be a five-digit numeric or character ZIP Code, a vector of ZIP Codes, a two-letter character state abbreviation, or"all"
.- by
Character scalar, required when
zip_code
is"HUD"
; the column name to use for identifying the best match for a given ZIP Code. This could be either"residential"
,"commercial"
, or"total"
.- return_max
Logical scalar, required when
zip_code
is"HUD"
; ifTRUE
(default), only the geography with the highest proportion of the ZIP Code type will be returned. If the ZIP Code straddles two states, two records will be returned. IfFALSE
, all records for the ZIP Code will be returned. Where a tie exists (i.e. two geographies each contain half of all addresses), the county with the lowestGEOID
value will be returned.- key
Optional when
zip_code
is"HUD"
. This should be a character string containing your HUD API key. Alternatively, it can be stored in your.RProfile
ashud_key
.- return
Character scalar, specifies the type of output to return. Can be one of
"id"
(default), which appends only the crosswalked value, or"all"
, which returns the entire crosswalk file appended to the source data.
Value
A tibble
with crosswalk values (or optionally, the full
crosswalk file) appended based on the return
argument.
Examples
# create sample data
df <- data.frame(id = c(1:3), zip5 = c("63005", "63139", "63636"))
# UDS crosswalk
# \donttest{
zi_crosswalk(df, input_var = zip5, zip_source = "UDS", year = 2022)
#> # A tibble: 3 × 3
#> zip5 id source_zcta
#> <chr> <int> <chr>
#> 1 63005 1 63005
#> 2 63139 2 63139
#> 3 63636 3 63636
# }
# HUD crosswalk
# you will need to replace INSERT_HUD_KEY with your own key
if (FALSE) { # \dontrun{
zi_crosswalk(df, input_var = zip5, zip_source = "HUD", year = 2023,
qtr = 1, target = "COUNTY", query = "MO", by = "residential",
return_max = TRUE, key = INSERT_HUD_KEY)
} # }
# custom dictionary
## load sample crosswalk data to simulate custom dictionary
mo_xwalk <- zi_mo_hud
# prep crosswalk
# when a ZIP Code crosses county boundaries, the portion with the largest
# number of residential addresses will be returned
mo_xwalk <- zi_prep_hud(mo_xwalk, by = "residential", return_max = TRUE)
#>
|
| | 0%
|
|====== | 9%
|
|============ | 18%
|
|=================== | 26%
|
|=============================== | 44%
|
|===================================== | 53%
|
|================================================= | 71%
|
|======================================================================| 100%
## crosswalk
zi_crosswalk(df, input_var = zip5, zip_source = mo_xwalk, source_var = zip5,
source_result = geoid)
#> # A tibble: 3 × 3
#> zip5 id source_geoid
#> <chr> <int> <chr>
#> 1 63005 1 29189
#> 2 63139 2 29510
#> 3 63636 3 29093