This function compares input data containing ZIP Codes with a crosswalk file that will append ZCTAs. This is an important step because not all ZIP Codes have the same five digits as their enclosing ZCTA.
Usage
zi_crosswalk(.data, input_var, zip_source = "UDS", source_var,
source_result, year = NULL, qtr = NULL, target = NULL, query = NULL,
by = NULL, return_max = NULL, key = NULL, return = "id")Arguments
- .data
An "input object" that is data.frame or tibble that contains ZIP Codes to be crosswalked.
- input_var
The column in the input data that contains five-digit ZIP Codes. If the input is numeric, it will be transformed to character data and leading zeros will be added.
- zip_source
Required character scalar or data frame; specifies the source of ZIP Code crosswalk data. This can be one of either
"UDS"(default) or"HUD", or a data frame containing a custom dictionary.- source_var
Character scalar, required when
zip_sourceis a data frame containing a custom dictionary; specifies the column name in the dictionary object that contains ZIP Codes.- source_result
Character scalar, required when
zip_sourceis a data frame containing a custom dictionary; specifies the column name in the dictionary object that contains ZCTAs, GEOIDs, or other values.- year
Optional four-digit numeric scalar for year; varies based on source. For
"UDS", years 2009 through 2023 are available. For"HUD", years 2010 through 2024 are available. Does not need to be specified when a custom dictionary is used.- qtr
Numeric scalar, required when
zip_codeis"HUD". Integer value between 1 and 4, representing the quarter of the year.- target
Character scalar, required when
zip_codeis"HUD". Can be one of"TRACT","COUNTY","CBSA","CBSADIV","CD", and"COUNTYSUB".- query
Scalar or vector, required when
zip_codeis"HUD". This can be a five-digit numeric or character ZIP Code, a vector of ZIP Codes, a two-letter character state abbreviation, or"all".- by
Character scalar, required when
zip_codeis"HUD"; the column name to use for identifying the best match for a given ZIP Code. This could be either"residential","commercial", or"total".- return_max
Logical scalar, required when
zip_codeis"HUD"; ifTRUE(default), only the geography with the highest proportion of the ZIP Code type will be returned. If the ZIP Code straddles two states, two records will be returned. IfFALSE, all records for the ZIP Code will be returned. Where a tie exists (i.e. two geographies each contain half of all addresses), the county with the lowestGEOIDvalue will be returned.- key
Optional when
zip_codeis"HUD". This should be a character string containing your HUD API key. Alternatively, it can be stored in your.RProfileashud_key.- return
Character scalar, specifies the type of output to return. Can be one of
"id"(default), which appends only the crosswalked value, or"all", which returns the entire crosswalk file appended to the source data.
Value
A tibble with crosswalk values (or optionally, the full
crosswalk file) appended based on the return argument.
Examples
# create sample data
df <- data.frame(id = c(1:3), zip5 = c("63005", "63139", "63636"))
# UDS crosswalk
# \donttest{
zi_crosswalk(df, input_var = zip5, zip_source = "UDS", year = 2022)
#> # A tibble: 3 × 3
#> zip5 id source_zcta
#> <chr> <int> <chr>
#> 1 63005 1 63005
#> 2 63139 2 63139
#> 3 63636 3 63636
# }
# HUD crosswalk
# you will need to replace INSERT_HUD_KEY with your own key
if (FALSE) { # \dontrun{
zi_crosswalk(df, input_var = zip5, zip_source = "HUD", year = 2023,
qtr = 1, target = "COUNTY", query = "MO", by = "residential",
return_max = TRUE, key = INSERT_HUD_KEY)
} # }
# custom dictionary
## load sample crosswalk data to simulate custom dictionary
mo_xwalk <- zi_mo_hud
# prep crosswalk
# when a ZIP Code crosses county boundaries, the portion with the largest
# number of residential addresses will be returned
mo_xwalk <- zi_prep_hud(mo_xwalk, by = "residential", return_max = TRUE)
#>
|
| | 0%
|
|== | 3%
|
|===== | 7%
|
|========== | 14%
|
|============== | 20%
|
|================= | 25%
|
|==================== | 29%
|
|========================= | 35%
|
|============================= | 42%
|
|=============================== | 44%
|
|================================== | 49%
|
|======================================= | 55%
|
|=========================================== | 62%
|
|============================================= | 64%
|
|================================================ | 69%
|
|====================================================== | 77%
|
|========================================================= | 82%
|
|=============================================================== | 90%
|
|==================================================================== | 97%
|
|======================================================================| 100%
## crosswalk
zi_crosswalk(df, input_var = zip5, zip_source = mo_xwalk, source_var = zip5,
source_result = geoid)
#> # A tibble: 3 × 3
#> zip5 id source_geoid
#> <chr> <int> <chr>
#> 1 63005 1 29189
#> 2 63139 2 29510
#> 3 63636 3 29093