This function compares input data containing ZIP Codes with a crosswalk file that will append ZCTAs. This is an important step because not all ZIP Codes have the same five digits as their enclosing ZCTA.
Usage
zi_crosswalk(
.data,
input_var,
zip_source = "UDS",
source_var,
source_result,
year = NULL,
qtr = NULL,
target = NULL,
query = NULL,
by = NULL,
return_max = NULL,
key = NULL,
return = "id",
input_zip,
dict = NULL
)Arguments
- .data
An "input object" that is data.frame or tibble that contains ZIP Codes to be crosswalked.
- input_var
The column in the input data that contains five-digit ZIP Codes, specified as a bare (unquoted) column name. Input must be character data with proper leading zeros; use
zi_repairto fix numeric inputs first.- zip_source
Required character scalar or data frame; specifies the source of ZIP Code crosswalk data. This can be one of either
"UDS"(default) or"HUD", or a data frame containing a custom dictionary.- source_var
Character scalar, required when
zip_sourceis a data frame containing a custom dictionary; specifies the column name in the dictionary object that contains ZIP Codes.- source_result
Character scalar, required when
zip_sourceis a data frame containing a custom dictionary; specifies the column name in the dictionary object that contains ZCTAs, GEOIDs, or other values.- year
Optional four-digit numeric scalar for year; varies based on source. For
"UDS", years 2009 through 2022 are available. For"HUD", years 2010 through 2024 are available. Does not need to be specified when a custom dictionary is used.- qtr
Numeric scalar, required when
zip_codeis"HUD". Integer value between 1 and 4, representing the quarter of the year.- target
Character scalar, required when
zip_codeis"HUD". Can be one of"TRACT","COUNTY","CBSA","CBSADIV","CD", and"COUNTYSUB".- query
Scalar or vector, required when
zip_codeis"HUD". This can be a five-digit numeric or character ZIP Code, a vector of ZIP Codes, a two-letter character state abbreviation, or"all".- by
Character scalar, required when
zip_codeis"HUD"; the column name to use for identifying the best match for a given ZIP Code. This could be either"residential","commercial", or"total".- return_max
Logical scalar, required when
zip_codeis"HUD"; ifTRUE(default), only the geography with the highest proportion of the ZIP Code type will be returned. If the ZIP Code straddles two states, two records will be returned. IfFALSE, all records for the ZIP Code will be returned. Where a tie exists (i.e. two geographies each contain half of all addresses), the county with the lowestGEOIDvalue will be returned.- key
Optional when
zip_codeis"HUD". This should be a character string containing your HUD API key. Alternatively, it can be stored in your.RProfileashud_key.- return
Character scalar, specifies the type of output to return. Can be one of
"id"(default), which appends only the crosswalked value, or"all", which returns the entire crosswalk file appended to the source data.- input_zip
[Deprecated] Use
input_varinstead. Will be removed in early 2027.- dict
[Deprecated] Use
zip_sourceandyearinstead. Will be removed in early 2027.
Value
A tibble with crosswalk values (or optionally, the full
crosswalk file) appended based on the return argument.
Examples
# create sample data
df <- data.frame(id = c(1:3), zip5 = c("63005", "63139", "63636"))
if (FALSE) { # interactive()
# UDS crosswalk
zi_crosswalk(df, input_var = zip5, zip_source = "UDS", year = 2022)
}
if (FALSE) { # nzchar(Sys.getenv("hud_key"))
# HUD crosswalk
zi_crosswalk(df, input_var = zip5, zip_source = "HUD", year = 2023,
qtr = 1, target = "COUNTY", query = "MO", by = "residential",
return_max = TRUE, key = Sys.getenv("hud_key"))
}
# custom dictionary
## load sample crosswalk data to simulate custom dictionary
mo_xwalk <- zi_mo_hud
# prep crosswalk
# when a ZIP Code crosses county boundaries, the portion with the largest
# number of residential addresses will be returned
mo_xwalk <- zi_prep_hud(mo_xwalk, by = "residential", return_max = TRUE)
## crosswalk
zi_crosswalk(df, input_var = zip5, zip_source = mo_xwalk, source_var = zip5,
source_result = geoid)
#> # A tibble: 3 × 3
#> id zip5 source_geoid
#> <int> <chr> <chr>
#> 1 1 63005 29189
#> 2 2 63139 29510
#> 3 3 63636 29093