Crosswalk ZIP Codes with UDS, HUD, or a Custom Dictionary

This function compares input data containing ZIP Codes with a crosswalk file that will append ZCTAs. This is an important step because not all ZIP Codes have the same five digits as their enclosing ZCTA.

Usage

zi_crosswalk(.data, input_var, zip_source = "UDS", source_var,
    source_result, year = NULL, qtr = NULL, target = NULL, query = NULL,
    by = NULL, return_max = NULL, key = NULL, return = "id")

Arguments

.data: An "input object" that is data.frame or tibble that contains ZIP Codes to be crosswalked.
input_var: The column in the input data that contains five-digit ZIP Codes. If the input is numeric, it will be transformed to character data and leading zeros will be added.
zip_source: Required character scalar or data frame; specifies the source of ZIP Code crosswalk data. This can be one of either "UDS" (default) or "HUD", or a data frame containing a custom dictionary.
source_var: Character scalar, required when zip_source is a data frame containing a custom dictionary; specifies the column name in the dictionary object that contains ZIP Codes.
source_result: Character scalar, required when zip_source is a data frame containing a custom dictionary; specifies the column name in the dictionary object that contains ZCTAs, GEOIDs, or other values.
year: Optional four-digit numeric scalar for year; varies based on source. For "UDS", years 2009 through 2023 are available. For "HUD", years 2010 through 2024 are available. Does not need to be specified when a custom dictionary is used.
qtr: Numeric scalar, required when zip_code is "HUD". Integer value between 1 and 4, representing the quarter of the year.
target: Character scalar, required when zip_code is "HUD". Can be one of "TRACT", "COUNTY", "CBSA", "CBSADIV", "CD", and "COUNTYSUB".
query: Scalar or vector, required when zip_code is "HUD". This can be a five-digit numeric or character ZIP Code, a vector of ZIP Codes, a two-letter character state abbreviation, or "all".
by: Character scalar, required when zip_code is "HUD"; the column name to use for identifying the best match for a given ZIP Code. This could be either "residential", "commercial", or "total".
return_max: Logical scalar, required when zip_code is "HUD"; if TRUE (default), only the geography with the highest proportion of the ZIP Code type will be returned. If the ZIP Code straddles two states, two records will be returned. If FALSE, all records for the ZIP Code will be returned. Where a tie exists (i.e. two geographies each contain half of all addresses), the county with the lowest GEOID value will be returned.
key: Optional when zip_code is "HUD". This should be a character string containing your HUD API key. Alternatively, it can be stored in your .RProfile as hud_key.
return: Character scalar, specifies the type of output to return. Can be one of "id" (default), which appends only the crosswalked value, or "all", which returns the entire crosswalk file appended to the source data.

Value

A tibble with crosswalk values (or optionally, the full crosswalk file) appended based on the return argument.

Examples

# create sample data
df <- data.frame(id = c(1:3), zip5 = c("63005", "63139", "63636"))

# UDS crosswalk
# \donttest{
  zi_crosswalk(df, input_var = zip5, zip_source = "UDS", year = 2022)
#> # A tibble: 3 × 3
#>   zip5     id source_zcta
#>   <chr> <int> <chr>      
#> 1 63005     1 63005      
#> 2 63139     2 63139      
#> 3 63636     3 63636      
# }

# HUD crosswalk
# you will need to replace INSERT_HUD_KEY with your own key
if (FALSE) { # \dontrun{
  zi_crosswalk(df, input_var = zip5, zip_source = "HUD", year = 2023,
    qtr = 1, target = "COUNTY", query = "MO", by = "residential",
    return_max = TRUE, key = INSERT_HUD_KEY)
} # }

# custom dictionary
## load sample crosswalk data to simulate custom dictionary
mo_xwalk <- zi_mo_hud

# prep crosswalk
# when a ZIP Code crosses county boundaries, the portion with the largest
# number of residential addresses will be returned
mo_xwalk <- zi_prep_hud(mo_xwalk, by = "residential", return_max = TRUE)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=========================                                             |  35%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |======================================================================| 100%

## crosswalk
zi_crosswalk(df, input_var = zip5, zip_source = mo_xwalk, source_var = zip5,
  source_result = geoid)
#> # A tibble: 3 × 3
#>   zip5     id source_geoid
#>   <chr> <int> <chr>       
#> 1 63005     1 29189       
#> 2 63139     2 29510       
#> 3 63636     3 29093