Developing Community Indices to Measure Change

Part 1

Gentrification Metrics

Gentrification- physical and demographic changes to a neighborhood that brings in wealthier residents, new businesses, investment and development in the area, is an important topic in social- science. Gentrification also bring concerns related social-justice due to displacement and dislocation of low-income residents from the neighborhood. Traditionally low-income neighborhood across the United States gentrify. Identifying neighborhood changes related to gentrification can help in planning for negative by-products of gentrification and urban development of the area. City can plan for low-income amenities and rent-controlled-housing in recently developed and developing neighborhoods.
To capture the neighborhood changes gentrification metrices are constructed. Gentrification is not caused by a single variable, but it is the result of pool of gentrifiers with cultural preference for urban living, better amenities, disposable income and urban housing. These variables can measure the trends in latent changes such as economic strength, human capital, distress and happiness in the neighborhood among many other factors,allowing the city to intervene before the low-income population are severely affected.
The objective of this paper is to capture initial picture of neighborhood change in the year 2000 using longitudinal tabulated database (LTDB) from census data containing data related to almost 72000 unique census tracts. It contains almost 70 variables describing racial, socio-economic, housing, age, and marital status of the population. The key to observing changes is the conversion of Census tract data into useful ‘features’ or variables that help predict gentrification. In theory, neighborhood change suggests that low-priced neighborhoods adjacent to wealthy ones have the highest probability of gentrifying in the face of new housing demand.
This paper uses Median Home Value as one of the main variables to construct gentrification metrices. To measure the general dimension of community strength and vulnerability in the year 2000 four instruments are created measuring different latent construct. For the porpose of this paper neighorhood health is constructed from the entire available LTDB data .

The LTDB data is first loaded in its raw form.

#load data
dat_2000 <-read.csv(file='../data/data-raw/harmonised_census_data_part01/ltdb_std_2000_sample.csv', stringsAsFactors=F ) %>%
rename_all(str_to_lower)

Some variables in the LTDB code have missing values as -999. This was a common practice in the past. To ensure unbiased results all the -999’s are replaced with NA prior to the analysis.

# replace -999 in data set with NA
#dat_2000 <- dat_2000 %>% 
# replace_with_na_all(condition = ~.x == -999) %>% 
dat_2000 %>% 
    as_tibble() %>% 
     mutate_all(function(i) ifelse(i == -999, NA, i)) %>%
na.omit()

Part 2

Selection of the gentrification metrics
In this step, four different instruments/metrices are constructed to measure the neighborhood health. Median home value is the common variable in all of the instruments, since it can captures the changes in intial stages. The value of variables included in the instruments are converted to their Z-scores, this avoids overweighting one variable. After normalising each variable, the Cornbach Alpha Score is calculated which shows how closely each variable is related to each other as a group. An alpha score of 70% or higher is considered better for internal relialbility of the group.

Instrument 1

This Instrument measures economic-wealth in the neighborhood. The variables included in the matrix are Median home value, percent of professional employees and total median house-hold income. Percent of professional employees is calculated divinding number of professional employees by number of employed people who are 16 years of age or over.

Higher the percent of professional employees higher is the number new businesses and educated people in the neighborhood. The median houshold income for this population is higher so they can afford homes with higher value. This instrument measures neighborhood health in term of economic-wealth.

Instrument 1 measures a Cornbach Alpha Score of 87%

#Instrument 1
# Economic wealth
#Median home value mhmval and mhmval12. 
# percent professional employees 
# Median HH Income, total

instrument1 <- dat_2000 %>% 
  mutate(
    pprof00 = (prof00/empclf00)
  ) %>% 
  select(mhmval00, pprof00,hinc00) %>%
   
  mutate(
    mhmval00zscore = scale(mhmval00, center = TRUE, scale = TRUE),
    pprof00zscore = scale(pprof00, center = TRUE, scale = TRUE),
    hinc00zscore = scale(hinc00, center = TRUE, scale = TRUE)
  )  %>%   
  select(mhmval00zscore,pprof00zscore,hinc00zscore) %>%
  rename(
      "median home value" = mhmval00zscore,
     "percent professional employees" = pprof00zscore,
     "Median household income" = hinc00zscore
      )%>% 
  data.frame()
#pairs( instrument1, lower.panel= panel.smooth, upper.panel= panel.cor )
  alpha1 <-  psych::alpha(instrument1, check.keys=TRUE )$total$raw_alpha
  alpha1

## [1] 0.8676727

Instrument 2

This Instrument measures human-capital in the neighborhood. The variables included in the matrix are Median home value, percent of population with four of college degree or more and per-capita income. The Percent is calculated by dividing the number of people with 4 years college or more with population who are 25 years of age or more.

More educated people tend to be human capital of the society and earn more per capita income compared to their less educated counter parts. This segement of population have higher appitite for urban amenities and housing. The instrument measures neighborhood health in term of human capital.

Instrument 2 measures a Cornbach Alpha Score of 89%

#Instrument 2 
# Human Capital
# Median home value 
# % with 4 years college degree or more
# per capita income

instrument2 <- dat_2000 %>% 
  mutate(
    pcol00 = (col00/ag25up00)
  ) %>% 
  select(mhmval00, pcol00, incpc00) %>%
  mutate(
    mhmval00zscore = scale(mhmval00, center = TRUE, scale = TRUE),
    pcol00zscore = scale(pcol00, center = TRUE, scale = TRUE),
    incpc00zscore = scale(incpc00, center = TRUE, scale = TRUE)
  )  %>%   
  select(mhmval00zscore,pcol00zscore,incpc00zscore)%>%
  rename(
      "median home value" = mhmval00zscore,
     "percent with four years college degree or more" = pcol00zscore,
     "per capita income" = incpc00zscore
      )%>% 
  data.frame()
#pairs( instrument2, lower.panel= panel.smooth, upper.panel= panel.cor )
  alpha2 <-  psych::alpha(instrument2, check.keys=TRUE )$total$raw_alpha
  alpha2

## [1] 0.8900003

Instrument 3

This Instrument measures hardship index of the neighborhood. The variables included in the matrix are Median home value, percent of population with high school degree or less and median household income. The percent is calculated by dividing number of people with high school degree or less by employed population who are 16 years of age or more.

The Population with less education usually work low paying jobs, have a harder time holding on to a job. More often than not they work more than one jobs, family life is harder. This instrument measures neighborhood vulnerability in terms of hardship.

Instrument 3 measures a Cornbach Alpha Score of 83%

#Instrument 3 
# Hardship index
# Median home value 
# % high school degree or less 
# median HH income 

instrument3 <- dat_2000 %>% 
  mutate(
    phs00 = (hs00/empclf00)
  ) %>%
  select(mhmval00, phs00, hinc00) %>%
  mutate(
    mhmval00zscore = scale(mhmval00, center = TRUE, scale = TRUE),
    phs00zscore = scale(phs00, center = TRUE, scale = TRUE),
    hinc00zscore = scale(hinc00, center = TRUE, scale = TRUE)
  )  %>%   
  select(mhmval00zscore,phs00zscore,hinc00zscore)%>%
  rename(
      "median home value" = mhmval00zscore,
     "percent with highschool degree or less" = phs00zscore,
     "median household income total" = hinc00zscore
      )%>% 
  data.frame()
#pairs( instrument3, lower.panel= panel.smooth, upper.panel= panel.cor )
  alpha3 <-  psych::alpha(instrument3, check.keys=TRUE )$total$raw_alpha
  alpha3

## [1] 0.830247

Instrument 4

This Instrument measures Distress Index of the neighborhood. The variables included in the matrix are median home value, median household income, percent of unemployed population, and percent of widowed,divorced and seperated popultion. Percent of umemployed population is calcutaed by the dividing the number of unemployed population with the total number of people in labor force. The percent of widowed, divorced and seperated population is calculated by diving the total number of such population with total population over age of 15.

Distressed neighborhood is often times characterised by umemployed population, broken families, and low median income. This instrument measures neighborhood vulnerability in terms of neighborhood ditress.

Instrument 4 measures a Cornbach Alpha Score of 75%

#Instrument 4
# ditressed neighborhood
# Median home value 
# Median HH Income, total
# percent unemployed
# percent widowed, divorced and seperated


instrument4 <- dat_2000 %>%
  mutate(
    punemp00 = (unemp00/clf00),
    pwds00 = (wds00/ag15up00)
  ) %>%
  select(mhmval00,hinc00,punemp00,pwds00) %>%
  mutate(
    mhmval00zscore = scale(mhmval00, center = TRUE, scale = TRUE),
    hinc00zscore = scale(hinc00, center = TRUE, scale = TRUE),
    punemp00zscore = scale(punemp00, center = TRUE, scale = TRUE),
    pwds00zscore = scale(pwds00, center = TRUE, scale = TRUE)
  )  %>%   
  select(mhmval00zscore, hinc00zscore, punemp00zscore, pwds00zscore)%>%
  rename(
      "median home value" = mhmval00zscore,
     "median household income" = hinc00zscore,
     "percent unemployed" = punemp00zscore,
     "percent widowed, divorced and seperated"= pwds00zscore
     )%>% 
  data.frame()
#pairs( instrument4, lower.panel= panel.smooth, upper.panel= panel.cor )
  alpha4 <-  psych::alpha(instrument4, check.keys=TRUE )$total$raw_alpha
  alpha4

## [1] 0.7468594

Part 3

Descriptive statistics on all of the metrics for all urban census tracts

To differentiate urban from rural tracts meta-data from the Metro Statistical Area files (MSA files) is loaded. A list of ALL counties that belong to MSAs (urban counties) is coded as urban and the rest is coded as rural.

# Download cross walk data 
URL <- "https://data.nber.org/cbsa-msa-fips-ssa-county-crosswalk/cbsatocountycrosswalk.csv"
crosswalk <- read.csv( URL, stringsAsFactors=F )

# all metro areas in the country
sort( unique( crosswalk$cbsaname ) )

crosswalk$urban <- ifelse( crosswalk$cbsaname == "", "rural", "urban" )

keep.these <- c( "countyname","state","fipscounty", 
                 "msa","msaname", 
                 "cbsa","cbsaname",
                 "urban" )

cw <- dplyr::select( crosswalk, keep.these )

## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(keep.these)` instead of `keep.these` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.

This paper show two methods of filtering urban data from the LTDB census data to conduct descriptive statistical analysis on the variables.

Method 1

The first method filters out urban county data from crosswalk data base and perform an inner joint of the two data bases (LTDB_2000 and crosswalk) based on CBSA number. This keeps only the matching entry from both databases. The data is further filtered to keep entry related to only unique tract id.

# filter urban entry only, and rename cbsa 
cw_urban <- cw %>%
  filter (urban == "urban") %>%
  rename ("cbsa10" = cbsa)
cw_urban

#inne join two data sets
dat_2000 %>%
  inner_join (cw_urban, by = "cbsa10") %>% 
  distinct (trtid10, .keep_all = TRUE) %>% 
  select(hinc00,unemp00,wds00,mhmval00, prof00, col00, incpc00, hs00) %>% 
summary ()

##      hinc00          unemp00            wds00           mhmval00      
##  Min.   :  2499   Min.   :   0.00   Min.   :   0.0   Min.   :      0  
##  1st Qu.: 32068   1st Qu.:  50.00   1st Qu.: 362.0   1st Qu.:  74100  
##  Median : 41580   Median :  84.04   Median : 535.0   Median : 100900  
##  Mean   : 44766   Mean   : 105.29   Mean   : 561.6   Mean   : 120462  
##  3rd Qu.: 54047   3rd Qu.: 133.01   3rd Qu.: 733.0   3rd Qu.: 142900  
##  Max.   :200001   Max.   :6405.33   Max.   :4450.0   Max.   :1000001  
##  NA's   :137                                         NA's   :136      
##      prof00           col00           incpc00            hs00     
##  Min.   :   0.0   Min.   :   0.0   Min.   :     0   Min.   :   0  
##  1st Qu.: 292.9   1st Qu.: 229.0   1st Qu.: 15583   1st Qu.: 671  
##  Median : 515.0   Median : 453.0   Median : 19696   Median :1072  
##  Mean   : 602.9   Mean   : 604.5   Mean   : 21240   Mean   :1152  
##  3rd Qu.: 824.8   3rd Qu.: 835.1   3rd Qu.: 24913   3rd Qu.:1546  
##  Max.   :3524.7   Max.   :4644.7   Max.   :147633   Max.   :6994  
##

Method 2

The second method filters out urban county data from crosswalk data base and keep entries for unique CBSA. It then looks for those CBSA in LTGB_2000 data base and keep only common entires.

# fiter urban area from crosswalk
cbsa <- 
 cw %>% 
  filter( urban == "urban" ) %>%
  select( cbsa, urban ) 
  cbsa <- unique( cbsa )
  nrow( cbsa )

#identify and keep the common rows in dat_2000 
cbsa.id <- cbsa$cbsa 
keep.these <- dat_2000$cbsa10 %in% cbsa.id 
dat_2000_urban <- filter( dat_2000, keep.these ) %>% 
  select(hinc00,unemp00,wds00,mhmval00, prof00, col00, incpc00, hs00) 

summary(dat_2000_urban )

##      hinc00          unemp00            wds00           mhmval00      
##  Min.   :  2499   Min.   :   0.00   Min.   :   0.0   Min.   :      0  
##  1st Qu.: 32068   1st Qu.:  50.00   1st Qu.: 362.0   1st Qu.:  74100  
##  Median : 41580   Median :  84.04   Median : 535.0   Median : 100900  
##  Mean   : 44766   Mean   : 105.29   Mean   : 561.6   Mean   : 120462  
##  3rd Qu.: 54047   3rd Qu.: 133.01   3rd Qu.: 733.0   3rd Qu.: 142900  
##  Max.   :200001   Max.   :6405.33   Max.   :4450.0   Max.   :1000001  
##  NA's   :137                                         NA's   :136      
##      prof00           col00           incpc00            hs00     
##  Min.   :   0.0   Min.   :   0.0   Min.   :     0   Min.   :   0  
##  1st Qu.: 292.9   1st Qu.: 229.0   1st Qu.: 15583   1st Qu.: 671  
##  Median : 515.0   Median : 453.0   Median : 19696   Median :1072  
##  Mean   : 602.9   Mean   : 604.5   Mean   : 21240   Mean   :1152  
##  3rd Qu.: 824.8   3rd Qu.: 835.1   3rd Qu.: 24913   3rd Qu.:1546  
##  Max.   :3524.7   Max.   :4644.7   Max.   :147633   Max.   :6994  
##

LAB-02-goel

Sunayna Goel

April 07, 2020