Gentrification Metrics
Gentrification- physical and demographic changes to a neighborhood that brings in wealthier residents, new businesses, investment and development in the area, is an important topic in social- science. Gentrification also bring concerns related social-justice due to displacement and dislocation of low-income residents from the neighborhood. Traditionally low-income neighborhood across the United States gentrify. Identifying neighborhood changes related to gentrification can help in planning for negative by-products of gentrification and urban development of the area. City can plan for low-income amenities and rent-controlled-housing in recently developed and developing neighborhoods.
To capture the neighborhood changes gentrification metrices are constructed. Gentrification is not caused by a single variable, but it is the result of pool of gentrifiers with cultural preference for urban living, better amenities, disposable income and urban housing. These variables can measure the trends in latent changes such as economic strength, human capital, distress and happiness in the neighborhood among many other factors,allowing the city to intervene before the low-income population are severely affected.
The objective of this paper is to capture initial picture of neighborhood change in the year 2000 using longitudinal tabulated database (LTDB) from census data containing data related to almost 72000 unique census tracts. It contains almost 70 variables describing racial, socio-economic, housing, age, and marital status of the population. The key to observing changes is the conversion of Census tract data into useful ‘features’ or variables that help predict gentrification. In theory, neighborhood change suggests that low-priced neighborhoods adjacent to wealthy ones have the highest probability of gentrifying in the face of new housing demand.
This paper uses Median Home Value as one of the main variables to construct gentrification metrices. To measure the general dimension of community strength and vulnerability in the year 2000 four instruments are created measuring different latent construct. For the porpose of this paper neighorhood health is constructed from the entire available LTDB data .
The LTDB data is first loaded in its raw form.
#load data
dat_2000 <-read.csv(file='../data/data-raw/harmonised_census_data_part01/ltdb_std_2000_sample.csv', stringsAsFactors=F ) %>%
rename_all(str_to_lower)
Some variables in the LTDB code have missing values as -999. This was a common practice in the past. To ensure unbiased results all the -999’s are replaced with NA prior to the analysis.
# replace -999 in data set with NA
#dat_2000 <- dat_2000 %>%
# replace_with_na_all(condition = ~.x == -999) %>%
dat_2000 %>%
as_tibble() %>%
mutate_all(function(i) ifelse(i == -999, NA, i)) %>%
na.omit()
Selection of the gentrification metrics
In this step, four different instruments/metrices are constructed to measure the neighborhood health. Median home value is the common variable in all of the instruments, since it can captures the changes in intial stages. The value of variables included in the instruments are converted to their Z-scores, this avoids overweighting one variable. After normalising each variable, the Cornbach Alpha Score is calculated which shows how closely each variable is related to each other as a group. An alpha score of 70% or higher is considered better for internal relialbility of the group.
This Instrument measures economic-wealth in the neighborhood. The variables included in the matrix are Median home value, percent of professional employees and total median house-hold income. Percent of professional employees is calculated divinding number of professional employees by number of employed people who are 16 years of age or over.
Higher the percent of professional employees higher is the number new businesses and educated people in the neighborhood. The median houshold income for this population is higher so they can afford homes with higher value. This instrument measures neighborhood health in term of economic-wealth.
Instrument 1 measures a Cornbach Alpha Score of 87%
#Instrument 1
# Economic wealth
#Median home value mhmval and mhmval12.
# percent professional employees
# Median HH Income, total
instrument1 <- dat_2000 %>%
mutate(
pprof00 = (prof00/empclf00)
) %>%
select(mhmval00, pprof00,hinc00) %>%
mutate(
mhmval00zscore = scale(mhmval00, center = TRUE, scale = TRUE),
pprof00zscore = scale(pprof00, center = TRUE, scale = TRUE),
hinc00zscore = scale(hinc00, center = TRUE, scale = TRUE)
) %>%
select(mhmval00zscore,pprof00zscore,hinc00zscore) %>%
rename(
"median home value" = mhmval00zscore,
"percent professional employees" = pprof00zscore,
"Median household income" = hinc00zscore
)%>%
data.frame()
#pairs( instrument1, lower.panel= panel.smooth, upper.panel= panel.cor )
alpha1 <- psych::alpha(instrument1, check.keys=TRUE )$total$raw_alpha
alpha1
## [1] 0.8676727
This Instrument measures human-capital in the neighborhood. The variables included in the matrix are Median home value, percent of population with four of college degree or more and per-capita income. The Percent is calculated by dividing the number of people with 4 years college or more with population who are 25 years of age or more.
More educated people tend to be human capital of the society and earn more per capita income compared to their less educated counter parts. This segement of population have higher appitite for urban amenities and housing. The instrument measures neighborhood health in term of human capital.
Instrument 2 measures a Cornbach Alpha Score of 89%
#Instrument 2
# Human Capital
# Median home value
# % with 4 years college degree or more
# per capita income
instrument2 <- dat_2000 %>%
mutate(
pcol00 = (col00/ag25up00)
) %>%
select(mhmval00, pcol00, incpc00) %>%
mutate(
mhmval00zscore = scale(mhmval00, center = TRUE, scale = TRUE),
pcol00zscore = scale(pcol00, center = TRUE, scale = TRUE),
incpc00zscore = scale(incpc00, center = TRUE, scale = TRUE)
) %>%
select(mhmval00zscore,pcol00zscore,incpc00zscore)%>%
rename(
"median home value" = mhmval00zscore,
"percent with four years college degree or more" = pcol00zscore,
"per capita income" = incpc00zscore
)%>%
data.frame()
#pairs( instrument2, lower.panel= panel.smooth, upper.panel= panel.cor )
alpha2 <- psych::alpha(instrument2, check.keys=TRUE )$total$raw_alpha
alpha2
## [1] 0.8900003
This Instrument measures hardship index of the neighborhood. The variables included in the matrix are Median home value, percent of population with high school degree or less and median household income. The percent is calculated by dividing number of people with high school degree or less by employed population who are 16 years of age or more.
The Population with less education usually work low paying jobs, have a harder time holding on to a job. More often than not they work more than one jobs, family life is harder. This instrument measures neighborhood vulnerability in terms of hardship.
Instrument 3 measures a Cornbach Alpha Score of 83%
#Instrument 3
# Hardship index
# Median home value
# % high school degree or less
# median HH income
instrument3 <- dat_2000 %>%
mutate(
phs00 = (hs00/empclf00)
) %>%
select(mhmval00, phs00, hinc00) %>%
mutate(
mhmval00zscore = scale(mhmval00, center = TRUE, scale = TRUE),
phs00zscore = scale(phs00, center = TRUE, scale = TRUE),
hinc00zscore = scale(hinc00, center = TRUE, scale = TRUE)
) %>%
select(mhmval00zscore,phs00zscore,hinc00zscore)%>%
rename(
"median home value" = mhmval00zscore,
"percent with highschool degree or less" = phs00zscore,
"median household income total" = hinc00zscore
)%>%
data.frame()
#pairs( instrument3, lower.panel= panel.smooth, upper.panel= panel.cor )
alpha3 <- psych::alpha(instrument3, check.keys=TRUE )$total$raw_alpha
alpha3
## [1] 0.830247
This Instrument measures Distress Index of the neighborhood. The variables included in the matrix are median home value, median household income, percent of unemployed population, and percent of widowed,divorced and seperated popultion. Percent of umemployed population is calcutaed by the dividing the number of unemployed population with the total number of people in labor force. The percent of widowed, divorced and seperated population is calculated by diving the total number of such population with total population over age of 15.
Distressed neighborhood is often times characterised by umemployed population, broken families, and low median income. This instrument measures neighborhood vulnerability in terms of neighborhood ditress.
Instrument 4 measures a Cornbach Alpha Score of 75%
#Instrument 4
# ditressed neighborhood
# Median home value
# Median HH Income, total
# percent unemployed
# percent widowed, divorced and seperated
instrument4 <- dat_2000 %>%
mutate(
punemp00 = (unemp00/clf00),
pwds00 = (wds00/ag15up00)
) %>%
select(mhmval00,hinc00,punemp00,pwds00) %>%
mutate(
mhmval00zscore = scale(mhmval00, center = TRUE, scale = TRUE),
hinc00zscore = scale(hinc00, center = TRUE, scale = TRUE),
punemp00zscore = scale(punemp00, center = TRUE, scale = TRUE),
pwds00zscore = scale(pwds00, center = TRUE, scale = TRUE)
) %>%
select(mhmval00zscore, hinc00zscore, punemp00zscore, pwds00zscore)%>%
rename(
"median home value" = mhmval00zscore,
"median household income" = hinc00zscore,
"percent unemployed" = punemp00zscore,
"percent widowed, divorced and seperated"= pwds00zscore
)%>%
data.frame()
#pairs( instrument4, lower.panel= panel.smooth, upper.panel= panel.cor )
alpha4 <- psych::alpha(instrument4, check.keys=TRUE )$total$raw_alpha
alpha4
## [1] 0.7468594
Descriptive statistics on all of the metrics for all urban census tracts
To differentiate urban from rural tracts meta-data from the Metro Statistical Area files (MSA files) is loaded. A list of ALL counties that belong to MSAs (urban counties) is coded as urban and the rest is coded as rural.
# Download cross walk data
URL <- "https://data.nber.org/cbsa-msa-fips-ssa-county-crosswalk/cbsatocountycrosswalk.csv"
crosswalk <- read.csv( URL, stringsAsFactors=F )
# all metro areas in the country
sort( unique( crosswalk$cbsaname ) )
crosswalk$urban <- ifelse( crosswalk$cbsaname == "", "rural", "urban" )
keep.these <- c( "countyname","state","fipscounty",
"msa","msaname",
"cbsa","cbsaname",
"urban" )
cw <- dplyr::select( crosswalk, keep.these )
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(keep.these)` instead of `keep.these` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
This paper show two methods of filtering urban data from the LTDB census data to conduct descriptive statistical analysis on the variables.
The first method filters out urban county data from crosswalk data base and perform an inner joint of the two data bases (LTDB_2000 and crosswalk) based on CBSA number. This keeps only the matching entry from both databases. The data is further filtered to keep entry related to only unique tract id.
# filter urban entry only, and rename cbsa
cw_urban <- cw %>%
filter (urban == "urban") %>%
rename ("cbsa10" = cbsa)
cw_urban
#inne join two data sets
dat_2000 %>%
inner_join (cw_urban, by = "cbsa10") %>%
distinct (trtid10, .keep_all = TRUE) %>%
select(hinc00,unemp00,wds00,mhmval00, prof00, col00, incpc00, hs00) %>%
summary ()
## hinc00 unemp00 wds00 mhmval00
## Min. : 2499 Min. : 0.00 Min. : 0.0 Min. : 0
## 1st Qu.: 32068 1st Qu.: 50.00 1st Qu.: 362.0 1st Qu.: 74100
## Median : 41580 Median : 84.04 Median : 535.0 Median : 100900
## Mean : 44766 Mean : 105.29 Mean : 561.6 Mean : 120462
## 3rd Qu.: 54047 3rd Qu.: 133.01 3rd Qu.: 733.0 3rd Qu.: 142900
## Max. :200001 Max. :6405.33 Max. :4450.0 Max. :1000001
## NA's :137 NA's :136
## prof00 col00 incpc00 hs00
## Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0
## 1st Qu.: 292.9 1st Qu.: 229.0 1st Qu.: 15583 1st Qu.: 671
## Median : 515.0 Median : 453.0 Median : 19696 Median :1072
## Mean : 602.9 Mean : 604.5 Mean : 21240 Mean :1152
## 3rd Qu.: 824.8 3rd Qu.: 835.1 3rd Qu.: 24913 3rd Qu.:1546
## Max. :3524.7 Max. :4644.7 Max. :147633 Max. :6994
##
The second method filters out urban county data from crosswalk data base and keep entries for unique CBSA. It then looks for those CBSA in LTGB_2000 data base and keep only common entires.
# fiter urban area from crosswalk
cbsa <-
cw %>%
filter( urban == "urban" ) %>%
select( cbsa, urban )
cbsa <- unique( cbsa )
nrow( cbsa )
#identify and keep the common rows in dat_2000
cbsa.id <- cbsa$cbsa
keep.these <- dat_2000$cbsa10 %in% cbsa.id
dat_2000_urban <- filter( dat_2000, keep.these ) %>%
select(hinc00,unemp00,wds00,mhmval00, prof00, col00, incpc00, hs00)
summary(dat_2000_urban )
## hinc00 unemp00 wds00 mhmval00
## Min. : 2499 Min. : 0.00 Min. : 0.0 Min. : 0
## 1st Qu.: 32068 1st Qu.: 50.00 1st Qu.: 362.0 1st Qu.: 74100
## Median : 41580 Median : 84.04 Median : 535.0 Median : 100900
## Mean : 44766 Mean : 105.29 Mean : 561.6 Mean : 120462
## 3rd Qu.: 54047 3rd Qu.: 133.01 3rd Qu.: 733.0 3rd Qu.: 142900
## Max. :200001 Max. :6405.33 Max. :4450.0 Max. :1000001
## NA's :137 NA's :136
## prof00 col00 incpc00 hs00
## Min. : 0.0 Min. : 0.0 Min. : 0 Min. : 0
## 1st Qu.: 292.9 1st Qu.: 229.0 1st Qu.: 15583 1st Qu.: 671
## Median : 515.0 Median : 453.0 Median : 19696 Median :1072
## Mean : 602.9 Mean : 604.5 Mean : 21240 Mean :1152
## 3rd Qu.: 824.8 3rd Qu.: 835.1 3rd Qu.: 24913 3rd Qu.:1546
## Max. :3524.7 Max. :4644.7 Max. :147633 Max. :6994
##