The State Determinants of COVID-19

CDE 498 - Spring 2021

Joshua Geenen

compiled on March 31, 2021

Setting Up

#Library packages
library(rmdformats)
library(dplyr)
library(tidyverse)
library(GGally)
library(leaps)
library(sjPlot)
library(ggplot2)
library(corrplot)
library(RColorBrewer)
library(car)
library(ggsci)
library(jtools)
library(ppcor)
library(rmarkdown)

#Set working directory
setwd("C:/Users/joshu/Desktop/COLLEGE_GENERAL/ASU_GENERAL/ASU_Spring_2021/CDE498")

#Read in data set
covid <- read.csv("state_covid.csv", head=T)

#Renaming and reformatting data
covid$poverty100 <- covid$poverty * 100
covid$college100 <- covid$college_grad_rate * 100
covid$democrat <- covid$Democrat_shr_2016
covid$republican <- covid$Republican_shr_2016
covid$covid <- covid$COVID_cases_per_100k

Question 1

Variables

Predictor variables: college100, republican, region_short Outcome variable: covid

college100: continuous predictor variable – I believe COVID cases will be negatively correlated with college graduation rates because college-educated individuals tend to get more exposure and training in scientific fields than individuals that are not college educated. This may lead individuals to have more faith and be more up-to-date on scientific findings related to the spread and severity of the disease around the U.S.

republican: continuous predictor variable – I believe states in which a larger proportion of citizens are Republican will have higher rates of COVID since they appear to have less strict COVID-prevention policies like mask mandates and lock downs.

region_short: categorical predictor variable – I believe states in the south and the west are less likely to implement restrictions regarding COVID-19, leading to more cases in these regions.

Summary Statistics

##College graduation rate
mean(covid$college100, na.rm = TRUE)
## [1] 35.9683
sd(covid$college100, na.rm = TRUE)
## [1] 7.52454
##Republican share
mean(covid$republican, na.rm = TRUE)
## [1] 48.4902
sd(covid$republican, na.rm = TRUE)
## [1] 11.8918
##Region
#Northeast region
mean(covid$region_short == "Northeast", na.rm = TRUE)
## [1] 0.1764706
#West region
mean(covid$region_short == "West", na.rm = TRUE)
## [1] 0.254902
#South
mean(covid$region_short == "South", na.rm = TRUE)
## [1] 0.254902
#Midwest
mean(covid$region_short == "Midwest", na.rm = TRUE)
## [1] 0.3137255
##COVID rates
mean(covid$covid, na.rm = TRUE)
## [1] 2216.451
sd(covid$covid, na.rm = TRUE)
## [1] 856.6188

Correlations

variables <- subset(covid, select = c(covid, college100, republican))

cormat <- cor(variables, use = "complete.obs")
print(cormat)
##                 covid college100 republican
## covid       1.0000000 -0.2829080  0.3216431
## college100 -0.2829080  1.0000000 -0.7060591
## republican  0.3216431 -0.7060591  1.0000000
corrplot(cormat, method = "circle")

Question 2

#Scatter plot using R
plot(covid$republican, covid$covid,
          main = "COVID-19 Rate by Percent Republican",
          xlab = "Republican %",
          ylab = "COVID-19 Cases per 100k",
          col = "purple")
abline(lm(covid$covid ~ covid$republican))

#Scatter plot using ggplot2
plot2 <- ggplot(data=covid, aes(x=republican, y=covid)) + 
  geom_point(aes(fill=region_short), size = 6, pch = 21) + 
  xlab("Republican Voting Share") + 
  ylab("COVID-19 Cases per 100K Citizens") + 
  ggtitle("COVID-19 Rate by Republican Voting Share") + 
  theme_bw() + 
  theme(plot.title = element_text(hjust = 0.5, size = 22, face = "bold"),
        axis.text = element_text(size = 12),
        axis.title = element_text(size = 12),
        legend.title = element_text(size = 12, face = "bold"),
        legend.position = "bottom") + 
  scale_fill_simpsons(name = "Regions") + 
  geom_text(label = covid$STATE_NAME, size = 3) + 
  geom_smooth(formula = y ~ x, method = "lm", se = F, color = "black")
print(plot2)

ggsave(plot = plot2,"covid_rate_by_republican.png", dpi = 300, width = 15, height = 9)

Question 3

Simple Linear Regression

I expect that Republican voting share will have a positive association with COVID cases per 100K citizens. Additionally, I expect that college graduation rates will have a negative association with COVID cases per 100K citizens.

##OLS Regression (one predictor or IV or X)
slm1 <- lm(covid$covid ~ covid$republican)
summary(slm1)
## 
## Call:
## lm(formula = covid$covid ~ covid$republican)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1707.6  -677.1   206.9   583.1  1319.2 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)  
## (Intercept)      1092.964    486.209   2.248   0.0291 *
## covid$republican   23.169      9.744   2.378   0.0214 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 819.3 on 49 degrees of freedom
## Multiple R-squared:  0.1035, Adjusted R-squared:  0.08516 
## F-statistic: 5.654 on 1 and 49 DF,  p-value: 0.02136
summ(slm1)
## MODEL INFO:
## Observations: 51
## Dependent Variable: covid$covid
## Type: OLS linear regression 
## 
## MODEL FIT:
## F(1,49) = 5.65, p = 0.02
## R² = 0.10
## Adj. R² = 0.09 
## 
## Standard errors: OLS
## ---------------------------------------------------------
##                             Est.     S.E.   t val.      p
## ---------------------- --------- -------- -------- ------
## (Intercept)              1092.96   486.21     2.25   0.03
## covid$republican           23.17     9.74     2.38   0.02
## ---------------------------------------------------------
slm2 <- lm(covid$covid ~ covid$college100)
summary(slm2)
## 
## Call:
## lm(formula = covid$covid ~ covid$college100)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1756.9  -631.8   169.3   563.9  1295.5 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       3374.89     572.98   5.890 3.45e-07 ***
## covid$college100   -32.21      15.60  -2.065   0.0443 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 830 on 49 degrees of freedom
## Multiple R-squared:  0.08004,    Adjusted R-squared:  0.06126 
## F-statistic: 4.263 on 1 and 49 DF,  p-value: 0.04426
summ(slm2)
## MODEL INFO:
## Observations: 51
## Dependent Variable: covid$covid
## Type: OLS linear regression 
## 
## MODEL FIT:
## F(1,49) = 4.26, p = 0.04
## R² = 0.08
## Adj. R² = 0.06 
## 
## Standard errors: OLS
## ---------------------------------------------------------
##                             Est.     S.E.   t val.      p
## ---------------------- --------- -------- -------- ------
## (Intercept)              3374.89   572.98     5.89   0.00
## covid$college100          -32.21    15.60    -2.06   0.04
## ---------------------------------------------------------

A one-unit increase in Republican voting share is associated with an additional 23.17 COVID cases per 100K citizens. The p-value of .02136 indicates that this model is statistically significant. The r-squared value of .1035 indicates that Republican voting share explains roughly 10% of the variability in COVID cases per 100K citizens.

A one-unit increase in college graduation rate is associated with an 32.21 fewer COVID cases per 100K citizens. The p-value of .0443 indicates that this model is statistically significant. The r-squared value of .08004 indicates that college graduation rates explain roughly 8% of the variability in COVID cases per 100K citizens.

Multiple Linear Regression

covid$region_short <- as.factor(covid$region_short)
mlm1 <- lm(covid$covid ~ covid$republican + 
             covid$college100 + relevel(covid$region_short, ref = "South"))
summary(mlm1)
## 
## Call:
## lm(formula = covid$covid ~ covid$republican + covid$college100 + 
##     relevel(covid$region_short, ref = "South"))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1867.4  -480.3    48.4   615.4  1280.6 
## 
## Coefficients:
##                                                      Estimate Std. Error
## (Intercept)                                         3583.3682  1442.9084
## covid$republican                                      -0.9241    14.0848
## covid$college100                                     -26.5993    22.7122
## relevel(covid$region_short, ref = "South")Midwest     -9.9284   284.8462
## relevel(covid$region_short, ref = "South")Northeast -883.8299   335.6613
## relevel(covid$region_short, ref = "South")West      -809.2942   322.1588
##                                                     t value Pr(>|t|)  
## (Intercept)                                           2.483   0.0168 *
## covid$republican                                     -0.066   0.9480  
## covid$college100                                     -1.171   0.2477  
## relevel(covid$region_short, ref = "South")Midwest    -0.035   0.9723  
## relevel(covid$region_short, ref = "South")Northeast  -2.633   0.0116 *
## relevel(covid$region_short, ref = "South")West       -2.512   0.0157 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 749.2 on 45 degrees of freedom
## Multiple R-squared:  0.3116, Adjusted R-squared:  0.2351 
## F-statistic: 4.073 on 5 and 45 DF,  p-value: 0.003903
summ(mlm1)
## MODEL INFO:
## Observations: 51
## Dependent Variable: covid$covid
## Type: OLS linear regression 
## 
## MODEL FIT:
## F(5,45) = 4.07, p = 0.00
## R² = 0.31
## Adj. R² = 0.24 
## 
## Standard errors: OLS
## -------------------------------------------------------------------
##                                      Est.      S.E.   t val.      p
## ------------------------------- --------- --------- -------- ------
## (Intercept)                       3583.37   1442.91     2.48   0.02
## covid$republican                    -0.92     14.08    -0.07   0.95
## covid$college100                   -26.60     22.71    -1.17   0.25
## relevel(covid$region_short,         -9.93    284.85    -0.03   0.97
## ref = "South")Midwest                                              
## relevel(covid$region_short,       -883.83    335.66    -2.63   0.01
## ref = "South")Northeast                                            
## relevel(covid$region_short,       -809.29    322.16    -2.51   0.02
## ref = "South")West                                                 
## -------------------------------------------------------------------

The p-value of .003903 indicates that this model is statistically significant. The adjusted r-squared value of .2351 indicates that the model explains roughly 23% of the variability in COVID cases per 100K citizens. The effect size for Republican voting share changed when other variables were included in the model. Additionally, the effect size for college graduation rates changed when other variables were included in the model.

Controlling for college graduation rate, a one-point increase in Republican voting share is associated with .9241 fewer COVID cases per 100K citizens for states in the South.

Controlling for Republican voting share, a one-point increase in college graduation rates is associated with 26.5993 fewer COVID cases per 100K citizens for states in the South.

Compared to states in the South, states in the Midwest had 9.9284 fewer COVID cases per 100K citizens when controlling for college graduation rates.

Compared to states in the South, states in the Northeast had 883.8299 fewer COVID cases per 100K citizens when controlling for college graduation rates.

Compared to states in the South, states in the West had 809.2942 fewer COVID cases per 100K citizens when controlling for college graduation rates.

Question 4

The South was shown to have higher rates of COVID per 100K citizens compared to any other region when controlling for other variables. Additionally, states with higher college graduation rates saw fewer COVID cases per 100K citizens. Furthermore, states with a higher rate of Republican voters saw higher rates of COVID per 100K citizens in general, but the South saw lower rates of COVID associated with a higher Republican voting share.