Coqual
I program I've been working on over time in R Markdown to quickly generate reports of our data. It is structured in an easy-to-navigate format, with a collapsible table of contents on the side. This is especially useful when we are dealing with a lot of data!
I always start with a look at the demographics. This gives a sense of which intersectional cuts are feasible to run.
The first step in any analysis is to visualize and inspect your data--searching for any particular issues with skewness, kurtosis, and outliers.
We get a lot of data, but using R Markdown and functional programming, I am able to examine as many variables as I want, generate a quick visualization, perform statistical testing, and report to the "quals". With the magic wand that is functional programming, I am even able to cast a spell on the fly if someone asks "What about turnover intentions for LGBT Gen Z-ers?" Very useful if you have over fifty variables.
This is even more useful when I want to look at intersections. For example, do older LGBT folks have lower turnover intentions than their Gen Z or Millennial counterparts? This can create endless possible combinations, so it's useful to have a short bit of code that can give you exactly what you want without too much fluff.
Plotwtdcat will:
"Dichotomize" the variable. If the outcome is binary, that's already done, but if we have Likert-scale items, I cut it at a certain level and declare each case either "High" or "Low". Using survey weights, calculate weighted proportions and convert to percentages. This is mainly for visualization purposes.
For binary outcomes, perform a weighted logistic regression. For Likert outcomes, perform a weighted t-test (if the predictor is dichotomous) or weighted ANOVA.
Summarize significance testing information (p values, significant contrasts if applicable), and apply a transparency layer to non-significant tests
Visualize the means along with significance testing information using ggplot2.
Plotwtdcat Code
ttestp<-function(dat, VAR1, Outcome, survey_des){ # Continous Outcome, Binary predictor
p<-svyttest(Outcome ~ Demo, survey_des)$p.value
}
anovap<-function(dat, VAR1, Outcome, survey_des){ # Continous Outcome, Multigroup Predictor
glm <- svyglm(Outcome ~ Demo,survey_des)
p<-regTermTest(glm, ~Demo, method=c("Wald"))$p # p<-summary(aov(glm))[[1]][["Pr(>F)"]][1]
}
logp<-function(dat, VAR1, Outcome, survey_des){ # Binary Outcome
glm <- svyglm(Outcome ~ Demo, des = survey_des,family=quasibinomial())
p<-Anova(glm)[3]
}
SigOrNotSig<-function(dat, VAR1, VAR2,survey_des,binary=F,more2grps=F){
pmod<-ifelse(more2grps==F & binary==F, ttestp(dat, VAR1, Outcome, survey_des),
ifelse(more2grps==F & binary==T, logp(dat, VAR1, Outcome, survey_des),
anovap(dat, VAR1, Outcome, survey_des)
) )
return(pmod)
}
plotwtdcat<-function(dat, VAR1, VAR2,binary=F,more2grps=F,Level=3){
dich<-paste(VAR2,"_d",sep="")
temp_dat<-Dich(dat, VAR2,Level)
temp_dat<-temp_dat%>% dplyr::select(record,VAR1,VAR2,dich, weight)
colnames(temp_dat)[2] <- "Demo"
colnames(temp_dat)[3] <- "Outcome"
colnames(temp_dat)[4] <- "Outcome_d"
survey_des<-svydesign(ids = ~1, weights= ~weight, data=temp_dat)
d<-svyby(~ Outcome_d,~Demo, design=survey_des, svymean, na.rm=T)
d[,2:3]<-d[,2:3]%>% multiply_by(100)%>% round(digits=2)
if(binary==F){
d<-d%>% rename(wtd_percent = Outcome_dHigh)}else {
d<-d%>% rename(wtd_percent = Outcome_dYes)}
d<-d %>% select(Demo, wtd_percent)
pmod<-SigOrNotSig(temp_dat, Demo, Outcome, survey_des,binary,more2grps) %>% as.numeric()
sig<-ifelse(pmod<.001,paste("Sig (p<.001);"),
ifelse(pmod<.05, paste(paste("Sig (p=",round(pmod,3),");",sep=''),sep=""),
paste(paste("Not sig (p=",round(pmod,3),").",sep=''),sep="\n")))
transparency<-ifelse(pmod<.05, 1, "") #change transparency if nonsig
if(more2grps==T){
glm <- svyglm(Outcome ~ Demo, des = survey_des)
cells <- emmeans(glm, ~ Demo)
con<-as.data.frame(pairs(cells, adjust = "none"))
con<-con%>% mutate(p.value=round(p.value,5))%>% filter(p.value<.05)%>%dplyr::select(contrast)
contrasts<-paste(con)
contrasts<-str_replace_all(contrasts,"^c|[\"()]|\\\\n","")
contrasts<-str_replace(contrasts,"haracter0","")
contrasts <- ifelse(pmod<0.05, contrasts ,"")
}
else(contrasts<-"")
sub<-paste(VAR1,": ",sig," ", contrasts,sep="")
sub = paste(strwrap(sub, width = 50), collapse = "\n", sep="") #change width to 80 when only race/racegender
ggplot(d, aes(x=d[,1], y=d[,2], fill=d[,1])) +
geom_bar(position='dodge', stat='identity',aes(alpha=transparency)) +
geom_text(aes(label=paste(round(wtd_percent),"%",sep="")), size=3.5, position=dodge_text, vjust=-.5) +
labs(x=element_blank(),y = element_blank(), subtitle=sub,title=element_blank())+
nice_theme()+
theme(legend.position = "None")+
ylim(0,120)
}