Coqual

About Me

Portolio

CV

I program I've been working on over time in R Markdown to quickly generate reports of our data. It is structured in an easy-to-navigate format, with a collapsible table of contents on the side. This is especially useful when we are dealing with a lot of data!

I always start with a look at the demographics. This gives a sense of which intersectional cuts are feasible to run.

The first step in any analysis is to visualize and inspect your data--searching for any particular issues with skewness, kurtosis, and outliers.

We get a lot of data, but using R Markdown and functional programming, I am able to examine as many variables as I want, generate a quick visualization, perform statistical testing, and report to the "quals". With the magic wand that is functional programming, I am even able to cast a spell on the fly if someone asks "What about turnover intentions for LGBT Gen Z-ers?" Very useful if you have over fifty variables.

This is even more useful when I want to look at intersections. For example, do older LGBT folks have lower turnover intentions than their Gen Z or Millennial counterparts? This can create endless possible combinations, so it's useful to have a short bit of code that can give you exactly what you want without too much fluff.

Plotwtdcat will:

"Dichotomize" the variable. If the outcome is binary, that's already done, but if we have Likert-scale items, I cut it at a certain level and declare each case either "High" or "Low". Using survey weights, calculate weighted proportions and convert to percentages. This is mainly for visualization purposes.
For binary outcomes, perform a weighted logistic regression. For Likert outcomes, perform a weighted t-test (if the predictor is dichotomous) or weighted ANOVA.
Summarize significance testing information (p values, significant contrasts if applicable), and apply a transparency layer to non-significant tests
Visualize the means along with significance testing information using ggplot2.

Plotwtdcat Code

ttestp<-function(dat, VAR1, Outcome, survey_des){ # Continous Outcome, Binary predictor

p<-svyttest(Outcome ~ Demo, survey_des)$p.value

}

anovap<-function(dat, VAR1, Outcome, survey_des){ # Continous Outcome, Multigroup Predictor

glm <- svyglm(Outcome ~ Demo,survey_des)

p<-regTermTest(glm, ~Demo, method=c("Wald"))$p # p<-summary(aov(glm))[[1]][["Pr(>F)"]][1]

}

logp<-function(dat, VAR1, Outcome, survey_des){ # Binary Outcome

glm <- svyglm(Outcome ~ Demo, des = survey_des,family=quasibinomial())

p<-Anova(glm)[3]

}

SigOrNotSig<-function(dat, VAR1, VAR2,survey_des,binary=F,more2grps=F){

pmod<-ifelse(more2grps==F & binary==F, ttestp(dat, VAR1, Outcome, survey_des),

ifelse(more2grps==F & binary==T, logp(dat, VAR1, Outcome, survey_des),

anovap(dat, VAR1, Outcome, survey_des)

) )

return(pmod)

}

plotwtdcat<-function(dat, VAR1, VAR2,binary=F,more2grps=F,Level=3){

dich<-paste(VAR2,"_d",sep="")

temp_dat<-Dich(dat, VAR2,Level)

temp_dat<-temp_dat%>% dplyr::select(record,VAR1,VAR2,dich, weight)

colnames(temp_dat)[2] <- "Demo"

colnames(temp_dat)[3] <- "Outcome"

colnames(temp_dat)[4] <- "Outcome_d"

survey_des<-svydesign(ids = ~1, weights= ~weight, data=temp_dat)

d<-svyby(~ Outcome_d,~Demo, design=survey_des, svymean, na.rm=T)

d[,2:3]<-d[,2:3]%>% multiply_by(100)%>% round(digits=2)

if(binary==F){

d<-d%>% rename(wtd_percent = Outcome_dHigh)}else {

d<-d%>% rename(wtd_percent = Outcome_dYes)}

d<-d %>% select(Demo, wtd_percent)

pmod<-SigOrNotSig(temp_dat, Demo, Outcome, survey_des,binary,more2grps) %>% as.numeric()

sig<-ifelse(pmod<.001,paste("Sig (p<.001);"),

ifelse(pmod<.05, paste(paste("Sig (p=",round(pmod,3),");",sep=''),sep=""),

paste(paste("Not sig (p=",round(pmod,3),").",sep=''),sep="\n")))

transparency<-ifelse(pmod<.05, 1, "") #change transparency if nonsig

if(more2grps==T){

glm <- svyglm(Outcome ~ Demo, des = survey_des)

cells <- emmeans(glm, ~ Demo)

con<-as.data.frame(pairs(cells, adjust = "none"))

con<-con%>% mutate(p.value=round(p.value,5))%>% filter(p.value<.05)%>%dplyr::select(contrast)

contrasts<-paste(con)

contrasts<-str_replace_all(contrasts,"^c|[\"()]|\\\\n","")

contrasts<-str_replace(contrasts,"haracter0","")

contrasts <- ifelse(pmod<0.05, contrasts ,"")

}

else(contrasts<-"")

sub<-paste(VAR1,": ",sig," ", contrasts,sep="")

sub = paste(strwrap(sub, width = 50), collapse = "\n", sep="") #change width to 80 when only race/racegender

ggplot(d, aes(x=d[,1], y=d[,2], fill=d[,1])) +

geom_bar(position='dodge', stat='identity',aes(alpha=transparency)) +

geom_text(aes(label=paste(round(wtd_percent),"%",sep="")), size=3.5, position=dodge_text, vjust=-.5) +

labs(x=element_blank(),y = element_blank(), subtitle=sub,title=element_blank())+

nice_theme()+

theme(legend.position = "None")+

ylim(0,120)

}