5 2/13 Lab IV | Project Runway

Our goal is to visualize the difference between the population percent (popPct) and the survey percent (svyPct) for various age groups. We’ll use the data in the table below (and of course, a full viz would include more subgroups). Create designs on how to present this information. Be ready to share concept and actual viz with the entire class. You can either work individually or in small groups. Do not include code with your visualizations. Instead, create an appendix that displays each code chunk at the end of the document. Make sure there are no warnings or messages displaying too.

Use the simulated data to make at least two plots: one in Base R and one in library(ggplot). Then you can use a dataset of your choice for the last two visualizations or keep working with the fake data.

Table 5.1: Table: Population and Survey Percentages by Age Group
age popPct svyPct
18 to 29 29 19
36 to 50 21 21
51 to 64 30 32
65+ 20 28

5.1 Base R Version

5.2 library(ggplot) Version 1

5.3 library(ggplot) Version 2

5.4 Alternative Plot of Your Choice

5.5 Code Appendix

5.5.1 Setup Code

# Load packages used in this session of R
library(knitr)
library(tidyverse)
library(ggplot2)

opts_chunk$set(echo = TRUE)
options(digits = 2)

5.5.2 Preparation Code

df <- data.frame("age" = c("18 to 29", "36 to 50", "51 to 64", "65+"),
                "popPct" = c(29, 21, 30, 20),
                "svyPct" = c(19, 21, 32, 28))

kable(df, caption = "Table: Population and survey percentages by age group")

5.5.3 Base R Plot Code

Age18to29 <- c(19, 29)
Age36to50 <- c(21,21)
Age51to64 <- c(32, 30)
Over65 <- c(28, 20)
age_groups <- cbind(Age18to29, Age36to50, Age51to64, Over65)
barplot(age_groups, beside=T, xlab="Age Group", names.arg=
          c("18 - 29", "36 - 50", "51 - 64", "65+"), ylab="Percent",
        main = "Percent Surveyed and Percent in Population by Age Group",
        ylim = c(0,35), las=1)
legend("bottomleft",c("Surveyed %", "Population %"),
       fill=c("black", "light gray"), horiz=FALSE, cex=0.73, bg="white")

5.5.4 library(ggplot) First Plot Code

df %>% 
  mutate(Population = popPct, Survey = svyPct) %>%
  dplyr::select(-popPct, -svyPct) %>%
  pivot_longer(-age, names_to="Group", values_to="Percent") %>%
  ggplot(aes(x=age, y=Percent, fill=Group)) +
  geom_bar(stat="identity", position="dodge") +
  scale_fill_grey() + 
  theme_minimal() +
  labs(x = "Age Group", y = "Percent",
       title = "Population and Survey Sample Proportions by Age Group")

5.5.5 library(ggplot) Second Plot Code

df %>% 
  mutate(Population = popPct, Survey = svyPct) %>%
  dplyr::select(-popPct, -svyPct) %>%
  pivot_longer(-age, names_to="Group", values_to="Percent") %>%
  ggplot(aes(x=age, y=Percent, fill=Group)) +
  geom_bar(stat="identity", position="dodge") +
  coord_flip() +
  scale_fill_grey() + 
  theme_minimal() +
  labs(x = "Age Group", y = "Percent",
       title = "Population and Survey Sample Proportions by Age Group")

5.5.6 Alternative Plot Code

library(apyramid)

df %>% 
  mutate(Population = popPct, Survey = svyPct) %>%
  dplyr::select(-popPct, -svyPct) %>%
  pivot_longer(-age, names_to="Group", values_to="Percent") %>%
  mutate(age = as.factor(age)) %>%
  age_pyramid(data = ., age_group = "age", split_by = "Group",  
              count = "Percent", show_midpoint = FALSE) +
  scale_fill_grey() +
  theme_minimal() +
  labs(x="Age Group", y="Percent", fill=NULL, 
       title = "Percent Surveyed and Percent in Population by Age Group")