Lab 02 – Data Visualization

Data visualization

Tidyverse

Ridge plots

ANES

Goals

In this lab, you will:

Gain proficiency in data visualization
Apply principles of effective visualization to a real dataset
Continue developing a workflow for reproducible data analysis

Getting Started

Go to our class GitHub repo and download the .qmd file for this lab.
Refer back to Lab 01 for instructions on how to get started on a lab.
You will work in your Lab 01–02 groups (see Blackboard).

Packages

We will work with the tidyverse package as usual. We will also use viridis and the ggridges packages.

Data: 2020 American National Election Study

The dataset comes from the 2020 American National Election Study.

anes <- read_csv("data/anes2020_subset.csv")

A subset of variables are provided here. Some of them have been recoded, while others you may need to recode in order to be able to carry out your analysis. The variables are as follows:

CASEID: a Case ID for the respondent.
hunt_fish: a dummy variable asking if the respondent has gone hunting or fishing in the past year.
scientists: A feeling thermometer question that asks how warmly respondents feel towards scientists. A score of 0 represents the coolest rating, while a score of 100 represents the warmest rating.
education: An ordinal categorical representing the highest level of education for the respondent, ranging from less than high school to a professional degree.
ideology: a seven point self-rating scale for the respondent’s ideology ranging from most liberal to most conservative
urbanrural: a variable indicating how rural or urban the respondent’s home community is with four possible values: rural, small town, suburb, or city.

Exercises

All plots should follow the best visualization practices discussed in lecture. Plots should include an informative title, axes should be labeled, and careful consideration should be given to aesthetic choices.

Remember that continuing to develop a sound workflow for reproducible data analysis is important as you complete the lab and other assignments in this course. You should provide adequate comments to your code and narrative text interpreting your results.

Exercise 1

Provide a brief description of the data collection process — who collected the data, when, who was surveyed, and anything else you find relevant/interesting.

Exercise 2

How many rows and columns are in the anes dataset? Include code and output.

Exercise 3

Create a bar chart of ideology (counts on y-axis).

Include axis labels and an informative title.
What is the most common ideology?
Do respondents tend to be moderate or more ideologically extreme?

Exercise 4

Now, let’s examine whether ideologies are different based upon where people live. Please make a filled bar plot, showing one bar for each ideology, with the proportion of respondents on the y-axis going from 0-1, and the fill determined by urbanrural.

Include labels
Encouraged to use viridis colors
Where do people of each ideology tend to live?
Does the level of non-response vary by ideology?

Exercise 5

How do people view scientists?

Make a histogram of the scientists feeling thermometer.
Comment on the distribution, including features such as skewness and peaks. Interpret in context.

Exercise 6

Does the ideology of those who have gone hunting or fishing in the past year differ from those who haven’t? Explore this using side-by-side boxplots.

Start your code with:

anes |>
  drop_na(hunt_fish) |>
  mutate(hunted_fished = ifelse(hunt_fish == 0,
                                "Did Not Hunt or Fish",
                                "Hunted or Fished"))

Exercise 7

Explore the same question as in Exercise 6, this time using geom_density_ridges() to construct a ridge plot as an alternative to side-by-side box plots.

You can read more about ridge plots here.

Exercise 8

Describe what you observe in the boxplots (Ex 6) and ridge plots (Ex 7). What can you learn from one plot that you do not see in the other or that adds additional context to the other?

Exercise 9

Is education related to views of scientists? Create a scatterplot with:

education on x-axis
scientists feeling thermometer on y-axis
add a best-fit line layer: geom_smooth(method = "lm")

Is this visualization useful? Why or why not?

Exercise 10

What is an additional question you could investigate using this data? State the question, provide a visualization that investigates it, and comment on what your visualization shows.

Exercise 11

Write a brief paragraph summarizing what you found in this lab, what limitations might exist to the data/analyses, and what additional data you would like to have to explore further questions.

Submission

Before submitting your .html (as a .zip file to Blackboard):

Check your code for neatness - add spaces and line breaks where appropriate to improve readability
Check visualizations for clean titles and labels
Suppress extraneous messages/warnings (e.g. set #| warning: false, #| message: false inside code chunks)
Ensure exercises are clearly labeled and your text responses are visually distinguished
Confirm neat organization and readable structure

Render one last time, check the .html file for accuracy, then convert to .zip file to upload to Blackboard.

Grading (50 pts)

Component	Points
Exercise 1	2
Exercise 2	2
Exercise 3	4
Exercise 4	4
Exercise 5	4
Exercise 6	4
Exercise 7	4
Exercise 8	4
Exercise 9	4
Exercise 10	4
Exercise 11	4
Reflection prompts	5
Workflow & formatting	5

Grading notes:

The “Workflow & formatting” grade is to assess the reproducible workflow. This includes having readable code (e.g. adequate use of spacing and line breaks), labeled code chunks, informative headers and sub-headers, and an overall organized and uncluttered report (e.g. suppress messages & warnings, no extraneous output).