
Lab 03 – What should I major in?
Goals
In this lab, you will:
- Explore relationships between college majors, earnings, and employment
- Practice data wrangling and visualization skills
- Continue developing a reproducible data analysis workflow
Getting Started
- You will be working in your Lab 03–04 Groups (see Blackboard).
- Download the
.qmdfile for this lab from our class GitHub repo. - Refer back to Lab 01 for detailed workflow and submission instructions.
Packages
We will use the following packages:
- tidyverse: data wrangling and visualization
- scales: formatting labels
- ggridges: ridge plots
- kableExtra: table formatting
- fivethirtyeight: data source
Data: College Majors and Earnings
In this assignment we explore data on college majors and earnings, specifically the data in the FiveThirtyEight story “The Economic Guide To Picking A College Major”.
These data originally come from the American Community Survey (ACS) 2010-2012 Public Use Microdata Series. While this is outside the scope of this assignment, if you are curious about how raw data from the ACS were cleaned and prepared, see the code FiveThirtyEight authors used. This data is over a decade old at this point, but you could pull and analyze more recent ACS data for your project! The ACS includes many more survey topics than those in this analysis.
We should also note that there are many considerations that go into picking a college major. Earnings potential and employment prospects are two of them, and they are important, but they don’t tell the whole story. Keep this in mind as you analyze the data.
The dataset is included in the fivethirtyeight package and is called college_recent_grads.
?college_recent_gradsThe college_recent_grads data frame is a trove of information. Let’s think about some questions we might want to answer with these data:
- Which major has the lowest unemployment rate?
- Which major has the highest percentage of women?
- How do the distributions of median income compare across major categories?
- Do women tend to choose majors with lower or higher earnings?
In the next section we aim to answer these questions.
Exercises
Respond to each exercise with clearly labeled code and written interpretation.
The function slice and/or its variations might be helpful in displaying only a subset of rows. Check out the help documentation by runing ?slice in the console.
You can check out more table formatting options using the kableExtra package here
Hint: read the ?percent() documentation to see how the accuracy argument works.
How do the distributions of median income compare across major categories?
There are three income variables reported in this data frame: p25th, median, and p75th. These correspond to the 25th, 50th, and 75th percentiles of the income distribution of sampled individuals for a given major.
A percentile is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. For example, the 20th percentile is the value below which 20% of the observations may be found. (Source: Wikipedia)
The question we want to answer is “How do the distributions of median income compare across major categories?” We need to do a few things to answer this question: First, we need to group the data by major_category. Then, we need a way to summarize the distributions of median income within these groups. This decision will depend on the shapes of these distributions. So first, we need to visualize the data.
Now that we’ve seen the shapes of the distributions of median incomes for each major category, we should have a better idea which summary statistic to use to quantify the typical median income - the mean or the median? (Yes, we are talking about the mean/median of medians!).
All STEM fields aren’t the same
One of the sections of the FiveThirtyEight story is “All STEM fields aren’t the same”. Let’s see if this is true.
Let’s unpack this code: with mutate we create a new variable called major_type, which is defined as "stem" if the major_category is in the vector called stem_categories we created earlier, and as "not stem" otherwise.
%in% is a logical operator. Other logical operators that are commonly used are
| Operator | Operation |
|---|---|
x < y |
less than |
x > y |
greater than |
x <= y |
less than or equal to |
x >= y |
greater than or equal to |
x != y |
not equal to |
x == y |
equal to |
x %in% y |
contains |
x | y |
or |
x & y |
and |
!x |
not |
Submission
Before submitting your .html:
- Check code readability
- Suppress unnecessary warnings and messages
- Ensure figures and tables have clear labels
- Confirm exercises are clearly labeled
Convert the final .html file to a .zip file and upload to Blackboard.
Grading (50 pts)
| Component | Points |
|---|---|
| Exercises 1–14 | 42 (3 each) |
| Workflow & formatting | 5 |
| Reflection | 3 |
| Bonus | 2 |