We will be using the babynames data from the babynames package, which includes counts of babynames in the United States from 1880 to 2017. A name is included if it occurs at least 5 times in that calendar year. The data come from the U.S. Social Security Administration.
Note: each row contains COUNTS and PROPORTIONS for a name*sex*year combination. Each row does NOT represent one baby. Keep this in mind as you analyze the data.
Exercises
NoteExercise 1
Create four new variables in the babynames data:
name_length (counts the number of letters in the name)
first_letter
last_letter
name_ending (extract the last three letters)
NoteExercise 2
How many unique names contain the string “liz”?
Produce a table with counts of the top 10 variations of “liz”. Hint: you should sum over all years first
NoteExercise 3
Are girl names more likely to end in vowels (aeiouy)?
Create a variable indicating whether the name ends in a vowel
For each year and sex, what proportion of babies received a vowel-ending name? Has that changed over time? Is the pattern different for boys and girls?
Create a line plot to investigate
Briefly comment on your results
NoteExercise 4
Have names starting with K become more common?
Produce a visualization that investigates this by sex. Comment on your results.
Choose two additional letters to investigate, and provide a 2nd plot that shows the trends over time for the three letters.
NoteExercise 5
Are longer names becoming more common?
Plot the average name length by sex over time
NoteExercise 6
What is the most common letter of first names? Has this changed over time? Does this differ by sex?
Produce an appropriate visualization to explore this. Briefly comment on your results. Hint: it may be helpful to brainstorm a useful visualization by sketching by hand first
NoteExercise 7
Which name endings (last three letters) are most popular among boys versus girls?
Produce a table that shows the top 5 name endings for each sex.
NoteExercise 8
Create a visualization that explores the popularity of your name over time. Briefly comment on the results.
NoteBONUS
Propose one additional question that can be investigated with these data, and provide a visualization that investigates it.