Week 03

Data wrangling pt. 1

Thursday announcements

  • Quiz 01 to start class today
  • Then Lab 03 in groups (see Blackboard)
  • Week 04 lecture posted - shorter this week!
  • One week from today: project brainstorm due

Statistician(s) of the Week

Edna Paisano

Desi Small-Rodriguez

Announcements

  • Lab 02 optional (full credit if you turn it in completed, exempt if you don’t)

  • Annotations due Thursday

  • First quiz in class Thursday - will focus on Week 01 & 02 (basics + data viz, not new data wrangling stuff yet)

Questions?

  • See Perusall for answers to your questions!

    • all answered as of 8pm Monday
    • others will be answered later this week
    • follow up in class or on Piazza if urgent!

What’s the difference between the fill and color aesthetics in ggplot?

  • For histograms / things with bars, color = outside, fill = inside.
ggplot(midwest, aes(x = popdensity)) +
  geom_histogram(bins = 10, color = "blue")

ggplot(midwest, aes(x = popdensity)) +
  geom_histogram(bins = 10, color = "blue", fill = "red")

  • For points, use color
ggplot(midwest, aes(x = area, y = poptotal, 
                    color = state)) +
  geom_point()

Note how the above examples set color inside or outside aes() in the global ggplot() layer.

  • Put inside aes() if you want to map a variable in the data to the color/fill
  • Put outside aes(), usually in the geom_ layer, if you want to set color/fill to a specific color for the whole plot
  • Later layers will override earlier layers
ggplot(midwest, aes(x = area, y = poptotal, 
                    color = state)) +
  geom_point(color = "pink")

Application Exercise

  • The remainder of class will be spent on AE-03.
  • You can access it from GitHub.
  • It is due at the end of class today.
  • To turn it in, you should upload your .html file to Blackboard.