class: center, middle, inverse, title-slide .title[ # Intro to R - STAT 7500 ] .author[ ### Dr. Katie Fitzgerald ] --- layout: true <div class="my-footer"> <span> <a href="https://kgfitzgerald.github.io/stat-7500" target="_blank">kgfitzgerald.github.io/stat-7500</a> </span> </div> --- ## R learning objectives During this section of course, you will: - Learn to explore, visualize, and analyze data in a *reproducible* and *shareable* manner - Gain (continued) experience in data wrangling, exploratory data analysis, data visualization, and applied statistical programming - Work on problems and case studies inspired by and based on *real-world questions* and data - Learn to effectively communicate results through written reports --- ## Some of what you will learn - Fundamentals of `R` - Data visualization and wrangling with `ggplot2` and `dplyr` from the `tidyverse` - Data science workflow and scientific practice - Data communication / storytelling - Reproducible reports with `Quarto` --- ## R learning objectives Ultimately, the goal is that you will be able to... -- - gain insight from data -- - gain insight from data, **reproducibly** -- - gain insight from data, reproducibly **(with literate programming)** -- - gain insight from data, reproducibly (with literate programming), **using modern programming tools and techniques** --- class: middle # Reproducible data analysis --- ## Reproducibility checklist .question[ What does it mean for a data analysis to be "reproducible"? ] -- Near-term goals: - Are the tables and figures reproducible from the code and data? - Does the code actually do what you think it does? - In addition to what was done, is it clear *why* it was done? Long-term goals: - Can the code be used for other data? - Can you extend the code to do other things? --- ## Toolkit for reproducibility - Scriptability `\(\rightarrow\)` R -- - Literate programming (code, narrative, output in one place) `\(\rightarrow\)` Quarto -- - Sharability `\(\rightarrow\)` R Projects & GitHub --- class: middle # R and RStudio --- ## R and RStudio .pull-left[ <img src="img/r-logo.png" width="25%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="img/rstudio-logo.png" width="50%" style="display: block; margin: auto;" /> ] - R is an open-source statistical **programming language** - RStudio is a convenient interface for R called an **IDE** (integrated development environment), e.g. *"I write R code in the RStudio IDE"* - At its simplest:<sup>*</sup> - R is like a car’s engine - RStudio is like a car’s dashboard .footnote[ *Source: [Modern Dive](https://moderndive.com/) ] --- ## R packages - **Packages** are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data<sup>1</sup> - As of September 2025, there are over 22,000 R packages available on **CRAN** (the Comprehensive R Archive Network)<sup>2</sup> - We're going to work with a small (but important) subset of these! .footnote[ <sup>1</sup> Wickham and Bryan, [R Packages](https://r-pkgs.org/). <sup>2</sup> [CRAN contributed packages](https://cran.r-project.org/web/packages/). ] --- ## Tour: R and RStudio <img src="img/tour-r-rstudio.png" width="80%" style="display: block; margin: auto;" /> --- ## Some key differences with SAS - Literate programming / integration of code, narrative, and output - CASE SENSITIVE! - No semi-colons needed :) No explicit run statements - SAS is procedural, step-based (you tell SAS what to do step-by-step) - R is function and object-oriented. You create objects (e.g. data frames, models) and functions return results you can store, manipulate, or pass along. In R, almost everything is an object that can be reused or modified. - R is more flexible with data types (can be stored as vectors, data frames, list, matrices, etc) --- ## A short list (for now) of R essentials - Functions are (most often) verbs, followed by what they will be applied to in parentheses: ``` r do_this(to_this) do_that(to_this, to_that, with_those) ``` -- - Packages are installed with the `install.packages` function and loaded with the `library` function, once per session: ``` r install.packages("package_name") library(package_name) ``` --- ## R essentials (continued) - Columns (variables) in data frames are accessed with `$`: .small[ ``` r dataframe$var_name ``` ] -- - Object documentation can be accessed with `?` ``` r ?mean ``` --- # Mathematical Functions in R | Operator / Function | Meaning | |---------------------|---------| | `+, -, *, /` | add, subtract, multiply, divide | | `^` or `**` | power | | `%%` | modulo (remainder after division) | | `log()` | natural log | | `exp()` | exponential function | | `sqrt()` | square root | | `abs()` | absolute value | | `round(x, n)` | round to nearest integer (optional `n` = # decimal places) | | `floor()` | round down (lower integer) | | `ceiling()` | round up (higher integer) | + Note - R will NOT read parentheses as multiplication --- # Logical operators in R | Operator | Meaning | |----------|---------| | `==` | equal | | `>` | greater than | | `>=` | greater than or equal to | | `<` | less than | | `<=` | less than or equal to | | `&` | and | | `|` | or | | `!=` | not equal to (`!` is a common prefix for “not”) | --- # TRUE/FALSE In R, when T/F are used in mathematical formulas, T=1 and F=0, so… + `T + T + F` is 2 + `sum(c(T,T,F,T))` is 3 (num of T) + `mean(c(T,T,F,T))` is 0.75 (prop of T) --- # Your turn! In RStudio, go to File > New File > R Script OR in your Files quadrant, click the 2nd green "plus" icon and select R Script. In your R Script (.R file), write & execute code to verify: + 5+5 = 10 + `\(5^5\)` = 3,125 + 5-(-5) = 10 + 1+2*3 = 7 (make sure order of operations works properly) + If you round the square root of the `\(e^{10}\)`, you get 148 + If you round the square root of the `\(e^{10}\)`, you get 148.4 + 5+6 > 10, using logical operators + The square root of 25 equals 5, using logical operators + both of the previous two statements are true, using the “&” logical operator Once you've finished, download the install_packages.R file from the website and run all the lines to install (most of) the necessary packages for the semester. --- ## tidyverse .pull-left[ <img src="img/tidyverse.png" width="99%" style="display: block; margin: auto;" /> ] .pull-right[ .center[.large[ [tidyverse.org](https://www.tidyverse.org/) ]] - The **tidyverse** is an opinionated collection of R packages designed for data science - All packages share an underlying philosophy and a common grammar ] --- class: middle # Quarto --- ## Quarto .center[.large[ [https://quarto.org/](https://quarto.org/) ]] - **Quarto** and the various packages that support it enable R users to write their code and prose in reproducible computational documents + each time you render, the analysis is run from the beginning - Quarto documents have a `.qmd` file extension - Simple markdown syntax for text - Code goes in chunks, defined by three backticks, narrative goes outside of chunks --- ## Tour: Quarto <img src="img/tour-quarto.jpg" width="60%" height="100%" style="display: block; margin: auto;" /> --- ## Environments .tip[ The environment of your Quarto document is separate from the Console! ] Remember this, and expect it to bite you a few times as you're learning to work with Quarto! --- ## Environments .pull-left[ First, run the following in the console .small[ ``` r x <- 2 x * 3 ``` ] .question[ All looks good, eh? ] ] -- .pull-right[ Then, add the following in an R chunk in your Quarto document .small[ ``` r x * 3 ``` ] .question[ What happens? Why the error? ] ] --- ## How will we use Quarto? - Every R assignment will be a Quarto document - You'll always have a template Quarto document to start with - The amount of scaffolding in the template will decrease over the semester --- ## What's with all the hexes? <img src="img/hex-australia.png" width="60%" style="display: block; margin: auto;" /> .footnote[ Mitchell O'Hara-Wild, [useR! 2018 feature wall](https://www.mitchelloharawild.com/blog/user-2018-feature-wall/) ] --- class: middle, center # Let's dive in! --- background-image: url("img/unvotes/unvotes-01.jpeg") --- class: inverse <img src="img/unvotes/unvotes-02.jpeg" width="100%" style="display: block; margin: auto;" /> --- class: inverse <img src="img/unvotes/unvotes-03.jpeg" width="100%" style="display: block; margin: auto;" /> --- class: inverse <img src="img/unvotes/unvotes-04.jpeg" width="100%" style="display: block; margin: auto;" /> --- class: inverse <img src="img/unvotes/unvotes-05.jpeg" width="100%" style="display: block; margin: auto;" /> --- class: inverse <img src="img/unvotes/unvotes-06.jpeg" width="100%" style="display: block; margin: auto;" /> --- class: inverse <img src="img/unvotes/unvotes-07.jpeg" width="100%" style="display: block; margin: auto;" /> --- class: inverse <img src="img/unvotes/unvotes-08.jpeg" width="90%" style="display: block; margin: auto;" /> --- class: inverse <img src="img/unvotes/unvotes-09.jpeg" width="90%" style="display: block; margin: auto;" /> --- class: inverse <img src="img/unvotes/unvotes-10.jpeg" width="90%" style="display: block; margin: auto;" /> --- class: inverse <img src="img/unvotes/unvotes-11.jpeg" width="90%" style="display: block; margin: auto;" /> --- class: inverse <img src="img/unvotes/unvotes-12.jpeg" width="90%" style="display: block; margin: auto;" /> --- class: inverse <img src="img/unvotes/unvotes-13.jpeg" width="100%" style="display: block; margin: auto;" /> --- class: inverse <img src="img/unvotes/unvotes-14.jpeg" width="100%" style="display: block; margin: auto;" /> --- # Let's edit! - UN Votes.qmd --- # Your turn! R HW 01