A Look at Paul George’s Missed Free Throws

Law of Large Numbers

Unconditional Probability

Conditional Probability

Estimating Probabilities from Data

Authors

Affiliations

Connor Bryson

Indiana University Indianapolis

Joshua Patrick

Baylor University

Published

July 18, 2024

Facilitation notes

This module would be suitable for an in-class lab or take-home assignment in an introductory statistics course.
It assumes a basic familiarity with R.
Students should be provided with the following data file (.csv)
- pgfreethrow.csv

Introduction

On June 22, 2021, game 2 of the NBA Western Conference Finals featured the Los Angeles Clippers at the Phoenix Suns.

With 8.2 seconds left in the game, Suns’ Forward Mikal Bridges fouled Clippers Forward Paul George on an in-bounds play, which gave George two free-throws. At this moment of the game, the Clippers were leading 103-102. If Paul George makes these two free-throws, the Suns would have just a small chance to tie or win the game.

Paul George missed both free throws. The Suns would score on an alley-oop with 0.9 seconds left in the game and win by one point. They would go on to win the Western Conference Finals 4-2.

How unlikely was it for Paul George to miss both free throws? In this module, we will answer this question with estimating probabilities from data.

Paul George reacts to missing a free throw at the end of game 2 of the Western Conference Finals. Image source: Robert Gauthier / Los Angeles Times

Learning Objectives

By the end of this module, you will be able to:

Estimate probabilities using the Law of Large Numbers
Filter a dataset in R in order to estimate unconditional probabilities
Estimate conditional probabilities from data
Estimate the intersection of two events from data
Estimate the probability that Paul George misses two free throws when he goes to the line

Estimating Probabilities with the Law of Large Numbers

We begin with a discussion on how to estimate probabilities using the Law of Large Numbers (LLN). According to the LLN, if an experiment is performed a large number of times, the proportion of times an outcome is observed will be close to the true probability of that outcome. In this problem, the outcome of interest is whether or not Paul George makes a free throw. Let’s look at data and use the LLN to estimate the probability Paul George hits any one free throw.

Career Data

Below is a data table with each row representing a free throw for Paul George in his career (through the 2021-2022 season). The variables are represented as columns. The following variables are included:

date : Date of game
shot : Which free throw attempt is this shot for this trip to the free throw line
num_shots: How many total attempts for this trip to the free throw line
make : Whether this attempt was a make (TRUE) or miss (FALSE)
opponent : The opposing team that Paul George was playing against
quarter : Which quarter this free throw attempt took place
time : The time remaining in the quarter when the free throw was attempted
court : Whether Paul George was on the home team (home) or visiting team (away)
away_score : The score for the away team after the free throw attempt
home_score : The score for the home team after the free throw attempt

Use the data table below to explore the different variables. Note the data type of each variable.

NOTE: We will be using the dplyr package (part of tidyverse) for data manipulation. The dplyr package provides more readable and intuitive syntax compared to base R.

To read the data file into R we can use the following code:

library(tidyverse)

career_data <- read_csv("pgfreethrow.csv")

Estimating free throw probabilities from the career data

NOTE: A technical free throw is awarded when a player, coach, or team violates certain rules that are not related to physical contact during play. Common reasons include: unsportsmanlike conduct, having too many players on the court, or calling a timeout when none are available.

Using the Law of Large Numbers, let’s first estimate the probabilities that Paul George will make any one free throw in his career. In the career data above, there are 3952 free throw attempts (note that technical free throws are not included here).

We can count the number of made free throws by first filtering the dataframe to only include the made free throws (make == TRUE) and then count how many rows are in the filtered dataframe.

total_makes = career_data |> 
  filter(make == TRUE) |> 
  count()

print(total_makes)

# A tibble: 1 × 1
      n
  <int>
1  3334

Of these free throws, Paul George made 3334 of them. Now, we can estimate the probability that Paul George will make any one free throw by taking the number of made free throws and dividing by the total number of free throw attempts.

Exercise 1

Estimate the probability that Paul George makes any one free throw during his career. Round to four decimal places. (Click on the box below to see the solution)

Exercise 1: Solution

uncon_prob = total_makes / nrow(career_data)
uncon_prob |> 
  round(4) |> 
  print()

       n
1 0.8436

Other Unconditional Probabilities

So far, we have found what is called unconditional probability. That is, we are just interested in the event of Paul George making any one free throw in his career. We did not take into account any other information. Before we move on to conditional probability, let’s estimate a couple more unconditional probabilities but only for a subset of the data.

We first introduce some notation. We represent the unconditional probability of some event \(A\) with the notation \[ P(A) \]

If we let \(A\) represent the probability that Paul George makes any one free throw in his career, then we found above \[ P(A) = 0.8436 \]

NOTE: In the NBA, if a player is fouled while in the act of shooting a two-point shot during regular play, they get two free throws if they missed the shot. If they make the shot in this scenario, they get the points from the shot plus one additional free throw (call an “and-one” situation). If a player is fouled while in the act of shooting a three-point shot during regular play, they get three free throws if they missed the shot and one free throw if they make the shot.

Two free throws can also be given to a player if they are fouled without shooting but the opposing team has committed too many fouls in a quarter (known as being in the “bonus”).

In basketball, a trip to the free throw line will usually consists of 1, 2, or 3 attempts. So now, let’s look at the probability that Paul George will make his first shot on every trip to the free throw line. We will still estimate this probability with the Law of Large Numbers but this time, the data will only consist of the first free throw attempt when he goes to the line.

In R, we can filter the data to only contain the first attempt using the filter function as we did above but instead of make==TRUE in the function, we will use shot==1. Let’s call this filtered dataset first_attempt_data.

Exercise 2

Determine the number of shots Paul George took that was the first shot at the free throw line. (Click on the box below to see the solution)

Exercise 2: Solution

first_attempt_data = career_data |> 
  filter(shot==1) 

first_attempt_shots = first_attempt_data |> count()
print(first_attempt_shots)

# A tibble: 1 × 1
      n
  <int>
1  2115

Now let’s see how many of those first shots he made. We can do so by filtering first_attempt_data where make==TRUE and then using count as we have done before. Let’s call this count first_attempt_makes.

Exercise 3

Determine the number of first shot Paul George made in his career. (Click on the box below to see the solution)

Exercise 3: Solution

first_attempt_makes = first_attempt_data |> 
  filter(make==TRUE) |> 
  count()

print(first_attempt_makes)

# A tibble: 1 × 1
      n
  <int>
1  1744

Now we can use first_attempt_makes and first_attempt_data to estimate the probability of Paul George making the first shot when he goes to the free throw line.

Exercise 4

Estimate the probability that Paul George makes the first shot when he goes to the free throw line during his career. (Click on the box below to see the solution)

Exercise 4: Solution

first_attempt_prob = first_attempt_makes / nrow(first_attempt_data)
first_attempt_prob |> 
  round(4) |> 
  print()

       n
1 0.8246

Using the same steps as above, let’s now estimate the probability that Paul George will make the second shot when he goes to the free throw line. In basketball, there are times when there is only one shot when a player goes to the free throw line so we don’t expect the number of second shots to be the same as the number of first shots.

Exercise 5

Estimate the probability that Paul George makes the second shot when he goes to the free throw line during his career. (Click on the box below to see the solution)

Exercise 5: Solution

second_attempt_data = career_data |> 
  filter(shot==2) 

second_attempt_makes = second_attempt_data |> 
  filter(make==TRUE) |> 
  count()

second_attempt_prob = second_attempt_makes / nrow(second_attempt_data)
second_attempt_prob |> 
  round(4) |> 
  print()

       n
1 0.8629

Summary of Unconditional Probability

So far, we have estimated the probability that Paul George making any one free throw during his career. Let’s call this event \(A\). We found \[ P(A) = 0.8436 \]

We also estimated the probability that Paul George will make the first shot when he goes to the free throw line. Let’s call this event \(B_1\). We found \[ P(B_1) = 0.8246 \]

The last probability that we estimated was for Paul George’s second shot when he goes to the free throw line. Let’s call the event that he makes the second shot \(B_2\). We found \[ P(B_2) = 0.8629 \]

Probabilities of a Missed Free Throw

Recall that our goal is to estimate the probability that Paul George would miss both free throws in game 2 of the Western Conference Finals. Thus, we want the probabilities of missing the shots. Since missing the shot is the complement of making the shot, we can find the probability of missing the shot as one minus the probability of making the shot.

NOTE: We denote the probability of a complement of some event with a superscript \(c\) on the event. So the probability of \(A\) complement would be denoted as \[ P(A^c)=1-P(A) \]

Exercise 6

What is the estimated probability that Paul George misses any one free throw during his career? Round to four decimal places. (Click on the box below to see the solution)

Exercise 6: Solution

miss_prob = 1- uncon_prob

miss_prob |> 
  round(4) |> 
  print()

       n
1 0.1564

Exercise 7

What is the estimated probability that Paul George misses the first shot when he goes to the free throw line during his career? Round to four decimal places. (Click on the box below to see the solution)

Exercise 7: Solution

miss_first_prob = 1- first_attempt_prob

miss_first_prob |> 
  round(4) |> 
  print()

       n
1 0.1754

Exercise 8

What is the estimated probability that Paul George misses the second shot when he goes to the free throw line during his career? Round to four decimal places. (Click on the box below to see the solution)

Exercise 8: Solution

miss_second_prob = 1- second_attempt_prob

miss_second_prob |> 
  round(4) |> 
  print()

       n
1 0.1371

So the probability that Paul George misses any one free throw during his career is estimated as \[ \begin{align*} P(A^c)&=1-P(A)\\ & = 1-0.8436\\ & = 0.1564 \end{align*} \]

The probability that Paul George misses the first free throw attempt during his career is estimated as \[ \begin{align*} P(B_1^c)&=1-P(B_1)\\ & = 1-0.8246\\ & = 0.1754 \end{align*} \]

The probability that Paul George misses the second free throw attempt during his career is estimated as \[ \begin{align*} P(B_2^c)&=1-P(B_2)\\ & = 1-0.8629\\ & = 0.1371 \end{align*} \]

To answer the question about missing both free throws, we must now discuss Conditional Probability.

Estimating Conditional Probablities

Conditional probability is a specific type of probability that deals with the likelihood of an event occurring given that another event has already occurred. It’s denoted as \[ P(A|B) \] which is read as “the probability of A occurring given that B has occurred.”

For Paul George’s free throw attempts, let’s look at the the following conditional probabilities:

He makes the second shot given he made the first shot.
He misses the second shot given he made the first shot.
He makes the second shot given he missed the first shot.
He misses the second shot given he missed the first shot.

Let \[ \begin{align*} B_1 &= \text{the event that he makes the first shot}\\ B_1^c &= \text{the event that he misses the first shot}\\ B_2 &= \text{the event that he makes the second shot}\\ B_2^c &= \text{the event that he misses the second shot} \end{align*} \] Use this notation to answer the following questions.

Let’s now find the four probabilities listed above using the career data in R.

We will only focus on trips to the free throw line that consisted of two or three attempts. We do not want to use the data where there was only one free throw attempt since we want to estimate the probability of the second shot conditioned on the first shot. Let’s start by filtering the data to take out the free throw trips that only had one shot.

free_throw_data = career_data |> 
  filter(num_shots >= 2) 

free_throw_data |> 
  count() |> 
  print()

# A tibble: 1 × 1
      n
  <int>
1  3587

We see that we have 3587 free throw attempts once we remove the trips to the free throw line that only had one attempt. Going forward, we will be using this filtered dataframe called free_throw_data.

To estimate the conditional probability using the Law of Large Numbers, there are two approaches we can use. We can use the conditional probability formula or we can just reduce the dataframe to the conditioned event. Let’s examine both approches.

Using the Conditional Probability Formula

To find a conditional probability, we will use the formula \[ P(A|B)=\frac{P(A\text{ and }B)}{P(B)} \] Let’s start with the first probability above:

He makes the second shot given he made the first shot.

Using the conditional probability formula, we have \[ P(B_2|B_1)=\frac{P(B_2\text{ and }B_1)}{P(B_1)} \]

Let’s start by finding the probability of making the first shot, \(P(B_1)\). We filter the dataframe to include only the attempts where he made the first shot. We can do this with the filter function by filtering on shot==1 and ‘make==TRUE’.

first_made = free_throw_data |> 
  filter(shot==1 & make==TRUE) |> 
  count()

first_made |> 
  print()

# A tibble: 1 × 1
      n
  <int>
1  1456

Now we can divide this value by the total number of first shots. In the free_throw_data dataframe, this can be determined by just counting the number of times shot==1. Dividing the number of rows in made_first by the total number of rows where shot==1 gives us the estimated probability for making the first shot.

first_total = free_throw_data |> 
  filter(shot==1) |> 
  count()

first_total |> 
  print()

# A tibble: 1 × 1
      n
  <int>
1  1750

first_made_prob = first_made / first_total

first_made_prob |> 
  print()

      n
1 0.832

So we see that \[ P(B_1) = 0.832 \]

To estimate the probability \(P(B_2\text{ and }B_1)\), we need to filter the data were the first shot is made and the second shot is made in the same trip to the free throw line. Since the data frame is in chronological order, we can assume the previous row in the data frame corresponds to the previous free throw attempt. In R, we can use the lag function to refer to the previous row value in our dataframe. The code lag(make)==FALSE will find the rows that come right after a row where make==FALSE. In other words, it will find the attempts where the previous shot was a miss.

make_both = free_throw_data |> 
  filter(shot==2 & make==TRUE & lag(make)==TRUE) |> 
  count()

second_total = free_throw_data |> 
  filter(shot==2) |> 
  count()

make_both_prob = make_both / second_total 

make_both_prob |> 
  round(4) |> 
  print()

       n
1 0.7223

Thus, we estimate the probability he makes both as \[ P(B_2\text{ and }B_1) = 0.7223 \]

We can now find the conditional probability that he makes the second given he makes the first as \[ \begin{align*} P(B_2|B_1)&=\frac{P(B_2\text{ and }B_1)}{P(B_1)}\\ &=\frac{0.7223}{0.832}\\ & = 0.8681 \end{align*} \]

Filtering the Dataframe to the Conditioned Event

We now will consider an alternative method for estimating the probability \(P(B_2|B_1)\). Instead of using the conditional probability formula, we will just filter the data to be the conditioned event. When you condition on a event, you really are just “reducing the sample space” to the conditioned event.

Let’s start by filtering free_throw_data to only where the first shot was made and there was a second shot. We first make sure the conditioned event occurred. Since the data is in sequential order, we can just use lag(make)==TRUE along with shot==2 since this will give us all the second shots where the previous shot (the first shot) was a make.

first_made_data = free_throw_data |> 
  filter(shot==2 & lag(make)==TRUE)

Of this filtered dataframe, how many did Paul George make the second shot? We can find this with the following code.

second_made = first_made_data |> 
  filter(make==TRUE) |> 
  count()

second_made |> 
  print()

# A tibble: 1 × 1
      n
  <int>
1  1264

The estimated probability that he makes the second shot given he made the first shot can be found with the following code.

second_made = first_made_data |> 
  filter(make==TRUE) |> 
  count()

B1_B2_prob = second_made / count(first_made_data) 

B1_B2_prob |> 
  round(4) |> 
  print()

       n
1 0.8681

Note that this is the same estimated probability as we found using the conditional probability formula.

Finding the Remaining Conditional probabilities

Let’s now estimate the second probability above:

He misses the second shot given he made the first shot.

All the code for using the conditional probability formula approach are given below.

first_made = free_throw_data |> 
  filter(shot==1 & make==TRUE) |> 
  count()

first_total = free_throw_data |> 
  filter(shot==1) |> 
  count()

# P(B1)
first_made_prob = first_made / first_total 

miss_second_make_first = free_throw_data |> 
  filter(shot==2 & make==FALSE & lag(make)==TRUE) |> 
  count()

second_total = free_throw_data |> 
  filter(shot==2) |> 
  count()

# P(B2^c and B1)
miss_second_make_first_prob = miss_second_make_first / second_total 


#P(B2^c|B1)
B_2comp_given_B1 = miss_second_make_first_prob / first_made_prob

B_2comp_given_B1 |> 
  round(4) |> 
  print()

       n
1 0.1319

Note: The conditional probability of two events that are complements of each other, and have the same conditioned event can be found using the complement rule. That is, \[ P(B^c|A) = 1 - P(B|A) \] Thus, we could find the probability \(P(B_2^c|B_1)\) as \[ \begin{align*} P(B_2^c|B_1) &= 1 - P(B_2|B_1)\\ & = 1 - 0.8681\\ & = 0.1319 \end{align*} \]

Now let’s estimate the probability by filtering the dataframe to the conditional event.

first_made_data = free_throw_data |> 
  filter(shot==2 & lag(make)==TRUE) 

second_missed = first_made_data |> 
  filter(make==FALSE) |> 
  count()

B2comp_B1_prob = second_missed / count(first_made_data) 

B2comp_B1_prob |> 
  round(4) |> 
  print()

       n
1 0.1319

The remaining two conditional probabilities from above are left for you to do. Try to find them by adapting the code above using the conditional probability formula approach or the filtering to the conditional event approach. The solutions to the following two exercises show both approaches.

Exercise 10

What is the estimated probability that Paul George makes the second shot given he missed the first shot when he goes to the free throw line during his career? Round to four decimal places. (Click on the box below to see the solution)

Exercise 10: Solution

Conditional Probability Formula Approach

first_missed = free_throw_data |> 
  filter(shot==1 & make==FALSE) |> 
  count()

first_total = free_throw_data |> 
  filter(shot==1) |> 
  count()

# P(B1^c)
first_missed_prob = first_missed / first_total 

make_second_missed_first = free_throw_data |> 
  filter(shot==2 & make==TRUE & lag(make)==FALSE) |> 
  count()

second_total = free_throw_data |> 
  filter(shot==2) |> 
  count()

# P(B2 and B1^c)
make_second_missed_first_prob = make_second_missed_first / second_total 


#P(B2|B1^c)
B_2_given_B1comp = make_second_missed_first_prob / first_missed_prob

B_2_given_B1comp |> 
  round(4) |> 
  print()

       n
1 0.8367

Filtering the Dataframe to the Conditioned Event Approach

first_missed_data = free_throw_data |> 
  filter(shot==2 & lag(make)==FALSE) 

second_made = first_missed_data |> 
  filter(make==TRUE) |> 
  count()

B2_B1comp_prob = second_made / count(first_missed_data) 

B2_B1comp_prob |> 
  round(4) |> 
  print()

       n
1 0.8367

Note 1: Exercise 11

What is the estimated probability that Paul George misses the second shot given he missed the first shot when he goes to the free throw line during his career? Round to four decimal places. (Click on the box below to see the solution)

Exercise 11: Solution

Conditional Probability Formula Approach

first_missed = free_throw_data |> 
  filter(shot==1 & make==FALSE) |> 
  count()

first_total = free_throw_data |> 
  filter(shot==1) |> 
  count()

# P(B1^c)
first_missed_prob = first_missed / first_total 

missed_second_missed_first = free_throw_data |> 
  filter(shot==2 & make==FALSE & lag(make)==FALSE) |> 
  count()

second_total = free_throw_data |> 
  filter(shot==2) |> 
  count()

# P(B2^c and B1^c)
missed_second_missed_first_prob = missed_second_missed_first / second_total 


#P(B2|B1^c)
B2comp_given_B1comp = missed_second_missed_first_prob / first_missed_prob

B2comp_given_B1comp |> 
  round(4) |> 
  print()

       n
1 0.1633

Filtering the Dataframe to the Conditioned Event Approach

first_missed_data = free_throw_data |> 
  filter(shot==2 & lag(make)==FALSE) 

second_missed = first_missed_data |> 
  filter(make==FALSE) |> 
  count()

B2comp_B1comp_prob = second_missed / count(first_missed_data) 

B2comp_B1comp_prob |> 
  round(4) |> 
  print()

       n
1 0.1633

Using the Complement Rule

B2comp_B1comp_prob = 1-B2_B1comp_prob

B2comp_B1comp_prob |> 
  round(4) |> 
  print()

       n
1 0.1633

Estimating the Probability that Both Shots are Missed

In Exercise 11, you found \[ \begin{align*} P(\text{missed second shot}\vert \text{missed first shot}) & = P(B_2^c|B_1^c)\\ & = 0.1633 \end{align*} \] We want to answer the questions “How unlikely was it for Paul George to miss both free throws?” This question is the probability \[ P(B_2^c\text{ and }B_1^c) \] whereas the fourth conditional probability (found in Exercise 11):

He misses the second shot given he missed the first shot.

is the probability \[ \begin{align*} P(B_2^c|B_1^c)&=\frac{P(B_2^c\text{ and }B_1^c)}{P(B_1^c)} \end{align*} \]

So what is the difference between these two probabilities in terms of Paul George shooting those two free throws?

Let’s think about the moment right after Paul George was fouled by Mikal Bridges. Paul George is stepping to the line to shoot two free throws. Before he has shot the first attempt, we want to know the probability of missing both. That is the probability \(P(B_2^c\text{ and }B_1^c)\).

Now let’s think about the moment right after he shot the first free throw attempt and missed. He is about to shoot the second attempt. We want to know the probability of missing the second given he has just missed the first. This is the probability \[ \begin{align*} P(B_2^c|B_1^c) = 0.1633 \end{align*} \]

Since we have already estimated the probability \(P(B_2^c|B_1^c)\), we now want to estimate the probability that he missed both shots \(P(B_2^c\text{ and }B_1^c)\). Note that this probability has already been estimated if the conditional probability formula approached was used to find \(P(B_2^c|B_1^c)\). See the solution to Exercise 11.

Note 2: Exercise 12

What is the estimated probability that Paul George misses both the first shot and the second shot when he goes to the free throw line during his career? Round to four decimal places. (Click on the box below to see the solution)

Exercise 12: Solution

missed_second_missed_first = free_throw_data |> 
  filter(shot==2 & make==FALSE & lag(make)==FALSE) |> 
  count()

second_total = free_throw_data |> 
  filter(shot==2) |> 
  count()

# P(B2^c and B1^c)
missed_second_missed_first_prob = missed_second_missed_first / second_total 

missed_second_missed_first_prob |> 
  round(4) |> 
  print()

       n
1 0.0274

We see that it is unlikely that Paul George misses both free throws when he goes to the line. In fact, this only happens 2.74% of the time he goes to the free throw line in his career.

Summary

In this module, we have discussed

How to use the Law of Large Numbers to estimate probabilities
How to filter a dataset in R in order to estimate unconditional probabilities
How to estimate conditional probabilities from data
How to estimate the intersection of two events from data
How unlikely it was for Paul George to miss two free throws in a game

Other questions can be asked about Paul George missing those two free throws in that game. For example:

Does the probability change if we only consider free throws taken in a Playoff game?
Does the probability change if we only consider the last few minutes of a game?

You can find these probabilities using the data in this module. These questions are left for you to find on your own.