A Look at Paul George’s Missed Free Throws
Introduction
On June 22, 2021, game 2 of the NBA Western Conference Finals featured the Los Angeles Clippers at the Phoenix Suns.
With 8.2 seconds left in the game, Suns’ Forward Mikal Bridges fouled Clippers Forward Paul George on an in-bounds play, which gave George two free-throws. At this moment of the game, the Clippers were leading 103-102. If Paul George makes these two free-throws, the Suns would have just a small chance to tie or win the game.
Paul George missed both free throws. The Suns would score on an alley-oop with 0.9 seconds left in the game and win by one point. They would go on to win the Western Conference Finals 4-2.
How unlikely was it for Paul George to miss both free throws? In this module, we will answer this question with estimating probabilities from data.
Paul George reacts to missing a free throw at the end of game 2 of the Western Conference Finals. Image source: Robert Gauthier / Los Angeles Times
Estimating Probabilities with the Law of Large Numbers
We begin with a discussion on how to estimate probabilities using the Law of Large Numbers (LLN). According to the LLN, if an experiment is performed a large number of times, the proportion of times an outcome is observed will be close to the true probability of that outcome. In this problem, the outcome of interest is whether or not Paul George makes a free throw. Let’s look at data and use the LLN to estimate the probability Paul George hits any one free throw.
Career Data
Below is a data table with each row representing a free throw for Paul George in his career (through the 2021-2022 season). The variables are represented as columns. The following variables are included:
date
: Date of gameshot
: Which free throw attempt is this shot for this trip to the free throw linenum_shots
: How many total attempts for this trip to the free throw linemake
: Whether this attempt was a make (TRUE) or miss (FALSE)opponent
: The opposing team that Paul George was playing againstquarter
: Which quarter this free throw attempt took placetime
: The time remaining in the quarter when the free throw was attemptedcourt
: Whether Paul George was on the home team (home) or visiting team (away)away_score
: The score for the away team after the free throw attempthome_score
: The score for the home team after the free throw attempt
Use the data table below to explore the different variables. Note the data type of each variable.
NOTE: We will be using the dplyr
package (part of tidyverse
) for data manipulation. The dplyr
package provides more readable and intuitive syntax compared to base R
.
To read the data file into R
we can use the following code:
library(tidyverse)
<- read_csv("pgfreethrow.csv") career_data
Estimating free throw probabilities from the career data
NOTE: A technical free throw is awarded when a player, coach, or team violates certain rules that are not related to physical contact during play. Common reasons include: unsportsmanlike conduct, having too many players on the court, or calling a timeout when none are available.
Using the Law of Large Numbers, let’s first estimate the probabilities that Paul George will make any one free throw in his career. In the career data above, there are 3952 free throw attempts (note that technical free throws are not included here).
We can count the number of made free throws by first filtering the dataframe to only include the made free throws (make == TRUE
) and then count how many rows are in the filtered dataframe.
= career_data |>
total_makes filter(make == TRUE) |>
count()
print(total_makes)
# A tibble: 1 × 1
n
<int>
1 3334
Of these free throws, Paul George made 3334 of them. Now, we can estimate the probability that Paul George will make any one free throw by taking the number of made free throws and dividing by the total number of free throw attempts.
Other Unconditional Probabilities
So far, we have found what is called unconditional probability. That is, we are just interested in the event of Paul George making any one free throw in his career. We did not take into account any other information. Before we move on to conditional probability, let’s estimate a couple more unconditional probabilities but only for a subset of the data.
We first introduce some notation. We represent the unconditional probability of some event \(A\) with the notation \[ P(A) \]
If we let \(A\) represent the probability that Paul George makes any one free throw in his career, then we found above \[ P(A) = 0.8436 \]
NOTE: In the NBA, if a player is fouled while in the act of shooting a two-point shot during regular play, they get two free throws if they missed the shot. If they make the shot in this scenario, they get the points from the shot plus one additional free throw (call an “and-one” situation). If a player is fouled while in the act of shooting a three-point shot during regular play, they get three free throws if they missed the shot and one free throw if they make the shot.
Two free throws can also be given to a player if they are fouled without shooting but the opposing team has committed too many fouls in a quarter (known as being in the “bonus”).
In basketball, a trip to the free throw line will usually consists of 1, 2, or 3 attempts. So now, let’s look at the probability that Paul George will make his first shot on every trip to the free throw line. We will still estimate this probability with the Law of Large Numbers but this time, the data will only consist of the first free throw attempt when he goes to the line.
In R
, we can filter the data to only contain the first attempt using the filter
function as we did above but instead of make==TRUE
in the function, we will use shot==1
. Let’s call this filtered dataset first_attempt_data
.
Now let’s see how many of those first shots he made. We can do so by filtering first_attempt_data
where make==TRUE
and then using count
as we have done before. Let’s call this count first_attempt_makes
.
Now we can use first_attempt_makes
and first_attempt_data
to estimate the probability of Paul George making the first shot when he goes to the free throw line.
Using the same steps as above, let’s now estimate the probability that Paul George will make the second shot when he goes to the free throw line. In basketball, there are times when there is only one shot when a player goes to the free throw line so we don’t expect the number of second shots to be the same as the number of first shots.
Summary of Unconditional Probability
So far, we have estimated the probability that Paul George making any one free throw during his career. Let’s call this event \(A\). We found \[ P(A) = 0.8436 \]
We also estimated the probability that Paul George will make the first shot when he goes to the free throw line. Let’s call this event \(B_1\). We found \[ P(B_1) = 0.8246 \]
The last probability that we estimated was for Paul George’s second shot when he goes to the free throw line. Let’s call the event that he makes the second shot \(B_2\). We found \[ P(B_2) = 0.8629 \]
Probabilities of a Missed Free Throw
Recall that our goal is to estimate the probability that Paul George would miss both free throws in game 2 of the Western Conference Finals. Thus, we want the probabilities of missing the shots. Since missing the shot is the complement of making the shot, we can find the probability of missing the shot as one minus the probability of making the shot.
NOTE: We denote the probability of a complement of some event with a superscript \(c\) on the event. So the probability of \(A\) complement would be denoted as \[ P(A^c)=1-P(A) \]
So the probability that Paul George misses any one free throw during his career is estimated as \[ \begin{align*} P(A^c)&=1-P(A)\\ & = 1-0.8436\\ & = 0.1564 \end{align*} \]
The probability that Paul George misses the first free throw attempt during his career is estimated as \[ \begin{align*} P(B_1^c)&=1-P(B_1)\\ & = 1-0.8246\\ & = 0.1754 \end{align*} \]
The probability that Paul George misses the second free throw attempt during his career is estimated as \[ \begin{align*} P(B_2^c)&=1-P(B_2)\\ & = 1-0.8629\\ & = 0.1371 \end{align*} \]
To answer the question about missing both free throws, we must now discuss Conditional Probability.
Estimating Conditional Probablities
Conditional probability is a specific type of probability that deals with the likelihood of an event occurring given that another event has already occurred. It’s denoted as \[ P(A|B) \] which is read as “the probability of A occurring given that B has occurred.”
For Paul George’s free throw attempts, let’s look at the the following conditional probabilities:
- He makes the second shot given he made the first shot.
- He misses the second shot given he made the first shot.
- He makes the second shot given he missed the first shot.
- He misses the second shot given he missed the first shot.
Let \[ \begin{align*} B_1 &= \text{the event that he makes the first shot}\\ B_1^c &= \text{the event that he misses the first shot}\\ B_2 &= \text{the event that he makes the second shot}\\ B_2^c &= \text{the event that he misses the second shot} \end{align*} \] Use this notation to answer the following questions.
Let’s now find the four probabilities listed above using the career data in R
.
We will only focus on trips to the free throw line that consisted of two or three attempts. We do not want to use the data where there was only one free throw attempt since we want to estimate the probability of the second shot conditioned on the first shot. Let’s start by filtering the data to take out the free throw trips that only had one shot.
= career_data |>
free_throw_data filter(num_shots >= 2)
|>
free_throw_data count() |>
print()
# A tibble: 1 × 1
n
<int>
1 3587
We see that we have 3587 free throw attempts once we remove the trips to the free throw line that only had one attempt. Going forward, we will be using this filtered dataframe called free_throw_data
.
To estimate the conditional probability using the Law of Large Numbers, there are two approaches we can use. We can use the conditional probability formula or we can just reduce the dataframe to the conditioned event. Let’s examine both approches.
Using the Conditional Probability Formula
To find a conditional probability, we will use the formula \[ P(A|B)=\frac{P(A\text{ and }B)}{P(B)} \] Let’s start with the first probability above:
- He makes the second shot given he made the first shot.
Using the conditional probability formula, we have \[ P(B_2|B_1)=\frac{P(B_2\text{ and }B_1)}{P(B_1)} \]
Let’s start by finding the probability of making the first shot, \(P(B_1)\). We filter the dataframe to include only the attempts where he made the first shot. We can do this with the filter
function by filtering on shot==1
and ‘make==TRUE’.
= free_throw_data |>
first_made filter(shot==1 & make==TRUE) |>
count()
|>
first_made print()
# A tibble: 1 × 1
n
<int>
1 1456
Now we can divide this value by the total number of first shots. In the free_throw_data
dataframe, this can be determined by just counting the number of times shot==1
. Dividing the number of rows in made_first
by the total number of rows where shot==1
gives us the estimated probability for making the first shot.
= free_throw_data |>
first_total filter(shot==1) |>
count()
|>
first_total print()
# A tibble: 1 × 1
n
<int>
1 1750
= first_made / first_total
first_made_prob
|>
first_made_prob print()
n
1 0.832
So we see that \[ P(B_1) = 0.832 \]
To estimate the probability \(P(B_2\text{ and }B_1)\), we need to filter the data were the first shot is made and the second shot is made in the same trip to the free throw line. Since the data frame is in chronological order, we can assume the previous row in the data frame corresponds to the previous free throw attempt. In R
, we can use the lag
function to refer to the previous row value in our dataframe. The code lag(make)==FALSE
will find the rows that come right after a row where make==FALSE
. In other words, it will find the attempts where the previous shot was a miss.
= free_throw_data |>
make_both filter(shot==2 & make==TRUE & lag(make)==TRUE) |>
count()
= free_throw_data |>
second_total filter(shot==2) |>
count()
= make_both / second_total
make_both_prob
|>
make_both_prob round(4) |>
print()
n
1 0.7223
Thus, we estimate the probability he makes both as \[ P(B_2\text{ and }B_1) = 0.7223 \]
We can now find the conditional probability that he makes the second given he makes the first as \[ \begin{align*} P(B_2|B_1)&=\frac{P(B_2\text{ and }B_1)}{P(B_1)}\\ &=\frac{0.7223}{0.832}\\ & = 0.8681 \end{align*} \]
Filtering the Dataframe to the Conditioned Event
We now will consider an alternative method for estimating the probability \(P(B_2|B_1)\). Instead of using the conditional probability formula, we will just filter the data to be the conditioned event. When you condition on a event, you really are just “reducing the sample space” to the conditioned event.
Let’s start by filtering free_throw_data
to only where the first shot was made and there was a second shot. We first make sure the conditioned event occurred. Since the data is in sequential order, we can just use lag(make)==TRUE
along with shot==2
since this will give us all the second shots where the previous shot (the first shot) was a make.
= free_throw_data |>
first_made_data filter(shot==2 & lag(make)==TRUE)
Of this filtered dataframe, how many did Paul George make the second shot? We can find this with the following code.
= first_made_data |>
second_made filter(make==TRUE) |>
count()
|>
second_made print()
# A tibble: 1 × 1
n
<int>
1 1264
The estimated probability that he makes the second shot given he made the first shot can be found with the following code.
= first_made_data |>
second_made filter(make==TRUE) |>
count()
= second_made / count(first_made_data)
B1_B2_prob
|>
B1_B2_prob round(4) |>
print()
n
1 0.8681
Note that this is the same estimated probability as we found using the conditional probability formula.
Finding the Remaining Conditional probabilities
Let’s now estimate the second probability above:
- He misses the second shot given he made the first shot.
All the code for using the conditional probability formula approach are given below.
= free_throw_data |>
first_made filter(shot==1 & make==TRUE) |>
count()
= free_throw_data |>
first_total filter(shot==1) |>
count()
# P(B1)
= first_made / first_total
first_made_prob
= free_throw_data |>
miss_second_make_first filter(shot==2 & make==FALSE & lag(make)==TRUE) |>
count()
= free_throw_data |>
second_total filter(shot==2) |>
count()
# P(B2^c and B1)
= miss_second_make_first / second_total
miss_second_make_first_prob
#P(B2^c|B1)
= miss_second_make_first_prob / first_made_prob
B_2comp_given_B1
|>
B_2comp_given_B1 round(4) |>
print()
n
1 0.1319
Note: The conditional probability of two events that are complements of each other, and have the same conditioned event can be found using the complement rule. That is, \[ P(B^c|A) = 1 - P(B|A) \] Thus, we could find the probability \(P(B_2^c|B_1)\) as \[ \begin{align*} P(B_2^c|B_1) &= 1 - P(B_2|B_1)\\ & = 1 - 0.8681\\ & = 0.1319 \end{align*} \]
Now let’s estimate the probability by filtering the dataframe to the conditional event.
= free_throw_data |>
first_made_data filter(shot==2 & lag(make)==TRUE)
= first_made_data |>
second_missed filter(make==FALSE) |>
count()
= second_missed / count(first_made_data)
B2comp_B1_prob
|>
B2comp_B1_prob round(4) |>
print()
n
1 0.1319
The remaining two conditional probabilities from above are left for you to do. Try to find them by adapting the code above using the conditional probability formula approach or the filtering to the conditional event approach. The solutions to the following two exercises show both approaches.
Estimating the Probability that Both Shots are Missed
In Exercise 11, you found \[ \begin{align*} P(\text{missed second shot}\vert \text{missed first shot}) & = P(B_2^c|B_1^c)\\ & = 0.1633 \end{align*} \] We want to answer the questions “How unlikely was it for Paul George to miss both free throws?” This question is the probability \[ P(B_2^c\text{ and }B_1^c) \] whereas the fourth conditional probability (found in Exercise 11):
- He misses the second shot given he missed the first shot.
is the probability \[ \begin{align*} P(B_2^c|B_1^c)&=\frac{P(B_2^c\text{ and }B_1^c)}{P(B_1^c)} \end{align*} \]
So what is the difference between these two probabilities in terms of Paul George shooting those two free throws?
Let’s think about the moment right after Paul George was fouled by Mikal Bridges. Paul George is stepping to the line to shoot two free throws. Before he has shot the first attempt, we want to know the probability of missing both. That is the probability \(P(B_2^c\text{ and }B_1^c)\).
Now let’s think about the moment right after he shot the first free throw attempt and missed. He is about to shoot the second attempt. We want to know the probability of missing the second given he has just missed the first. This is the probability \[ \begin{align*} P(B_2^c|B_1^c) = 0.1633 \end{align*} \]
Since we have already estimated the probability \(P(B_2^c|B_1^c)\), we now want to estimate the probability that he missed both shots \(P(B_2^c\text{ and }B_1^c)\). Note that this probability has already been estimated if the conditional probability formula approached was used to find \(P(B_2^c|B_1^c)\). See the solution to Exercise 11.
We see that it is unlikely that Paul George misses both free throws when he goes to the line. In fact, this only happens 2.74% of the time he goes to the free throw line in his career.
Summary
In this module, we have discussed
How to use the Law of Large Numbers to estimate probabilities
How to filter a dataset in R in order to estimate unconditional probabilities
How to estimate conditional probabilities from data
How to estimate the intersection of two events from data
How unlikely it was for Paul George to miss two free throws in a game
Other questions can be asked about Paul George missing those two free throws in that game. For example:
- Does the probability change if we only consider free throws taken in a Playoff game?
- Does the probability change if we only consider the last few minutes of a game?
You can find these probabilities using the data in this module. These questions are left for you to find on your own.