MLS - Types of Decision Errors

Decision Errors

Exploring types of decision errors

Author

Affiliation

Jonathan Lieb

Baylor University

Published

March 30, 2025

Facilitation notes

This module would be suitable for an in-class lab in an introductory statistics course or for supplemental self-study.
This module is designed for students just learning about hypothesis testing and is meant to bolster their understanding of errors that can be made in hypothesis testing. It assumes a basic knowledge of what hypothesis testing is.
This module requires no programming knowledge.
The data used to create this module can be downloaded here
Much of the data for the module was derived from an MLS article and fbref.com. The rest of the data concerning the opening of training facilities was individually researched.

Welcome video

Introduction

In this module, we will explore the concept of decision errors in hypothesis testing. We will use MLS (Major League Soccer) training facility data to explore Type I and Type II errors.

Inter Miami CF’s built training facilities that opened in 2020. One of the fields is adjacent to their home stadium and can be seen in the background of the image below.

Image Source: Cornfield948, CC BY-SA 3.0, via Wikimedia Commons

In the last decade (2014-2023) more than half of all MLS teams opened new training facilities, and several more clubs have plans to build new facilities soon. These cutting edge facilities are designed to provide players with the best possible environment to train and develop, hopefully translating to better performance on the field.

But these new facilities come at a high cost. According to the MLS, between 2017 and 2020 over 400 million dollars were spent on training facilities. For reference, Atlanta United FC’s new training facility, which opened in 2017, cost $60 million. Inter Miami, a relatively new expansion team, opted to spend $40 million on their training facilities, which sprawl over 25 acres and include 7 fields.

All this money being spent raises some important questions though. Do these new, state-of-the-art facilities lead to better performance on the field though? What are some potential consequences of claiming they make a difference when they don’t? What are some potential consequences of claiming they don’t make a difference when they do?

We will explore these questions and more as we kick off our exploration of hypothesis testing and decision errors. For our purposes we will consider any team with a training facility that opened in 2014 or later to have a new training facility and any team with a training facility that opened before 2014 to have an old training facility.

Learning Objectives

By the end of this module, you should be able to:

Understand the basic concepts of hypothesis testing
Define Type I errors
Define Type II errors
Explain the relationship between Type I and Type II errors

Research Questions

What could be a consequence of claiming that new training facilities lead to better performance on the field when they don’t?
What could be a consequence of claiming that new training facilities don’t lead to better performance on the field when they do?
What type of error is worse to make?

Terms to Know

ON THE RISE New teams like St. Louis City SC have helped fuel the MLS’s growth. St. Louis City SC joined the league in 2023 and has already made a splash.

Ratings, attendance, and revenue have all increased dramatically over the last decade.

Image Source: IagoQnsi, CC BY-SA 4.0, via Wikimedia Commons

Soccer Vocabulary

Let’s knock out some soccer terminology before we head into hypothesis testing and decision errors.

MLS: Major League Soccer. The top professional soccer league in the United States and Canada.
Match: A soccer game
Points per Match Played: The average number of points a team earns per match played. In MLS, a team earns 3 points for a win, 1 point for a draw, and 0 points for a loss. This is not the amount of goals scored per match.
Training Facility: A facility where a team practices and trains. These facilities can include fields, gyms, locker rooms, and more. Often these teams share facilities with other teams in the organization.
Table: The standings in a league. Teams are ranked by the number of points they have earned. The team with the most points is at the top of the table.

Data Exploration

Let’s take a quick look at the data that will be used for this module and calculate some basic summary statistics. The data includes the points per match played for MLS teams in 2023 and when their current training facility opened. For the sake of this module, we will consider any facility opened within 10 years of 2023 to be a new facility. There are 17 teams with new training facilities and 12 teams with old training facilities in the data.

Fun Fact: The top team in the MLS in 2023, FC Cincinnati, averaged 2.03 points per match. They opened a new training facility in 2019.

Below are the 2023 points per match played for teams with new training facilities.

1.5, 2.03, 1.18, 1, 1.53, 1.21, 1.62, 1.21, 1.85, 1.62, 1.47, 1.29, 1.41, 1.44, 1.15, 1.26, 1.65

What is the mean of the points per match played for teams with new training facilities in 2023?

The points per match played for teams with old training facilities in 2023 are shown below.

1.18, 0.79, 1.68, 1.35, 1.5, 1.06, 1.21, 1.26, 1.26, 1.29, 1.56, 0.65

What is the mean of the points per match played for teams with old training facilities in 2023?

The graph below show the points per match played in 2023 for teams with new training facilities and teams without new training facilities.

Hypothesis Testing

Before we can kick off our exploration of decision errors, we need to understand the basics of hypothesis testing.

Hypothesis testing, a method used to make decisions about a population parameter based on sample data, is a foundational concept in statistics. The decision is made by comparing the sample data results to a null hypothesis. The null hypothesis is generally a statement that there is no effect or no difference than what is assumed to be the true. An alternative hypothesis is a statement that there is an effect or a difference. A hypothesis test can be done for one sample, two sample, or many samples.

Null and Alternative Hypotheses

The null hypothesis is generally denoted by $H_0$ and the alternative hypothesis is generally denoted by $H_a$. For one sample tests, the null hypothesis is often that the population parameter is equal to a specific value. For example $H_0 : \mu = \mu_0$. For multiple sample tests, the null hypothesis often states that there is no difference between the groups. For example $H_0: \mu_{1} = \mu_{2} = ... = \mu_{n}$.

NOTE: All examples in this section are using the population mean ($\mu$) as the parameter of interest. Any population parameter can be tested using hypothesis testing. Other common parameters include the population proportion ($p$) and the population variance ($\sigma^2$).

An example of a hypothesis test using proportions is shown below:

\[H_0: p = p_0\]

\[H_a: p \neq p_0\]

Where $p$ is the population proportion and $p_0$ is the assumed population proportion.

The alternative hypothesis is a statement that there is an effect or a difference. For a one sample test, the alternative hypothesis is generally that the population parameter is not equal to, greater than, or less than a specific value. For example, $H_a: \mu \neq \mu_0$, $H_a: \mu > \mu_0$, or $H_a: \mu < \mu_0$. For a multi-sample test, the alternative hypothesis is generally that there is a difference between the groups. $H_a: \text{ At least one } \mu_i \text{ is different}$.

Setting up a Hypothesis Test

Let’s set up a hypothesis test to determine if a new training facility leads to better performance on the field. We will use points per match played as our measure of performance. The null hypothesis is that there is no difference in points per match played between teams with new training facilities and teams without new training facilities. The alternative hypothesis is that the teams with new training facilities have a higher points per match played than teams without new training facilities. Let $\mu_{new}$ be the population mean points per match played for teams with new training facilities and $\mu_{old}$ be the population mean points per match played for teams without new training facilities.

Which of the following shows the correct null and alternative hypotheses for this test?

$H_0: \mu_{new} = \mu_{old}$, $H_a: \mu_{new} > \mu_{old}$ $H_0: \mu_{new} > \mu_{old}$, $H_a: \mu_{new} = \mu_{old}$ $H_0: \mu_{new} = \mu_{old}$, $H_a: \mu_{new} < \mu_{old}$ $H_0: \mu_{new} = \mu_{old}$, $H_a: \mu_{new} \neq \mu_{old}$

Significance Level and Decisions in Hypothesis Testing

In hypothesis testing, we make a decision to either reject the null hypothesis or fail to reject the null hypothesis. We make this decision based on the sample data and the significance level. The significance level is the probability of making a Type I error, which we will expand on soon. The significance level is denoted by $\alpha$. We often use the probability of observing the sample data given that the null hypothesis is true, called the p-value, to make our decision. If the p-value is less than the significance level, we reject the null hypothesis. If the p-value is greater than the significance level, we fail to reject the null hypothesis.

Click here to read more about p-values and significance levels.

These decisions can be either correct or incorrect. A correct decision occurs when we reject the null hypothesis when the null hypothesis is false or when we fail to reject the null hypothesis when the null hypothesis is true. An incorrect decision occurs when we reject the null hypothesis when the null hypothesis is true or when we fail to reject the null hypothesis when the null hypothesis is false.

All the possible outcomes of a hypothesis test are shown in the table below.

	$H_0$ is True	$H_0$ is False
Do not reject $H_0$	Correct Decision	Type II Error
Reject $H_0$	Type I Error	Correct Decision

Type I Errors

A Type I error occurs when we reject a true null hypothesis. In other words, we conclude that there is an effect when there is no effect. Type I errors are also known as false positives. The probability of making a Type I error is denoted by $\alpha$ and is also known as the significance level. We control the likelihood of making a Type I error by setting the significance level.

When we construct a 95% confidence interval, we set the significance level to 0.05. This indicates that there is a 5% chance of making a Type I error. $\alpha = 0.05$ is the most common significance level used in hypothesis testing.

Quote: “For in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas” - Ronald Fisher

Keep in mind that the significance level can be adjusted based on the context of the hypothesis test and what type of error is more costly. Alphas of 0.05 or 0.01 are not the end-all-be-all significance levels.

In our example of the new training facilities, a Type I error would occur if we conclude that teams with new training facilities have a higher points per match played than teams without new training facilities, when there truly is no difference in points per match played between the two groups.

Exercise 1: Type I Errors

Which of the following could be a potential consequences of making a Type I error in the context of the new training facilities?

Teams may invest millions in new training facilities that don't actually help them win more games Teams may not invest in new training facilities that could help them win more games

Would a Type I error in this context most likely have greater consequences for the team’s finances or the team’s performance?

If alpha = 0.05, what is the percent chance that there is no difference in points per match played between the two groups given that we accept the alternative hypothesis of greater points per match played for teams with new training facilities?

If alpha = 0.01, what is the percent chance of making a Type I error, given we reject the null hypothesis?

If alpha = 0.01, what is the likelihood of making a Type I error if we fail to reject the null hypothesis?

T/F: There a best significance level to always use in hypothesis testing.

Type II Errors

A Type II error occurs when we fail to reject a false null hypothesis. We conclude that there is no effect when there is an effect. Type II errors are also known as false negatives. The probability of making a Type II error is denoted by $\beta$. The value of beta is determined by the power of the test, which is the probability of correctly rejecting a false null hypothesis. Often we want power to be at least 0.80. This means that we want the probability of making a Type II error to be less than 0.20.

\[\beta = 1 - \text{Power}\] The power of a test depends on the sample size, the effect size, and the significance level. As a general rule, increasing the sample size, the effect size, or the significance level will increase the power of a test. The full calculations for power are beyond the scope of this lesson.

In our example of the new training facilities, a Type II error would occur if we conclude that there is no difference in points per match played between teams with new training facilities and teams without new training facilities, when in reality teams with new training facilities have a higher points per match played than teams without new training facilities.

The effect size of our test for comparing points per match played between teams with new training facilities and teams without new training facilities is found using Cohens-d and the power is then calculated using more advanced statistical methods.

Exercise 2: Type II Errors

Which of the following could be a potential consequences of making a Type II error in the context of the new training facilities?

Teams may not invest in new training facilities that could help them win more games Teams may invest millions in new training facilities that don't actually help them win more games

Would a Type II error in this context most likely have greater consequences for the team’s finances or the team’s performance?

Power can sometimes be hard to come by with small samples in hypothesis testing. For example, using the points per match played in 2023 for teams with new training facilities and teams without new training facilities we have 17 teams with new training facilities and 12 teams without new training facilities. With these sample sizes, the samples’ effect size, and an $\alpha = 0.05$, we have a power of 0.46269.

What is the probability of making a Type II error in this instance, given we fail to reject the null hypothesis? (Round to the hundreths)

What is the probability of making a Type II error in this instance, given we reject the null hypothesis? (Round to the hundreths)

What could be done to increase the power of the test?

Relationship between Type I and Type II Errors

There is a trade-off between Type I and Type II errors. If we decrease the probability of making a Type I error, we increase the probability of making a Type II error. If we increase the probability of making a Type I error, we decrease the probability of making a Type II error. This means that we need to carefully consider the consequences of each type of error when designing a hypothesis test.

In our example of the new training facilities, if we increase the significance level from 0.05 to 0.10, our power increases from 0.463 to 0.596. This means that we are more likely to detect a difference in points per match played between teams with new training facilities and teams without new training facilities. However, we are also twice as likely to make a Type I error than previously.

Below is a plot showing this trade-off between Type I and Type II errors in our example of the new training facilities.

In the plot above it can be seen that we never have as much power as we would like for any reasonable Type I error rate.

Exercise 3: Relationship between Type I and Type II Errors

TRUE or FALSE

Sample size affects the probability of making a Type I error.
Sample size affects the probability of making a Type II error.
Changing the significance level affects the probability of making a Type I error.
Changing the significance level affects the probability of making a Type II error.
For a test at the 0.05 significance level, increasing the sample size from 10 to 100 will decrease the probability of making a Type I error.
For a test at the 0.05 significance level, increasing the sample size from 10 to 100 will decrease the probability of making a Type II error.
A hypothesis test at the 0.01 significance level is more likely to result in a Type I error than a hypothesis test at the 0.05 significance level.
Given we hold sample size constant, a hypothesis test at the 0.01 significance level is more likely to result in a Type II error than a hypothesis test at the 0.05 significance level.

What’s Worse: Type I or Type II Errors?

The answer to this question depends on the context of the hypothesis test. In some cases, a Type I error is more serious than a Type II error. In other cases, a Type II error is more serious than a Type I error. Below are two examples related to injuries in sports.

Example where Type I error is more serious than Type II error

Assume that a new recovery method is being tested for athletes with muscle injuries. This method claims to help athletes recover faster from their injuries. The null hypothesis is that the new recovery method takes the same amount of time as traditional recovery methods. The alternative hypothesis is that the new recovery method is faster than traditional recovery methods.

A Type I error in this context would claim that the new recovery method is effective, when in reality it is not. This could result in athletes using the new recovery method and potentially worsening their injuries by returning to play too soon. A Type II error in this context would be when the new recovery method is assumed to be ineffective when it is actually effective. This would result in athletes continuing to recover following traditional recovery timelines.

The risk of more serious injuries and more setbacks in the injury recovery process appears to be more serious than the risk of athletes continuing to follow standard recovery timelines. Therefore, in this context, a Type I error is more serious than a Type II error.

Example where Type II error is more serious than Type I error

An example of a Type II error being more serious than a Type I error can be seen in the context of concussion protocols in sports. A player takes a hard hit to the head during a game. Team physicians run quick tests on the player. The null hypothesis is that the player performs equal on the test to an uninjured player and therefore does not have a concussion, while the alternative hypothesis is that the player performs worse and therefore has a concussion.

A Type I error would assume a player has a concussion when they do not. This could result in the player being taken out of the game unnecessarily, but would not have long term consequences for their health and safety. A Type II error in this context would incorrectly assume a player was fine after a head injury, when they in fact have a concussion. This could result in the player continuing to play when they should not, potentially leading to serious long-term brain damage.

The risk of serious long term brain damage from a concussion is much greater concern than the risk of a player being taken out of a game unnecessarily. Therefore, in this context, a Type II error is more serious than a Type I error.

Exercise 4: Consequences of Errors

Answer the following questions in the context of the new training facilities:

Which type of error would be more likely to hurt/fail to help the performance of the team?

Which type of error would be more costly financially?

Which type of error do you think is more serious in this context: Type I or Type II?

Results of the Hypothesis Test

Using our example of the new training facilities, we can perform a hypothesis test to determine if there is a difference in points per match played between teams with new training facilities and teams without new training facilities. Results of different hypothesis tests with different significance levels and power are shown below. The p-value of the test is 0.03524.

The test used to produce these results is a two-sample t-test for the means of two independent samples. For more information on tests like this click here.

Test NO.	Significance Level	Power	Decision	Possible Error	Possible Error Rate
1	0.05	0.463	Reject Null Hypothesis	Type I Error	0.05
2	0.10	0.596	Reject Null Hypothesis	Type I Error	0.10
3	0.01	0.224	Fail to Reject Null Hypothesis	Type II Error	0.776

Exercise 5: Results of the Hypothesis Test

Which hypothesis test results in a decision that has the highest chance of being an error?

Which hypothesis test has the greatest chance of producing a Type I error?

Which hypothesis test has the greatest chance of making a Type II error?

If you were advising a team owner on whether to invest in new training facilities, what would you recommend?

Conclusion

In this lesson, we learned about Type I and Type II errors in hypothesis testing. We learned that a Type I error is a false positive, and the rate is controlled by the significance level of the test. We learned that a Type II error is a false negative, and the rate is controlled by the power of the test. There is a trade-off between Type I and Type II errors, and the consequences of each type of error depend on the context of the hypothesis test.

Fun Fact: The Seattle Sounders have clearly bought into the idea that new training facilities will help them win more games. Their new facility opened in 2024 and they bid farewell to the old Starfire Training Facility depicted below.

Image Source: Joe Mabel, CC BY-SA 3.0, via Wikimedia Commons

We saw that in our example of new training facilities that if we performed a hypothesis test with alpha = .05, we would conclude that teams with new training facilities have a higher points per match played than teams without new training facilities. This conclusion is based on the p-value of 0.03524, which is less than the significance level of 0.05. However we also learned that there is still a 5% chance that this conclusion is incorrect, and that teams with new training facilities don’t actually perform better. If our decision in this case was incorrect, it could cost the team somewhere in the vicinity of $50 million for new facilities that don’t actually help the team win more games.

However, if we thought that risk were too high we might set our alpha = 0.01 from the beginning in order to lesson our Type I error rate. However this would switch our decision to fail to reject the null hypothesis, and we would conclude that there is no difference in points per match played between teams with new training facilities and teams without new training facilities. This conclusion is based on the p-value of 0.03524, which is greater than the significance level of 0.01. Our power is very weak in this case scenario though and there would be a 77.6% chance that we are making a Type II error. This means that there is a 77.6% chance that we incorrectly conclude that teams with new training facilities do not perform better than teams without new training facilities, when in fact they do, which could prevent the team from performing better on the field and generating more revenue.

If you were to advise a team owner on whether to invest in new training facilities, you would need to carefully consider the consequences of each type of error before forming your hypothesis test design and making a recommendation. It would also be important to communicate the possibility that an error had occurred, particularly if the rate of error is high (such as 77.6%).

Errors in hypothesis testing can have serious consequences, so it is important to carefully consider the significance level and power of the test when designing a hypothesis test. Increasing the sample size is the most effective way to make our hypothesis test more powerful. Just remember, when performing and interpreting hypothesis tests there is always a chance of making an error, and it is important to consider the consequences of each type of error before beginning.