Updated October 15, 2022

What is a P-Value?

Robinhood Learn

Democratize Finance For All. Our writers’ work has appeared in The Wall Street Journal, Forbes, the Chicago Tribune, Quartz, the San Francisco Chronicle, and more.

Definition:

A p-value is a “probability value” that you use to check the hypothesis that there is no significant relationship between two variables.

🤔 Understanding p-value

When you’re testing a hypothesis, you can use a p-value to determine the statistical significance of your test. If it’s statistically significant, it means the outcome of your test is more likely to be valid and not just a fluke. The assumption that there is not a sufficiently great relationship between two variables is referred to as a “null hypothesis." So, if you can prove that the null hypothesis is most likely false, then you have proof that your hypothesis is probably true. To determine if there’s strong evidence to reject the null hypothesis, you need to choose a benchmark level of significance (typically: 0.10, 0.05, or 0.01). If the p-value of your statistical test is below the selected level of significance, then you have sufficient evidence to dismiss the null hypothesis.

Example

Let’s assume that you’re looking to find the average age of a group of 1,000 students. Your hypothesis might be that the average age of students is 19. So, your null hypothesis is that the average age of students is not 19.

After taking a sample of 100 students, you find that the average age of the sampled group is 18. You then choose a level of significance of 0.05 and calculate the p-value to help test the validity of your null hypothesis (that the population’s average age isn’t 19).

P-value < 0.05: Sufficient statistical evidence to reject the null hypothesis.
P-value ≥ 0.05: Insufficient statistical evidence and can’t reject the null hypothesis.

The p-value doesn’t determine whether or not the null hypothesis is correct. Instead, the p-value indicates the probability of coming across the observed results. At the 0.05 level of significance, you have a 95% probability of observing these results again, which lends weight to the theory that the average student age is not actually 19.

Takeaway

Testing hypotheses with a p-value is like playing Family Feud…

In the TV show, a team tries to find out if its selected answer matches the top ones from a survey of 100 individuals. If enough survey respondents choose the team’s response, then the team is able to back up that its answer is significant to the game. Likewise, p-value acts like Family Feud’s 100-person survey, allowing you to test a null hypothesis statistically.

Ready to start investing?

Certain limitations apply

New customers need to sign up, get approved, and link their bank account. The cash value of the stock rewards may not be withdrawn for 30 days after the reward is claimed. Stock rewards not claimed within 60 days may expire. See full terms and conditions at rbnhd.co/freestock. Securities trading is offered through Robinhood Financial LLC.

Tell me more…

What is the p-value?

A p-value is a probability value used to statistically test the likelihood of an event (e.g., the average price that my customers are willing to pay for shoes is $20). One way to perform hypothesis testing is to “play devil’s advocate”: make an opposite proposition (e.g., the average price that my customers are willing to pay for shoes is not $20) and check if you’re able to reject that idea.

The null hypothesis is the belief that there is no significant relationship between two variables. Meanwhile, the alternative hypothesis is the opposite: there is a significant relationship between two variables.

When you’re ready to gather evidence to dismiss the null hypothesis, then you have a stronger case that an event, typically stated on the alternative hypothesis, is more likely. All of this backed with statistical data.

Calculating the p-value

You can calculate a p-value using a statistical table or software. A critical step is choosing your desired level of confidence. Researchers typically use a 90%, 95%, or 99% confidence level and refer to these confidence levels as 0.10, 0.05, and 0.01, respectively. The higher the confidence level, the more robust the statistical test. For example, a 0.01 confidence level is considered more significant than a 0.05 confidence level.

When using a statistical table to calculate a p-value, you’ll obtain an estimated probability value (e.g., a p-value between 0.05 and 0.01). Some statistical tables provide a more detailed breakdown of p-values than others.

For example, one table may stop at the 99% confidence level, and another goes an extra step to the 99.99% confidence level. If you were to require the exact p-value, you could use software, such as Excel’s Analysis ToolPak add-in, to calculate a highly accurate p-value.

Interpreting the p-value

Once you calculate your p-value, you test it against your target confidence level:

P-value < confidence level: Reject the null hypothesis due to sufficient statistical evidence.
P-value ≥ confidence level: Can’t reject the null hypothesis due to insufficient statistical evidence.

What do p-values tell you about statistical data?

P-values tell you how likely it is that a null hypothesis (the belief that there is no significant relationship between two variables) is true.

If the p-value is below the selected confidence level, then you’re able to take down the null hypothesis. Let’s assume that you choose the conventional 0.05 confidence level and that you find a p-value of 0.03 with your data set.

In this example, the p-value of 0.03 indicates that there’s a 3% probability that the results of your data set are due to chance. The p-value allows you to reject the null hypothesis.

Meanwhile, a p-value above the confidence level tells you that there is a higher probability that the observed results in your data set are due to chance. By choosing your confidence level, you set a benchmark to make an objective decision. At the 0.05 confidence level, all p-values greater than 0.05 are considered to be too random.

For example, a p-value of 0.07 indicates that there is a 7% probability that the observed results are accidental, and you can’t reject the null hypothesis in this case.

How are p-values calculated?

P-value can be calculated using a formula, statistical table, or software (such as Microsoft Excel).

First, let’s set up the following scenario: Jimmy wants to find out if people in his town can recognize his “world-famous Caesar dressing” from the generic one sold by his local grocer.

He selects 50 individuals from his town for a sample test. Each subject tastes dressing from five containers. Only one container contained Jimmy’s Caesar dressing, and the order was randomized.

Jimmy’s null hypothesis is that consumers won’t be able to recognize his “world-famous Caesar dressing.”
The alternative hypothesis is that consumers are able to recognize his special dressing.
Jimmy expects 10 people (one in five) might pick his dressing by chance since it’s one of the five options provided.
When Jimmy runs the sample test with the 50 subjects, 18 of them recognized his “world-famous Caesar dressing.”
The variability of an experiment (often referred to as the experiment’s “degrees of freedom”) is obtained by subtracting one from the number of variables. Since we have only two variables (Jimmy’s dressing and the generic dressing), the experiment’s degrees of freedom is one.
Jimmy selects the 0.05 confidence level.

Now let’s look at two different ways to calculate Jimmy’s p-value:

Example of p-value calculation using the formula and statistical table

Different scenarios require different approaches (e.g., t-test, z-test, chi square test) to calculate the p-value. If you were trying to determine whether or not a sample represents a population, you would use a different p-value calculation approach than if you were trying to establish whether or not two samples are comparable.

To find the p-value of Jimmy’s example, you can use the chi square formula — a measure of the difference between the observed and expected values in an experiment. The “chi square” (𝛘2) formula is:

(18-10)^2 / 10 = 64 / 10 = 6.4

Once you have your chi square value, you’ll use a statistical table to estimate the p-value.

Here’s how to read the table: The first column (df) indicates your degrees of freedom, and you move down the row until you get close to your chi square value. In this example, the degree of freedom is one; the first row indicates that 6.4 falls between 0.975 and 0.99.

The experiment’s p-value is between 0.025 (1 – 0.975) and 0.01 (1 – 0.99). These results are below the 0.05 confidence level, so Jimmy has sufficient evidence to reject the null hypothesis.

The ability of the townspeople to recognize his “world-famous Caesar dressing” from the generic one sold by his local grocer isn’t due to chance.

Example of p-value calculation using Microsoft Excel

A limitation of using a statistical table to calculate the p-value is that you’ll get an estimated value. With the advances in computing, you’re now able to find the exact p-value. One way to calculate a more accurate p-value is Excel.

Using our scenario, you can use the CHISQ.DIST function to find the p-value of 0.01142 (1 – 0.98858).

This p-value is still below the confidence level, so it also supports Jimmy rejecting the null hypothesis.

What are the significant p-values?

When statistically testing a hypothesis, a significant p-value is considered to be any p-value below the selected confidence level. The commonly used confidence levels are 0.1, 0.05, and 0.01, which test hypotheses at 90%, 95%, or 99% confidence, respectively.

If you were to choose a 0.01 level of confidence, then significant p-values would be those smaller than 0.01.

Significant p-values serve as the basis to make an objective decision about a hypothesis. A study’s resulting p-value is compared against the confidence level to decide whether or not to reject the null hypothesis.

P-value < confidence level: Low p-value is significant, and you can reject the null hypothesis.
P-value ≥ confidence: P-value isn’t statistically significant, and you can’t reject the null hypothesis.

What are Type I and Type II errors?

When testing whether or not to reject a hypothesis, you have the possibility of making a wrong decision. The table below summarizes the two possible types of errors in hypothesis testing: Type I error and Type II error.

A Type I error takes place when you incorrectly dismiss the null hypothesis. Typically referred to as the “alpha,” the probability of making a Type I error is determined by the significance level.

For example, if you were to choose a 0.05 significance level, then you have a 5% chance of dismissing a true null hypothesis or a 5% chance of making a Type I error.

A Type II error happens when you don’t reject a false null hypothesis. The probability of making a Type II error is referred to as the “beta.” Using the beta, you can define the “power of a test” — your probability of properly dismissing a false null hypothesis.

While a strong power of a test is desirable, you have an opportunity cost: By decreasing the chance of making a Type II error, you increase the chance of making a Type I error.

What are the problems with using p-values?

One of the main problems with using p-values is that it involves choosing a potentially arbitrary threshold. While 0.05 is often used to test statistical significance, it isn’t a one-size-fits-all benchmark.

P-values are useful test statistics to evaluate a hypothesis, but they can’t be the sole basis for an entire body of research. For the sake of meeting a level of significance, some statisticians may turn to “p-hacking” and engage in questionable statistical methods until something “significant” emerges.

Additionally, the American Statistical Association (ASA) notes that p-values provide data about a specific hypothetical explanation but don’t provide the hypothetical explanation itself.

Correlation doesn’t necessarily mean causation, which may lead to misinterpretation of p-values. The ASA suggests supplementing with hypothesis testing as appropriate and not letting p-values become the end-all of research.

Another problem with using p-value analysis is the probability of making an incorrect rejection of the null hypothesis (Type I error) or failing to reject a false null hypothesis (Type II error).

Ready to start investing?

Certain limitations apply

2461376

What is a T-test?

Updated March 29, 2023

What is Standard Deviation?

Updated March 19, 2021

What is the Weighted Average?

Updated March 29, 2023