All Topics  
P-value

 

   Email Print
   Bookmark   Link






 

P-value



 
 
In statistical
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
 hypothesis testing, the p-value is the probability
Probability

Probability, or wikt:chance, is a way of expressing knowledge or belief that an Event will occur or has occurred. In mathematics the concept has been given an exact meaning in probability theory, that is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy to draw conclusions about t...
 of obtaining a result at least as extreme as the one that was actually observed, assuming that the null hypothesis
Null hypothesis

In statistics, a null hypothesis is a concept which arises in the context of statistical hypothesis testing. A common convention is to use the symbol H0 to denote the null hypothesis....
 is true. The fact that p-values are based on this assumption is crucial to their correct interpretation.

More technically, a p-value of an experiment is a random variable defined over the sample space
Sample space

In probability theory, the sample space or universal sample space, often denoted S, O, or U , of an experiment or random trial and error is the set of all possible outcomes....
 of the experiment such that its distribution under the null hypothesis is uniform on the interval [0,1].






Discussion
Ask a question about 'P-value'
Start a new discussion about 'P-value'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In statistical
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
 hypothesis testing, the p-value is the probability
Probability

Probability, or wikt:chance, is a way of expressing knowledge or belief that an Event will occur or has occurred. In mathematics the concept has been given an exact meaning in probability theory, that is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy to draw conclusions about t...
 of obtaining a result at least as extreme as the one that was actually observed, assuming that the null hypothesis
Null hypothesis

In statistics, a null hypothesis is a concept which arises in the context of statistical hypothesis testing. A common convention is to use the symbol H0 to denote the null hypothesis....
 is true. The fact that p-values are based on this assumption is crucial to their correct interpretation.

More technically, a p-value of an experiment is a random variable defined over the sample space
Sample space

In probability theory, the sample space or universal sample space, often denoted S, O, or U , of an experiment or random trial and error is the set of all possible outcomes....
 of the experiment such that its distribution under the null hypothesis is uniform on the interval [0,1]. Many p-values can be defined for the same experiment.

Coin flipping example


For example, an experiment is performed to determine whether a coin flip is fair
Fair coin

In probability theory and statistics, a sequence of statistical independence Bernoulli trials with probability 1/2 of success on each trial is metaphorically called a fair coin....
 (50% chance of landing heads or tails) or unfairly biased, either toward heads (> 50% chance of landing heads) or toward tails (< 50% chance of landing heads). (A bent coin produces biased results.)

Since we consider both biased alternatives, a two-tailed test
Two-tailed test

The two-tailed test is a statistical test used in Statistical inference, in which a given statistical hypothesis testing will be rejected when the value of the statistic is either sufficiently small or sufficiently large....
 is performed. The null hypothesis is that the coin is fair, and that any deviations from the 50% rate can be ascribed to chance alone.

Suppose that the experimental results show the coin turning up heads 14 times out of 20 total flips. The p-value of this result would be the chance of a fair coin landing on heads at least 14 times out of 20 flips plus the chance of a fair coin landing on tails 14 or more times out of 20 flips. In this case the random variable T has a binomial distribution
Binomial distribution

In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n statistical independence yes/no experiments, each of which yields success with probability p....
. The probability that 20 flips of a fair coin would result in 14 or more heads is 0.0577. By symmetry, the probability that 20 flips of the coin would result in 14 or more tails (alternatively, 6 or fewer heads) is the same, 0.0577. Thus, the p-value for the coin turning up the same face 14 times out of 20 total flips is 0.0577 + 0.0577 = 0.1154 .

Interpretation

Generally, one rejects the null hypothesis
Null hypothesis

In statistics, a null hypothesis is a concept which arises in the context of statistical hypothesis testing. A common convention is to use the symbol H0 to denote the null hypothesis....
 if the p-value is smaller than or equal to the significance level, often represented by the Greek letter a (alpha
Alpha

Alpha may refer to:...
). If the level is 0.05, then the results are only 5% likely to be as extraordinary as just seen, given that the null hypothesis is true.

In the above example we have:
  • null hypothesis (H0) — fair coin;
  • observation (O) — 14 heads out of 20 flips; and
  • probability (p-value) of observation (O) given H0 — p(O | H0) = 0.0577 × 2 (two-tailed
    Two-tailed test

    The two-tailed test is a statistical test used in Statistical inference, in which a given statistical hypothesis testing will be rejected when the value of the statistic is either sufficiently small or sufficiently large....
    ) = 0.1154 (percentage expressed as 11.54%).
The calculated p-value exceeds 0.05, so the observation is consistent with the null hypothesis — that the observed result of 14 heads out of 20 flips can be ascribed to chance alone — as it falls within the range of what would happen 95% of the time were this in fact the case. In our example, we fail to reject the null hypothesis at the 5% level. Although the coin did not fall evenly, the deviation from expected outcome is just small enough to be reported as being "not statistically significant at the 5% level".

However, had a single extra head been obtained, the resulting p-value (two-tailed) would be 0.0414 (4.14%). This time the null hypothesis - that the observed result of 15 heads out of 20 flips can be ascribed to chance alone - is rejected. Such a finding would be described as being "statistically significant at the 5% level".

Critics of p-values point out that the criterion used to decide "statistical significance" is based on the somewhat arbitrary choice of level (often set at 0.05). A proposed replacement for the p-value is p-rep
P-rep

P-rep or has been proposed as a statistical alternative to the classic p-value. Whereas a p-value is the probability of obtaining a result under the null hypothesis, p-rep is supposed to represent the probability of replicating an effect....
. It is necessary to use a reasonable null hypothesis
Null hypothesis

In statistics, a null hypothesis is a concept which arises in the context of statistical hypothesis testing. A common convention is to use the symbol H0 to denote the null hypothesis....
 to assess the result fairly. The choice of null hypothesis entails assumptions.

Frequent misunderstandings


The conclusion obtained from comparing the p-value to a significance level yields two results: either the null hypothesis is rejected, or the null hypothesis cannot be rejected at that significance level. You cannot accept the null hypothesis simply by the comparison just made (11% > 5%); there are alternative tests that have to be performed, such as some "goodness of fit
Goodness of fit

The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question....
" tests. It would be very irresponsible to conclude that the null hypothesis needs to be accepted based on the simple fact that the p-value is larger than the significance level chosen.

The use of p-values is widespread; however, such use has come under heavy criticism due both to its inherent shortcomings and the potential for misinterpretation.

There are several common misunderstandings about p-values.

  1. The p-value is not the probability that the null hypothesis
    Null hypothesis

    In statistics, a null hypothesis is a concept which arises in the context of statistical hypothesis testing. A common convention is to use the symbol H0 to denote the null hypothesis....
     is true. (This false conclusion is used to justify the "rule" of considering a result to be significant if its p-value is very small (near zero).)
    In fact, frequentist statistics does not, and cannot, attach probabilities to hypotheses. Comparison of Bayesian
    Bayesian probability

    Bayesian probability interprets the concept of probability as 'a measure of a state of knowledge' , and not as a frequentist . Broadly speaking, there are two views on Bayesian probability that interpret the 'state of knowledge' concept in different ways....
     and classical approaches shows that a p-value can be very close to zero while the posterior probability
    Posterior probability

    The posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant Scientific evidence is taken into account....
     of the null is very close to unity. This is the Jeffreys-Lindley paradox.
  2. The p-value is not the probability that a finding is "merely a fluke." (Again, this conclusion arises from the "rule" that small p-values indicate significant differences.)
    As the calculation of a p-value is based on the assumption that a finding is the product of chance alone, it patently cannot also be used to gauge the probability of that assumption being true. This is subtly different from the real meaning which is that the p-value is the chance that null hypothesis explains the result: the result might not be "merely a fluke," and be explicable by the null hypothesis with confidence equal to the p-value.
  3. The p-value is not the probability of falsely rejecting the null hypothesis. This error is a version of the so-called prosecutor's fallacy
    Prosecutor's fallacy

    The prosecutor's fallacy is any of several fallacy of statistical reasoning often used in legal arguments. Two of the most common errors are described below:...
    .
  4. The p-value is not the probability that a replicating experiment would not yield the same conclusion.
  5. 1 − (p-value) is not the probability of the alternative hypothesis being true (see (1)).
  6. The significance level of the test is not determined by the p-value.
    The significance level of a test is a value that should be decided upon by the agent interpreting the data before the data are viewed, and is compared against the p-value or any other statistic calculated after the test has been performed.
  7. The p-value does not indicate the size or importance of the observed effect (compare with effect size
    Effect size

    In statistics, effect size is a measure of the strength of the relationship between two variables. In scientific experiments, it is often useful to know not only whether an experiment has a statistical significance effect, but also the size of any observed effects....
    ).


See also

  • Counternull
    Counternull

    In statistics, and especially in the statistical analysis of psychology data, the counternull is a statistic used to aid the understanding and presentation of research results....
  • Statistical hypothesis testing
    Statistical hypothesis testing

    A statistical hypothesis test is a method of making statistical decisions using experimental data. It is sometimes called confirmatory data analysis, in contrast to exploratory data analysis....
  • Binomial test
    Binomial test

    In statistics, the binomial test is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories....


Additional reading

  • Dallal GE (2007)
  • Hubbard R, Armstrong JS (2005) (PDF)
  • Fisher's method
    Fisher's Method

    In statistics, Fisher's method, also known as Fisher's combined probability test, developed by and named for Ronald Fisher, is a data fusion or "meta-analysis" technique for combining the results from a variety of Statistical independence tests bearing upon the same overall hypothesis as if in a single large test....
     for combining independent
    Statistical independence

    In probability theory, to say that two event s are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs....
     test
    Experiment

    In scientific inquiry, an experiment is a method of investigating causal relationships among variables. An experiment is a cornerstone of the empiricism approach to acquiring data about the world and is used in both natural sciences and social sciences....
    s of significance using their p-values
  • Dallal GE (2007) (A good tutorial)


External links

  • for various specific tests (chi-square, fisher's F-test, etc).
  • , including a Java applet that illustrates how the numerical values of p-values can give quite misleading impressions about the truth or falsity of the hypothesis under test.