The practice of
scienceScience is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe...
involves formulating and testing
hypothesesA hypothesis is a proposed explanation for a phenomenon. The term derives from the Greek, ὑποτιθέναι – hypotithenai meaning "to put under" or "to suppose". For a hypothesis to be put forward as a scientific hypothesis, the scientific method requires that one can test it...
, assertions that are
capable of being proven falseFalsifiability or refutability of an assertion, hypothesis or theory is the logical possibility that it can be contradicted by an observation or the outcome of a physical experiment...
using a test of observed data. The
null hypothesis typically corresponds to a general or default position. For example, the null hypothesis might be that there is no relationship between two measured phenomena or that a potential treatment has no effect.
The term was originally coined by
EnglishEngland is a country that is part of the United Kingdom. It shares land borders with Scotland to the north and Wales to the west; the Irish Sea is to the north west, the Celtic Sea to the south west, with the North Sea to the east and the English Channel to the south separating it from continental...
geneticistA geneticist is a biologist who studies genetics, the science of genes, heredity, and variation of organisms. A geneticist can be employed as a researcher or lecturer. Some geneticists perform experiments and analyze data to interpret the inheritance of skills. A geneticist is also a Consultant or...
and statistician
Ronald FisherSir Ronald Aylmer Fisher FRS was an English statistician, evolutionary biologist, eugenicist and geneticist. Among other things, Fisher is well known for his contributions to statistics by creating Fisher's exact test and Fisher's equation...
in 1935.
It is typically paired with a second hypothesis, the alternative hypothesis, which asserts a particular relationship between the phenomena.
Jerzy NeymanJerzy Neyman , born Jerzy Spława-Neyman, was a Polish American mathematician and statistician who spent most of his professional career at the University of California, Berkeley.-Life and career:...
and
Egon PearsonEgon Sharpe Pearson, CBE FRS was the only son of Karl Pearson, and like his father, a leading British statistician....
formalized the notion of the alternative. The alternative need not be the logical negation of the null hypothesis; it predicts the results from the experiment if the alternative hypothesis is true. The use of alternative hypotheses was not part of Fisher's formulation, but became standard.
It is important to understand that the
null hypothesis can never be proven. A set of data can only
reject a null hypothesis or
fail to reject it. For example, if comparison of two groups (e.g.: treatment, no treatment) reveals no statistically significant difference between the two, it does not mean that there is no difference in reality. It only means that there is not enough evidence to reject the null hypothesis (in other words, one
fails to reject the null hypothesis).
Principle
Hypothesis testing works by
collecting dataIn statistics and survey methodology, sampling is concerned with the selection of a subset of individuals from within a population to estimate characteristics of the whole population....
and measuring how likely the particular set of data is, assuming the null hypothesis is true. If the data-set is very unlikely, defined as belonging to a set of data that only rarely will be observed (usually in less than either 5% of the time or 1% of the time), the experimenter rejects the null hypothesis concluding it (probably) is false. If the data do not contradict the null hypothesis, then only a weak conclusion can be made; namely that the observed dataset provides no strong evidence against the null hypothesis. As the null hypothesis could be true or false, in this case, in some contexts this is interpreted as meaning that the data give insufficient evidence to make any conclusion, on others it means that there is no evidence to support changing from a currently useful regime to a different one.
For instance, a certain drug may reduce the chance of having a heart attack. Possible null hypotheses are "this drug does not reduce the chances of having a heart attack" or "this drug has no effect on the chances of having a heart attack". The test of the hypothesis consists of administering the drug to half of the people in a study group as a controlled experiment. If the data show a statistically significant change in the people receiving the drug, the null hypothesis is rejected.
Choice of H_{0}
The choice of null hypothesis (
H_{0}) and consideration of directionality (see "one-tailed test") is critical. Consider the question of whether a tossed coin is fair (i.e. that on average it lands heads up 50% of the time). A potential null hypothesis is "this coin is not biased towards heads" (one-tail test). The experiment is to repeatedly toss the coin. A possible result of 5 tosses is 5 heads. Under this null hypothesis, the data are considered unlikely (with a fair coin, the probability of this is 3%). The data refute the null hypothesis: the coin is biased.
Alternatively, the null hypothesis, "this coin is fair" allows runs of tails as well as heads, increasing the probability of 5 of a kind to 6% (two-tail test), which is no longer statistically significant, preserving the null hypothesis.
This example illustrates one hazard of hypothesis testing: evaluating a large number of true null hypotheses against a single dataset is likely to spuriously reject some of them because of the inevitable noise in the data. However, formulating the null hypothesis before collecting data
rejects a true null hypothesisIn statistical test theory the notion of statistical error is an integral part of hypothesis testing. The test requires an unambiguous statement of a null hypothesis, which usually corresponds to a default "state of nature", for example "this person is healthy", "this accused is not guilty" or...
only a small percent of the time.
Testing for differences
In scientific and medical research, null hypotheses play a major role in testing the significance of differences in treatment and
controlScientific control allows for comparisons of concepts. It is a part of the scientific method. Scientific control is often used in discussion of natural experiments. For instance, during drug testing, scientists will try to control two groups to keep them as identical and normal as possible, then...
groups. This use, while widespread, offers several grounds for criticism, including
straw manA straw man is a component of an argument and is an informal fallacy based on misrepresentation of an opponent's position, twisting his words or by means of [false] assumptions...
,
Bayesian criticismIn statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
and
publication biasPublication bias is the tendency of researchers, editors, and pharmaceutical companies to handle the reporting of experimental results that are positive differently from results that are negative or inconclusive, leading to bias in the overall published literature...
.
The typical null hypothesis at the outset of the experiment is that no difference exists between the control and experimental groups (for the variable being compared). Other possibilities include:
- that values in samples from a given population can be modeled using a certain family of statistical distributions.
- that the variability
The term variability, "the state or characteristic of being variable", describes how spread out or closely clustered a set of data is. This may be applied to many different subjects:*Climate variability...
of data in different groups is the same, although they may be centered around different values.
Example
Given the test scores of two random samples of men and women, does one group differ from the other? A possible null hypothesis is that the mean male score is the same as the mean female score:
- H_{0}: μ_{1} = μ_{2}
where:
- H_{0} = the null hypothesis
- μ_{1} = the mean of population 1, and
- μ_{2} = the mean of population 2.
A stronger null hypothesis is that the two samples are drawn from the same population, such that the variance and shape of the distributions are also equal.
Terminology
Much of the terminology used in connection with null hypotheses derives from the immediate relation to
statistical hypothesis testingA statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
; part of this terminology is outlined here, but see this list of definitions for a more complete set.
Simple hypothesis : Any hypothesis which specifies the population distribution completely.
Composite hypothesis : Any hypothesis which does
not specify the population distribution completely.
A
point hypothesis is more complicated to describe. The term arises in contexts where the set of all possible population distributions is put in parametric form. A point hypothesis is one where exact values are specified for either all the parameters or for a subset of the parameters. Formally, the case where only a subset of parameters is defined is still a composite hypothesis; nonetheless, the term point hypothesis is often applied in such cases, particularly where the hypothesis test can be structured in such a way that the distribution of the test statistic (the distribution under the null hypothesis) does not depend on the parameters whose values have not been specified under the point null hypothesis. Careful treatments of point hypotheses for subsets of parameters do consider them as composite hypotheses and study how the
p-valueIn statistical significance testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. One often "rejects the null hypothesis" when the p-value is less than the significance level α ,...
for a fixed
critical value-Differential topology:In differential topology, a critical value of a differentiable function between differentiable manifolds is the image ƒ in N of a critical point x in M.The basic result on critical values is Sard's lemma...
of the test statistic varies with the parameters that are not specified by the null hypothesis.
A
one-tailed hypothesis is a hypothesis in which the value of a parameter is specified as being either:
- above a certain value, or
- below a certain value.
An example of a one-tailed null hypothesis would be that, in a medical context, an existing treatment, A, is no worse than a new treatment, B. The corresponding alternative hypothesis would be that B is better than A. Here if the null hypothesis were accepted (i.e. there is no reason to reject the hypothesis that A is at least as good as B), the conclusion would be that treatment A should continue to be used. If the null hypothesis were rejected, the result would be that treatment B would used in future, given that there is evidence that it is better than A. A hypothesis test would look for evidence that B is better than A, not for evidence that the outcomes of treatments A and B are different. Formulating the hypothesis as a "better than" comparison is said to give the hypothesis
directionality.
Directionality
Quite often statements of point null hypotheses appear not to have a "directionality", namely, that values larger or smaller than a hypothesized value are conceptually identical. However, null hypotheses can and do have "direction"—in many instances statistical theory allows the formulation of the test procedure to be simplified, thus the test is equivalent to testing for an exact identity. For instance, when formulating a one-tailed alternative hypothesis,
application of Drug A will lead to increased growth in patients, then the true null hypothesis is the opposite of the alternative hypothesis, i.e.
application of Drug A will not lead to increased growth in patients (a composite null hypothesis). The effective null hypothesis will be
application of Drug A will have no effect on growth in patients (a point null hypothesis).
In order to understand why the effective null hypothesis is valid, it is instructive to consider the above hypotheses. The alternative predicts that exposed patients experience increased growth compared to the control group. That is,
- H_{1}: μ_{drug} > μ_{control} (where μ = the patients' mean growth)
The true null hypothesis is:
- H_{T}: μ_{drug} ≤ μ_{control}
The effective null hypothesis is:
- H_{0}: μ_{drug} = μ_{control}
The reduction occurs because, in order to gauge support for the alternative, classical hypothesis testing requires calculating how often the results would be as or more extreme than the observations. This requires measuring the probability of rejecting the null hypothesis for each possibility it includes, and second to ensure that these probabilities are all less than or equal to the test's quoted significance level. For reasonable test procedures the largest such probability occurs on the region boundary
H_{T}, specifically for the cases included in
H_{0} only. Thus the test procedure can be defined (that is the critical values can be defined) for testing the null hypothesis
H_{T} exactly as if the null hypothesis of interest was the reduced version
H_{0}.
Fisher said, "the null hypothesis must be exact, that is free of vagueness and ambiguity, because it must supply the basis of the 'problem of distribution,' of which the test of significance is the solution", implying a more restrictive domain for
H_{0}. According to this view, the null hypothesis must be numerically exact—it must state that a particular quantity or difference is equal to a particular number. In classical science, it is most typically the statement that there is
no effect of a particular treatment; in observations, it is typically that there is
no difference between the value of a particular measured variable and that of a prediction. The majority of null hypotheses in practice do not meet this "exactness" criterion. For example, consider the usual test that two means are equal where the true values of the variances are unknown—exact values of the variances are not specified.
Most statisticians believe that it is valid to state direction as a part of null hypothesis, or as part of a null hypothesis/alternative hypothesis pair. The logic is quite simple: if the direction is omitted, then if the null hypothesis is not rejected it is quite confusing to interpret the conclusion. For example, consider an
H_{0} that claims the population , with the one-tailed alternative . If the sample evidence obtained through
x-bar equals −200 and the corresponding t-test statistic equals −50, what is the conclusion? Not enough evidence to reject the null hypothesis? Surely not! But we cannot accept the one-sided alternative in this case. Therefore, to overcome this ambiguity, it is better to include the direction of the effect if the test is one-sided. The statistical theory required to deal with the simple cases dealt with here, and more complicated ones, makes use of the concept of an unbiased test.
Sample size
Statistical hypothesis testing involves performing the same experiment on multiple subjects. The number of subjects is known as the
sample sizeSample size determination is the act of choosing the number of observations to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample...
. The properties of the procedure depends on the sample size. Even if a null hypothesis does not hold for the population, an insufficient sample size may prevent its rejection. If sample size is under a researcher's control, a good choice depends on the
statistical powerThe power of a statistical test is the probability that the test will reject the null hypothesis when the null hypothesis is actually false . The power is in general a function of the possible distributions, often determined by a parameter, under the alternative hypothesis...
of the test, the
effect sizeIn statistics, an effect size is a measure of the strength of the relationship between two variables in a statistical population, or a sample-based estimate of that quantity...
that the test must reveal and the desired significance level. The significance level is the probability of rejecting the null hypothesis when the null hypothesis holds in the population. The statistical power is the probability of rejecting the null hypothesis when it does not hold in the population (i.e., for a particular effect size).
External links