An
Ftest is any statistical test in which the
test statisticIn statistical hypothesis testing, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data that...
has an
Fdistribution under the
null hypothesisThe practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...
.
It is most often used when
comparing statistical modelsModel selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a preexisting set of data is considered...
that have been fit to a
dataThe term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
set, in order to identify the model that best fits the population from which the data were sampled. Exact
Ftests mainly arise when the models have been fit to the data using
least squaresThe method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in solving every...
. The name was coined by
George W. SnedecorGeorge Waddel Snedecor was an American mathematician and statistician. He contributed to the foundations of analysis of variance, data analysis, experimental design, and statistical methodology. Snedecor's F distribution and the George W...
, in honour of Sir Ronald A. Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.
Common examples of Ftests
Examples of Ftests include:
 The hypothesis that the means of several normally distributed populations, all having the same standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
, are equal. This is perhaps the bestknown Ftest, and plays an important role in the analysis of varianceIn statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...
(ANOVA).
 The hypothesis that a proposed regression model fits the data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...
well. See Lackoffit sum of squaresIn statistics, a sum of squares due to lack of fit, or more tersely a lackoffit sum of squares, is one of the components of a partition of the sum of squares in an analysis of variance, used in the numerator in an Ftest of the null hypothesis that says that a proposed model fits well. Sketch of...
.
 The hypothesis that a data set in a regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...
follows the simpler of two proposed linear models that are nested within each other.
 Scheffé's method
In statistics, Scheffé's method, named after Henry Scheffé, is a method for adjusting significance levels in a linear regression analysis to account for multiple comparisons...
for multiple comparisons adjustment in linear models.
Ftest of the equality of two variances
This Ftest is extremely
sensitiveRobust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions. Introduction :...
to
nonnormality. In the
analysis of varianceIn statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...
(ANOVA), alternative tests include
Levene's testIn statistics, Levene's test is an inferential statistic used to assess the equality of variances in different samples. Some common statistical procedures assume that variances of the populations from which different samples are drawn are equal. Levene's test assesses this assumption. It tests the...
,
Bartlett's testIn statistics, Bartlett's test is used to test if k samples are from populations with equal variances. Equal variances across samples is called homoscedasticity or homogeneity of variances. Some statistical tests, for example the analysis of variance, assume that variances are equal across groups...
, and the Brown–Forsythe test. However, when any of these tests are conducted to test the underlying assumption of
homoscedasticityIn statistics, a sequence or a vector of random variables is homoscedastic if all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity...
(i.e. homogeneity of variance), as a preliminary step to testing for mean effects, there is an increase in the experimentwise Type I error rate.
Formula and calculation
Most Ftests arise by considering a decomposition of the
variabilityIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
in a collection of data in terms of
sums of squaresThe partition of sums of squares is a concept that permeates much of inferential statistics and descriptive statistics. More properly, it is the partitioning of sums of squared deviations or errors. Mathematically, the sum of squared deviations is an unscaled, or unadjusted measure of dispersion...
. The
test statisticIn statistical hypothesis testing, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data that...
in an Ftest is the ratio of two scaled sums of squares reflecting different sources of variability. These sums of squares are constructed so that the statistic tends to be greater when the null hypothesis is not true. In order for the statistic to follow the
Fdistribution under the null hypothesis, the sums of squares should be statistically independent, and each should follow a scaled chisquared distribution. The latter condition is guaranteed if the data values are independent and
normally distributed with a common
varianceIn probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...
.
Multiplecomparison ANOVA problems
The Ftest in oneway analysis of variance is used to assess whether the
expected valueIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
s of a quantitative variable within several predefined groups differ from each other. For example, suppose that a medical trial compares four treatments. The ANOVA Ftest can be used to assess whether any of the treatments is on average superior, or inferior, to the others versus the null hypothesis that all four treatments yield the same mean response. This is an example of an "omnibus" test, meaning that a single test is performed to detect any of several possible differences. Alternatively, we could carry out pairwise tests among the treatments (for instance, in the medical trial example with four treatments we could carry out six tests among pairs of treatments). The advantage of the ANOVA Ftest is that we do not need to prespecify which treatments are to be compared, and we do not need to adjust for making
multiple comparisonsIn statistics, the multiple comparisons or multiple testing problem occurs when one considers a set of statistical inferences simultaneously. Errors in inference, including confidence intervals that fail to include their corresponding population parameters or hypothesis tests that incorrectly...
. The disadvantage of the ANOVA Ftest is that if we reject the
null hypothesisThe practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...
, we do not know which treatments can be said to be significantly different from the others — if the Ftest is performed at level α we cannot state that the treatment pair with the greatest mean difference is significantly different at level α.
The formula for the oneway
ANOVA Ftest
statisticIn statistical hypothesis testing, a hypothesis test is typically specified in terms of a test statistic, which is a function of the sample; it is considered as a numerical summary of a set of data that...
is
or
The "explained variance", or "betweengroup variability" is
where
denotes the
sample meanIn mathematics, an average, or central tendency of a data set is a measure of the "middle" value of the data set. Average is one form of central tendency. Not all central tendencies should be considered definitions of average....
in the
i^{th} group,
n_{i} is the number of observations in the
i^{th} group, and
denotes the overall mean of the data.
The "unexplained variance", or "withingroup variability" is
where
Y_{ij} is the
j^{th} observation in the
i^{th} out of
K groups and
N is the overall sample size. This Fstatistic follows the
Fdistribution with
K − 1,
N −
K degrees of freedom under the null hypothesis. The statistic will be large if the betweengroup variability is large relative to the withingroup variability, which is unlikely to happen if the
population meansIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
of the groups all have the same value.
Note that when there are only two groups for the oneway ANOVA Ftest,
F =
t^{2}
where
t is the
Student's t statisticA ttest is any statistical hypothesis test in which the test statistic follows a Student's t distribution if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known...
.
Regression problems
Consider two models, 1 and 2, where model 1 is 'nested' within model 2. Model 1 is the Restricted model, and model two is the Unrestricted one. That is, model 1 has
p_{1} parameters, and model 2 has
p_{2} parameters, where
p_{2} >
p_{1}, and for any choice of parameters in model 1, the same regression curve can be achieved by some choice of the parameters of model 2. (We use the convention that any constant parameter in a model is included when counting the parameters. For instance, the simple linear model
y =
mx +
b has
p = 2 under this convention.) The model with more parameters will always be able to fit the data at least as well as the model with fewer parameters. Thus typically model 2 will give a better (i.e. lower error) fit to the data than model 1. But one often wants to determine whether model 2 gives a
significantly better fit to the data. One approach to this problem is to use an
F test.
If there are
n data points to estimate parameters of both models from, then one can calculate the
F statistic (coefficient of determination), given by
where RSS
_{i} is the
residual sum of squaresIn statistics, the residual sum of squares is the sum of squares of residuals. It is also known as the sum of squared residuals or the sum of squared errors of prediction . It is a measure of the discrepancy between the data and an estimation model...
of model
i. If your regression model has been calculated with weights, then replace RSS
_{i} with χ
^{2}, the weighted sum of squared residuals. Under the null hypothesis that model 2 does not provide a significantly better fit than model 1,
F will have an
F distribution, with (
p_{2} −
p_{1},
n −
p_{2})
degrees of freedomIn statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the...
. The null hypothesis is rejected if the
F calculated from the data is greater than the critical value of the
F distribution for some desired falserejection probability (e.g. 0.05). The Ftest is a Wald test.
Oneway ANOVA example
Consider an experiment to study the effect of three different levels of a factor on a response (e.g. three levels of a fertilizer on plant growth). If we had 6 observations for each level, we could write the outcome of the experiment in a table like this, where
a_{1},
a_{2}, and
a_{3} are the three levels of the factor being studied.
a_{1} 
a_{2} 
a_{3} 
6 
8 
13 
8 
12 
9 
4 
9 
11 
5 
11 
8 
3 
6 
7 
4 
8 
12 
The null hypothesis, denoted H
_{0}, for the overall Ftest for this experiment would be that all three levels of the factor produce the same response, on average. To calculate the Fratio:
Step 1: Calculate the mean within each group:

Step 2: Calculate the overall mean:

 where a is the number of groups.
Step 3: Calculate the "betweengroup" sum of squares:

where n is the number of data values per group.
The betweengroup degrees of freedom is one less than the number of groups

so the betweengroup mean square value is

Step 4: Calculate the "withingroup" sum of squares. Begin by centering the data in each group
a_{1} 
a_{2} 
a_{3} 
6 − 5 = 1 
8 − 9 = 1 
13 − 10 = 3 
8 − 5 = 3 
12 − 9 = 3 
9 − 10 = 1 
4 − 5 = 1 
9 − 9 = 0 
11 − 10 = 1 
5 − 5 = 0 
11 − 9 = 2 
8 − 10 = 2 
3 − 5 = 2 
6 − 9 = 3 
7 − 10 = 3 
4 − 5 = 1 
8 − 9 = 1 
12 − 10 = 2 
The withingroup sum of squares is the sum of squares of all 18 values in this table

The withingroup degrees of freedom is

Thus the withingroup mean square value is
Step 5: The Fratio is

The critical value is the number that the test statistic must exceed to reject the test. In this case, F_{crit}(2,15) = 3.68 at α = 0.05. Since F = 9.3 > 3.68, the results are significantIn statistics, a result is called statistically significant if it is unlikely to have occurred by chance. The phrase test of significance was coined by Ronald Fisher....
at the 5% significance level. One would reject the null hypothesis, concluding that there is strong evidence that the expected values in the three groups differ. The pvalueIn statistical significance testing, the pvalue is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. One often "rejects the null hypothesis" when the pvalue is less than the significance level α ,...
for this test is 0.002.
After performing the Ftest, it is common to carry out some "posthoc" analysis of the group means. In this case, the first two group means differ by 4 units, the first and third group means differ by 5 units, and the second and third group means differ by only 1 unit. The standard error of each of these differences is . Thus the first group is strongly different from the other groups, as the mean difference is more times the standard error, so we can be highly confident that the population meanIn probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...
of the first group differs from the population means of the other groups. However there is no evidence that the second and third groups have different population means from each other, as their mean difference of one unit is comparable to the standard error.
Note F(x, y) denotes an Fdistribution with x degrees of freedom in the numerator and y degrees of freedom in the denominator.
ANOVA's robustness with respect to Type I errors for departures from population normality
The oneway ANOVA can be generalized to the factorial and multivariate layouts, as well as to the analysis of covariance. None of these Ftests, however, are robustRobust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions. Introduction :...
when there are severe violations of the assumption that each population follows the normal distribution, particularly for small alpha levels and unbalanced layouts. Furthermore, if the underlying assumption of homoscedasticity is violated, the Type I error properties degenerate much more severely. For nonparametric alternatives in the factorial layout, see Sawilowsky. For more discussion see ANOVA on ranks.
External links