Post-hoc analysis - AbsoluteAstronomy.com

Post-hoc analysis in the context of design

Design of experiments

In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...

and analysis of experiments, refers to looking at the data—after the experiment has concluded—for patterns that were not specified a priori
A priori (statistics)
In statistics, a priori knowledge is prior knowledge about a population, rather than that estimated by recent observation. It is common in Bayesian inference to make inferences conditional upon this knowledge, and the integration of a priori knowledge is the central difference between the Bayesian...

. It is sometimes called by critics data dredging
Data dredging
Data dredging is the inappropriate use of data mining to uncover misleading relationships in data. Data-snooping bias is a form of statistical bias that arises from this misuse of statistics...

to evoke the sense that the more one looks the more likely something will be found. More subtly, each time a pattern in the data is considered, a statistical test

Statistical hypothesis testing

A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...

is effectively performed. This greatly inflates the total number of statistical tests and necessitates the use of multiple testing procedures to compensate. However, this is difficult to do precisely and in fact most results of post-hoc analyses are reported as they are with unadjusted p-values

P-value

In statistical significance testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. One often "rejects the null hypothesis" when the p-value is less than the significance level α ,...

. These p-values must be interpreted in light of the fact that they are a small and selected subset of a potentially large group of p-values. Results of post-hoc analysis should be explicitly labeled as such in reports and publications to avoid misleading readers.

In practice, post-hoc analyses are usually concerned with finding patterns and/or relationships between subgroups

Sample (statistics)

In statistics, a sample is a subset of a population. Typically, the population is very large, making a census or a complete enumeration of all the values in the population impractical or impossible. The sample represents a subset of manageable size...

of sampled populations

Statistical population

A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...

that would otherwise remain undetected and undiscovered were a scientific community to rely strictly upon a priori
A priori (statistics)
In statistics, a priori knowledge is prior knowledge about a population, rather than that estimated by recent observation. It is common in Bayesian inference to make inferences conditional upon this knowledge, and the integration of a priori knowledge is the central difference between the Bayesian...

statistical methods. Post-hoc tests — also known as a posteriori tests — greatly expand the range and capability of methods that can be applied in exploratory research
Exploratory research
Exploratory research is a type of research conducted for a problem that has not been clearly defined. Exploratory research helps determine the best research design, data collection method and selection of subjects. It should draw definitive conclusions only with extreme caution...

. Post-hoc examination strengthens induction

Inductive reasoning

Inductive reasoning, also known as induction or inductive logic, is a kind of reasoning that constructs or evaluates propositions that are abstractions of observations. It is commonly construed as a form of reasoning that makes generalizations based on individual instances...

by limiting the probability that significant effects will seem to have been discovered between subgroups of a population when none actually exist. As it is, many scientific papers are published without adequate, preventative post-hoc control of the Type I Error Rate.

Post-hoc analysis is an important procedure without which multivariate hypothesis testing would greatly suffer, rendering the chances of discovering false positives unacceptably high. Ultimately, post-hoc testing creates better informed scientists who can therefore formulate better, more efficient a priori hypotheses and research designs.

Student Neuman–Keuls post-hoc ANOVA

The Student Newman–Keuls and related tests are often referred to as post hoc. However, an experimenter often plans to test all pairwise comparisons before seeing the data. Therefore these tests are better categorized as a priori.

An example of an analysis often mislabeled as a post-hoc analysis is the Newman–Keuls method

Newman–Keuls method

In statistics, the Newman–Keuls method is a post-hoc test used for comparisons after the performed F-test is found to be significant...

: "A different approach to evaluating a posteriori pairwise comparisons stems from the work of Student (1927), Newman (1939), and Keuls (1952). The Newman–Keuls procedure is based on a stepwise or layer approach to significance testing. Sample means are ordered from the smallest to the largest. The largest difference, which involves means that are r = p steps apart, is tested first at α level of significance; if significant, means that are r = p − 1 steps apart are tested at α level of significance and so on. The Newman–Keuls procedure provides an r-mean significance level equal to α for each group of r ordered means, that is, the probability of falsely rejecting the hypothesis that all means in an ordered group are equal to α. It follows that the concept of error rate applies neither on an experimentwise nor on a per comparison basis–the actual error rate falls somewhere between the two. The Newman–Keuls procedure, like Tukey's procedure, requires equal sample n's.

The critical difference

, that two means separated by r steps must exceed to be declared significant is, according to the Newman–Keuls procedure,

The Newman–Keuls and Tukey procedures require the same critical difference for the first comparison that is tested. The Tukey procedure uses this critical difference for all the remaining tests, whereas the Newman–Keuls procedure reduces the size of the critical difference, depending on the number of steps separating the ordered means. As a result, the Newman–Keuls test is more powerful than Tukey's test. Remember, however, that the Newman–Keuls procedure does not control the experimentwise error rate at α.

Frequently a test of the overall null hypothesis m₁ = m₂ = … = m_p is performed with an F statistic in ANOVA rather than with a range statistic. If the F statistic is significant, Shaffer (1979) recommends using the critical difference

instead of

to evaluate the largest pairwise comparison at the first step of the testing procedure. The testing procedure for all subsequent steps is unchanged. She has shown that the modified procedure leads to greater power at the first step without affecting control of the type I error rate. This makes dissonances, in which the overall null hypothesis is rejected by an F test without rejecting any one of the proper subsets of comparison, less likely."

List of post-hoc tests

Fisher's least significant difference (LSD)
Bonferroni correction
Bonferroni correction
In statistics, the Bonferroni correction is a method used to counteract the problem of multiple comparisons. It was developed and introduced by Italian mathematician Carlo Emilio Bonferroni...
Duncan's new multiple range test
Duncan's new multiple range test
In statistics, Duncan's new multiple range test is a multiple comparison procedure developed by David B. Duncan in 1955. Duncan's MRT belongs to the general class of multiple comparison procedures that use the studentized range statistic qr to compare sets of means.Duncan's new multiple range test...
Friedman test
Friedman test
The Friedman test is a non-parametric statistical test developed by the U.S. economist Milton Friedman. Similar to the parametric repeated measures ANOVA, it is used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row together, then...
Newman–Keuls method
Newman–Keuls method
In statistics, the Newman–Keuls method is a post-hoc test used for comparisons after the performed F-test is found to be significant...
Scheffé's method
Scheffé's method
In statistics, Scheffé's method, named after Henry Scheffé, is a method for adjusting significance levels in a linear regression analysis to account for multiple comparisons...
Tukey's range test

Student Neuman–Keuls post-hoc ANOVA

List of post-hoc tests

See also