In
statisticsStatistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, a
metaanalysis combines the results of several studies that address a set of related research hypotheses. In its simplest form, this is normally by identification of a common measure of
effect sizeIn statistics, an effect size is a measure of the strength of the relationship between two variables in a statistical population, or a samplebased estimate of that quantity...
, for which a weighted average might be the output of a metaanalyses. Here the weighting might be related to sample sizes within the individual studies. More generally there are other differences between the studies that need to be allowed for, but the general aim of a metaanalysis is to more powerfully estimate the true "effect size" as opposed to a smaller "effect size" derived in a single study under a given single set of assumptions and conditions.
Metaanalyses are often, but not always, important components of a
systematic reviewA systematic review is a literature review focused on a research question that tries to identify, appraise, select and synthesize all high quality research evidence relevant to that question. Systematic reviews of highquality randomized controlled trials are crucial to evidencebased medicine...
procedure. Here it is convenient to follow the terminology used by the
Cochrane CollaborationThe Cochrane Collaboration is a group of over 28,000 volunteers in more than 100 countries who review the effects of health care interventions tested in biomedical randomized controlled trials. A few more recent reviews have also studied the results of nonrandomized, observational studies...
, and use "metaanalysis" to refer to statistical methods of combining evidence, leaving other aspects of 'research synthesis' or 'evidence synthesis', such as combining information from qualitative studies, for the more general context of
systematic reviewA systematic review is a literature review focused on a research question that tries to identify, appraise, select and synthesize all high quality research evidence relevant to that question. Systematic reviews of highquality randomized controlled trials are crucial to evidencebased medicine...
s.
The term "metaanalysis" was coined by
Gene V. GlassGene V Glass is an American statistician and researcher working in educational psychology and the social sciences. He coined the term "metaanalysis" and illustrated its use in 1976 while a faculty member at the University of Colorado Boulder...
.
History
The first metaanalysis was performed by
Karl PearsonKarl Pearson FRS was an influential English mathematician who has been credited for establishing the disciplineof mathematical statistics....
in 1904, in an attempt to overcome the problem of reduced
statistical powerThe power of a statistical test is the probability that the test will reject the null hypothesis when the null hypothesis is actually false . The power is in general a function of the possible distributions, often determined by a parameter, under the alternative hypothesis...
in studies with small sample sizes; analyzing the results from a group of studies can allow more accurate data analysis. However, the first metaanalysis of all conceptually identical experiments concerning a particular research issue, and conducted by independent researchers, has been identified as the 1940 booklength publication
Extrasensory perception after sixty years, authored by Duke University psychologists
J. G. PrattJoseph Gaither Pratt was an American psychologist who specialized in the field of parapsychology. Among his research interests were extrasensory perception, psychokinesis, mediumship, poltergeists and psi....
,
J. B. RhineJoseph Banks Rhine was a botanist who later developed an interest in parapsychology and psychology. Rhine founded the parapsychology lab at Duke University, the Journal of Parapsychology, and the Foundation for Research on the Nature of Man...
, and associates. This encompassed a review of 145 reports on ESP experiments published from 1882 to 1939, and included an estimate of the influence of unpublished papers on the overall effect (the
filedrawer problem). Although metaanalysis is widely used in
epidemiologyEpidemiology is the study of healthevent, healthcharacteristic, or healthdeterminant patterns in a population. It is the cornerstone method of public health research, and helps inform policy decisions and evidencebased medicine by identifying risk factors for disease and targets for preventive...
and
evidencebased medicineEvidencebased medicine or evidencebased practice aims to apply the best available evidence gained from the scientific method to clinical decision making. It seeks to assess the strength of evidence of the risks and benefits of treatments and diagnostic tests...
today, a metaanalysis of a medical treatment was not published until 1955. In the 1970s, more sophisticated analytical techniques were introduced in
educational researchEducational research refers to a variety of methods, in which individuals evaluate different aspects of education including but not limited to: “student learning, teaching methods, teacher training, and classroom dynamics”....
, starting with the work of
Gene V. GlassGene V Glass is an American statistician and researcher working in educational psychology and the social sciences. He coined the term "metaanalysis" and illustrated its use in 1976 while a faculty member at the University of Colorado Boulder...
,
Frank L. SchmidtFrank L. Schmidt is an American psychology professor known for his work in personnel selection and employment testing. Schmidt is a researcher in the area of industrial and organizational psychology with the most number of publications in the two major journals in the 1980s...
and
John E. HunterJohn E. "Jack" Hunter was an American psychology professor known for his work in methodology. His bestknown work is Methods of MetaAnalysis: Correcting Error and Bias in Research Findings. The American Communication Association named a research award in his honor.Hunter received his Ph.D...
.
Gene V Glass was the first modern statistician to formalize the use of metaanalysis, and is widely recognized as the modern founder of the method. The online
Oxford English DictionaryThe Oxford English Dictionary , published by the Oxford University Press, is the selfstyled premier dictionary of the English language. Two fully bound print editions of the OED have been published under its current name, in 1928 and 1989. The first edition was published in twelve volumes , and...
lists the first usage of the term in the statistical sense as 1976 by Glass. The statistical theory surrounding metaanalysis was greatly advanced by the work of
Nambury S. RajuNambury S. Raju was an American psychology professor known for his work in psychometrics, metaanalysis, and utility theory...
,
Larry V. HedgesLarry V. Hedges is a researcher in statistical methods for metaanalysis and evaluation of education policy. He is Professor of Statistics and Social Policy, Institute for Policy Research, Northwestern University. Previously, he was the Stella M...
, Harris Cooper,
Ingram OlkinIngram Olkin is a professor emeritus and chair of statistics and education at Stanford University and the Stanford University School of Education...
,
John E. HunterJohn E. "Jack" Hunter was an American psychology professor known for his work in methodology. His bestknown work is Methods of MetaAnalysis: Correcting Error and Bias in Research Findings. The American Communication Association named a research award in his honor.Hunter received his Ph.D...
, Jacob Cohen,
Thomas C. ChalmersThomas Clark Chalmers, MD, FACP was famous for his role in the development of the randomized controlled trial and metaanalysis in medical research....
,
Robert RosenthalRobert Rosenthal is Distinguished Professor of Psychology at the University of California, Riverside. His interests include selffulfilling prophecies, which he explored in a wellknown study of the Pygmalion Effect: the effect of teachers' expectations on students.Rosenthal was born in Giessen,...
and
Frank L. SchmidtFrank L. Schmidt is an American psychology professor known for his work in personnel selection and employment testing. Schmidt is a researcher in the area of industrial and organizational psychology with the most number of publications in the two major journals in the 1980s...
.
Advantages of metaanalysis
Advantages of metaanalysis (e.g. over classical literature reviews, simple overall means of effect sizes etc.) include:
 Shows if the results are more varied than what is expected from the sample diversity
 Derivation and statistical testing of overall factors / effect size parameters in related studies
 Generalization to the population of studies
 Ability to control for betweenstudy variation
 Including moderators to explain variation
 Higher statistical power to detect an effect than in 'n=1 sized study sample'
 Deal with information overload: the high number of articles published each year.
 It combines several studies and will therefore be less influenced by local findings than single studies will be.
 Makes it possible to show if a publication bias
Publication bias is the tendency of researchers, editors, and pharmaceutical companies to handle the reporting of experimental results that are positive differently from results that are negative or inconclusive, leading to bias in the overall published literature...
exists.
Steps in a metaanalysis
1. Formulation of the problem
2. Search of literature
3. Selection of studies ('incorporation criteria')
 Based on quality criteria, e.g. the requirement of randomization and blinding in a clinical trial
 Selection of specific studies on a wellspecified subject, e.g. the treatment of breast cancer.
 Decide whether unpublished studies are included to avoid publication bias (file drawer problem)
4. Decide which dependent variables or summary measures are allowed. For instance:
 Differences (discrete data)
 Means (continuous data)
 Hedges' g is a popular summary measure for continuous data that is standardized in order to eliminate scale differences, but it incorporates an index of variation between groups:
in which
is the treatment mean,
is the control mean,
the pooled variance.
5. Model selection (see next paragraph)
For reporting guidelines, see QUOROM statement
Metaregression models
Generally, three types of models can be distinguished in the literature on metaanalysis: simple regression, fixed effect metaregression and random effects metaregression.
Simple regression
The model can be specified as

Where
is the effect size in study
and
(intercept) the estimated overall effect size. The variables
specify different characteristics of the study,
specifies the between study variation. Note that this model does not allow specification of within study variation.
Fixedeffect metaregression
Fixedeffect metaregression assumes that the true effect size
is normally distributed with
where
is the within study variance of the effect size. A fixed effect metaregression model thus allows for within study variability, but no between study variability because all studies have the identical expected fixed effect size
, i.e.
. ***Note that for the "fixedeffect" no plural is used (in contrast to "randomeffects") as only ONE true effect across all datasets is assumed.***

Here
is the variance of the effect size in study
.
Fixed effect metaregression ignores between study variation. As a result, parameter estimates are biased if between study variation can not be ignored. Furthermore, generalizations to the population are not possible.
Random effects metaregression
Random effects metaregression rests on the assumption that
in
is a random variable following a (hyper)distribution
A random effects metaregression is called a
mixed effects model when moderators are added to the model.

Here
is the variance of the effect size in study
. Between study variance
is estimated using common estimation procedures for random effects models (restricted maximum likelihood (REML) estimators).
Which model to choose
The simple regression model does not allow for within study variation, this yields in to significant results too easy. The fixed effects regression model does not allow for between study variation, this also yields in to significant results too easy. The random or mixed effects model allows for within study variation and between study variation and is therefore the most appropriate model to choose. Whether there is between study variation can be tested by testing whether the effect sizes are homogeneous. If the test shows that the effect sizes are not heterogeneous the fixed effects metaregression might seem appropriate, however this test often does not have enough power to detect between study variation. Besides the lack of power of this test, you can reason that the fixed effects assumption of homogeneous effect sizes is rather weak, because it assumes that all studies are exactly the same. However you can assume that no two studies are exactly the same. To cope with the fact that each study is different (different sample; different time; different place; etc) a random or mixed effects model is always the appropriate model to choose and gives the most reliable results.
Applications in modern science
Modern statistical metaanalysis does more than just combine the effect sizes of a set of studies. It can test if the outcomes of studies show more variation than the variation that is expected because of sampling different research participants. If that is the case, study characteristics such as measurement instrument used, population sampled, or aspects of the studies' design are coded. These characteristics are then used as predictor variables to analyze the excess variation in the effect sizes. Some methodological weaknesses in studies can be corrected statistically. For example, it is possible to correct effect sizes or correlations for the downward bias due to measurement error or restriction on score ranges.
Metaanalysis can be done with singlesubject design as well as group research designs. This is important because much of the research on low incidents populations has been done with
singlesubject researchSinglesubject research is a group of research methods that are used extensively in the experimental analysis of behavior and applied behavior analysis with both human and nonhuman participants. Four principal methods in this type of research are: changing criterion, reversal , alternating...
designs. Considerable dispute exists for the most appropriate metaanalytic technique for single subject research.
Metaanalysis leads to a shift of emphasis from single studies to multiple studies. It emphasizes the practical importance of the effect size instead of the statistical significance of individual studies. This shift in thinking has been termed "metaanalytic thinking". The results of a metaanalysis are often shown in a
forest plotA forest plot is a graphical display designed to illustrate the relative strength of treatment effects in multiple quantitative scientific studies addressing the same question. It was developed for use in medical research as a means of graphically representing a metaanalysis of the results of...
.
Results from studies are combined using different approaches. One approach frequently used in metaanalysis in health care research is termed '
inverse variance methodIn statistics, inversevariance weighting is a method of aggregating two or more random variables to minimize the variance of the sum. Each random variable in the sum is weighted in inverse proportion to its variance....
'. The average
effect sizeIn statistics, an effect size is a measure of the strength of the relationship between two variables in a statistical population, or a samplebased estimate of that quantity...
across all studies is computed as a
weighted mean, whereby the weights are equal to the inverse variance of each studies' effect estimator. Larger studies and studies with less random variation are given greater weight than smaller studies. Other common approaches include the Mantel–Haenszel method
and the
PetoSir Richard Peto FRS is Professor of Medical Statistics and Epidemiology at the University of Oxford.He attended Taunton's School in Southampton and subsequently studied Natural Sciences at Cambridge University....
method.
A recent approach to studying the influence that weighting schemes can have on results has been proposed through the construct of
gravity, which is a special case of combinatorial metaanalysis.
Signed differential mappingSigned differential mapping or SDM is a statistical technique for metaanalyzing studies on differences in brain activity or structure which used neuroimaging techniques such as fMRI, VBM, DTI or PET...
is a statistical technique for metaanalyzing studies on differences in brain activity or structure which used neuroimaging techniques such as fMRI, VBM or PET.
Weaknesses
Some have argued that a weakness of the method is that sources of bias are not controlled by the method. A good metaanalysis of badly designed studies will still result in bad statistics, according to
Robert SlavinRobert "Bob" Slavin is an American psychologist who studies educational and academic issues. He founded the Success for All reform program for primary and middle schools....
. Slavin has argued that only methodologically sound studies should be included in a metaanalysis, a practice he calls 'best evidence metaanalysis'. Other metaanalysts would include weaker studies, and add a studylevel predictor variable that reflects the methodological quality of the studies to examine the effect of study quality on the effect size. However, Glass argued that the better approach preserves variance in the study sample, casting as wide a net as possible, and that methodological selection criteria introduce unwanted subjectivity, defeating the purpose of the approach.
File drawer problem
Another weakness of the method is the heavy reliance on published studies, which may create exaggerated outcomes, as it is very hard to publish studies that show no significant results. For any given research area, one cannot know how many studies have been conducted but never reported and the results filed away.
This file drawer problem results in the distribution of effect sizes that are biased, skewed or completely cut off, creating a serious
base rate fallacyThe base rate fallacy, also called base rate neglect or base rate bias, is an error that occurs when the conditional probability of some hypothesis H given some evidence E is assessed without taking into account the "base rate" or "prior probability" of H and the total probability of evidence...
, in which the significance of the published studies is overestimated. For example, if there were fifty tests, and only ten got results, then the real outcome is only 20% as significant as it appears, except that the other 80% were not submitted for publishing, or thrown out by publishers as uninteresting. This should be seriously considered when interpreting the outcomes of a metaanalysis.
This can be visualized with a funnel plot which is a scatter plot of sample size and effect sizes. There are several procedures available that attempt to correct for the file drawer problem, once identified, such as guessing at the cut off part of the distribution of study effects.
Other weaknesses are
Simpson's ParadoxIn probability and statistics, Simpson's paradox is a paradox in which a correlation present in different groups is reversed when the groups are combined. This result is often encountered in socialscience and medicalscience statistics, and it occurs when frequencydata are hastily given causal...
(two smaller studies may point in one direction, and the combination study in the opposite direction); the coding of an effect is subjective; the decision to include or reject a particular study is subjective; there are two different ways to measure effect: correlation or standardized mean difference; the interpretation of effect size is purely arbitrary; it has not been determined if the statistically most accurate method for combining results is the fixed effect model or the random effects model; and, for medicine, the underlying risk in each studied group is of significant importance, and there is no universally agreedupon way to weight the risk.
The example provided by the Rind et al. controversy illustrates an application of metaanalysis which has been the subject of subsequent criticisms of many of the components of the metaanalysis.
Dangers of agendadriven bias
The most severe weakness and abuse of metaanalysis often occurs when the person or persons doing the metaanalysis have an economic,
socialThe term social refers to a characteristic of living organisms...
, or political agenda such as the passage or defeat of
legislationLegislation is law which has been promulgated by a legislature or other governing body, or the process of making it...
. Those persons with these types of agenda have a high likelihood to abuse metaanalysis due to personal
biasBias is an inclination to present or hold a partial perspective at the expense of alternatives. Bias can come in many forms.In judgement and decision making:...
. For example, researchers favorable to the author's agenda are likely to have their studies "cherry picked" while those not favorable will be ignored or labeled as "not credible". In addition, the favored authors may themselves be biased or paid to produce results that support their overall political, social, or economic goals in ways such as selecting small favorable data sets and not incorporating larger unfavorable data sets.
If a metaanalysis is conducted by an individual or organization with a bias or predetermined desired outcome, it should be treated as highly suspect or having a high likelihood of being "
junk scienceJunk science is a term used in U.S. political and legal disputes that brands an advocate's claims about scientific data, research, or analyses as spurious. The term may convey a pejorative connotation that the advocate is driven by political, ideological, financial, or other unscientific...
". From an integrity perspective, researchers with a bias should avoid metaanalysis and use a less abuseprone (or independent) form of research.
A 2011 study done to disclose possible conflicts of interests in underlying research studies used for medical metaanalyses reviewed 29 metaanalyses and found that conflicts of interests in the studies underlying the metaanalyses were rarely disclosed. The 29 metaanalyses included 11 from general medicine journals; 15 from specialty medicine journals, and 3 from the Cochrane Database of Systematic Reviews. The 29 metaanalyses reviewed an aggregate of 509 randomized controlled trials (RCTs). Of these, 318 RCTs reported funding sources with 219 (69%) industry funded. 132 of the 509 RCTs reported author conflict of interest disclosures, with 91 studies (69%) disclosing industry financial ties with one or more authors. The information was, however, seldom reflected in the metaanalyses. Only two (7%) reported RCT funding sources and none reported RCT authorindustry ties. The authors concluded “without acknowledgment of COI due to industry funding or author industry financial ties from RCTs included in metaanalyses, readers’ understanding and appraisal of the evidence from the metaanalysis may be compromised.”
Comparison of metaanalysis to the scientific method
Francis Bacon described a method of procedure for advancing the physical sciences.
“Aphorism 106: In forming our axioms from induction, we must examine and try whether the axiom we derive be only fitted and calculated for the particular instances from which it is deduced, or whether it be more extensive and general. If it be the latter, we must observe, whether it confirms its own extent and generality by giving surety, as it were, in pointing out new particulars, so that we may neither stop at actual discoveries, nor with a careless grasp catch at shadows and abstract forms, instead of substances of a determinate nature: and as soon as we act thus, well authorized hope may with reason, be said to beam upon us.”
George Boole gave a similar description .
“The study of every department of physical science begins with observation; it advances by the collation of facts to a presumptive acquaintance with their connecting law, the validity of such presumption it tests by new experiments so devised as to augment, if the presumption be well founded, its probability indefinitely; and finally, the law of the phenomenon having been with sufficient confidence determined, the investigation of causes, conducted by the due mixture of hypothesis and deduction, crowns the inquiry.” (Boole, 1958, p. 402)
In both descriptions there are three steps:first assemble data, second formulate an explanatory physical law, and third test the proposed physical law in future experiments. In a meta analysis the first two steps are carried out, but the third step is modified. Metaanalysis being retrospective has no data gathered after the formulation of the physical law and so confirms the physical law using data that were known at the time the physical law was formulated. This requires a change from the usual notion of probability:
“Probability is expectation founded upon partial knowledge. A perfect acquaintance with all the circumstances affecting the occurrence of an event would change expectation into certainty, and leave neither room nor demand for a theory of probabilities.”(Boole, 1958, p. 402)
Statistical significance in a hypothesis test is the probability rejecting the null hypothesis when it is true. In the scientific method, statistical significance is the probability of a future event. In a metaanalysis, statistical significance is the probability of a past event.
In a metaanalysis the analyst has “perfect acquaintance with all the circumstances affecting the occurrence” of any event defined by the data at the time the hypotheses are specified. So there is no uncertainty and the probabilities of such events, using Boole’s notion of probability, would be zero or one. The procedure in metaanalysis is to simulate necessary incompleteness of knowledge by calculating the power and statistical significance as if none of the data were known to the analyst at the time the hypotheses were specified. A metaanalysis hypothesis test is, within the context of the scientific method of Bacon and Boole, a simulated hypothesis test.
See also
 Epidemiologic methods
 Newcastle–Ottawa scale
In statistics, the Newcastle–Ottawa scale is a method for assessing the quality of nonrandomised studies in metaanalyses. The scales allocate stars, maximum of nine, for quality of selection, comparability, exposure and outcome of study participants. The method was developed as a collaboration...
 Reporting bias
In empirical research, reporting bias refers to a tendency to underreport unexpected or undesirable experimental results, attributing the results to sampling or measurement error, while being more trusting of expected or desirable results, though these may be subject to the same sources of error...
 Review journal
A review journal in academic publishing is a periodical or series that is devoted to the publication of review articles that summarize the progress in some particular area or topic during a preceding period.Types:Review journals can be divided by...
 Study heterogeneity
In statistics, study heterogeneity is a problem that can arise when attempting to undertake a metaanalysis. Ideally, the studies whose results are being combined in the metaanalysis should all be undertaken in the same way and to the same experimental protocols: study heterogeneity is a term used...
 Systematic review
A systematic review is a literature review focused on a research question that tries to identify, appraise, select and synthesize all high quality research evidence relevant to that question. Systematic reviews of highquality randomized controlled trials are crucial to evidencebased medicine...
Further reading
. Explores two contrasting views: does metaanalysis provide "objective, quantitative methods for combining evidence from separate but similar studies" or merely "statistical tricks which make unjustified assumptions in producing oversimplified generalisations out of a complex of disparate studies"?
 Wilson, D. B., & Lipsey, M. W. (2001). Practical metaanalysis. Thousand Oaks: Sage publications. ISBN 0761921680
 O'Rourke, K. (2007) Just the history from the combining of information: investigating and synthesizing what is possibly common in clinical observations or studies via likelihood. Oxford: University of Oxford, Department of Statistics. Gives technical background material and details on the "An historical perspective on metaanalysis" paper cited in the references.
 Owen, A. B. (2009). "Karl Pearson's metaanalysis revisited". Annals of Statistics
The Annals of Statistics is a peerreviewed statistics journal published by the Institute of Mathematical Statistics. It was started in 1973 as a continuation in part of the Annals of Mathematical Statistics, which was split into the Annals of Statistics and the Annals of Probability.Articles older...
, 37 (6B), 3867–3892. Supplementary report.
 Ellis, Paul D. (2010). The Essential Guide to Effect Sizes: An Introduction to Statistical Power, MetaAnalysis and the Interpretation of Research Results. United Kingdom: Cambridge University Press. ISBN 0521142466
 Bonett, D.G. (2009). Metaanalytic interval estimation for standardized and unstandardized mean differences, Psychological Methods, 14, 225238.
External links
Software