All Topics  
Selection bias

 

   Email Print
   Bookmark   Link






 

Selection bias



 
 
Selection bias (e.g. Berkson's bias) is a distortion of evidence or data that arises from the way that the data are collected. It is sometimes referred to as the selection effect. The term selection bias most often refers to the distortion of a statistical
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
 analysis, due to the method of collecting samples. If the selection bias is not taken into account then any conclusions drawn may be wrong.

le selection may involve pre- or post-selecting the samples that may preferentially include or exclude certain kinds of results.






Discussion
Ask a question about 'Selection bias'
Start a new discussion about 'Selection bias'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Selection bias (e.g. Berkson's bias) is a distortion of evidence or data that arises from the way that the data are collected. It is sometimes referred to as the selection effect. The term selection bias most often refers to the distortion of a statistical
Statistics

Statistics is a Mathematics pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It also provides tools for prediction and forecasting based on data....
 analysis, due to the method of collecting samples. If the selection bias is not taken into account then any conclusions drawn may be wrong.

Bias from sample selection

Sample selection may involve pre- or post-selecting the samples that may preferentially include or exclude certain kinds of results. Typically this causes measures of statistical significance
Statistical significance

In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. "A statistically significant difference" simply means there is statistical evidence that there is a difference; it does not mean the difference is necessarily large, important, or significant in the common meaning of the word....
 to appear much stronger than they are, but it is also possible to cause completely illusory artifacts. Selection bias can be the result of scientific fraud which manipulate data directly, but more often is either unconscious or due to biases in the instruments used for observation.

As another example: If an experiment were to be conducted to count the distribution of sizes of fish in a lake, a net might be used to catch a representative sample of fish. If the net had a mesh size of 1 cm, then no fish narrower than 1 cm wide would be found in the sample. This is a result of the method of selection: there is no way of knowing whether there are any fish smaller than 1 cm based on an experiment using that net.

To determine in a particular setting whether there is selection bias or not, it is not sufficient to establish that there has been selection. Instead, one must establish that the quantity of interest (fish size, for example) is systematically different in the sample than in the entire population of interest, as the selection procedure may simultaneously lead to bias in one quantity such as the fish size, but not in another, for example the sex ratio
Sex ratio

Sex ratio is the ratio of males to females in a population. The primary sex ratio is the ratio at the time of conception, secondary sex ratio is the ratio at time of birth, and tertiary sex ratio is the ratio of mature organisms....
 of the fish.

Types of selection bias

There are many types of possible selection bias, including:

Spatial

  • Selecting end-points of a series. For example, to maximise a claimed trend, you could start the time series at an unusually low year, and end on a high one.
  • Early termination of a trial at a time when its results support a desired conclusion.
  • A trial may be terminated early at an extreme value (often for ethical
    Ethics

    Ethics is a word for a philosophy that encompasses proper conduct and good living. It is significantly broader than the common conception of ethics as the analyzing of right and wrong....
     reasons), but the extreme value is likely to be reached by the variable with the largest variance
    Variance

    In probability theory and statistics, the variance of a random variable, probability distribution, or sample is one measure of statistical dispersion, averaging the squared distance of its possible values from the expected value ....
    , even if all variables have a similar mean
    Mean

    In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....
    . As a result of that early termination, therefore, the means of variables with larger variances are overestimated.
  • Partitioning data with knowledge of the contents of the partitions, and then analyzing them with tests designed for blindly chosen partitions (see stratified sampling
    Stratified sampling

    In statistics, stratified sampling is a method of sampling from a population.When sub-populations vary considerably, it is advantageous to sample each subpopulation independently....
    , cluster sampling
    Cluster sampling

    Cluster sampling is a sampling technique used when "natural" groupings are evident in a statistical population. It is often used in marketing research....
    , Texas sharpshooter fallacy
    Texas sharpshooter fallacy

    The Texas sharpshooter fallacy is a logical fallacy in which information that has no relationship is interpreted or manipulated until it appears to have meaning....
    ).
  • Analyzing the lengths of intervals by selecting intervals that occupy randomly chosen points in time or space, a process that favors longer intervals. This is known as length time bias
    Length time bias

    Length time bias is a form of selection bias, a statistical distortion of results which can lead to incorrect conclusions about the data. Length time bias can occur when the lengths of intervals are analysed by selecting intervals that occupy randomly chosen points in time or space....
    .


Data

  • Rejection of "bad" data on arbitrary grounds, instead of according to previously stated or generally agreed criteria.
  • Rejection of "outliers" on statistical grounds that fail to take into account important information that could be derived from "wild" observations


Participants

  • Pre-screening of trial participants, or advertising for volunteers within particular groups. For example to "prove" that smoking doesn't affect fitness, advertise for both at the local fitness centre, but advertise for smokers during the advanced aerobics class, and for non-smokers during the weight loss sessions.
  • Discounting trial subjects/tests that did not run to completion. For example, in a test of a dieting program, the researcher may simply reject everyone who drops out of the trial. But most of those who drop out are those for whom it wasn't working.
  • Self-selection
    Self-selection

    Self-selection is a term used to indicate any situation in which individuals select themselves into a group , causing a biased sample. It is commonly used to describe situations where the characteristics of the people which cause them to select themselves in the group create abnormal or undesirable conditions in the group....
     bias, which is possible whenever the group of people being studied has any form of control over whether to participate. Participants' decision to participate may be correlated with traits that affect the study, making the participants a non-representative sample. For example, people who have strong opinions or substantial knowledge may be more willing to spend time answering a survey than those who don't.
  • Migration
    Human migration

    Human migration denotes any movement by humans from one district to another, sometimes over long distances or in large groups.Migration is one of the four evolutionary forces ...
     bias may be introduced by excluding subjects who have recently moved into the study area -- this may occur when newcomers are not available in a register used to identify the source population -- or by excluding subjects who move out of the study area during follow-up.


Studies

  • Selection of which studies to include in a meta-analysis
    Meta-analysis

    In statistics, a meta-analysis combines the results of several studies that address a set of related research hypotheses. This is normally done by identification of a common measure of effect size, which is modelled using a form of meta-regression....
     (see also combinatorial meta-analysis)
  • Performing repeated experiments and reporting only the most favourable results. (Perhaps relabelling lab records of other experiments as "calibration tests", "instrumentation errors" or "preliminary surveys".)
  • Presenting the most significant result of a data dredge
    Data dredging

    Data dredging is the inappropriate search for 'statistically significant' relationships in large quantities of data. This activity was formerly known in the statistical community as data mining, but that term is now in widespread use with an essentially positive meaning, so the pejorative term data dredging is now used instead....
     as if it were a single experiment. (Which is logically the same as the previous item, but curiously is seen as much less dishonest.)


Related issues

Selection bias is closely related to:
  • sample bias, a selection bias produced by an accidental bias in the sampling technique, as against deliberate or unconscious manipulation.
  • publication bias
    Publication bias

    Publication bias arises from the tendency for researchers, editors, and pharmaceutical companies to handle experimental results that are positive differently from results that are negative or inconclusive....
     or reporting bias, the distortion produced in community perception or meta-analyses
    Meta-analysis

    In statistics, a meta-analysis combines the results of several studies that address a set of related research hypotheses. This is normally done by identification of a common measure of effect size, which is modelled using a form of meta-regression....
     by not publishing uninteresting (usually negative) results, or results which go against the experimenter's prejudices, a sponsor's interests, or community expectations.
  • confirmation bias
    Confirmation bias

    In psychology and cognitive science, confirmation bias is a tendency to search for or interpret new information in a way that confirms one's preconceptions and to avoid information and interpretations which contradict prior beliefs....
    , the distortion produced by experiments that are designed to seek confirmatory evidence instead of trying to disprove the hypothesis.
  • exclusion bias, results from applying different criteria to cases and controls in regards to participation eligibility for a study/different variables serving as basis for exclusion.


Overcoming selection bias

In the general case, selection biases cannot be overcome with statistical analysis of existing data alone, though see the work of James Heckman
James Heckman

James Joseph Heckman is an American economist and Nobel laureate. He is the Henry Schultz Distinguished Service Professor of Economics at the University of Chicago, Distinguished Chair of Microeconometrics at University College, London, and University College, Dublin....
 for some strategies in special cases (Heckman correction
Heckman correction

The Heckman correction is any of a number of related statistical methods developed by James Heckman in 1976 through 1979 which allow the researcher to correct for selection bias....
). An informal assessment of the degree of selection bias can be made by examining correlations between (exogenous
Exogenous

Exogenous refers to an action or object coming from outside a system. It is the opposite of endogenous, something generated from within the system....
) background variables and a treatment indicator. However, in regression
Regression

Regression could refer to:* Regression , a defensive reaction to some unaccepted impulses* Past life regression, a process claiming to retrieve memories of previous lives...
 models, it is correlation between unobserved determinants of the outcome and unobserved determinants of selection into the sample which bias estimates, and this correlation between unobservables cannot be directly assessed by the observed determinants of treatment.

See also

  • Biased sample
    Biased sample

    A biased sample is a sample of a statistical population in which some members of the population are less likely to be included than others. If the bias makes estimation of population parameters impossible, the sample is a non-probability sample....
  • Berkson's paradox
    Berkson's paradox

    Berkson's paradox or Berkson's fallacy is a result in conditional probability and statistics which is counter-intuitive for some people, and hence a veridical paradox....
  • Self-fulfilling prophecy
    Self-fulfilling prophecy

    A self-fulfilling prophecy is a prediction that directly or indirectly causes itself to become true, by the very terms of the prophecy itself. Although examples of such prophecy can be found in literature as far back as ancient Greece and ancient India, it is 20th-century sociologist Robert K....
  • Attrition bias
    Attrition bias

    Attrition bias or exclusion bias in epidemiology is a kind of selection bias caused by attrition of subjects.This can be due to:#Systematic difference of participants in the study from the population from which they were selected due to selective loss of participants....
  • Black Swan theory
    Black swan theory

    The Black Swan theory refers to a large-impact, hard-to-predict, and rare event beyond the realm of normal expectations. Unlike the philosophical "Falsifiability#Inductive_categorical_inference", the "Black Swan" theory refers only to events of large consequence and their dominant role in history....
  • List of cognitive biases
    List of cognitive biases

    A cognitive bias is a pattern of deviation in judgment that occurs in particular situations .Implicit in the concept of a "pattern of deviation" is a standard of comparison; this may be the judgment of people outside those particular situations, or may be a set of independently verifiable facts....
  • Wyatt Earp effect
    Wyatt Earp effect

    The Wyatt-Earp effect is a coincidental situation. One of the necessary conditions for Wyatt Earp to have become a legend was of course having survived all these duels....