The
clustering illusion refers to the tendency to erroneously perceive small samples from random distributions as having significant "streaks" or "clusters", caused by a human tendency to underpredict the amount of
variabilityIn statistics, statistical dispersion is variability or spread in a variable or a probability distribution...
likely to appear in a small sample of random or semi-random data due to chance.
Thomas GilovichThomas D. Gilovich is a professor of psychology at Cornell University who has researched decision making and behavioral economics and has written popular books on said subjects. He has collaborated with Daniel Kahneman, Lee Ross and Amos Tversky....
found that most people thought that the sequence
"OXXXOXXXOXXOOOXOOXXOO" looked non-random, when, in fact, it has several characteristics maximally probable for a "random" stream, such as an equal number of each result and an equal number of adjacent results with the same outcome for both possible outcomes.
The
clustering illusion refers to the tendency to erroneously perceive small samples from random distributions as having significant "streaks" or "clusters", caused by a human tendency to underpredict the amount of
variabilityIn statistics, statistical dispersion is variability or spread in a variable or a probability distribution...
likely to appear in a small sample of random or semi-random data due to chance.
Thomas GilovichThomas D. Gilovich is a professor of psychology at Cornell University who has researched decision making and behavioral economics and has written popular books on said subjects. He has collaborated with Daniel Kahneman, Lee Ross and Amos Tversky....
found that most people thought that the sequence
"OXXXOXXXOXXOOOXOOXXOO" looked non-random, when, in fact, it has several characteristics maximally probable for a "random" stream, such as an equal number of each result and an equal number of adjacent results with the same outcome for both possible outcomes. In sequences like this, people seem to expect to see a greater number of alternations than one would predict
statisticallyStatistics is a branch of mathematics concerned with collecting and interpreting data. According to other definitions, it is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. Statisticians improve the quality of data with the...
. The probability of an alternation in a sequence of independent random binary events is .5, yet people seem to expect an alternation rate of about .7. In fact, in a short number of trials, variability and non-random-looking "streaks" are quite
probableProbability is a way of expressing knowledge or belief that an event will occur or has occurred. In mathematics the concept has been given an exact meaning in probability theory, that is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy...
.
Daniel KahnemanDaniel Kahneman is an Israeli psychologist and Nobel laureate, notable for his work on the psychology of judgment and decision-making, behavioral economics and hedonic psychology....
and
Amos TverskyAmos Nathan Tversky, was a cognitive and mathematical psychologist, and a pioneer of cognitive science, a longtime collaborator of Daniel Kahneman, and a key figure in the discovery of systematic human cognitive bias and handling of risk. Much of his early work concerned the foundations of...
explained this kind of misprediction as being caused by the
representativeness heuristicThe Representativeness Heuristic is a rule of thumb wherein people judge the probability or frequency of a hypothesis by considering how much the hypothesis resembles available data as opposed to using a Bayesian calculation. While often very useful in everyday life, it can also result in neglect...
(which itself they also first proposed.) Gilovich argues that a similar effect occurs for other types of random dispersions, including 2-dimensional data such as seeing clusters in the locations of impact of
V-1 flying bombThe Fieseler Fi 103, better known as V-1 , colloquially know in Britain as the 'Doodlebug', was an early cruise missile used during World War II. The V-1 was developed at Peenemünde by the German Luftwaffe during the Second World War. Between 13 June 1944 and 29 March 1945, it was fired at...
s on London during
World War IIWorld War II, or the Second World War , was a global military conflict which involved a majority of the world's nations, including all great powers, organized into two opposing military alliances: the Allies and the Axis...
or seeing streaks in
stock marketA stock market is a public market for the trading of company stock and derivatives at an agreed price; these are securities listed on a stock exchange as well as those only traded privately....
price fluctuations over time.
The clustering illusion was central to a widely reported study by Gilovich, Robert Vallone and
Amos TverskyAmos Nathan Tversky, was a cognitive and mathematical psychologist, and a pioneer of cognitive science, a longtime collaborator of Daniel Kahneman, and a key figure in the discovery of systematic human cognitive bias and handling of risk. Much of his early work concerned the foundations of...
. They found that the idea that
basketballBasketball is a team sport in which two teams of 5 players try to score points against one another by placing a ball through a
10 foot high hoop under organized rules...
players shoot successfully in "streaks", sometimes called by sportcasters as having a "hot hand" and widely believed by Gilovich et al.'s subjects, was false. In the data they collected, if anything the success of a previous throw very slightly predicted a subsequent miss rather than another success.
Using this
cognitive biasA cognitive bias is a person's tendency to make errors in judgment based on cognitive factors, and is a phenomenon studied in cognitive science and social psychology. Forms of cognitive bias include errors in statistical judgment, social attribution, and memory that are common to all human beings....
in causal reasoning may result in the
Texas sharpshooter fallacyThe Texas sharpshooter fallacy is a logical fallacy in which information that has no relationship is interpreted or manipulated until it appears to have meaning...
. It may also have a relationship with
gambler's fallacyThe gambler's fallacy, also known as the Monte Carlo fallacy or the fallacy of the maturity of chances, is the belief that if deviations from expected behaviour are observed in repeated independent trials of some random process then these deviations are likely to be evened out by opposite...
. More general forms of erroneous pattern recognition are
pareidoliaPareidolia is a psychological phenomenon involving a vague and random stimulus being perceived as significant. Common examples include seeing images of animals or faces in clouds, the man in the moon, and hearing hidden messages on records played in reverse...
and
apopheniaApophenia is the experience of seeing patterns or connections in random or meaningless data. The term was coined in 1958 by Klaus Conrad, who defined it as the "unmotivated seeing of connections" accompanied by a "specific experience of an abnormal meaningfulness".In statistics, apophenia would be...
.
Clustering (or the illusion of clustering) is also used in the analysis of CSPRNG and TCP/IP Sequence Numbers:
Strange Attractors and TCP/IP Sequence Number Analysis - One Year Later
External links