All Topics  
Bayesian inference

 

   Email Print
   Bookmark   Link






 

Bayesian inference



 
 
Bayesian inference is statistical inference
Statistical inference

Inferential statistics or statistical induction comprises the use of statistics to make inferences concerning some unknown aspect of a population....
 in which evidence or observations are used to update or to newly infer the probability
Probability

Probability, or wikt:chance, is a way of expressing knowledge or belief that an Event will occur or has occurred. In mathematics the concept has been given an exact meaning in probability theory, that is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy to draw conclusions about t...
 that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem
Bayes' theorem

In probability theory, Bayes' theorem relates the Conditional probability of two random events. It is often used to compute posterior probabilities given observations....
 in the inference process. Bayes' theorem was derived from the work of the Reverend Thomas Bayes
Thomas Bayes

Thomas Bayes was a Kingdom of Great Britain mathematician and Presbyterian minister, known for having formulated a specific case of the theorem that bears his name: Bayes' theorem, which was published posthumously....
.

Evidence and changing beliefs
Bayesian inference uses aspects of the scientific method
Scientific method

Scientific method refers to techniques for investigating phenomenon, acquiring new knowledge, or correcting and integrating previous knowledge. To be termed scientific, a method of inquiry must be based on gathering observable, empirical and Measure evidence subject to specific principles of reasoning....
, which involves collecting evidence
Evidence

Evidence in its broadest sense includes everything that is used to determine or demonstrate the truth of an assertion. Giving or procuring evidence is the process of using those things that are either a) presumed to be true, or b) were themselves proven via evidence, to demonstrate an assertion's truth....
 that is meant to be consistent or inconsistent with a given hypothesis
Hypothesis

A hypothesis consists either of a suggested explanation for an observable phenomenon or of a reasoned proposal predicting a possible causal correlation among multiple phenomena....
.






Discussion
Ask a question about 'Bayesian inference'
Start a new discussion about 'Bayesian inference'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Bayesian inference is statistical inference
Statistical inference

Inferential statistics or statistical induction comprises the use of statistics to make inferences concerning some unknown aspect of a population....
 in which evidence or observations are used to update or to newly infer the probability
Probability

Probability, or wikt:chance, is a way of expressing knowledge or belief that an Event will occur or has occurred. In mathematics the concept has been given an exact meaning in probability theory, that is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy to draw conclusions about t...
 that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem
Bayes' theorem

In probability theory, Bayes' theorem relates the Conditional probability of two random events. It is often used to compute posterior probabilities given observations....
 in the inference process. Bayes' theorem was derived from the work of the Reverend Thomas Bayes
Thomas Bayes

Thomas Bayes was a Kingdom of Great Britain mathematician and Presbyterian minister, known for having formulated a specific case of the theorem that bears his name: Bayes' theorem, which was published posthumously....
.

Evidence and changing beliefs


Bayesian inference uses aspects of the scientific method
Scientific method

Scientific method refers to techniques for investigating phenomenon, acquiring new knowledge, or correcting and integrating previous knowledge. To be termed scientific, a method of inquiry must be based on gathering observable, empirical and Measure evidence subject to specific principles of reasoning....
, which involves collecting evidence
Evidence

Evidence in its broadest sense includes everything that is used to determine or demonstrate the truth of an assertion. Giving or procuring evidence is the process of using those things that are either a) presumed to be true, or b) were themselves proven via evidence, to demonstrate an assertion's truth....
 that is meant to be consistent or inconsistent with a given hypothesis
Hypothesis

A hypothesis consists either of a suggested explanation for an observable phenomenon or of a reasoned proposal predicting a possible causal correlation among multiple phenomena....
. As evidence accumulates, the degree of belief in a hypothesis ought to change. With enough evidence, it should become very high or very low. Thus, proponents of Bayesian inference say that it can be used to discriminate between conflicting hypotheses: hypotheses with very high support should be accepted as true and those with very low support should be rejected as false. However, detractors say that this inference method may be biased due to initial beliefs that one holds before any evidence is ever collected. (This is a form of inductive bias
Inductive bias

The inductive bias of a learning algorithm is the set of assumptions that the learner uses to predict outputs given inputs that it has not encountered ....
).

Bayesian inference uses a numerical estimate of the degree of belief in a hypothesis before evidence has been observed and calculates a numerical estimate of the degree of belief in the hypothesis after evidence has been observed. (This process is repeated when additional evidence is obtained.) Bayesian inference usually relies on degrees of belief, or subjective probabilities, in the induction process and does not necessarily claim to provide an objective method of induction. Nonetheless, some Bayesian statisticians believe probabilities can have an objective value and therefore Bayesian inference can provide an objective method of induction. See scientific method
Scientific method

Scientific method refers to techniques for investigating phenomenon, acquiring new knowledge, or correcting and integrating previous knowledge. To be termed scientific, a method of inquiry must be based on gathering observable, empirical and Measure evidence subject to specific principles of reasoning....
.

Bayes' theorem adjusts probabilities given new evidence in the following way:



where

  • represents a specific hypothesis, which may or may not be some null hypothesis
    Null hypothesis

    In statistics, a null hypothesis is a concept which arises in the context of statistical hypothesis testing. A common convention is to use the symbol H0 to denote the null hypothesis....
    .
  • is called the prior probability
    Prior probability

    A prior probability is a conditional probability, interpreted as a description of what is known about a variable in the absence of some Marginal likelihood....
     of that was inferred before new evidence, , became available.
  • is called the conditional probability
    Conditional probability

    Conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional probability is written P, and is read "the probability of A, given B"....
     of seeing the evidence if the hypothesis happens to be true. It is also called a likelihood function
    Likelihood function

    In statistics, the likelihood function is a function of the parameters of a statistical model that plays a key role in statistical inference. In non-technical usage, "likelihood" is a synonym for "probability", but throughout this article only the technical definition is used....
     when it is considered as a function of for fixed .
  • is called the marginal probability of : the a priori probability of witnessing the new evidence under all possible hypotheses. It can be calculated as the sum of the product of all probabilities of any complete set of mutually exclusive hypotheses and corresponding conditional probabilities:


.


  • is called the posterior probability
    Posterior probability

    The posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant Scientific evidence is taken into account....
     of given .


The factor represents the impact that the evidence has on the belief in the hypothesis. If it is likely that the evidence would be observed when the hypothesis under consideration is true, but unlikely that would have been the outcome of the observation, then this factor will be large. Multiplying the prior probability of the hypothesis by this factor would result in a larger posterior probability of the hypothesis given the evidence. Conversely, if it is unlikely that the evidence would be observed if the hypothesis under consideration is true, but a priori likely that would be observed, then the factor would reduce the posterior probability for . Under Bayesian inference, Bayes' theorem therefore measures how much new evidence should alter a belief in a hypothesis.

Bayesian statisticians argue that even when people have very different prior subjective probabilities, new evidence from repeated observations will tend to bring their posterior subjective probabilities closer together. However, others argue that when people hold widely different prior subjective probabilities their posterior subjective probabilities may never converge even with repeated collection of evidence. These critics argue that worldviews which are completely different initially can remain completely different over time despite a large accumulation of evidence.

Multiplying the prior probability by the factor will never yield a probability that is greater than 1, since is at least as great as (where denotes "and"), which equals (see joint probability).

The probability of given , , can be represented as a function of its second argument with its first argument held fixed. Such a function is called a likelihood function
Likelihood function

In statistics, the likelihood function is a function of the parameters of a statistical model that plays a key role in statistical inference. In non-technical usage, "likelihood" is a synonym for "probability", but throughout this article only the technical definition is used....
; it is a function of alone, with treated as a parameter
Parameter

In mathematics, statistics, and the mathematical sciences, a parameter is a quantity that defines certain characteristics of systems or function s....
. A ratio of two likelihood functions is called a likelihood ratio, . For example,

,


where the dependence of on is suppressed for simplicity (as might have been, except we will need to use that parameter below). Since and not- are mutually exclusive and span all possibilities, the sum previously given for the marginal probability reduces to



As a result, we can rewrite Bayes' theorem as

.


We could then exploit the identity

to exhibit as a function of just (and , which is computed directly from the evidence).

With two pieces of evidence and , that are marginally
Marginal distribution

In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset....
 and conditionally
Conditional probability

Conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional probability is written P, and is read "the probability of A, given B"....
 independent
Statistical independence

In probability theory, to say that two event s are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs....
 of each other given the hypotheses, Bayesian inference can be applied iteratively. We could use the first piece of evidence to calculate an initial posterior probability, and then use that posterior probability as a new prior probability to calculate a second posterior probability given the second piece of evidence. Bayes' theorem applied iteratively yields



Using likelihood ratios, we find that

,


This iteration of Bayesian inference could be extended with more independent pieces of evidence.

Bayesian inference is used to calculate probabilities for decision making under uncertainty. Besides the probabilities, a loss function
Loss function

In statistics, decision theory and economics, a loss function is a function that maps an event onto a real number representing the economic cost or regret associated with the event....
 should be evaluated to take into account the relative impact of the alternatives.

Simple examples of Bayesian inference


From which bowl is the cookie?


To illustrate, suppose there are two full bowls of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our friend Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of bowl #1?

Intuitively, it seems clear that the answer should be more than a half, since there are more plain cookies in bowl #1. The precise answer is given by Bayes' theorem. Let correspond to bowl #1, and to bowl #2. It is given that the bowls are identical from Fred's point of view, thus , and the two must add up to 1, so both are equal to 0.5. The event is the observation of a plain cookie. From the contents of the bowls, we know that and . Bayes' formula then yields

Before we observed the cookie, the probability we assigned for Fred having chosen bowl #1 was the prior probability, , which was 0.5. After observing the cookie, we must revise the probability to , which is 0.6.

False positives in a medical test


False positive
Type I and type II errors

In statistics, the terms Type I error and type II error are used to describe possible errors made in a statistical decision process. In 1928, Jerzy Neyman and Egon Pearson , both eminent statisticians, discussed the problems associated with "deciding whether or not a particular sample may be judged as likely to have been randomly dr...
s result when a test falsely or incorrectly reports a positive result. For example, a medical test for a disease
Disease

A disease or medical condition is an abnormal condition of an organism that impairs bodily functions, associated with specific symptoms and Medical signs....
 may return a positive result indicating that patient has a disease even if the patient does not have the disease. We can use Bayes' theorem to determine the probability that a positive result is in fact a false positive. We find that if a disease is rare, then the majority of positive results may be false positives, even if the test is accurate.

Suppose that a test for a disease generates the following results:

  • If a tested patient has the disease, the test returns a positive result 99% of the time, or with probability 0.99
  • If a tested patient does not have the disease, the test returns a positive result 5% of the time, or with probability 0.05.


Naively, one might think that only 5% of positive test results are false, but that is quite wrong, as we shall see.

Suppose that only 0.1% of the population has that disease, so that a randomly selected patient has a 0.001 prior probability of having the disease.

We can use Bayes' theorem to calculate the probability that a positive test result is a false positive.

Let A represent the condition in which the patient has the disease, and B represent the evidence of a positive test result. Then, probability that the patient actually has the disease given the positive test result is



and hence the probability that a positive result is a false positive is about , or 98%.

Despite the apparent high accuracy of the test, the incidence of the disease is so low that the vast majority of patients who test positive do not have the disease. Nonetheless, the fraction of patients who test positive who do have the disease (.019) is 19 times the fraction of people who have not yet taken the test who have the disease (.001). Thus the test is not useless, and re-testing may improve the reliability of the result.

In order to reduce the problem of false positives, a test should be very accurate in reporting a negative result when the patient does not have the disease. If the test reported a negative result in patients without the disease with probability 0.999, then

,


so that now is the probability of a false positive.

On the other hand, false negative
Type I and type II errors

In statistics, the terms Type I error and type II error are used to describe possible errors made in a statistical decision process. In 1928, Jerzy Neyman and Egon Pearson , both eminent statisticians, discussed the problems associated with "deciding whether or not a particular sample may be judged as likely to have been randomly dr...
s result when a test falsely or incorrectly reports a negative result. For example, a medical test for a disease
Disease

A disease or medical condition is an abnormal condition of an organism that impairs bodily functions, associated with specific symptoms and Medical signs....
 may return a negative result indicating that patient does not have a disease even though the patient actually has the disease. We can also use Bayes' theorem to calculate the probability of a false negative. In the first example above,



The probability that a negative result is a false negative is about 0.0000105 or 0.00105%. When a disease is rare, false negatives will not be a major problem with the test.

But if 60% of the population had the disease, then the probability of a false negative would be greater. With the above test, the probability of a false negative would be



The probability that a negative result is a false negative rises to 0.0155 or 1.55%.

In the courtroom


Bayesian inference can be used in a court setting by an individual juror to coherently accumulate the evidence for and against the guilt of the defendant, and to see whether, in totality, it meets their personal threshold for 'beyond a reasonable doubt'.

  • Let denote the event that the defendant is guilty.


  • Let denote the event that the defendant's DNA matches DNA found at the crime scene.


  • Let denote the probability of seeing event if the defendant is actually guilty. (Usually this would be taken to be near unity.)


  • Let denote the probability that the defendant is guilty assuming the DNA match (event ).


  • Let denote the juror's personal estimate of the probability that the defendant is guilty, based on the evidence other than the DNA match. This could be based on his responses under questioning, or previously presented evidence.


Bayesian inference tells us that if we can assign a probability p(G) to the defendant's guilt before we take the DNA evidence into account, then we can revise this probability to the conditional probability , since



Suppose, on the basis of other evidence, a juror decides that there is a 30% chance that the defendant is guilty. Suppose also that the forensic testimony was that the probability that a person chosen at random would have DNA that matched that at the crime scene is 1 in a million, or 10-6.

The event E can occur in two ways. Either the defendant is guilty (with prior probability 0.3) and thus his DNA is present with probability 1, or he is innocent (with prior probability 0.7) and he is unlucky enough to be one of the 1 in a million matching people.

Thus the juror could coherently revise his opinion to take into account the DNA evidence as follows:



The benefit of adopting a Bayesian approach is that it gives the juror a formal mechanism for combining the evidence presented. The approach can be applied successively to all the pieces of evidence presented in court, with the posterior from one stage becoming the prior for the next.

The juror would still have to have a prior estimate for the guilt probability before the first piece of evidence is considered. It has been suggested that this could reasonably be the guilt probability of a random person taken from the qualifying population. Thus, for a crime known to have been committed by an adult male living in a town containing 50,000 adult males, the appropriate initial prior probability might be 1/50,000.

For the purpose of explaining Bayes' theorem to jurors, it will usually be appropriate to give it in the form of betting odds rather than probabilities, as these are more widely understood. In this form Bayes' theorem states that

Posterior odds = prior odds x Bayes factor
Bayes factor

In statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing....


In the example above, the juror who has a prior probability of 0.3 for the defendant being guilty would now express that in the form of odds of 3:7 in favour of the defendant being guilty, the Bayes factor is one million, and the resulting posterior odds are 3 million to 7 or about 429,000 to one in favour of guilt.

A logarithmic approach
Gambling and information theory

Bayesian_inference might be thought of as gambling theory applied to the world around. The myriad applications for logarithmic information measures tell us precisely how to take the best guess in the face of partial information....
 which replaces multiplication with addition and reduces the range of the numbers involved might be easier for a jury to handle. This approach, developed by Alan Turing
Alan Turing

Alan Mathison Turing, Order of the British Empire, Fellow of the Royal Society was a British mathematician, logician and Cryptanalysis....
 during World War II
World War II

World War II, or the Second World War , was a global military conflict which involved a Participants in World War II, including all of the great powers, organised into two opposing military alliances: the Allies of World War II and the Axis powers....
 and later promoted by I. J. Good
I. J. Good

Irving John Good is a British statistician who worked also as a cryptographer at Bletchley Park. He was born Isidore Jacob Gudak to a Jewish family in London....
 and E. T. Jaynes
Edwin Thompson Jaynes

Edwin Thompson Jaynes was Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis, Missouri. He wrote extensively on statistical mechanics and on foundations of probability and statistical inference, initiating in 1957 the Maximum entropy thermodynamics of thermodynamics, as being a particular application of mor...
 among others, amounts to the use of information entropy
Information entropy

In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the self-information contained in a message, usually in units such as bits....
.

In the United Kingdom, Bayes' theorem was explained to the jury in the odds form by a statistician expert witness
Expert witness

An expert witness or professional witness is a witness, who by virtue of education, training, skill, or experience, is believed to have knowledge in a particular subject beyond that of the average person, sufficient that others may officially rely upon the witness's specialized opinion about an evidence or fact issue within the scope...
 in the rape case of Regina versus Denis John Adams. A conviction was secured but the case went to Appeal, as no means of accumulating evidence had been provided for those jurors who did not want to use Bayes' theorem. The Court of Appeal upheld the conviction, but also gave their opinion that "To introduce Bayes' Theorem, or any similar method, into a criminal trial plunges the Jury into inappropriate and unnecessary realms of theory and complexity, deflecting them from their proper task." No further appeal was allowed and the issue of Bayesian assessment of forensic DNA data remains controversial.

Gardner-Medwin argues that the criterion on which a verdict in a criminal trial should be based is not the probability of guilt, but rather the probability of the evidence, given that the defendant is innocent (akin to a frequentist p-value
P-value

In statistics hypothesis testing, the p-value is the probability of obtaining a result at least as extreme as the one that was actually observed, assuming that the null hypothesis is true....
). He argues that if the posterior probability of guilt is to be computed by Bayes' theorem, the prior probability of guilt must be known. This will depend on the incidence of the crime, which is an unusual piece of evidence to consider in a criminal trial. Consider the following three propositions:

A: The known facts and testimony could have arisen if the defendant is guilty,

B: The known facts and testimony could have arisen if the defendant is innocent,

C: The defendant is guilty.

Gardner-Medwin argues that the jury should believe both A and not-B in order to convict. A and not-B implies the truth of C, but the reverse is not true. It is possible that B and C are both true, but in this case he argues that a jury should acquit, even though they know that they will be letting some guilty people go free. See also Lindley's paradox
Lindley's paradox

Lindley's paradox describes a counterintuitive situation in statistics in which the Bayesian inference and frequentist approaches to a hypothesis testing problem give opposite results for certain choices of the prior distribution....
.

Other court cases in which probabilistic arguments played some role were the Howland will forgery trial
Howland will forgery trial

The Howland will forgery trial was a United States court case in 1868 to decide Henrietta Howland Robinson's contest of the Will of Sylvia Ann Howland....
, the Sally Clark
Sally Clark

Sally Clark was a United Kingdom solicitor who became the victim of a famous miscarriage of justice when she was wrongly convicted of the murder of two of her sons....
 case, and the Lucia de Berk
Lucia de Berk

Lucia de Berk is a Dutch nurse who was sentenced to life imprisonment in 2003 for four murders and three attempted murders of patients in her care....
 case.

Search theory


In May 1968 the US nuclear submarine Scorpion (SSN-589)
USS Scorpion (SSN-589)

USS Scorpion was a Skipjack class submarine nuclear submarine of the United States Navy, and the sixth ship of the U.S. Navy to carry that name....
 failed to arrive as expected at her home port of Norfolk, Virginia
Norfolk, Virginia

Norfolk is an independent city in the Virginia in the United States. With a population of 234,403 as of the United States Census 2000, it is Virginia's second-largest incorporated city....
. The US Navy was convinced that the vessel had been lost off the Eastern seaboard but an extensive search failed to discover the wreck. The US Navy's deep water expert, John Craven USN, believed that it was elsewhere and he organised a search south west of the Azores
Azores

The Azores is a Portugal archipelago in the Atlantic Ocean, about 1,500 km from Lisbon and about 3,900 km from the east coast of North America....
 based on a controversial approximate triangulation by hydrophones. He was allocated only a single ship, the Mizar, and he took advice from a firm of consultant mathematicians in order to maximise his resources. A Bayesian search methodology was adopted. Experienced submarine commanders were interviewed to construct hypotheses about what could have caused the loss of the Scorpion.

The sea area was divided up into grid squares and a probability assigned to each square, under each of the hypotheses, to give a number of probability grids, one for each hypothesis. These were then added together to produce an overall probability grid. The probability attached to each square was then the probability that the wreck was in that square. A second grid was constructed with probabilities that represented the probability of successfully finding the wreck if that square were to be searched and the wreck were to be actually there. This was a known function of water depth. The result of combining this grid with the previous grid is a grid which gives the probability of finding the wreck in each grid square of the sea if it were to be searched.

This sea grid was systematically searched in a manner which started with the high probability regions first and worked down to the low probability regions last. Each time a grid square was searched and found to be empty its probability was reassessed using Bayes' theorem
Bayes' theorem

In probability theory, Bayes' theorem relates the Conditional probability of two random events. It is often used to compute posterior probabilities given observations....
. This then forced the probabilities of all the other grid squares to be reassessed (upwards), also by Bayes' theorem. The use of this approach was a major computational challenge for the time but it was eventually successful and the Scorpion was found about 740 kilometers southwest of the Azores
Azores

The Azores is a Portugal archipelago in the Atlantic Ocean, about 1,500 km from Lisbon and about 3,900 km from the east coast of North America....
 in October of that year.

Suppose a grid square has a probability p of containing the wreck and that the probability of successfully detecting the wreck if it is there is q. If the square is searched and no wreck is found, then, by Bayes' theorem, the revised probability of the wreck being in the square is given by

For each other grid square, if its prior probability is r, its posterior probability is given by



More mathematical examples


Naive Bayes classifier


See naive Bayes classifier
Naive Bayes classifier

A naive Bayes classifier is a term in Bayesian statistics statistics dealing with a simple probabilistic Classifier based on applying Bayes' theorem with strong statistical independence assumptions....
.

Posterior distribution of the binomial parameter


In this example we consider the computation of the posterior distribution for the binomial parameter. This is the same problem considered by Bayes in Proposition 9 of his essay.

We are given m observed successes and n observed failures in a binomial experiment. The experiment may be tossing a coin, drawing a ball from an urn, or asking someone their opinion, among many other possibilities. What we know about the parameter (let's call it a) is stated as the prior distribution, p(a).

For a given value of a, the probability of m successes in m+n trials is



Since m and n are fixed, and a is unknown, this is a likelihood function for a. From the continuous form of the law of total probability
Law of total probability

In probability theory, the law of total probability is that "the prior probability of A is equal to the prior expected value of the posterior probability of A." That is, for any random variable N,...
 we have



For some special choices of the prior distribution p(a), the integral can be solved and the posterior takes a convenient form. In particular, if p(a) is a beta distribution
Beta distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] parameterized by two positive shape parameters, typically denoted by α and β....
 with parameters m0 and n0, then the posterior is also a beta distribution with parameters m+m0 and n+n0.

A conjugate prior
Conjugate prior

In Bayesian probability theory, a class of prior probability distributions p is said to be conjugate to a class of likelihood functions p if the resulting posterior probability p are in the same family as p; the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior f...
 is a prior distribution, such as the beta distribution in the above example, which has the property that the posterior is the same type of distribution.

What is "Bayesian" about Proposition 9 is that Bayes presented it as a probability for the parameter a. That is, not only can one compute probabilities for experimental outcomes, but also for the parameter which governs them, and the same algebra is used to make inferences of either kind. Interestingly, Bayes actually states his question in a way that might make the idea of assigning a probability distribution to a parameter palatable to a frequentist. He supposes that a billiard ball is thrown at random onto a billiard table, and that the probabilities p and q are the probabilities that subsequent billiard balls will fall above or below the first ball. By making the binomial parameter a depend on a random event, he cleverly escapes a philosophical quagmire that was an issue he most likely was not even aware of.

Computer applications


Bayesian inference has applications in artificial intelligence
Artificial intelligence

Artificial intelligence is the intelligence of machines and the branch of computer science which aims to create it. Major AI textbooks define the field as "the study and design of intelligent agents,"...
 and expert system
Expert system

An expert system is software that attempts to reproduce the performance of one or more human experts, most commonly in a specific problem domain, and is a traditional application and/or subfield of artificial intelligence....
s. Bayesian inference techniques have been a fundamental part of computerized pattern recognition
Pattern recognition

Pattern recognition is a sub-topic of machine learning. It is "the act of taking in raw data and taking an action based on the Category of the data"....
 techniques since the late 1950s. There is also an ever growing connection between Bayesian methods and simulation-based Monte Carlo
Monte Carlo method

Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used when computer simulation physics and mathematics systems....
 techniques since complex models cannot be processed in closed form by a Bayesian analysis, while the graphical model
Graphical model

In probability theory, statistics, and machine learning, a graphical model is a graph that represents statistical independence among random variables....
 structure inherent to statistical models, may allow for efficient simulation algorithms like the Gibbs sampling
Gibbs sampling

In mathematics and physics, Gibbs sampling is an algorithm to generate a sequence of samples from the joint probability of two or more random variables....
 and other Metropolis-Hastings algorithm
Metropolis-Hastings algorithm

In mathematics and physics, the Metropolis-Hastings algorithm is a method for creating a Markov chain that can be used to generate a sequence of Sample_%28statistics%29 from a probability distribution that is difficult to Sampling from directly....
 schemes. Recently Bayesian inference has gained popularity amongst the phylogenetics
Phylogenetics

In biology, phylogenetics is the study of evolutionary relatedness among various groups of organisms , which is discovered through molecular sequencing data and morphological data matrices....
 community for these reasons; applications such as , and allow many demographic and evolutionary parameters to be estimated simultaneously.

As applied to statistical classification
Statistical classification

Statistical classification is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items and based on a training set of previously labeled items....
, Bayesian inference has been used in recent years to develop algorithms for identifying unsolicited bulk e-mail spam
E-mail spam

E-mail spam, also known as junk e-mail, is a subset of spam that involves nearly identical messages sent to numerous recipients by e-mail....
. Applications which make use of Bayesian inference for spam filtering include DSPAM
DSPAM

DSPAM is a free software statistical spam filter written by Jonathan A. Zdziarski, author of the book Ending Spam and other books. It is intended to be a scalability, content-based spam filter for large multi-user systems....
, Bogofilter
Bogofilter

Bogofilter is a mail filter that classifies e-mail as spamming or ham by a statistics analysis of the message's header and content . The program is able to learn from the user's classifications and corrections....
, SpamAssassin
SpamAssassin

SpamAssassin is a computer program released under the Apache License used for e-mail spam anti-spam techniques based on content-matching rules....
, SpamBayes
SpamBayes

SpamBayes is a Bayesian spam filtering Stopping_e-mail_abuse#Spam_filters written in Python which uses techniques laid out by Paul Graham in his essay "A Plan for Spam"....
, and Mozilla
Mozilla

Mozilla was the official, public, original name of Mozilla Application Suite by the Mozilla Foundation, currently known as SeaMonkey internet suite....
. Spam classification is treated in more detail in the article on the naive Bayes classifier
Naive Bayes classifier

A naive Bayes classifier is a term in Bayesian statistics statistics dealing with a simple probabilistic Classifier based on applying Bayes' theorem with strong statistical independence assumptions....
.

In some applications fuzzy logic
Fuzzy logic

Fuzzy logic is a form of multi-valued logic derived from fuzzy set theory to deal with reasoning that is approximate rather than precise. In binary sets with binary logic, in contrast to fuzzy logic named also crisp logic, the variables may have a Membership function of only 0 or 1....
 is an alternative to Bayesian inference. Fuzzy logic and Bayesian inference, however, are mathematically and semantically not compatible. You cannot, in general, understand the degree of truth in fuzzy logic as probability and vice versa; fuzziness measures "the degree to which an event occurs, not whether it occurs".

See also


External links

  • from Queen Mary University of London
  • Bayes' Theorem for the curious and bewildered; an excruciatingly gentle introduction by Eliezer Yudkowsky
    Eliezer Yudkowsky

    Eliezer S. Yudkowsky is an United States of America artificial intelligence researcher concerned with the technological singularity, and an advocate of Friendly Artificial Intelligence....
  • Paul Graham. (exposition of a popular approach for spam classification)
  • How to implement Bayes' Theorem for online rating and ranking systems
  • , categorized and annotated. Designed for cognitive science; maintained by .
  • A short article on Baysian Multisensory Perception
  • Bayesian probabilistic learning in robots
  • a comprehensive Bayesian treatment of Inductive Logic and Confirmation Theory
  • An extensive presentation of Bayesian Confirmation Theory