All Topics  
Bayesian probability

 
Bayesian Probability

   Email Print
   Bookmark   Link






 

Bayesian probability



 
 
Bayesian probability interprets the concept of probability
Probability

Probability, or wikt:chance, is a way of expressing knowledge or belief that an Event will occur or has occurred. In mathematics the concept has been given an exact meaning in probability theory, that is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy to draw conclusions about t...
 as 'a measure of a state of knowledge' , and not as a frequency as in orthodox statistics. Broadly speaking, there are two views on Bayesian probability that interpret the 'state of knowledge' concept in different ways. For the objectivist school, the rules of Bayesian statistics can be justified by desiderata of rationality and consistency
Cox's theorem

Cox's theorem, named after the physicist Richard Threlkeld Cox, is a derivation of the laws of probability theory from a certain set of postulates....
 and interpreted as an extension of Aristotelian logic.






Discussion
Ask a question about 'Bayesian probability'
Start a new discussion about 'Bayesian probability'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Bayesian probability interprets the concept of probability
Probability

Probability, or wikt:chance, is a way of expressing knowledge or belief that an Event will occur or has occurred. In mathematics the concept has been given an exact meaning in probability theory, that is used extensively in such areas of study as mathematics, statistics, finance, gambling, science, and philosophy to draw conclusions about t...
 as 'a measure of a state of knowledge' , and not as a frequency as in orthodox statistics. Broadly speaking, there are two views on Bayesian probability that interpret the 'state of knowledge' concept in different ways. For the objectivist school, the rules of Bayesian statistics can be justified by desiderata of rationality and consistency
Cox's theorem

Cox's theorem, named after the physicist Richard Threlkeld Cox, is a derivation of the laws of probability theory from a certain set of postulates....
 and interpreted as an extension of Aristotelian logic. For the subjectivist school, the state of knowledge corresponds to a 'personal belief' . Many modern machine learning
Machine learning

Machine learning is the subfield of artificial intelligence that is concerned with the design and development of algorithms that allow computers to improve their performance over time based on data, such as from sensor data or databases....
 methods are based on objectivist Bayesian principles . One of the crucial features of the Bayesian view is that a probability can be assigned to a hypothesis, which is not possible under the frequentist view, where a hypothesis can only be rejected or not rejected.

The Bayesian probability calculus


According to the Bayesian probability calculus, the probability of a hypothesis given the data (the posterior) is proportional to the product of the likelihood times the prior probability
Prior probability

A prior probability is a conditional probability, interpreted as a description of what is known about a variable in the absence of some Marginal likelihood....
 (often just called the prior). The likelihood brings in the effect of the data, while the prior specifies the belief in the hypothesis before the data was observed.

More formally, the Bayesian probability calculus makes use of Bayes' formula - a theorem that is valid in all common interpretations of probability - in the following way:



where

  • is a hypothesis, and is the data.
  • is the prior probability
    Prior probability

    A prior probability is a conditional probability, interpreted as a description of what is known about a variable in the absence of some Marginal likelihood....
     of : the probability that is correct before the data was seen.
  • is the conditional probability
    Conditional probability

    Conditional probability is the probability of some event A, given the occurrence of some other event B. Conditional probability is written P, and is read "the probability of A, given B"....
     of seeing the data given that the hypothesis is true. is called the likelihood.
  • is the marginal probability of .
  • is the posterior probability
    Posterior probability

    The posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant Scientific evidence is taken into account....
    : the probability that the hypothesis is true, given the data and the previous state of belief about the hypothesis.


is the a priori probability of witnessing the data under all possible hypotheses. Given any exhaustive set of mutually exclusive
Mutually exclusive

In simple terms, two events are mutually exclusive if they cannot occur at the same time ....
 hypotheses , we have:

.


We can consider here to index alternative worlds, of which there is exactly one which we inhabit, and is the hypothesis that we are in the world . is then the probability that we are in the world and witness the data. Since the set of alternative worlds was assumed to be mutually exclusive and exhaustive, the above formula is a case of the law of alternatives
Law of total probability

In probability theory, the law of total probability is that "the prior probability of A is equal to the prior expected value of the posterior probability of A." That is, for any random variable N,...
.

is a normalizing constant
Normalizing constant

The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics....
 that only depends on the data, and which in most cases does not need to be computed explicitly. As a result, Bayes' formula is often simplified to:



where denotes proportionality
Proportionality (mathematics)

In mathematics, two quantity are called proportional if they vary in such a way that one of the quantities is a constant multiple of the other, or equivalently if they have a constant ratio....
.

In general, Bayesian methods are characterized by the following concepts and procedures:

  • The use of hierarchical models, and the marginalization
    Marginalization

    Marginalization is the social process of becoming or being made marginal ; "the marginalization of the underclass"; "marginalization of literature" and many other are some examples....
     over the values of nuisance parameters. In most cases, the computation is intractable, but good approximations can be obtained using Markov chain Monte Carlo
    Markov chain Monte Carlo

    Markov chain Monte Carlo method methods , are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its Markov chain#Steady-state_analysis_and_limiting_distributions....
     methods.
  • The sequential use of the Bayes' formula: when more data becomes available after calculating a posterior distribution, the posterior becomes the next prior.
  • In frequentist statistics, a hypothesis can only be rejected or not rejected
    Statistical hypothesis testing

    A statistical hypothesis test is a method of making statistical decisions using experimental data. It is sometimes called confirmatory data analysis, in contrast to exploratory data analysis....
    . In Bayesian statistics, a probability can be assigned to a hypothesis.


The objective and subjective variants of Bayesian probability differ mainly in their interpretation and construction of the prior probability.

History


Pierre Simon Laplace
The term Bayesian refers to Thomas Bayes
Thomas Bayes

Thomas Bayes was a Kingdom of Great Britain mathematician and Presbyterian minister, known for having formulated a specific case of the theorem that bears his name: Bayes' theorem, which was published posthumously....
 (1702–1761), who proved a special case of what is now called Bayes' theorem
Bayes' theorem

In probability theory, Bayes' theorem relates the Conditional probability of two random events. It is often used to compute posterior probabilities given observations....
. However, it was Pierre-Simon Laplace
Pierre-Simon Laplace

Pierre-Simon, marquis de Laplace was a France mathematician and astronomer whose work was pivotal to the development of astronomy and statistics....
 (1749–1827) who introduced a general version of the theorem and used it to approach problems in celestial mechanics, medical statistics, reliability
Reliability (statistics)

In statistics, reliability is the consistency of a set of measurements or measuring instrument, often used to describe a Test . This can either be whether the measurements of the same instrument give or are likely to give the same measurement , or in the case of more subjective instruments, such as personality or trait inventories, whether t...
, and jurisprudence
Jurisprudence

Jurisprudence is the theory and philosophy of law. Scholars of jurisprudence, or legal philosophers, hope to obtain a deeper understanding of the nature of law, of legal reasoning, legal systems and of legal institutions....
 .

The frequentist view of probability overshadowed the Bayesian view during the first half of the 20th century due to prominent figures such as Ronald Fisher
Ronald Fisher

Sir Ronald Aylmer Fisher, Fellow of the Royal Society was an England statistician, evolutionary biologist, and genetics. He was described by Anders Hald as "a genius who almost single-handedly created the foundations for modern statistical science" and Richard Dawkins described him as "the greatest of Charles Darwin successors"....
, Jerzy Neyman
Jerzy Neyman

Jerzy Neyman , born Jerzy Splawa-Neyman, was a Polish-American mathematician and statistician.He was born into a Poles family in Bendery, Bessarabia in Imperial Russia, the fourth of four children of Czeslaw Splawa-Neyman and Kazimiera Lutoslawska....
 and Egon Pearson
Egon Pearson

Egon Sharpe Pearson was the only son of Karl Pearson, and like his father, a leading British statistician. He went to Winchester School and Trinity College, Cambridge, and succeeded his father as professor of statistics at University College London and as editor of the journal Biometrika....
. The word Bayesian appeared in the 1950s, and by the 1960s it became the term preferred by people who sought to escape the limitations and inconsistencies of the frequentist approach to probability theory . Before that time, Bayesian methods were known under the name of inverse probability (because they often involve inferring causes from effects).

In the 20th century, the ideas of Laplace were further developed in two different directions, giving rise to the objective and subjective Bayesian schools. In the objectivists school, the statistical analysis depends only on the model assumed and the data analysed . No subjective decisions need to be involved. In contrast, the subjectivist school denies the possibility of fully objective analysis for the general case.

In the further development of Laplace's ideas, the subjective school predates the objectivist school. The idea that 'probability' should be interpreted as 'subjective degree of belief in a proposition' was proposed by John Maynard Keynes in the early 1920s. This idea was taken further by Bruno de Finetti
Bruno de Finetti

Bruno de Finetti was an Italy list of probabilists and statistician, noted for the "operational subjective" conception of probability. The classic exposition of his distinctive theory is the 1937 "La pr?vision: ses lois logiques, ses sources subjectives," which discussed probability founded on the coherence of betting odds and the consequenc...
 in Italy (Fondamenti Logici del Ragionamento Probabilistico, 1930) and Frank Ramsey
Frank P. Ramsey

Frank Plumpton Ramsey was a United Kingdom mathematician who, in addition to mathematics, made significant contributions in philosophy and economics....
 in Cambridge (The Foundations of Mathematics, 1931). The approach was devised to solve problems with the frequentist definition of probability but also with the earlier, objectivist approach of Laplace . The subjective school was further developed and popularized in the 1950's by L.J. Savage
Leonard Jimmie Savage

Leonard Jimmie Savage was a US mathematician and List of statisticians.He graduated from the University of Michigan and later worked at the Institute for Advanced Study in Princeton, New Jersey, the University of Chicago, the University of Michigan, Yale University, and the Statistical Research Group at Columbia University....
.

The strong revival of objective Bayesian inference was mainly due to Harold Jeffreys
Harold Jeffreys

Sir Harold Jeffreys, Fellow of the Royal Society was a mathematician, statistician, geophysicist, and astronomer.He was born in Fatfield, County Durham, England....
, whose seminal book "Theory of probability" first appeared in 1939. In 1957, Edwin Jaynes
Edwin Thompson Jaynes

Edwin Thompson Jaynes was Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis, Missouri. He wrote extensively on statistical mechanics and on foundations of probability and statistical inference, initiating in 1957 the Maximum entropy thermodynamics of thermodynamics, as being a particular application of mor...
 introduced the concept of maximum entropy, which is an important principle in the formulation of objective methods, mainly for discrete problems. In 1979, José-Miguel Bernardo
José-Miguel Bernardo

Jos?-Miguel Bernardo is a Spanish mathematician and statistician. A noted Bayesian, he is currently a professor of Statistics at the University of Valencia....
 introduced reference analysis, which offers a general applicable framework for objective analysis.

In contrast to the frequentist view of probability, the Bayesian viewpoint has a well formulated axiomatic basis. In 1946, Richard T. Cox
Richard Threlkeld Cox

Richard Threlkeld Cox was a professor of physics at Johns Hopkins University, known for Cox's theorem relating to the foundations of probability....
 showed that the rules of Bayesian inference necessarily follow from a simple set of desiderata, including the representation of degrees of belief by real numbers and the need for consistency . Another fundamental justification of the Bayesian approach is De Finetti's theorem
De Finetti's theorem

In probability theory, de Finetti's theorem explains why exchangeable random variables observations are conditionally independent given some unobservable quantity to which an epistemic probability probability distribution would then be assigned....
, which was formulated in 1930 .

Other well-known proponents of Bayesian probability theory include I.J. Good, B.O. Koopman, Dennis Lindley
Dennis Lindley

Dennis Victor Lindley is a British statistics, decision theorist and leading advocate of Bayesian statistics.Dennis Lindley grew up in the south-west London suburb of Surbiton....
, Howard Raiffa
Howard Raiffa

Howard Raiffa is the Frank P. Ramsey Professor of Managerial Economics, a joint chair held by the Business School and the Kennedy School of Government at Harvard University....
, Robert Schlaifer
Robert Schlaifer

Robert O. Schlaifer was a pioneer of Bayesian decision theory. At the time of his death he was William Ziegler Professor of Business Administration Emeritus of the Harvard Business School....
 and Alan Turing
Alan Turing

Alan Mathison Turing, Order of the British Empire, Fellow of the Royal Society was a British mathematician, logician and Cryptanalysis....
.

In the 1980's, there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to dramatic improvements in hardware and software, and an increasing interest in nonstandard, complex applications . Despite the advantages of the Bayesian approach (such as a solid axiomatic basis and wider scope), most undergraduate teaching is still based on frequentist statistics, mainly due to academic inertia . Nonetheless, Bayesian methods are widely accepted and used, such as for example in the field of machine learning
Machine learning

Machine learning is the subfield of artificial intelligence that is concerned with the design and development of algorithms that allow computers to improve their performance over time based on data, such as from sensor data or databases....
 .

Justification of the Bayesian view


There are three main ways in which the Bayesian view can be justified: the Cox axioms
Cox's theorem

Cox's theorem, named after the physicist Richard Threlkeld Cox, is a derivation of the laws of probability theory from a certain set of postulates....
, the Dutch book argument
Dutch book

In gambling a Dutch book or lock is a set of odds and bets which guarantees a profit, regardless of the outcome of the gamble. It is associated with probability implied by the odds not being Coherence ....
 and de Finetti's theorem
De Finetti's theorem

In probability theory, de Finetti's theorem explains why exchangeable random variables observations are conditionally independent given some unobservable quantity to which an epistemic probability probability distribution would then be assigned....
.

Richard T. Cox
Richard Threlkeld Cox

Richard Threlkeld Cox was a professor of physics at Johns Hopkins University, known for Cox's theorem relating to the foundations of probability....
 showed that Bayesian inference is the only inductive inference that is logically consistent . The rules of Bayesian inference necessary follow from some simple desiderata, such as consistency and the fact that a probability is expressed numerically. Both Cox and ET Jaynes promoted the view of Bayesian inference as an extension of Aristotelian logic.

Objective versus subjective Bayesian inference

Subjective Bayesian probability interprets 'probability' as 'the degree of belief (or strength of belief) an individual has in the truth of a proposition', and is in that respect subjective. In particular, they claim the choice of the prior is necessarily subjective.

Other Bayesians state that such subjectivity can be avoided, and claim that the prior state of knowledge uniquely defines a prior probability distribution for well posed problems. This was also the position taken in by the first followers of the Bayesian view, beginning with Laplace. In the Bayesian revival in the 20th century, the chief proponents of this objectivist school were Edwin Thompson Jaynes
Edwin Thompson Jaynes

Edwin Thompson Jaynes was Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis, Missouri. He wrote extensively on statistical mechanics and on foundations of probability and statistical inference, initiating in 1957 the Maximum entropy thermodynamics of thermodynamics, as being a particular application of mor...
 and Harold Jeffreys
Harold Jeffreys

Sir Harold Jeffreys, Fellow of the Royal Society was a mathematician, statistician, geophysicist, and astronomer.He was born in Fatfield, County Durham, England....
. More recently, James Berger (Duke University
Duke University

Duke University is a private university research university located in Durham, North Carolina, North Carolina, United States. Founded by Methodism and Religious Society of Friends in the present-day town of Trinity, North Carolina in 1838, the school moved to Durham in 1892....
) and José-Miguel Bernardo
José-Miguel Bernardo

Jos?-Miguel Bernardo is a Spanish mathematician and statistician. A noted Bayesian, he is currently a professor of Statistics at the University of Valencia....
 (Universitat de València) have contributed to the development of objective Bayesian methods. For the objective construction of the prior distribution, the following principles can be applied:

  • Maximum entropy
  • Transformation group analysis
  • Reference analysis


Scientific method


The scientific method
Scientific method

Scientific method refers to techniques for investigating phenomenon, acquiring new knowledge, or correcting and integrating previous knowledge. To be termed scientific, a method of inquiry must be based on gathering observable, empirical and Measure evidence subject to specific principles of reasoning....
 can be interpreted as an application of Bayesian probabilist inference . In this view, Bayes' theorem is explicitly or implicitly used to update the strength of prior scientific beliefs in the truth of hypotheses
Hypothesis

A hypothesis consists either of a suggested explanation for an observable phenomenon or of a reasoned proposal predicting a possible causal correlation among multiple phenomena....
 in the light of new information from observation or experiment
Experiment

In scientific inquiry, an experiment is a method of investigating causal relationships among variables. An experiment is a cornerstone of the empiricism approach to acquiring data about the world and is used in both natural sciences and social sciences....
. ET Jaynes' book probability theory (which appeared posthumously in 2003) has the logic of science as subtitle, and argues for the scientific method as a form of Bayesian inference.

See also

  • Bayesian inference
    Bayesian inference

    Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true....
    : practical application of the Bayesian view
  • Bayesian network
    Bayesian network

    A Bayesian network is a probabilistic graphical model that represents a set of variables and their probabilistic independencies. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms....
    : Bayesian reasoning for multiple variables in the presence of conditional independencies
  • Bertrand's paradox
    Bertrand's paradox (probability)

    Bertrand's paradox is a problem within the classical interpretation of probability theory. Consider an equilateral triangle inscribed in a circle....
    : a paradox in classical probability, solved by Bayesian methods
  • De Finetti's game
    De Finetti's game

    De Finetti's game uses objective means to measure subjective probability. It wasdeveloped by Italian statistician Bruno de Finetti.The principles of de Finetti's game, with a relevant example, follow....
    : a procedure for evaluating someone's subjective probability
  • Fiducial inference
    Fiducial inference

    Fiducial inference was a form of statistical inference put forward by Ronald Fisher in an attempt to perform inverse probability without prior probability distributions....
    : Fisher's attempt to produce 'posterior' distributions without the use of a prior.
  • Frequency probability
    Frequency probability

    Frequency probability is the Probability interpretations that defines an event's probability as the limit of its relative frequency in a large number of trials....
    : the main alternative to the Bayesian view
  • Inference
    Inference

    Inference is the act or process of deriving a logical consequence from premises.Inference is studied within several different fields.* Human inference is traditionally studied within the field of cognitive psychology....
  • Maximum entropy thermodynamics
    Maximum entropy thermodynamics

    In physics, maximum entropy thermodynamics views equilibrium thermodynamics and statistical mechanics as Inference#Inference and uncertainty processes....
    : a Bayesian view of thermodynamics due to Edwin T. Jaynes
  • Probability interpretations
    Probability interpretations

    The word probability has been used in a variety of ways since it was first coined in relation to games of chance. Does probability measure the real, physical tendency of something to occur, or is it just a measure of how strongly one believes it will occur? In answering such questions, we interpret the probability values of probability theo...
  • Uncertainty
    Uncertainty

    Uncertainty is a term used in subtly different ways in a number of fields, including philosophy, Uncertainty_principle , statistics, economics, finance, insurance, psychology, sociology, engineering, and information science....


Footnotes


External links

  • , by David MacKay, has many chapters on Bayesian methods, including introductory examples; arguments in favour of Bayesian methods (in the style of Edwin Jaynes
    Edwin Thompson Jaynes

    Edwin Thompson Jaynes was Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis, Missouri. He wrote extensively on statistical mechanics and on foundations of probability and statistical inference, initiating in 1957 the Maximum entropy thermodynamics of thermodynamics, as being a particular application of mor...
    ); state-of-the-art Monte Carlo method
    Monte Carlo method

    Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used when computer simulation physics and mathematics systems....
    s, message-passing method
    Message-passing method

    Message-passing methods are a set of algorithms in statistics/machine learning for doing inference through local computation. Belief propagation on Bayesian networks is a good example of a message-passing method....
    s, and variational methods
    Calculus of variations

    Calculus of variations is a field of mathematics that deals with functional , as opposed to ordinary calculus which deals with function . Such functionals can for example be formed as integrals involving an unknown function and its derivatives....
    ; and examples illustrating the intimate connections between Bayesian inference and data compression
    Data compression

    In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits than an code representation would use through use of specific encoding schemes....
    .
  • A very gentle introduction by Eliezer Yudkowsky
  • from Queen Mary University of London
  • Jaynes, E.T. (1998) .
  • Bretthorst, G. Larry, 1988, in Lecture Notes in Statistics, 48, Springer-Verlag, New York, New York;
  • Jeff Miller
  • James Franklin , history from a Bayesian point of view.
  • Is the portrait of Thomas Bayes authentic? The IMS Bulletin, Vol. 17 (1988), No. 3, pp. 276-278