All Topics  
Dempster-Shafer theory

 

   Email Print
   Bookmark   Link






 

Dempster-Shafer theory



 
 
The Dempster–Shafer theory is a mathematical theory of evidence
Evidence

Evidence in its broadest sense includes everything that is used to determine or demonstrate the truth of an assertion. Giving or procuring evidence is the process of using those things that are either a) presumed to be true, or b) were themselves proven via evidence, to demonstrate an assertion's truth....
 based on belief functions and plausible reasoning, which is used to combine separate pieces of information (evidence) to calculate the probability of an event. The theory was developed by Arthur P. Dempster
Arthur P. Dempster

Arthur Pentland Dempster is a Professor Emeritus in the Harvard University Department of Statistics. He was one of four faculty when the department was founded in 1957....
 and Glenn Shafer.

first gamble is that we bet on a head turning up when we toss a coin that is known to be fair.






Discussion
Ask a question about 'Dempster-Shafer theory'
Start a new discussion about 'Dempster-Shafer theory'
Answer questions from other users
Full Discussion Forum



Encyclopedia


The Dempster–Shafer theory is a mathematical theory of evidence
Evidence

Evidence in its broadest sense includes everything that is used to determine or demonstrate the truth of an assertion. Giving or procuring evidence is the process of using those things that are either a) presumed to be true, or b) were themselves proven via evidence, to demonstrate an assertion's truth....
 based on belief functions and plausible reasoning, which is used to combine separate pieces of information (evidence) to calculate the probability of an event. The theory was developed by Arthur P. Dempster
Arthur P. Dempster

Arthur Pentland Dempster is a Professor Emeritus in the Harvard University Department of Statistics. He was one of four faculty when the department was founded in 1957....
 and Glenn Shafer.

Consider two possible gambles

The first gamble is that we bet on a head turning up when we toss a coin that is known to be fair. Now consider the second gamble, in which we bet on the outcome of a fight between the world's greatest boxer and the world's greatest wrestler. Assume we are fairly ignorant about martial arts and would have great difficulty making a choice of who to bet on.

Many people would feel more unsure about taking the second gamble, in which the probabilities are unknown, rather than the first gamble, in which the probabilities are easily seen to be one half for each outcome. Dempster–Shafer theory allows one to consider the confidence one has in the probabilities assigned to the various outcomes.

Formalism

Let X be the universal set: the set of all states under consideration. The power set
Power set

In mathematics, given a Set S, the power set of S, written , P, ℘ or Power set#Representing subsets as functions, is the set of all subsets of S....
, , is the set of all possible sub-sets of X, including the empty set
Empty set

In mathematics, and more specifically set theory, the empty set is the unique Set having no members. Some axiomatic set theories assure that the empty set exists by including an axiom of empty set; in other theories, its existence can be deduced....
. For example, if:

then

The elements of the power set can be taken to represent propositions that one might be interested in, by containing all and only the states in which this proposition is true.

The theory of evidence assigns a belief mass to each subset of the power set. Formally, a function , is called a basic belief assignment (BBA), when it verifies two axioms. First, the mass of the empty set is zero:

Second, the masses of the remaining members of the power set add up to a total of 1:

The mass m(A) of a given member of the power set, A, expresses the proportion of all relevant and available evidence that supports the claim that the actual state belongs to A but to no particular subset of A. The value of m(A) pertains only to the set A and makes no additional claims about any subsets of A, each of which have, by definition, their own mass.

From the mass assignments, the upper and lower bounds of a probability interval can be defined. This interval contains the precise probability of a set of interest (in the classical sense), and is bounded by two non-additive continuous measures called belief (or support) and plausibility:

The belief bel(A) for a set A is defined as the sum of all the masses of (not necessarily proper) subsets of the set of interest:

The plausibility pl(A) is the sum of all the masses of the sets B that intersect the set of interest A:

The two measures are related to each other as follows:

It follows from the above that you need know but one of the three (mass, belief, or plausibility) to deduce the other two, though you may need to know the values for many sets in order to calculate one of the other values for a particular set.

Dempster's rule of combination

The problem we now face is how to combine two independent sets of mass assignments. The original combination rule, known as Dempster's rule of combination, is a generalization of Bayes' rule. This rule strongly emphasises the agreement between multiple sources and ignores all the conflicting evidence through a normalization factor. Use of that rule has come under serious criticism when significant conflict in the information is encountered.

Specifically, the combination (called the joint mass) is calculated from the two sets of masses and in the following manner:

where

is a measure of the amount of conflict between the two mass sets. The normalization factor, , has the effect of completely ignoring conflict and attributing any mass associated with conflict to the null set. Consequently, this operation yields counterintuitive results in the face of significant conflict in certain contexts.

Discussion


Dempster–Shafer theory is a generalization of the Bayesian theory of subjective probability; whereas the latter requires probabilities for each question of interest, belief functions base degrees of belief (or confidence, or trust) for one question on the probabilities for a related question. These degrees of belief may or may not have the mathematical properties of probabilities; how much they differ depends on how closely the two questions are related. Put another way, it is a way of representing epistemic
Epistemology

Epistemology or theory of knowledge is the branch of philosophy concerned with the nature and scope of knowledge. It addresses the questions:...
 plausibilities but it can yield answers which contradict those arrived at using probability theory
Probability theory

Probability theory is the branch of mathematics concerned with analysis of Statistical randomness phenomena. The central objects of probability theory are random variables, stochastic processes, and event s: mathematical abstractions of determinism events or measured quantities that may either be single occurrences or evolve over time in an a...
.

Often used as a method of sensor fusion
Sensor fusion

Sensor fusion is the combining of sensory data or data derived from sensory data from disparate sources such that the resulting information is in some sense better than would be possible when these sources were used individually....
, Dempster–Shafer theory is based on two ideas: obtaining degrees of belief for one question from subjective probabilities for a related question, and Dempster's rule for combining such degrees of belief when they are based on independent items of evidence. In essence, the degree of belief in a proposition depends primarily upon the number of answers (to the related questions) containing the proposition, and the subjective probability of each answer. Also contributing are the rules of combination that reflect general assumptions about the data.

In this formalism a degree of belief (also referred to as a mass) is represented as a belief function rather than a Bayesian probability distribution
Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval ....
. Probability values are assigned to sets of possibilities rather than single events: their appeal rests on the fact they naturally encode evidence in favor of propositions.

Dempster–Shafer theory assigns its masses to all of the subsets of the entities that comprise a system. Suppose for example that a system has five members, that is to say five independent states, exactly one of which is actual. If the original set is called S, , then the set of all subsets —the power set— is called 2S. Since you can express each possible subset as a binary vector (describing whether any particular member is present or not by writing a “1” or a “0” for that member's slot), it can be seen that there are 25 subsets possible ( in general), ranging from the empty subset (0, 0, 0, 0, 0) to the "everything" subset (1, 1, 1, 1, 1). The empty subset represents a contradiction, which is not true in any state, and is thus assigned a mass of zero; the remaining masses are normalised so that their total is 1. The "everything" subset is often labelled "unknown" as it represents the state where all elements are present, in the sense that you cannot tell which is actual.

Belief and plausibility

Shafer's framework allows for belief about propositions to be represented as intervals, bounded by two values, belief (or support) and plausibility:

beliefplausibility.


Belief in a hypothesis is constituted by the sum of the masses of all sets enclosed by it (i.e. the sum of the masses of all subsets of the hypothesis). It is the amount of belief that directly supports a given hypothesis at least in part, forming a lower bound. Plausibility is 1 minus the sum of the masses of all sets whose intersection with the hypothesis is empty. It is an upper bound on the possibility that the hypothesis could possibly happen, i.e. it "could possibly happen" up to that value, because there is only so much evidence that contradicts that hypothesis.

For example, suppose we have a belief of 0.5 and a plausibility of 0.8 for a proposition, say "the cat in the box is dead." This means that we have evidence that allows us to state strongly that the proposition is true with a confidence of 0.5. However, the evidence contrary to that hypothesis (i.e. "the cat is alive") only has a confidence of 0.2. The remaining mass of 0.3 (the gap between the 0.5 supporting evidence on the one hand, and the 0.2 contrary evidence on the other) is "indeterminate," meaning that the cat could either be dead or alive. This interval represents the level of uncertainty based on the evidence in your system.

Hypothesis Mass Belief Plausibility
Null (neither alive nor dead) 0 0 0
Alive 0.2 0.2 0.5
Dead 0.5 0.5 0.8
Either (alive or dead) 0.3 1.0 1.0


The null hypothesis is set to zero by definition (it corresponds to "no solution"). The orthogonal hypotheses "Alive" and "Dead" have probabilities of 0.2 and 0.5, respectively. This could correspond to "Live/Dead Cat Detector" signals, which have respective reliabilities of 0.2 and 0.5. Finally, the all-encompassing "Either" hypothesis (which simply acknowledges there is a cat in the box) picks up the slack so that the sum of the masses is 1. The belief for the "Alive" and "Dead" hypotheses matches their corresponding masses because they have no subsets; belief for "Either" consists of the sum of all three masses (Either, Alive, and Dead) because "Alive" and "Dead" are each subsets of "Either". The "Alive" plausibility is 1-m (Death) and the "Dead" plausibility is 1-m (Alive). Finally, the "Either" plausibility sums m(Alive)+m(Dead)+m(Either). The universal hypothesis ("Either") will always have 100% belief and plausibility —it acts as a checksum of sorts.

Here is a somewhat more elaborate example where the behaviour of belief and plausibility begins to emerge. We're looking at a faraway object, which can only be coloured in one of three colours (red, white, and blue) through a variety of detector modes:

Hypothesis Mass Belief Plausibility
Null 0 0 0
Red 0.35 0.35 0.56
White 0.25 0.25 0.45
Blue 0.15 0.15 0.34
Red or white 0.06 0.66 0.85
Red or blue 0.05 0.55 0.75
White or blue 0.04 0.44 0.65
Any 0.1 1.0 1.0


Although these are rather bad examples, as events of that kind would not be modeled as disjoint sets in the probability space, rather would the event "red or blue" be considered as the union of the events "red" and "blue", thereby (see the axioms of probability theory) P(red or white) ≥ P(white) = 0.25 and p(any)=1. Only the three disjoint events "Blue" "Red" and "White" would need to add up to 1. In fact one could model a probability measure on the space linear proportional to "plausibility" (normalized so that P(red) + P(white) + P(blue) = 1, and with the exception that still all probabilities are ≤ 1)

Combining beliefs

Beliefs corresponding to independent pieces of information are combined using Dempster's rule of combination which is a generalisation of the special case of Bayes' theorem
Bayes' theorem

In probability theory, Bayes' theorem relates the Conditional probability of two random events. It is often used to compute posterior probabilities given observations....
 where events are independent. Note that the probability masses from propositions that contradict each other can also be used to obtain a measure of how much conflict there is in a system. This measure has been used as a criterion for clustering multiple pieces of seemingly conflicting evidence around competing hypotheses.

In addition, one of the computational advantages of the Dempster–Shafer framework is that priors and conditionals need not be specified, unlike Bayesian methods which often use a symmetry (minimax error) argument to assign prior probabilities to random variables (e.g. assigning 0.5 to binary values for which no information is available about which is more likely). However, any information contained in the missing priors and conditionals is not used in the Dempster–Shafer framework unless it can be obtained indirectly — and arguably is then available for calculation using Bayes equations.

Dempster–Shafer theory allows one to specify a degree of ignorance in this situation instead of being forced to supply prior probabilities which add to unity. This sort of situation, and whether there is a real distinction between risk
Risk

Risk is a concept that denotes the precise probability of specific eventualities. Technically, the notion of risk is independent from the notion of value and, as such, eventualities may have both beneficial and adverse consequences....
 and ignorance
Ignorance

Ignorance is the state in which a person lacks knowledge, sophistication or intelligence. The word 'Ignorant' is an adjective describing a person in that state....
, has been extensively discussed by statisticians and economists. See, for example, the contrasting views of Daniel Ellsberg, Howard Raiffa
Howard Raiffa

Howard Raiffa is the Frank P. Ramsey Professor of Managerial Economics, a joint chair held by the Business School and the Kennedy School of Government at Harvard University....
, Kenneth Arrow and Frank Knight
Knightian uncertainty

In economics, Knightian uncertainty is risk that is immeasurable, not possible to calculate.Knightian uncertainty is named after University of Chicago economist Frank Knight , who distinguished risk and uncertainty in his seminal work Risk, Uncertainty, and Profit:...
.

Critics

Judea Pearl
Judea Pearl

Judea Pearl is a computer scientist and philosopher, best known for developing the probability approach to artificial intelligence, in particular through Bayesian networks , and for the formalization of causal reasoning ....
 (1988a, chapter 9; 1988b and 1990); has argued that it is misleading to interpret belief functions as representing either "probabilities of an event," or "the confidence one has in the probabilities assigned to various outcomes," or "degrees of belief (or confidence, or trust) in a proposition," or "degree of ignorance in a situation." Instead, belief functions represent the probability that a given proposition is provable from a set of other propositions, to which probabilities are assigned. Confusing probabilities of truth with probabilities of provability may lead to counterintuitive results in reasoning tasks such as (1) representing incomplete knowledge, (2) belief-updating and (3) evidence pooling. He further demonstrated that, if partial knowledge is encoded and updated by belief function methods, the resulting beliefs cannot serve as a basis for rational decisions.

Klopotek and Wierzchon:

proposed to interpret the Dempster–Shafer theory in terms of statistics of decision tables (of the rough set theory), whereby the operator of combining evidence should be seen as relational join of decision tables. In another interpretation

they propose to view this theory as describing destructive material processing (under loss of properties), e.g. like in some semiconductor production processes. Under both interpretations reasoning in DST gives correct results, contrary to the earlier probabilistic interpretations, criticized by Pearl in the cited papers and by other researches.

See also

  • Possibility theory
    Possibility theory

    Possibility theory is a mathematical theory for dealing with certain types of uncertainty and is an alternative to probability theory....
  • Probability theory
    Probability theory

    Probability theory is the branch of mathematics concerned with analysis of Statistical randomness phenomena. The central objects of probability theory are random variables, stochastic processes, and event s: mathematical abstractions of determinism events or measured quantities that may either be single occurrences or evolve over time in an a...
  • Bayes' theorem
    Bayes' theorem

    In probability theory, Bayes' theorem relates the Conditional probability of two random events. It is often used to compute posterior probabilities given observations....
  • Bayesian network
    Bayesian network

    A Bayesian network is a probabilistic graphical model that represents a set of variables and their probabilistic independencies. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms....
  • G.L.S. Shackle
  • Transferable belief model
    Transferable belief model

    Consider the following classical problem of information fusion. A patient has an illness that can be caused by three different factors A, B and C....
  • Info-gap decision theory
    Info-gap decision theory

    Info-gap decision theory is a non-probabilistic decision theory seeking to optimize robustness to failure, or opportuneness for windfall, under severe uncertainty....
  • Subjective logic
    Subjective logic

    Subjective logic is a type of probabilistic logic that explicitly takes uncertainty and belief ownership into account. In general, subjective logic is suitable for modeling and analysing situations involving uncertainty and incomplete knowledge....


Further reading

  • Yager, R. R., & Liu, L. (2008). Classic works of the Dempster–Shafer theory of belief functions. Studies in fuzziness and soft computing, v. 219. Berlin: Springer
    Springer Science+Business Media

    Springer Science+Business Media or Springer is a worldwide publishing company based in Germany, which publishes textbooks, academic reference books, and peer-reviewed topical journals, with a focus on science, technology, mathematics, and medicine....
    . ISBN 9783540253815.