Probability box
Encyclopedia
A probability box is a characterization of an uncertain number consisting of both aleatoric and epistemic uncertainties
Uncertainty quantification
Uncertainty quantification is the science of quantitative characterization and reduction of uncertainties in applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known...

 that is often used in risk analysis
Risk analysis
Risk Analysis may refer to:*Quantitative risk analysis*Risk analysis **Probabilistic risk assessment, an engineering safety analysis*Risk analysis *Risk Management*Risk management tools* Certified Risk Analyst...

 or quantitative uncertainty
Uncertainty
Uncertainty is a term used in subtly different ways in a number of fields, including physics, philosophy, statistics, economics, finance, insurance, psychology, sociology, engineering, and information science...

 modeling where numerical calculations must be performed. Probability bounds analysis is used to make arithmetic and logical calculations with p-boxes.

An example p-box is shown in the figure at right for an uncertain number x consisting of a left (upper) bound and a right (lower) bound on the probability distribution for x. The bounds are coincident for values of x below 0 and above 24. The bounds may have almost any shapes, including step functions, so long as they are monotonically increasing and do not cross each other. A p-box is used to express simultaneously incertitude (epistemic uncertainty), which is represented by the breadth between the left and right edges of the p-box, and variability (aleatory uncertainty), which is represented by the overall slant of the p-box.

Interpretation

There are dual interpretations of a p-box. It can be understood as bounds on the cumulative probability associated with any x-value. For instance, in the p-box depicted at right, the probability that the value will be 2.5 or less is between 4% and 36%. A p-box can also be understood as bounds on the x-value at any particular probability level. In the example, the 95th percentile is sure to be between 9 and 16.

If the left and right bounds of a p-box are sure to enclose the unknown distribution, the bounds are said to be rigorous, or absolute. The bounds may also be the tightest possible such bounds on the distribution function given the available information about it, in which case the bounds are therefore said to be best-possible. It may commonly be the case, however, that not every distribution that lies within these bounds is a possible distribution for the uncertain number, even when the bounds are rigorous and best-possible.

Mathematical definition

P-boxes are specified by left and right bounds on the cumulative probability distribution function
Cumulative distribution function
In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

 (or, equivalently, the survival function
Survival function
The survival function, also known as a survivor function or reliability function, is a property of any random variable that maps a set of events, usually associated with mortality or failure of some system, onto time. It captures the probability that the system will survive beyond a specified time...

) of a quantity and, optionally, additional information about the quantity’s mean
Mean
In statistics, mean has two related meanings:* the arithmetic mean .* the expected value of a random variable, which is also called the population mean....

, variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

 and distributional shape (family, unimodality, symmetry, etc.). A p-box represents a class of probability distributions consistent with these constraints.

Let denote the space of distribution functions on the real number
Real number
In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...

s , i.e., = {D | D : → [0,1], D(x) ≤ D(y) whenever x < y, for all x, y ∈ }, and let denote the set of real intervals
Interval (mathematics)
In mathematics, a interval is a set of real numbers with the property that any number that lies between two numbers in the set is also included in the set. For example, the set of all numbers satisfying is an interval which contains and , as well as all numbers between them...

, i.e., = {i | i = [i1, i2], i1i2, i1, i2 ∈ }. Then a p-box is a quintuple {, F, m, v, F}, where , F ∈ , while m, v ∈ , and F ⊆ . This quintuple denotes the set of distribution functions F ∈ matching the following constraints:
F (x) ≤ F(x) ≤ (x),
x dF(x)
Riemann-Stieltjes integral
In mathematics, the Riemann–Stieltjes integral is a generalization of the Riemann integral, named after Bernhard Riemann and Thomas Joannes Stieltjes.-Definition:...

 ∈  m,
x2dF(x)) – (∫ x dF(x))2 ∈ v, and
F ∈ F.


Thus, the constraints are that the distribution function F falls within prescribed bounds, the mean of the distribution (given by the Riemann-Stieltjes integral
Riemann-Stieltjes integral
In mathematics, the Riemann–Stieltjes integral is a generalization of the Riemann integral, named after Bernhard Riemann and Thomas Joannes Stieltjes.-Definition:...

) is in the interval m, the variance of the distribution is in the interval v, and the distribution is within some admissible class of distributions F.
The Riemann-Stieltjes integrals do not depend on the differentiability of F.

P-boxes serve the same role for random variables that upper and lower probabilities
Upper and lower probabilities
Upper and lower probabilities are representations of imprecise probability. Whereas probability theory uses a single number, the probability, to describe how likely an event is to occur, this method uses two numbers: the upper probability of the event and the lower probability of the event.Because...

 serve for events
Event (probability theory)
In probability theory, an event is a set of outcomes to which a probability is assigned. Typically, when the sample space is finite, any subset of the sample space is an event...

. In robust Bayes analysis
Robust Bayes analysis
Robust Bayes analysis, also called Bayesian sensitivity analysis, investigates the robustness of answers from a Bayesian analysis to uncertainty about the precise details of the analysis. An answer is robust if it does not depend sensitively on the assumptions and calculation inputs on which it is...

 a p-box is also known as a distribution band. A p-box can be constructed as a closed neighborhood of a distribution F ∈ under the Kolmogorov
Kolmogorov-Smirnov test
In statistics, the Kolmogorov–Smirnov test is a nonparametric test for the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution , or to compare two samples...

, Lévy
Lévy metric
In mathematics, the Lévy metric is a metric on the space of cumulative distribution functions of one-dimensional random variables. It is a special case of the Lévy–Prokhorov metric, and is named after the French mathematician Paul Pierre Lévy.-Definition:...

 or Wasserstein metric
Wasserstein metric
In mathematics, the Wasserstein metric is a distance function defined between probability distributions on a given metric space M....

. A p-box is a crude but computationally convenient kind of credal set
Credal set
A credal set is a set of probability distributions or, equivalently, a set of probability measures. A credal set is often assumed or constructed to be a closed convex set...

. Whereas a credal set is defined solely in terms of the constraint F as a convex set of distributions (which automatically determine , F, m, and v, but are often very difficult to compute with), a p-box usually has a loosely constraining specification of F, or even no constraint so that F = . Calculations with p-boxes, unlike credal sets, are often quite efficient, and algorithms for all standard mathematical functions are known.

A p-box is minimally specified by its left and right bounds, in which case the other constraints are understood to be vacuous as {, F, [– , ], [0, ], }. Even when these ancillary constraints are vacuous, there may still be nontrivial bounds on the mean and variance that can be inferred from the left and right edges of the p-box.

Where p-boxes come from

P-boxes may arise from a variety of kinds of incomplete information about a quantity, and there are several ways to obtain p-boxes from data and analytical judgment.

Distributional p-boxes

When a probability distribution is known to have a particular shape (e.g., normal, uniform, beta, Weibull, etc.) but its parameters can only be specified imprecisely as intervals, the result is called a distributional p-box, or sometimes a parametric p-box. Such a p-box is usually easy to obtain by enveloping extreme distributions given the possible parameters. For instance, if a quantity is known to be normal with mean somewhere in the interval [7,8] and standard deviation within the interval [1,2], the left and right edges of the p-box can be found by enveloping the distribution functions of four probability distributions, namely, normal(7,1), normal(8,1), normal(7,2), and normal(8,2), where normal(μ,σ) represents a normal distribution with mean μ and standard deviation σ. All probability distributions that are normal and have means and standard deviations inside these respective intervals will have distribution functions that fall entirely within this p-box. The left and right bounds enclose many non-normal distributions, but these would be excluded from the p-box by specifying normality as the distribution family.

Distribution-free p-boxes

Even if the parameters such as mean and variance of a distribution are known precisely, the distribution cannot be specified precisely if the distribution family is unknown. In such situations, envelopes of all distributions matching given moments can be constructed from inequalities such as those due to Markov
Markov's inequality
In probability theory, Markov's inequality gives an upper bound for the probability that a non-negative function of a random variable is greater than or equal to some positive constant...

, Chebyshev
Chebyshev's inequality
In probability theory, Chebyshev’s inequality guarantees that in any data sample or probability distribution,"nearly all" values are close to the mean — the precise statement being that no more than 1/k2 of the distribution’s values can be more than k standard deviations away from the mean...

, Cantelli, or Rowe that enclose all distribution functions having specified parameters. These define distribution-free p-boxes because they make no assumption whatever about the family or shape of the uncertain distribution. When qualitative information is available, such as that the distribution is unimodal, the p-boxes can often be tightened substantially.

P-boxes from imprecise measurements

When all members of a population can be measured, or when random sample data are abundant, analysts often use an empirical distribution
Empirical distribution function
In statistics, the empirical distribution function, or empirical cdf, is the cumulative distribution function associated with the empirical measure of the sample. This cdf is a step function that jumps up by 1/n at each of the n data points. The empirical distribution function estimates the true...

 to summarize the values. When those data have non-negligible measurement uncertainty
Measurement uncertainty
In metrology, measurement uncertainty is a non-negative parameter characterizing the dispersion of the values attributed to a measured quantity. The uncertainty has a probabilistic basis and reflects incomplete knowledge of the quantity. All measurements are subject to uncertainty and a measured...

 represented by interval ranges about each sample value, an empirical distribution may be generalized to a p-box. Such a p-box can be specified by cumulating the lower endpoints of all the interval measurements into a cumulative distribution forming the left edge of the p-box, and cumulating the upper endpoints to form the right edge. The broader the measurement uncertainty, the wider the resulting p-box.

Interval measurements can also be used to generalize distributional estimates based on the method of matching moments or maximum likelihood
Maximum likelihood
In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....

, that make shape assumptions such as normality or lognormality, etc. Although the measurement uncertainty can be treated rigorously, the resulting distributional p-box generally will not be rigorous when it is a sample estimate based on only a subsample of the possible values. But, because these calculations take account of the dependence between the parameters of the distribution, they will often yield tighter p-boxes than could be obtained by treating the interval estimates of the parameters as unrelated as is done for distributional p-boxes.

Confidence bands

There may be uncertainty about the shape of a probability distribution because the sample size of the empirical data characterizing it is small. Several methods in traditional statistics have been proposed to account for this sampling uncertainty about the distribution shape, including Kolmogorov-Smirnov and similar confidence band
Confidence band
A confidence band is used in statistical analysis to represent the uncertainty in an estimate of a curve or function based on limited or noisy data. Confidence bands are often used as part of the graphical presentation of results in a statistical analysis...

s, which are distribution-free in the sense that they make no assumption about the shape of the underlying distribution. There are related confidence-band methods that do make assumptions about the shape or family of the underlying distribution, which can often result in tighter confidence bands. Constructing confidence bands requires one to select the probability defining the confidence level, which usually must be less than 100% for the result to be non-vacuous. Confidence bands at the (1−α)% confidence level are defined such that, (1−α)% of the time they are constructed, they will completely enclose the distribution from which the data were randomly sampled. A confidence band about a distribution function is sometimes used as a p-box even though it represents statistical rather than rigorous or sure bounds. This use implicitly assumes that the true distribution, whatever it is, is inside the p-box.

An analogous Bayesian structure is called a Bayesian p-box, which encloses all distributions having parameters within a subset of parameter space corresponding to some specified probability level from a Bayesian analysis of the data. This subset is the credible region
Credible interval
In Bayesian statistics, a credible interval is an interval in the domain of a posterior probability distribution used for interval estimation. The generalisation to multivariate problems is the credible region...

 for the parameters given the data, which could be defined as the highest posterior probability density region, or the lowest posterior loss region, or in some other suitable way. To construct a Bayesian p-box one must select a prior distribution, in addition to specifying the credibility level (analogous to a confidence level).

P-boxes from calculation results

P-boxes can arise from computations involving probability distributions, or involving both a probability distribution and an interval, or involving other p-boxes. For example, the sum of a quantity represented by a probability distribution and a quantity represented by an interval will generally be characterized by a p-box. The sum of two random variables characterized by well-specified probability distributions is another precise probability distribution typically only when the copula
Copula (statistics)
In probability theory and statistics, a copula can be used to describe the dependence between random variables. Copulas derive their name from linguistics....

 (dependence function) between the two summands is completely specified. When their dependence is unknown or only partially specified, the sum will be more appropriately represented by a p-box because different dependence relations lead to many different distributions for the sum. Kolmogorov originally asked what bounds could be placed about the distribution of a sum when nothing is known about the dependence between the distributions of the addends. The question was only answered in the early 1980s
1980s
File:1980s decade montage.png|thumb|400px|From left, clockwise: The first Space Shuttle, Columbia, lifted off in 1981; American President Ronald Reagan and Soviet leader Mikhail Gorbachev eased tensions between the two superpowers, leading to the end of the Cold War; The Fall of the Berlin Wall in...

. Since that time, formulas and algorithms for sums have been generalized and extended to differences, products, quotients and other binary and unary functions under various dependence assumptions.

These methods, collectively called probability bounds analysis, provide algorithms to evaluate mathematical expressions when there is uncertainty about the input values, their dependencies, or even the form of mathematical expression itself. The calculations yield results that are guaranteed to enclose all possible distributions of the output variable if the input p-boxes were also sure to enclose their respective distributions. In some cases, a calculated p-box will also be best-possible in the sense that only possible distributions are within the p-box, but this is not always guaranteed.
For instance, the set of probability distributions that could result from adding random values without the independence assumption from two (precise) distributions is generally a proper subset
Subset
In mathematics, especially in set theory, a set A is a subset of a set B if A is "contained" inside B. A and B may coincide. The relationship of one set being a subset of another is called inclusion or sometimes containment...

 of all the distributions admitted by the computed p-box. That is, there are distributions within the output p-box that could not arise under any dependence between the two input distributions. The output p-box will, however, always contain all distributions that are possible, so long as the input p-boxes were sure to enclose their respective underlying distributions. This property often suffices for use in risk analysis
Risk analysis
Risk Analysis may refer to:*Quantitative risk analysis*Risk analysis **Probabilistic risk assessment, an engineering safety analysis*Risk analysis *Risk Management*Risk management tools* Certified Risk Analyst...

.

Special cases

Precise probability distributions and interval
Interval (mathematics)
In mathematics, a interval is a set of real numbers with the property that any number that lies between two numbers in the set is also included in the set. For example, the set of all numbers satisfying is an interval which contains and , as well as all numbers between them...

s are special cases of p-boxes, as are real values
Real number
In mathematics, a real number is a value that represents a quantity along a continuum, such as -5 , 4/3 , 8.6 , √2 and π...

 and integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...

s. Because a probability distribution expresses variability and lacks incertitude, the left and right bounds of its p-box are coincident for all x-values at the value of the cumulative distribution function (which is a non-decreasing function from zero to one). Mathematically, a probability distribution F is the degenerate p-box {F, F, E(F), V(F), F}, where E and V denote the expectation and variance operators. An interval expresses only incertitude. Its p-box looks like a rectangular box whose upper and lower bounds jump from zero to one at the endpoints of the interval. Mathematically, an interval [a, b] corresponds to the degenerate p-box {H(a), H(b), [a, b], [0, (ba)2/4], }, where H denotes the Heaviside step function
Heaviside step function
The Heaviside step function, or the unit step function, usually denoted by H , is a discontinuous function whose value is zero for negative argument and one for positive argument....

. A precise scalar number c lacks both kinds of uncertainty. Its p-box is just a step function from 0 to 1 at the value c; mathematically this is {H(c), H(c), c, 0, H(c)}.

Applications

P-boxes and probability bounds analysis have been used in many applications spanning many disciplines in engineering and environmental science, including
  • ODE
    Ordinary differential equation
    In mathematics, an ordinary differential equation is a relation that contains functions of only one independent variable, and one or more of their derivatives with respect to that variable....

     models of chemical reactor
    Chemical reactor
    In chemical engineering, chemical reactors are vessels designed to contain chemical reactions. The design of a chemical reactor deals with multiple aspects of chemical engineering. Chemical engineers design reactors to maximize net present value for the given reaction...

     dynamics
  • Engineering design
  • Analysis of species sensitivity distributions
  • Sensitivity analysis
    Sensitivity analysis
    Sensitivity analysis is the study of how the variation in the output of a statistical model can be attributed to different variations in the inputs of the model. Put another way, it is a technique for systematically changing variables in a model to determine the effects of such changes.In any...

     in aerospace engineering
    Aerospace engineering
    Aerospace engineering is the primary branch of engineering concerned with the design, construction and science of aircraft and spacecraft. It is divided into two major and overlapping branches: aeronautical engineering and astronautical engineering...

     of the buckling load of the frontskirt of the Ariane 5
    Ariane 5
    Ariane 5 is, as a part of Ariane rocket family, an expendable launch system used to deliver payloads into geostationary transfer orbit or low Earth orbit . Ariane 5 rockets are manufactured under the authority of the European Space Agency and the Centre National d'Etudes Spatiales...

     launcher
  • Pharmacokinetic
    Pharmacokinetics
    Pharmacokinetics, sometimes abbreviated as PK, is a branch of pharmacology dedicated to the determination of the fate of substances administered externally to a living organism...

     variability of inhaled VOC
    Volatile organic compound
    Volatile organic compounds are organic chemicals that have a high vapor pressure at ordinary, room-temperature conditions. Their high vapor pressure results from a low boiling point, which causes large numbers of molecules to evaporate or sublimate from the liquid or solid form of the compound and...

    s
  • Groundwater modeling
    Groundwater model
    Groundwater models are computer models of groundwater flow systems, and are used by hydrogeologists. Groundwater models are used to simulate and predict aquifer conditions.-Characteristics:...

  • Bounding failure
    Failure analysis
    Failure analysis is the process of collecting and analyzing data to determine the cause of a failure. It is an important discipline in many branches of manufacturing industry, such as the electronics industry, where it is a vital tool used in the development of new products and for the improvement...

     probability for series systems
  • Lead
    Lead
    Lead is a main-group element in the carbon group with the symbol Pb and atomic number 82. Lead is a soft, malleable poor metal. It is also counted as one of the heavy metals. Metallic lead has a bluish-white color after being freshly cut, but it soon tarnishes to a dull grayish color when exposed...

     contamination in soil
    Soil contamination
    Soil contamination or soil pollution is caused by the presence of xenobiotic chemicals or other alteration in the natural soil environment....

     at an ironworks
    Ironworks
    An ironworks or iron works is a building or site where iron is smelted and where heavy iron and/or steel products are made. The term is both singular and plural, i.e...

     brownfield
    Brownfield land
    Brownfield sites are abandoned or underused industrial and commercial facilities available for re-use. Expansion or redevelopment of such a facility may be complicated by real or perceived environmental contaminations. Cf. Waste...

  • Uncertainty propagation for salinity
    Salinity in Australia
    Soil salinity and dryland salinity are two problems degrading the environment of Australia. Salinity is a concern in most states, but especially in the south-west of Western Australia....

     risk models
  • Power supply system safety assessment
  • Contaminated land risk assessment
  • Engineered systems for drinking water treatment
    Water treatment
    Water treatment describes those processes used to make water more acceptable for a desired end-use. These can include use as drinking water, industrial processes, medical and many other uses. The goal of all water treatment process is to remove existing contaminants in the water, or reduce the...

  • Computing soil screening levels
  • Human health and ecological risk analysis by the United States Environmental Protection Agency
    United States Environmental Protection Agency
    The U.S. Environmental Protection Agency is an agency of the federal government of the United States charged with protecting human health and the environment, by writing and enforcing regulations based on laws passed by Congress...

     of PCB
    Polychlorinated biphenyl
    Polychlorinated biphenyls are a class of organic compounds with 2 to 10 chlorine atoms attached to biphenyl, which is a molecule composed of two benzene rings. The chemical formula for PCBs is C12H10-xClx...

     contamination at the Housatonic River
    Housatonic River
    The Housatonic River is a river, approximately long, in western Massachusetts and western Connecticut in the United States. It flows south to southeast, and drains about of southwestern New England into Long Island Sound...

     Superfund
    Superfund
    Superfund is the common name for the Comprehensive Environmental Response, Compensation, and Liability Act of 1980 , a United States federal law designed to clean up sites contaminated with hazardous substances...

     site
  • Environmental assessment
    Environmental impact assessment
    An environmental impact assessment is an assessment of the possible positive or negative impact that a proposed project may have on the environment, together consisting of the natural, social and economic aspects....

     for the Calcasieu Estuary
    Calcasieu River
    The Calcasieu River is a river on the Gulf Coast of southwestern Louisiana, U.S.A.. Approximately long, it drains a largely rural area of forests and bayou country, meandering southward to the Gulf of Mexico. The name "Calcasieu" comes from the Native American Atakapa language katkosh, for...

     Superfund site
  • Verification and validation
    Verification and Validation
    In software project management, software testing, and software engineering, verification and validation is the process of checking that a software system meets specifications and that it fulfills its intended purpose...

     in scientific computation for engineering problems
  • Toxicity to small mammals of environmental mercury
    Mercury (element)
    Mercury is a chemical element with the symbol Hg and atomic number 80. It is also known as quicksilver or hydrargyrum...

     contamination
  • Modeling travel time of pollution in groundwater
    Groundwater
    Groundwater is water located beneath the ground surface in soil pore spaces and in the fractures of rock formations. A unit of rock or an unconsolidated deposit is called an aquifer when it can yield a usable quantity of water. The depth at which soil pore spaces or fractures and voids in rock...

  • Endangered species
    Endangered species
    An endangered species is a population of organisms which is at risk of becoming extinct because it is either few in numbers, or threatened by changing environmental or predation parameters...

     assessment for reintroduction of Leadbeater’s Possum
  • Exposure of insectivorous
    Insectivore
    An insectivore is a type of carnivore with a diet that consists chiefly of insects and similar small creatures. An alternate term is entomophage, which also refers to the human practice of eating insects....

     birds to an agricultural pesticide
    Pesticide
    Pesticides are substances or mixture of substances intended for preventing, destroying, repelling or mitigating any pest.A pesticide may be a chemical unicycle, biological agent , antimicrobial, disinfectant or device used against any pest...

  • Climate change
    Climate change
    Climate change is a significant and lasting change in the statistical distribution of weather patterns over periods ranging from decades to millions of years. It may be a change in average weather conditions or the distribution of events around that average...

     projections
  • Waiting time in queuing systems
    Queueing theory
    Queueing theory is the mathematical study of waiting lines, or queues. The theory enables mathematical analysis of several related processes, including arriving at the queue, waiting in the queue , and being served at the front of the queue...

  • Extinction
    Extinction
    In biology and ecology, extinction is the end of an organism or of a group of organisms , normally a species. The moment of extinction is generally considered to be the death of the last individual of the species, although the capacity to breed and recover may have been lost before this point...

     risk analysis for spotted owl
    Northern Spotted Owl
    The Northern Spotted Owl, Strix occidentalis caurina, is one of three Spotted Owl subspecies. A Western North American bird in the family Strigidae, genus Strix, it is a medium-sized dark brown owl sixteen to nineteen inches in length and one to one and one sixth pounds. Females are larger than males...

     on the Olympic Peninsula
    Olympic Peninsula
    The Olympic Peninsula is the large arm of land in western Washington state of the USA, that lies across Puget Sound from Seattle. It is bounded on the west by the Pacific Ocean, the north by the Strait of Juan de Fuca, and the east by Puget Sound. Cape Alava, the westernmost point in the contiguous...

  • Biosecurity
    Biosecurity
    Biosecurity is a set of preventive measures designed to reduce the risk of transmission of infectious diseases, quarantined pests, invasive alien species, living modified organisms...

     against introduction of invasive species
    Invasive species
    "Invasive species", or invasive exotics, is a nomenclature term and categorization phrase used for flora and fauna, and for specific restoration-preservation processes in native habitats, with several definitions....

     or agricultural pests
  • Finite-element
    Finite element method
    The finite element method is a numerical technique for finding approximate solutions of partial differential equations as well as integral equations...

     structural analysis
    Structural analysis
    Structural analysis is the determination of the effects of loads on physical structures and their components. Structures subject to this type of analysis include all that must withstand loads, such as buildings, bridges, vehicles, machinery, furniture, attire, soil strata, prostheses and...


Criticisms

No internal structure. Because a p-box retains little information about any internal structure within the bounds, it does not elucidate which distributions within the p-box are most likely, nor whether the edges represent very unlikely or distinctly likely scenarios. This could complicate decisions in some cases if an edge of a p-box encloses a decision threshold.

Loses information. To achieve computational efficiency, p-boxes lose information compared to more complex Dempster-Shafer structures or credal set
Credal set
A credal set is a set of probability distributions or, equivalently, a set of probability measures. A credal set is often assumed or constructed to be a closed convex set...

s. In particular, p-boxes lose information about the mode (most probable value) of a quantity. This information could be useful to keep, especially in situations where the quantity is an unknown but fixed value.

Traditional probability sufficient. Some critics of p-boxes argue that precisely specified probability distributions are sufficient to characterize uncertainty of all kinds. For instance, Lindley
Dennis Lindley
Dennis Victor Lindley is a British statistician, decision theorist and leading advocate of Bayesian statistics.Dennis Lindley grew up in the south-west London suburb of Surbiton. He was an only child and his father was a local building contractor...

 has asserted, "Whatever way uncertainty is approached, probability is the only sound way to think about it." These critics argue that it is meaningless to talk about ‘uncertainty about probability’ and that traditional probability
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

 is a complete theory that is sufficient to characterize all forms of uncertainty. Under this criticism, users of p-boxes have simply not made the requisite effort to identify the appropriate precisely specified distribution functions.

Possibility theory can do better. Some critics contend that it makes sense in some cases to work with a possibility
Possibility theory
Possibility theory is a mathematical theory for dealing with certain types of uncertainty and is an alternative to probability theory. Professor Lotfi Zadeh first introduced possibility theory in 1978 as an extension of his theory of fuzzy sets and fuzzy logic. D. Dubois and H. Prade further...

 distribution rather than working separately with the left and right edges of p-boxes. They argue that the set of probability distributions induced by a possibility distribution is a subset of those enclosed by an analogous p-box's edges. Others make a counterargument that one cannot do better with a possibility distribution than with a p-box.

See also

  • uncertain number
  • interval
    Interval (mathematics)
    In mathematics, a interval is a set of real numbers with the property that any number that lies between two numbers in the set is also included in the set. For example, the set of all numbers satisfying is an interval which contains and , as well as all numbers between them...

  • cumulative probability distribution
    Cumulative distribution function
    In probability theory and statistics, the cumulative distribution function , or just distribution function, describes the probability that a real-valued random variable X with a given probability distribution will be found at a value less than or equal to x. Intuitively, it is the "area so far"...

  • upper and lower probabilities
    Upper and lower probabilities
    Upper and lower probabilities are representations of imprecise probability. Whereas probability theory uses a single number, the probability, to describe how likely an event is to occur, this method uses two numbers: the upper probability of the event and the lower probability of the event.Because...

  • credal set
    Credal set
    A credal set is a set of probability distributions or, equivalently, a set of probability measures. A credal set is often assumed or constructed to be a closed convex set...

  • risk analysis
    Risk analysis
    Risk Analysis may refer to:*Quantitative risk analysis*Risk analysis **Probabilistic risk assessment, an engineering safety analysis*Risk analysis *Risk Management*Risk management tools* Certified Risk Analyst...

  • uncertainty propagation
  • probability bounds analysis
  • Dempster–Shafer theory and the section on Dempster-Shafer structure
  • imprecise probability
    Imprecise probability
    Imprecise probability generalizes probability theory to allow for partial probability specifications, and is applicable when information is scarce, vague, or conflicting, in which case a unique probability distribution may be hard to identify...

  • simultaneous confidence bands on distribution and survival function
    Survival function
    The survival function, also known as a survivor function or reliability function, is a property of any random variable that maps a set of events, usually associated with mortality or failure of some system, onto time. It captures the probability that the system will survive beyond a specified time...

    s using likelihood ratios
  • pointwise binomial confidence intervals for F(X) for a given X
  • uncertainty propagation software

Additional references

Baudrit, C., and D. Dubois (2006). Practical representations of incomplete probabilistic knowledge. Computational Statistics & Data Analysis 51: 86-108.

Baudrit, C., D. Dubois, D. Guyonnet (2006). Joint propagation and exploitation of probabilistic and possibilistic information in risk assessment. IEEE Transactions on Fuzzy Systems 14: 593-608.

Bernardini, A., and F. Tonon (2009). Extreme probability distributions of random/fuzzy sets and p-boxes. International Journal of Reliability and Safety 3: 57-78. (alternative link)

Destercke, S., D. Dubois and E. Chojnacki (2008). Unifying practical uncertainty representations – I: Generalized p-boxes. International Journal of Approximate Reasoning 49: 649-663 .

Dubois, D. (2010). (Commentary) Representation, propagation, and decision issues in risk analysis under incomplete probabilistic information. Risk Analysis 30: 361-368. DOI: 10.1111/j.1539-6924.2010.01359.x.

Dubois, D., and D. Guyonnet (2011). Risk-informed decision-making in the presence of epistemic uncertainty. International Journal of General Systems 40: 145-167.

Guyonnet, D., F. Blanchard, C. Harpet, Y. Ménard, B. Côme and C. Baudrit (2005). Projet IREA—Traitement des incertitudes en évaluation des risques d'exposition. Rapport BRGM/RP-54099-FR, Bureau de Recherches Géologiques et Minières, France.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK