Multiset
Encyclopedia
In mathematics
Mathematics
Mathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...

, the notion of multiset (or bag) is a generalization of the notion of set in which members are allowed to appear more than once. For example, there is a unique set that contains the elements a and b and no others, but there are many multisets with this property, such as the multiset that contains two copies of a and one of b or the multiset that contains three copies of both a and b. The term "multiset" was coined by Nicolaas Govert de Bruijn
Nicolaas Govert de Bruijn
Nicolaas Govert de Bruijn is a Dutch mathematician, affiliated as professor emeritus with the Eindhoven University of Technology. He received his Ph.D. in 1943 from Vrije Universiteit Amsterdam....

 in the 1970s.
The use of multisets in mathematics predates the name "multiset" by nearly 90 years: Richard Dedekind
Richard Dedekind
Julius Wilhelm Richard Dedekind was a German mathematician who did important work in abstract algebra , algebraic number theory and the foundations of the real numbers.-Life:...

 used multisets in a paper published in 1888.

Overview

The number of times an element belongs to the multiset is the multiplicity
Multiplicity (mathematics)
In mathematics, the multiplicity of a member of a multiset is the number of times it appears in the multiset. For example, the number of times a given polynomial equation has a root at a given point....

 of that member. The total number of elements in a multiset, including repeated memberships, is the cardinality of the multiset. For example, in the multiset {a, a, b, b, b, c} the multiplicities of the members a, b, and c are respectively 2, 3, and 1, and the cardinality of the multiset is 6. To distinguish between sets and multisets, a notation that incorporates brackets is sometimes used: the multiset {2,2,3} can be represented as [2,2,3]. In multisets, as in sets and in contrast to tuples, the order of elements is irrelevant: The multisets {a, b} and {b, a} are equal.

Formal definition

Within set theory
Set theory
Set theory is the branch of mathematics that studies sets, which are collections of objects. Although any type of object can be collected into a set, set theory is applied most often to objects that are relevant to mathematics...

, a multiset may be formally defined as a 2-tuple(A, m) where A is some set and m : A → N≥1 is a function
Function (mathematics)
In mathematics, a function associates one quantity, the argument of the function, also known as the input, with another quantity, the value of the function, also known as the output. A function assigns exactly one output to each input. The argument and the value may be real numbers, but they can...

 from A to the set N≥1 = {1, 2, 3, ...} of positive natural number
Natural number
In mathematics, the natural numbers are the ordinary whole numbers used for counting and ordering . These purposes are related to the linguistic notions of cardinal and ordinal numbers, respectively...

s. The set A is called the underlying set of elements. For each a in A the multiplicity (that is, number of occurrences) of a is the number m(a). If a universe U in which the elements of A must live is specified, the definition can be simplified to just a multiplicity function mU : U → N from U to the set N = {0, 1, 2, 3, ...} of natural numbers, obtained by extending m to U with values 0 outside A. This extended multiplicity function is the multiplicity function called 1A below. Like any function, the function m may be defined as its graph: the set of ordered pairs { (am(a)) : a in A }. With these definitions the multiset written as { aab } is defined as ({ ab },{ (a, 2), (b, 1) }), and the multiset { ab } is defined as ({ ab },{ (a, 1), (b, 1) }).

The concept of a multiset is a generalization
Generalization
A generalization of a concept is an extension of the concept to less-specific criteria. It is a foundational element of logic and human reasoning. Generalizations posit the existence of a domain or set of elements, as well as one or more common characteristics shared by those elements. As such, it...

 of the concept of a set. A multiset corresponds to an ordinary set if the multiplicity of every element is one (as opposed to some larger natural number). However to replace set theory by "multiset theory" so as to have multisets directly into the foundations is not easy: a privileged role would still have to be given to (ordinary) sets when defining maps, as there is no clear notion of maps (functions) between multisets. It can be done, with the result that classical theorems such as the Cantor–Bernstein–Schroeder theorem
Cantor–Bernstein–Schroeder theorem
In set theory, the Cantor–Bernstein–Schroeder theorem, named after Georg Cantor, Felix Bernstein, and Ernst Schröder, states that, if there exist injective functions and between the sets A and B, then there exists a bijective function...

 or Cantor's theorem
Cantor's theorem
In elementary set theory, Cantor's theorem states that, for any set A, the set of all subsets of A has a strictly greater cardinality than A itself...

, when generalized to multisets, are false; they remain true only in the case of finite multisets. In addition, the notion of a set as a "class of items satisfying a certain property" – i.e. the extension of a predicate
Extension (predicate logic)
The extension of a predicatea truth-valued functionis the set of tuples of values that, used as arguments, satisfy the predicate. Such a set of tuples is a relation.For example the statement "d2 is the weekday following d1"...

 – is used throughout mathematics, and this notion lacks a sensible generalization to multisets with multiple memberships.

An indexed family
Indexed family
In mathematics, an indexed family is a collection of values that are associated with indexes. For example, a family of real numbers, indexed by the integers is a collection of real numbers, where each integer is associated with one of the real numbers....

, ( ai ), where i is in some index-set, may define a multiset, sometimes written { ai }, in which the multiplicity of any element x is the number of indices i such that ai = x. The condition for this to be possible is that no element occurs infinitely many times in the family: even in an infinite multiset, the multiplicities must be finite numbers.

Multiplicity function

The set indicator function of a normal set is generalized to the multiplicity function for multisets. The set indicator function of a subset A of a set X is the function


defined by


The set indicator function of the intersection
Intersection (set theory)
In mathematics, the intersection of two sets A and B is the set that contains all elements of A that also belong to B , but no other elements....

 of sets is the minimum function of the indicator functions


The set indicator function of the union
Union (set theory)
In set theory, the union of a collection of sets is the set of all distinct elements in the collection. The union of a collection of sets S_1, S_2, S_3, \dots , S_n\,\! gives a set S_1 \cup S_2 \cup S_3 \cup \dots \cup S_n.- Definition :...

 of sets is the maximum function of the indicator functions


The set indicator function of a subset
Subset
In mathematics, especially in set theory, a set A is a subset of a set B if A is "contained" inside B. A and B may coincide. The relationship of one set being a subset of another is called inclusion or sometimes containment...

 is smaller than or equal to that of the superset


The set indicator function of a cartesian product
Cartesian product
In mathematics, a Cartesian product is a construction to build a new set out of a number of given sets. Each member of the Cartesian product corresponds to the selection of one element each in every one of those sets...

 is the product of the indicator functions of the cartesian factors


The cardinality of a (finite) set is the sum of the indicator function values


Now generalize the concept of set indicator function by releasing the constraint that the values are 0 and 1 only and allow the values 2, 3, 4 and so on. The resulting function is called a multiplicity function and such a function defines a multiset. The concepts of intersection, union, subset, cartesian product and cardinality of multisets are defined by the above formulas.

The multiplicity function of a multiset sum, is the sum of the multiplicity functions


The multiplicity function of a multiset difference is the zero-truncated subtraction of the multiplicity functions


The scalar multiplication
Scalar multiplication
In mathematics, scalar multiplication is one of the basic operations defining a vector space in linear algebra . In an intuitive geometrical context, scalar multiplication of a real Euclidean vector by a positive real number multiplies the magnitude of the vector without changing its direction...

 of a multiset by a natural number n may be defined as:


A small finite multiset, A, is represented by a list where each element, x, occurs as many times as the multiplicity, 1A(x), indicates.

Examples

One of the simplest and most natural examples is the multiset of prime
Prime number
A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. A natural number greater than 1 that is not a prime number is called a composite number. For example 5 is prime, as only 1 and 5 divide it, whereas 6 is composite, since it has the divisors 2...

 factors of a number n. Here the underlying set of elements is the set of prime divisor
Divisor
In mathematics, a divisor of an integer n, also called a factor of n, is an integer which divides n without leaving a remainder.-Explanation:...

s of n. For example the number 120
120 (number)
120 is the natural number following 119 and preceding 121. 120 was known as "the great hundred", especially prior to the year 1700, from the Teutonic Hundert which equalled 120. The number 100, now known commonly as "one hundred" was then known as "the small hundred". It is also known as...

 has the prime factorization


which gives the multiset {2, 2, 2, 3, 5}.

A related example is the multiset of solutions of an algebraic equation. A quadratic equation
Quadratic equation
In mathematics, a quadratic equation is a univariate polynomial equation of the second degree. A general quadratic equation can be written in the formax^2+bx+c=0,\,...

, for example, has two solutions. However, in some cases they are both the same number. Thus the multiset of solutions of the equation could be { 3, 5 }, or it could be { 4, 4 }. In the latter case it has a solution of multiplicity 2.

Free commutative monoids

The free commutative monoid
Monoid
In abstract algebra, a branch of mathematics, a monoid is an algebraic structure with a single associative binary operation and an identity element. Monoids are studied in semigroup theory as they are naturally semigroups with identity. Monoids occur in several branches of mathematics; for...

 on a set X (see free object
Free object
In mathematics, the idea of a free object is one of the basic concepts of abstract algebra. It is a part of universal algebra, in the sense that it relates to all types of algebraic structure . It also has a formulation in terms of category theory, although this is in yet more abstract terms....

) can be taken to be the set of finite multisets with elements drawn from X, with the monoid operation being multiset sum and the empty multiset as identity element. Such monoids are also known as (finite) formal sums of elements of X with natural coefficents. The free commutative semigroup is the subset of the free commutative monoid which contains all multisets with elements drawn from X except the empty multiset.

Free abelian group
Free abelian group
In abstract algebra, a free abelian group is an abelian group that has a "basis" in the sense that every element of the group can be written in one and only one way as a finite linear combination of elements of the basis, with integer coefficients. Hence, free abelian groups over a basis B are...

s are formal sums (i.e. linear combinations) of elements of X with integer coefficients. Equivalently, they may be seen as signed finite multisets with elements drawn from X.

Counting multisets

The number of multisets of cardinality k, with elements taken from a finite set of cardinality n, is called the multiset coefficient or multiset number. This number is written by some authors as , a notation that is meant to resemble that of binomial coefficient
Binomial coefficient
In mathematics, binomial coefficients are a family of positive integers that occur as coefficients in the binomial theorem. They are indexed by two nonnegative integers; the binomial coefficient indexed by n and k is usually written \tbinom nk , and it is the coefficient of the x k term in...

s; it is used for instance in (Stanley, 1997), and could be pronounced "n multichoose k" to resemble "n choose k" for . Unlike for binomial coefficients, there is no "multiset theorem" in which multiset coefficients would occur, and they should not be confused with the unrelated multinomial coefficients that occur in the multinomial theorem
Multinomial theorem
In mathematics, the multinomial theorem says how to expand a power of a sum in terms of powers of the terms in that sum. It is the generalization of the binomial theorem to polynomials.-Theorem:...

.
The value of multiset coefficients can be given explicitly as


where the last expression expresses it as binomial coefficient; many authors in fact avoid separate notation and just write binomial coefficients. So, the number of such multisets is the same as the number of subsets of cardinality k in a set of cardinality n + k − 1. The analogy with binomial coefficients can be stressed by writing the numerator in the above expression as a rising factorial power
to match the expression of binomial coefficients using a falling factorial power:

There are for example 4 multisets of cardinality 3 with elements taken from the set {1,2} of cardinality 2 (n=2, k=3), namely : {1,1,1}, {1,1,2}, {1,2,2}, {2,2,2}. And there are also 4 subsets of cardinality 3 in the set {1,2,3,4} of cardinality 4 (n+k-1 = 4), namely : {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}.

One simple way to prove the equality of multiset coefficients and binomial coefficients given above, involves representing multisets in the following way. First, consider the notation for multisets that would represent { a, a, a, a, a, a, b, b, c, c, c, d, d, d, d, d, d, d } (6 as, 2 bs, 3 cs, 7 ds) in this form:


This is a multiset of cardinality 18 made of elements of a set of cardinality 4. The number of characters including both dots and vertical lines used in this notation is 18 + 4 − 1. The number of vertical lines is 4 − 1. The number of multisets of cardinality 18 is then the number of ways to arrange the 4 − 1 vertical lines among the 18 + 4 − 1 characters, and is thus the number of subsets of cardinality 4 − 1 in a set of cardinality 18 + 4 − 1. Equivalently, it is the number of ways to arrange the 18 dots among the 18 + 4 − 1 characters, which is the number of subsets of cardinality 18 of a set of cardinality 18 + 4 − 1. This is


so that is the value of the multiset coefficient


One may define a generalized binomial coefficient


in which n is not required to be a nonnegative integer, but may be negative or a non-integer, or a non-real complex number
Complex number
A complex number is a number consisting of a real part and an imaginary part. Complex numbers extend the idea of the one-dimensional number line to the two-dimensional complex plane by using the number line for the real part and adding a vertical axis to plot the imaginary part...

. (If k = 0, then the value of this coefficient is 1 because it is the empty product
Empty product
In mathematics, an empty product, or nullary product, is the result of multiplying no factors. It is equal to the multiplicative identity 1, given that it exists for the multiplication operation in question, just as the empty sum—the result of adding no numbers—is zero, or the additive...

.) Then the number of multisets of cardinality k in a set of cardinality n is


This fact led Gian-Carlo Rota
Gian-Carlo Rota
Gian-Carlo Rota was an Italian-born American mathematician and philosopher.-Life:Rota was born in Vigevano, Italy...

 to ask "Why are negative sets multisets?". He considered that question worthy of the attention of philosophers of mathematics
Philosophy of mathematics
The philosophy of mathematics is the branch of philosophy that studies the philosophical assumptions, foundations, and implications of mathematics. The aim of the philosophy of mathematics is to provide an account of the nature and methodology of mathematics and to understand the place of...

.

Recurrence relation

A recurrence relation
Recurrence relation
In mathematics, a recurrence relation is an equation that recursively defines a sequence, once one or more initial terms are given: each further term of the sequence is defined as a function of the preceding terms....

 for multiset coefficients may be given as


with


The above recurrence may be interpreted as follows.
Let [n] := {1, ..., n} be the source set. There is always exactly one (empty) multiset of size 0, and if n = 0 there are no larger multisets, which gives the initial conditions.

Now, consider the case in which n,k > 0. A multiset of cardinality k with elements from [n] might or might not contain any instance of the final element n. If it does appear, then by removing n once, one is left with a multiset of cardinality k − 1 of elements from [n], and every such multiset can arise, which gives a total of
possibilities.

If n does not appear, then our original multiset is equal to a multiset of cardinality k with elements from [n − 1], of which there are


Thus,

Polynomial notation

The set {x} may be represented by the monomial
Monomial
In mathematics, in the context of polynomials, the word monomial can have one of two different meanings:*The first is a product of powers of variables, or formally any value obtained by finitely many multiplications of a variable. If only a single variable x is considered, this means that any...

 x. The set of subsets, { {}, {x} } , is represented by the binomial
Binomial
In algebra, a binomial is a polynomial with two terms —the sum of two monomials—often bound by parenthesis or brackets when operated upon...

 1 + x.

The set {x,y} may be represented by the monomial x·y. The set of subsets, { {}, {x}, {y}, {x,y} } , is represented by the polynomial
Polynomial
In mathematics, a polynomial is an expression of finite length constructed from variables and constants, using only the operations of addition, subtraction, multiplication, and non-negative integer exponents...

·(1 + y) = 1 + x + y + x·y.

The multiset {x,x} may be represented by the monomial x·x = x2. The multiset of submultisets, { {}, {x}, {x}, {x,x} }, is represented by the polynomial2 = 1 + x + x + x·x = 1 + 2·x + x2.

The multiset of submultisets of the multiset xn is


That is why the binomial coefficient
Binomial coefficient
In mathematics, binomial coefficients are a family of positive integers that occur as coefficients in the binomial theorem. They are indexed by two nonnegative integers; the binomial coefficient indexed by n and k is usually written \tbinom nk , and it is the coefficient of the x k term in...

 counts the number of k-combination
Combination
In mathematics a combination is a way of selecting several things out of a larger group, where order does not matter. In smaller cases it is possible to count the number of combinations...

s of an n-set.

The multiset xK·yN−K , containing N elements, K of which are hits, is called a statistical population
Statistical population
A statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. For example, if we were interested in generalizations about crows, then we would describe the set of crows that is of interest...

, and a submultiset is called a statistical sample. The set of samples is
K·(1 + y)N−K

which by the binomial theorem
Binomial theorem
In elementary algebra, the binomial theorem describes the algebraic expansion of powers of a binomial. According to the theorem, it is possible to expand the power n into a sum involving terms of the form axbyc, where the exponents b and c are nonnegative integers with , and the coefficient a of...

 equals


So the number of n-samples with k hits is


See hypergeometric distribution and inferential statistics for further on the distribution of hits.

The infinite set of finite multisets of elements taken from the set {x}:
{ {}, {x}, {x,x}, {x,x,x}, ... }


may be represented by the formal power series
Formal power series
In mathematics, formal power series are a generalization of polynomials as formal objects, where the number of terms is allowed to be infinite; this implies giving up the possibility to substitute arbitrary values for indeterminates...


S = 1 + x + x2 + x3 + ... = 1 + xS .


The formal solution,
S = (1 − x)−1,


makes sense as a set of multisets, but the intermediate result, 1−x, does not make sense as a set of multisets.

The infinite set of finite multisets of elements taken from the set x·y is
−1·(1 − y)−1 = 1 + (x + y) + (x2 + x·y + y2) + ...

The special case y=x : The infinite multiset of finite multisets of elements taken from the multiset x2 is
−2 =  1 + 2·x + 3·x2 + ...
The general case:
The infinite multiset of finite multisets of elements taken from the multiset xn is
 .

This explains why "multisets are negative sets". The negative binomial coefficient
Binomial coefficient
In mathematics, binomial coefficients are a family of positive integers that occur as coefficients in the binomial theorem. They are indexed by two nonnegative integers; the binomial coefficient indexed by n and k is usually written \tbinom nk , and it is the coefficient of the x k term in...

s count the number of k-multisets of elements taken from an n-set.

Cumulant generating function

A non-negative integer, n, can be represented by the monomial xn . A finite multiset of non-negative integers, say {2, 2, 2, 3, 5}, can likewise be represented by a polynomial f(x), say f(x) = 3·x2 + x3 + x5 .

It is convenient to consider the cumulant generating function g(t) = log(f(et)), say g(t) = log(3·et + et
 + et) .
  • The cardinality of the multiset is eg(0) = f(1), say 3 + 1 + 1 = 5.

  • The derivative
    Derivative
    In calculus, a branch of mathematics, the derivative is a measure of how a function changes as its input changes. Loosely speaking, a derivative can be thought of as how much one quantity is changing in response to changes in some other quantity; for example, the derivative of the position of a...

     g is g '(t) = f(et)−1·f '(et)·et, say g '(t) = (3·et + et + et)−1·(6·et + 3·et + 5·et) .
    • The mean value of the multiset is μ = g '(0) = f(1)−1·f '(1), say μ = (3+1+1)−1·(6+3+5) = 2.8 .
    • The variance
      Variance
      In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

       of the multiset is σ2 = g ' '(0) .


The numbers ( μ, σ2, ··· )
 = ( g '(0), g ' '(0), ··· )
are called cumulant
Cumulant
In probability theory and statistics, the cumulants κn of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. The moments determine the cumulants in the sense that any two probability distributions whose moments are identical will have...

s.

The infinite set of non-negative integers {0, 1, 2, ···} is represented by the formal power series
Formal power series
In mathematics, formal power series are a generalization of polynomials as formal objects, where the number of terms is allowed to be infinite; this implies giving up the possibility to substitute arbitrary values for indeterminates...

 1 + x + x2 + ··· = (1 − x)−1. The mean value and standard deviation are undefined. Nevertheless it has a cumulant-generating function, g(t) = −log(1−et). The derivative of this cumulant-generating function is g '(t) = (et−1)−1.

A finite multiset of real numbers , A = { Ai }, is represented by the cumulant generating function


This representation is unique: different multisets have different cumulant generating functions. See partition function (statistical mechanics)
Partition function (statistical mechanics)
Partition functions describe the statistical properties of a system in thermodynamic equilibrium. It is a function of temperature and other parameters, such as the volume enclosing a gas...

 for the case where the numbers in question are the energy levels of a physical system.

The cumulant-generating function of a multiset of n real numbers having mean μ and standard deviation σ is:
g(t) = log(n) + μ·t 
+ 2−1·(σ·t)2 + ··· ,
and the derivative is simply: g '(t) = μ + σ2·t + ···

The cumulant-generating function of set, {k}, consisting of a single real number, k, is g(t) = k·t , and the derivative is the number itself: g '(t) = k . So the concept of the derivative of the cumulant generating function of a multiset of real numbers is a generalization
Generalization
A generalization of a concept is an extension of the concept to less-specific criteria. It is a foundational element of logic and human reasoning. Generalizations posit the existence of a domain or set of elements, as well as one or more common characteristics shared by those elements. As such, it...

 of the concept of a real number.

The cumulant-generating function of a constant multiset, {k, k, k, k, ··· , k} of n elements all equal to the same real number k, is g(t) = log(n)+k·t , and the derivative is the number itself: g '(t) = k , irrespective of n.

The cumulant-generating function of the multiset of sums of elements of two multisets of numbers is the sum of the two cumulant-generating functions:


There is unfortunately no general formula for computing the cumulant generating function of a product


but the special case of a constant times a multiset of numbers is:


The multiset 2·A = {2·Ai} is not the same multiset as 2×A =A+A = {Ai+Aj}. For example, 2·{+1,−1} = {+2,−2} while 2×{+1,−1} = {+1,−1} + {+1,−1} = {+1+1,+1−1,−1+1,−1−1} = {+2,0,0,−2}.


The standard normal distribution is like a limit of big multisets of numbers.


This limit does not make sense as a multiset of numbers, but the derivative of the cumulant generating functions of the multisets in question makes sense, and the limit is well defined.


The constant term k2·log(2) vanishes by differentiation. The terms ··· vanish in the limit. So for the standard normal distribution, having mean 0 and standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...

 1, the derivative of the cumulant generating function is simply g '(t) = t . For the normal distribution having mean μ and standard deviation σ, the derivative of the cumulant generating function is
g '(t) = μ + σ2·t .

See also random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK