Conditional entropy
Encyclopedia
In information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

, the conditional entropy (or equivocation) quantifies the remaining entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

 (i.e. uncertainty) of a random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

 Y given that the value of another random variable X is known. It is referred to as the entropy of Y conditional on X, and is written H(Y|X). Like other entropies, the conditional entropy is measured in bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

s, nat
Nat (information)
A nat is a logarithmic unit of information or entropy, based on natural logarithms and powers of e, rather than the powers of 2 and base 2 logarithms which define the bit. The nat is the natural unit for information entropy...

s, or ban
Ban (information)
A ban, sometimes called a hartley or a dit , is a logarithmic unit which measures information or entropy, based on base 10 logarithms and powers of 10, rather than the powers of 2 and base 2 logarithms which define the bit. As a bit corresponds to a binary digit, so a ban is a decimal digit...

s.

Definition

More precisely, if H(Y|X=x) is the entropy of the variable Y conditional on the variable X taking a certain value x, then H(Y|X) is the result of averaging H(Y|X=x) over all possible values x that X may take. Given discrete random variable X with support
Support (mathematics)
In mathematics, the support of a function is the set of points where the function is not zero, or the closure of that set . This concept is used very widely in mathematical analysis...

 \mathcal X and Y with support \mathcal Y, the conditional entropy of Y given X is defined as: NEWLINE
NEWLINE
NEWLINE
NEWLINE
\begin{align}
NEWLINE
NEWLINE H(Y|X)\ &\equiv \sum_{x\in\mathcal X}\,p(x)\,H(Y|X=x)\\ &{=}\sum_{x\in\mathcal X}p(x)\sum_{y\in\mathcal Y}\,p(y|x)\,\log\, \frac{1}{p(y|x)}\\ &=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}\,p(x,y)\,\log\,p(y|x)\\ &=-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(y|x)\\ &=-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x,y)} {p(x)} \\ &= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x)} {p(x,y)}. \end{align} Note: The supports of X and Y can be replaced by their domains if it is understood that 0 \log 0 should be treated as being equal to zero.

Chain rule

From this definition and the definition of conditional probability, the chain rule for conditional entropy is H(Y|X)\,=\,H(X,Y)-H(X) \, . This is true because \begin{align} H(Y|X)=&\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x)} {p(x,y)}\\ =&-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(x,y) + \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(x) \\

Intuition

Intuitively, the combined system contains H(X,Y) bits of information: we need H(X,Y) bits of information to reconstruct its exact state. If we learn the value of X, we have gained H(X) bits of information, and the system has H(Y|X) bits of uncertainty remaining. H(Y|X)=0 if and only if the value of Y is completely determined by the value of X. Conversely, H(Y|X) = H(Y) if and only if Y and X are independent random variables.

Generalization to quantum theory

In quantum information theory, the conditional entropy is generalized to the conditional quantum entropy
Conditional quantum entropy
The conditional quantum entropy is an entropy measure used in quantum information theory. It is a generalization of the conditional entropy of classical information theory...

.

Other properties

For any X and Y: H(X|Y) \le H(X) H(X,Y) = H(X|Y) + H(Y|X) + I(X;Y), where I(X;Y) is the mutual information between X and Y. I(X;Y) \le H(X), where I(X;Y) is the mutual information between X and Y. For independent X and Y: H(Y|X) = H(Y) and H(X|Y) = H(X) Although the specific-conditional entropy, H(X|Y=y), can be either lesser or greater than H(X|Y), H(X|Y=y) can never exceed H(X) when X is the uniform distribution.

See also

NEWLINE
    NEWLINE
  • Entropy (information theory)
  • NEWLINE
  • Mutual information
    Mutual information
    In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables...

  • NEWLINE
  • Conditional quantum entropy
    Conditional quantum entropy
    The conditional quantum entropy is an entropy measure used in quantum information theory. It is a generalization of the conditional entropy of classical information theory...

  • NEWLINE
  • Variation of information
NEWLINE NEWLINE
    NEWLINE
  • Likelihood function
    Likelihood function
    In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...

NEWLINE
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK