Conditional entropy - AbsoluteAstronomy.com

Information theory

Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

, the conditional entropy (or equivocation) quantifies the remaining entropy

Information entropy

In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

(i.e. uncertainty) of a random variable

Random variable

In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

Y

given that the value of another random variable

X

is known. It is referred to as the entropy of $Y$ conditional on $X$ , and is written

H(Y|X)

. Like other entropies, the conditional entropy is measured in bit

Bit

A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

s, nat

Nat (information)

A nat is a logarithmic unit of information or entropy, based on natural logarithms and powers of e, rather than the powers of 2 and base 2 logarithms which define the bit. The nat is the natural unit for information entropy...

s, or ban

Ban (information)

A ban, sometimes called a hartley or a dit , is a logarithmic unit which measures information or entropy, based on base 10 logarithms and powers of 10, rather than the powers of 2 and base 2 logarithms which define the bit. As a bit corresponds to a binary digit, so a ban is a decimal digit...

Definition

More precisely, if

H(Y|X=x)

is the entropy of the variable

Y

conditional on the variable

X

taking a certain value

x

, then

H(Y|X)

is the result of averaging

H(Y|X=x)

over all possible values

x

that

X

may take. Given discrete random variable

X

with support

Support (mathematics)

In mathematics, the support of a function is the set of points where the function is not zero, or the closure of that set . This concept is used very widely in mathematical analysis...

\mathcal X

and

Y

with support

\mathcal Y

, the conditional entropy of

Y

given

X

is defined as: NEWLINE

NEWLINE

\begin{align}

NEWLINE

NEWLINE H(Y|X)\ &\equiv \sum_{x\in\mathcal X}\,p(x)\,H(Y|X=x)\\ &{=}\sum_{x\in\mathcal X}p(x)\sum_{y\in\mathcal Y}\,p(y|x)\,\log\, \frac{1}{p(y|x)}\\ &=-\sum_{x\in\mathcal X}\sum_{y\in\mathcal Y}\,p(x,y)\,\log\,p(y|x)\\ &=-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(y|x)\\ &=-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x,y)} {p(x)} \\ &= \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x)} {p(x,y)}. \end{align} Note: The supports of X and Y can be replaced by their domains if it is understood that

0 \log 0

should be treated as being equal to zero.

Chain rule

From this definition and the definition of conditional probability, the chain rule for conditional entropy is

H(Y|X)\,=\,H(X,Y)-H(X) \, .

This is true because

\begin{align}
H(Y|X)=&\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x)} {p(x,y)}\\
 =&-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(x,y) + \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(x) \\

Intuition

Intuitively, the combined system contains

H(X,Y)

bits of information: we need

H(X,Y)

bits of information to reconstruct its exact state. If we learn the value of

X

, we have gained

H(X)

bits of information, and the system has

H(Y|X)

bits of uncertainty remaining.

H(Y|X)=0

if and only if the value of

Y

is completely determined by the value of

X

. Conversely,

H(Y|X) = H(Y)

if and only if

Y

and

X

are independent random variables.

Generalization to quantum theory

In quantum information theory, the conditional entropy is generalized to the conditional quantum entropy

Conditional quantum entropy

The conditional quantum entropy is an entropy measure used in quantum information theory. It is a generalization of the conditional entropy of classical information theory...

Other properties

For any

X

and

Y

H(X|Y) \le H(X)

H(X,Y) = H(X|Y) + H(Y|X) + I(X;Y)

, where

I(X;Y)

is the mutual information between

X

and

Y

I(X;Y) \le H(X)

, where

I(X;Y)

is the mutual information between

X

and

Y

. For independent

X

and

Y

H(Y|X) = H(Y)

and

H(X|Y) = H(X)

Although the specific-conditional entropy,

H(X|Y=y)

, can be either lesser or greater than

H(X|Y)

H(X|Y=y)

can never exceed

H(X)

when

X

is the uniform distribution.

Definition

Chain rule

Intuition

Generalization to quantum theory

Other properties

See also