All Topics  
Differential entropy

 

   Email Print
   Bookmark   Link






 

Differential entropy



 
 
Differential entropy (also referred to as continuous entropy) is a concept in information theory
Information theory

Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E....
 which tries to extend the idea of (Shannon) entropy
Information entropy

In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the self-information contained in a message, usually in units such as bits....
, a measure of average surprisal of a random variable
Random variable

In mathematics, random variables are used in the study of Randomness and probability. They were developed to assist in the analysis of Game of chance, stochastic events, and the results of experiment by capturing only the mathematical properties necessary to answer probability questions....
, to continuous probability distribution
Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval ....
s.

X be a random variable with a probability density function
Probability density function

In mathematics, a probability density function is a function that represents a probability distribution in terms of integrals.Formally, a probability distribution has density ƒ, if ƒ is a non-negative Lebesgue integration function such that the probability of the interval [ab] is given by...
 f whose support
Support (mathematics)

In mathematics, the support of a function is the set of points where the function is not zero, or the closure of that set. This concept is used very widely in mathematical analysis....
 is a set . The differential entropy or is defined as As with its discrete analog, the units of differential entropy depend on the base of the logarithm
Logarithm

In mathematics, the logarithm of a number to a given base is the Power or exponent to which the base must be raised in order to produce the number....
, which is usually 2 (i.e., the units are bit
Bit

A bit is a binary numeral system numerical digit, taking a value of either 0 or 1. Binary digits are a basic unit of information Computer data storage and transmission in digital computing and digital information theory....
s).






Discussion
Ask a question about 'Differential entropy'
Start a new discussion about 'Differential entropy'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Differential entropy (also referred to as continuous entropy) is a concept in information theory
Information theory

Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E....
 which tries to extend the idea of (Shannon) entropy
Information entropy

In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the self-information contained in a message, usually in units such as bits....
, a measure of average surprisal of a random variable
Random variable

In mathematics, random variables are used in the study of Randomness and probability. They were developed to assist in the analysis of Game of chance, stochastic events, and the results of experiment by capturing only the mathematical properties necessary to answer probability questions....
, to continuous probability distribution
Probability distribution

In probability theory and statistics, a probability distribution identifies either the probability of each value of an unidentified random variable , or the probability of the value falling within a particular interval ....
s.

Definition

Let X be a random variable with a probability density function
Probability density function

In mathematics, a probability density function is a function that represents a probability distribution in terms of integrals.Formally, a probability distribution has density ƒ, if ƒ is a non-negative Lebesgue integration function such that the probability of the interval [ab] is given by...
 f whose support
Support (mathematics)

In mathematics, the support of a function is the set of points where the function is not zero, or the closure of that set. This concept is used very widely in mathematical analysis....
 is a set . The differential entropy or is defined as As with its discrete analog, the units of differential entropy depend on the base of the logarithm
Logarithm

In mathematics, the logarithm of a number to a given base is the Power or exponent to which the base must be raised in order to produce the number....
, which is usually 2 (i.e., the units are bit
Bit

A bit is a binary numeral system numerical digit, taking a value of either 0 or 1. Binary digits are a basic unit of information Computer data storage and transmission in digital computing and digital information theory....
s). See logarithmic units for logarithms taken in different bases. Related concepts such as joint
Joint entropy

The joint entropy is an information entropy used in information theory. The joint entropy measures how much entropy is contained in a joint system of two random variables....
, conditional
Conditional entropy

In information theory, the conditional entropy quantifies the remaining information entropy of a random variable given that the value of a second random variable is known....
 differential entropy, and relative entropy are defined in a similar fashion. One must take care in trying to apply properties of discrete entropy to differential entropy, since probability density functions can be greater than 1. For example, Uniform
Uniform distribution (continuous)

In probability theory and statistics, the continuous uniform distribution is a family of probability distributions such that for each member of the family, all interval s of the same length on the distribution's support are equally probable....
(0,1/2) has differential entropy .

The definition of differential entropy above can be obtained by partitioning the range of X into bins of length with associated sample points within the bins, for X Riemann integrable. This gives a quantized
Quantization (signal processing)

In digital signal processing, quantization is the process of approximating a continuous range of values by a relatively small set of discrete symbols or integer values....
 version of X, defined by if . Then the entropy of is . The first term approximates the differential entropy, while the second term is approximately . Note that this procedure suggests that the differential entropy of a discrete random variable should be .

Note that the continuous mutual information
Mutual information

In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two variables....
  has the distinction of retaining its fundamental significance as a measure of discrete information since it is actually the limit of the discrete mutual information of partitions of X and Y as these partitions become finer and finer. Thus it is invariant under linear transformations of X and Y, and still represents the amount of discrete information that can be transmitted over a channel that admits a continuous space of values.

Properties of differential entropy

  • For two densities f and g, with equality if almost everywhere
    Almost everywhere

    In measure theory , one says that a property holds almost everywhere if the set of elements for which the property does not hold is a null set, i.e....
    . Similarly, for two random variables X and Y, and with equality if and only if
    If and only if

    If and only if, in logic and fields that rely on it such as mathematics and philosophy, is a biconditional logical connective between statements....
     X and Y are independent
    Statistical independence

    In probability theory, to say that two event s are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs....
    .
  • The chain rule for differential entropy holds as in the discrete case
.
  • Differential entropy is translation invariant, ie, for a constant c.
  • Differential entropy is in general not invariant under arbitrary invertible maps. In particular, for a constant a, . For a vector valued random variable X and a matrix A, .
  • In general, for a transformation from a random vector X to a random vector with same dimension Y , the corresponding entropies are related via where is the Jacobian
    Jacobian

    In vector calculus, the Jacobian is shorthand for either the Jacobian matrix or its determinant, the Jacobian determinant.In algebraic geometry the Jacobian of a algebraic curve means the Jacobian variety: a group variety associated to the curve, in which the curve can be embedded....
     of the transformation m.
  • If a random vector has mean zero and covariance
    Covariance

    In probability theory and statistics, covariance is a measure of how much two variables change together .If two variables tend to vary together , then the covariance between the two variables will be positive....
     matrix K, with equality if and only if X is jointly gaussian.


Example: Exponential distribution

Let X be an exponentially distributed
Exponential distribution

In probability theory and statistics, the exponential distributions are a class of continuous probability distributions. They describe the times between events in a Poisson process, i.e....
 random variable with parameter , that is, with probability density function

Its differential entropy is then
 
 
 


Here, was used rather than to make it explicit that the logarithm was taken to base e, to simplify the calculation.

Differential entropies for various distributions

In the table below, (the gamma function
Gamma function

In mathematics, the Gamma function is an extension of the factorial function to real number and complex number numbers. For a complex number z with positive real part the Gamma function is defined by...
), , , and is Euler's constant
Euler-Mascheroni constant

The Euler?Mascheroni constant is a mathematical constant recurring in mathematical analysis and number theory, usually denoted by the lowercase Greek letter ....
.
Table of differential entropies.
Distribution Name Probability density function (pdf) Entropy in nats
Uniform
Uniform distribution (continuous)

In probability theory and statistics, the continuous uniform distribution is a family of probability distributions such that for each member of the family, all interval s of the same length on the distribution's support are equally probable....
 
for
Normal
Normal distribution

The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields....
 
Exponential
Exponential distribution

In probability theory and statistics, the exponential distributions are a class of continuous probability distributions. They describe the times between events in a Poisson process, i.e....
 
Rayleigh
Rayleigh distribution

In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution. It can arise when a two-dimensional vector has elements that are normal distribution, are uncorrelated, and have equal variance....
 
Beta
Beta distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] parameterized by two positive shape parameters, typically denoted by α and β....
 
for
Cauchy
Cauchy distribution

The Cauchy?Lorentz distribution, named after Augustin Cauchy and Hendrik Lorentz,  is a continuous probability distribution. As a probability distribution, it is known as the Cauchy distribution, while among physicists, it is known as a Lorentz distribution, or a Lorentz function or the Breit?Wigner dis...
 
Chi
Chi distribution

In probability theory and statistics, the chi distribution is a continuous probability distribution. The distribution usually arises when a k-dimensional vector's orthogonal components are independent and each follow a standard normal distribution distribution....
 
Chi-squared
Chi-square distribution

In probability theory and statistics, the chi-square distribution is one of the most widely used theoretical probability distributions in inferential statistics, e.g., in statistical significance tests....
 
 
Erlang
Erlang distribution

The Erlang distribution is a continuous probability distribution with wide applicability primarily due to its relation to the exponential distribution and Gamma distribution distributions....
 
F
Gamma
Gamma distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. It has a scale parameter θ and a shape parameter k....
 
Laplace
Laplace distribution

In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. It is also known as the double exponential distribution, because it can be thought of as two exponential distributions spliced together back-to-back....
 
Logistic
Logistic distribution

In probability theory and statistics, the logistic distribution is a continuous probability distribution.Its cumulative distribution function is the logistic function, which appears in logistic regression and feedforward neural networks....
 
Lognormal
Log-normal distribution

In probability and statistics, the log-normal distribution is the single-tailed probability distribution of any random variable whose logarithm is normal distribution....
 
Maxwell-Boltzmann
Generalized normal
Pareto
Pareto distribution

The Pareto distribution, named after the Italian economist Vilfredo Pareto, is a power law probability distribution that coincides with social sciences, scientific, geophysical, actuarial science, and many other types of observable phenomena....
 
Student's t
Student's t-distribution

In probability and statistics, Student's t-distribution is a probability distribution that arises in the problem of estimating the expected value of a normal distribution Statistical population when the sample size is small....
 
Triangular
Triangular distribution

In probability theory and statistics, the triangular distribution is a continuous probability distribution with lower limit a, mode c and upper limit b....
 
Weibull
Weibull distribution

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is often called the Rosin?Rammler distribution when used to describe the size distribution of Granular material....
 
Multivariate normal
Multivariate normal distribution

In probability theory and statistics, a multivariate normal distribution, sometimes also called a multivariate Gaussian distribution, is a generalization of the one-dimensional normal distribution to higher dimensions....
 


See also

  • Information entropy
    Information entropy

    In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the self-information contained in a message, usually in units such as bits....
  • Information theory
    Information theory

    Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E....
  • Self-information
    Self-information

    In information theory , self-information is a measure of the information content associated with the outcome of a random variable. It is expressed in a Units of measurement of information, for example bits,...
  • Kullback-Leibler divergence
  • Entropy estimation
    Entropy estimation

    Estimating the differential entropy of a system or process, given some observations, is useful in various science/engineering applications, such as Independent Component Analysis, , genetic analysis , speech recognition, manifold learning, and time delay estimation....


External links