Information theory

# Information theory

Overview
Information theory is a branch of applied mathematics
Applied mathematics
Applied mathematics is a branch of mathematics that concerns itself with mathematical methods that are typically used in science, engineering, business, and industry. Thus, "applied mathematics" is a mathematical science with specialized knowledge...

and electrical engineering
Electrical engineering
Electrical engineering is a field of engineering that generally deals with the study and application of electricity, electronics and electromagnetism. The field first became an identifiable occupation in the late nineteenth century after commercialization of the electric telegraph and electrical...

involving the quantification
Quantification
Quantification has several distinct senses. In mathematics and empirical science, it is the act of counting and measuring that maps human sense observations and experiences into members of some set of numbers. Quantification in this sense is fundamental to the scientific method.In logic,...

of information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...

. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing
Signal processing
Signal processing is an area of systems engineering, electrical engineering and applied mathematics that deals with operations on or analysis of signals, in either discrete or continuous time...

operations such as compressing data
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

and on reliably storing and communicating
Telecommunication
Telecommunication is the transmission of information over significant distances to communicate. In earlier times, telecommunications involved the use of visual signals, such as beacons, smoke signals, semaphore telegraphs, signal flags, and optical heliographs, or audio messages via coded...

data. Since its inception it has broadened to find applications in many other areas, including statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...

, natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

, cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

generally, networks other than communication networks — as in neurobiology, the evolution and function of molecular codes, model selection in ecology, thermal physics, quantum computing, plagiarism detection and other forms of data analysis
Data analysis
Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making...

.

A key measure of information is known as entropy, which is usually expressed by the average number of bits needed to store or communicate one symbol in a message.
Discussion
 Ask a question about 'Information theory' Start a new discussion about 'Information theory' Answer questions from other users Full Discussion Forum

Encyclopedia
Information theory is a branch of applied mathematics
Applied mathematics
Applied mathematics is a branch of mathematics that concerns itself with mathematical methods that are typically used in science, engineering, business, and industry. Thus, "applied mathematics" is a mathematical science with specialized knowledge...

and electrical engineering
Electrical engineering
Electrical engineering is a field of engineering that generally deals with the study and application of electricity, electronics and electromagnetism. The field first became an identifiable occupation in the late nineteenth century after commercialization of the electric telegraph and electrical...

involving the quantification
Quantification
Quantification has several distinct senses. In mathematics and empirical science, it is the act of counting and measuring that maps human sense observations and experiences into members of some set of numbers. Quantification in this sense is fundamental to the scientific method.In logic,...

of information
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...

. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing
Signal processing
Signal processing is an area of systems engineering, electrical engineering and applied mathematics that deals with operations on or analysis of signals, in either discrete or continuous time...

operations such as compressing data
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

and on reliably storing and communicating
Telecommunication
Telecommunication is the transmission of information over significant distances to communicate. In earlier times, telecommunications involved the use of visual signals, such as beacons, smoke signals, semaphore telegraphs, signal flags, and optical heliographs, or audio messages via coded...

data. Since its inception it has broadened to find applications in many other areas, including statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...

, natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

, cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

generally, networks other than communication networks — as in neurobiology, the evolution and function of molecular codes, model selection in ecology, thermal physics, quantum computing, plagiarism detection and other forms of data analysis
Data analysis
Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information, suggesting conclusions, and supporting decision making...

.

A key measure of information is known as entropy, which is usually expressed by the average number of bits needed to store or communicate one symbol in a message. Entropy quantifies the uncertainty involved in predicting the value of a random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

. For example, specifying the outcome of a fair coin flip (two equally likely outcomes) provides less information (lower entropy) than specifying the outcome from a roll of a die (six equally likely outcomes).

Applications of fundamental topics of information theory include lossless data compression
Lossless data compression
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. The term lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed, in exchange...

(e.g. ZIP files
ZIP (file format)
Zip is a file format used for data compression and archiving. A zip file contains one or more files that have been compressed, to reduce file size, or stored as is...

), lossy data compression
Lossy data compression
In information technology, "lossy" compression is a data encoding method that compresses data by discarding some of it. The procedure aims to minimize the amount of data that need to be held, handled, and/or transmitted by a computer...

(e.g. MP3
MP3
MPEG-1 or MPEG-2 Audio Layer III, more commonly referred to as MP3, is a patented digital audio encoding format using a form of lossy data compression...

s and JPGs), and channel coding
Channel capacity
In electrical engineering, computer science and information theory, channel capacity is the tightest upper bound on the amount of information that can be reliably transmitted over a communications channel...

(e.g. for Digital Subscriber Line (DSL)). The field is at the intersection of mathematics
Mathematics
Mathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...

, statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, physics
Physics
Physics is a natural science that involves the study of matter and its motion through spacetime, along with related concepts such as energy and force. More broadly, it is the general analysis of nature, conducted in order to understand how the universe behaves.Physics is one of the oldest academic...

, neurobiology, and electrical engineering
Electrical engineering
Electrical engineering is a field of engineering that generally deals with the study and application of electricity, electronics and electromagnetism. The field first became an identifiable occupation in the late nineteenth century after commercialization of the electric telegraph and electrical...

. Its impact has been crucial to the success of the Voyager
Voyager program
The Voyager program is a U.S program that launched two unmanned space missions, scientific probes Voyager 1 and Voyager 2. They were launched in 1977 to take advantage of a favorable planetary alignment of the late 1970s...

missions to deep space, the invention of the compact disc, the feasibility of mobile phones, the development of the Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...

, the study of linguistics
Linguistics
Linguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....

and of human perception, the understanding of black hole
Black hole
A black hole is a region of spacetime from which nothing, not even light, can escape. The theory of general relativity predicts that a sufficiently compact mass will deform spacetime to form a black hole. Around a black hole there is a mathematically defined surface called an event horizon that...

s, and numerous other fields. Important sub-fields of information theory are source coding
Source coding
In information theory, Shannon's source coding theorem establishes the limits to possible data compression, and the operational meaning of the Shannon entropy....

, channel coding, algorithmic complexity theory, algorithmic information theory
Algorithmic information theory
Algorithmic information theory is a subfield of information theory and computer science that concerns itself with the relationship between computation and information...

, information-theoretic security, and measures of information.

## Overview

The main concepts of information theory can be grasped by considering the most widespread means of human communication: language. Two important aspects of a concise language are as follows: First, the most common words (e.g., "a", "the", "I") should be shorter than less common words (e.g., "benefit", "generation", "mediocre"), so that sentences will not be too long. Such a tradeoff in word length is analogous to data compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

and is the essential aspect of source coding
Source coding
In information theory, Shannon's source coding theorem establishes the limits to possible data compression, and the operational meaning of the Shannon entropy....

. Second, if part of a sentence is unheard or misheard due to noise — e.g., a passing car — the listener should still be able to glean the meaning of the underlying message. Such robustness is as essential for an electronic communication system as it is for a language; properly building such robustness into communications is done by channel coding
Channel capacity
In electrical engineering, computer science and information theory, channel capacity is the tightest upper bound on the amount of information that can be reliably transmitted over a communications channel...

. Source coding and channel coding are the fundamental concerns of information theory.

Note that these concerns have nothing to do with the importance of messages. For example, a platitude such as "Thank you; come again" takes about as long to say or write as the urgent plea, "Call an ambulance!" while the latter may be more important and more meaningful in many contexts. Information theory, however, does not consider message importance or meaning, as these are matters of the quality of data rather than the quantity and readability of data, the latter of which is determined solely by probabilities.

Information theory is generally considered to have been founded in 1948 by Claude Shannon
Claude Elwood Shannon
Claude Elwood Shannon was an American mathematician, electronic engineer, and cryptographer known as "the father of information theory"....

in his seminal work, "A Mathematical Theory of Communication
A Mathematical Theory of Communication
"A Mathematical Theory of Communication" is an influential 1948 article by mathematician Claude E. Shannon. As of November 2011, Google Scholar has listed more than 48,000 unique citations of the article and the later-published book version...

". The central paradigm of classical information theory is the engineering problem of the transmission of information over a noisy channel. The most fundamental results of this theory are Shannon's source coding theorem, which establishes that, on average, the number of bits needed to represent the result of an uncertain event is given by its entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

; and Shannon's noisy-channel coding theorem, which states that reliable communication is possible over noisy channels provided that the rate of communication is below a certain threshold, called the channel capacity. The channel capacity can be approached in practice by using appropriate encoding and decoding systems.

Information theory is closely associated with a collection of pure and applied disciplines that have been investigated and reduced to engineering practice under a variety of rubrics
A rubric is an assessment tool for communicating expectations of quality. Rubrics support student self-reflection and self-assessment as well as communication between assessor and assessees...

throughout the world over the past half century or more: adaptive system
The term adaptation arises mainly in the biological scope as a trial to study the relationship between the characteristics of living beings and their environments...

s, anticipatory systems, artificial intelligence
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...

, complex system
Complex system
A complex system is a system composed of interconnected parts that as a whole exhibit one or more properties not obvious from the properties of the individual parts....

s, complexity science, cybernetics
Cybernetics
Cybernetics is the interdisciplinary study of the structure of regulatory systems. Cybernetics is closely related to information theory, control theory and systems theory, at least in its first-order form...

, informatics
Informatics is the science of information, the practice of information processing, and the engineering of information systems. Informatics studies the structure, algorithms, behavior, and interactions of natural and artificial systems that store, process, access and communicate information...

, machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

, along with systems science
Systems science
Systems science is an interdisciplinary field of science that studies the nature of complex systems in nature, society, and science. It aims to develop interdisciplinary foundations, which are applicable in a variety of areas, such as engineering, biology, medicine and social sciences.Systems...

s of many descriptions. Information theory is a broad and deep mathematical theory, with equally broad and deep applications, amongst which is the vital field of coding theory
Coding theory
Coding theory is the study of the properties of codes and their fitness for a specific application. Codes are used for data compression, cryptography, error-correction and more recently also for network coding...

.

Coding theory is concerned with finding explicit methods, called codes, of increasing the efficiency and reducing the net error rate of data communication over a noisy channel to near the limit that Shannon proved is the maximum possible for that channel. These codes can be roughly subdivided into data compression
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

(source coding) and error-correction (channel coding) techniques. In the latter case, it took many years to find the methods Shannon's work proved were possible. A third class of information theory codes are cryptographic algorithms (both code
Code (cryptography)
In cryptography, a code is a method used to transform a message into an obscured form, preventing those who do not possess special information, or key, required to apply the transform from understanding what is actually transmitted. The usual method is to use a codebook with a list of common...

s and cipher
Cipher
In cryptography, a cipher is an algorithm for performing encryption or decryption — a series of well-defined steps that can be followed as a procedure. An alternative, less common term is encipherment. In non-technical usage, a “cipher” is the same thing as a “code”; however, the concepts...

s). Concepts, methods and results from coding theory and information theory are widely used in cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

and cryptanalysis
Cryptanalysis
Cryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information that is normally required to do so. Typically, this involves knowing how the system works and finding a secret key...

. See the article ban (information)
Ban (information)
A ban, sometimes called a hartley or a dit , is a logarithmic unit which measures information or entropy, based on base 10 logarithms and powers of 10, rather than the powers of 2 and base 2 logarithms which define the bit. As a bit corresponds to a binary digit, so a ban is a decimal digit...

for a historical application.

Information theory is also used in information retrieval
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

, intelligence gathering
Intelligence (information gathering)
Intelligence assessment is the development of forecasts of behaviour or recommended courses of action to the leadership of an organization, based on a wide range of available information sources both overt and covert. Assessments are developed in response to requirements declared by the leadership...

, gambling
Gambling
Gambling is the wagering of money or something of material value on an event with an uncertain outcome with the primary intent of winning additional money and/or material goods...

, statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, and even in musical composition
Musical composition
Musical composition can refer to an original piece of music, the structure of a musical piece, or the process of creating a new piece of music. People who practice composition are called composers.- Musical compositions :...

.

## Historical background

The landmark event that established the discipline of information theory, and brought it to immediate worldwide attention, was the publication of Claude E. Shannon's classic paper "A Mathematical Theory of Communication
A Mathematical Theory of Communication
"A Mathematical Theory of Communication" is an influential 1948 article by mathematician Claude E. Shannon. As of November 2011, Google Scholar has listed more than 48,000 unique citations of the article and the later-published book version...

" in the Bell System Technical Journal
Bell System Technical Journal
The Bell System Technical Journal was the in-house scientific journal of Bell Labs that was published from 1922 to 1983.- Notable papers :...

in July and October 1948.

Prior to this paper, limited information-theoretic ideas had been developed at Bell Labs
Bell Labs
Bell Laboratories is the research and development subsidiary of the French-owned Alcatel-Lucent and previously of the American Telephone & Telegraph Company , half-owned through its Western Electric manufacturing subsidiary.Bell Laboratories operates its...

, all implicitly assuming events of equal probability. Harry Nyquist
Harry Nyquist
Harry Nyquist was an important contributor to information theory.-Personal life:...

's 1924 paper, Certain Factors Affecting Telegraph Speed, contains a theoretical section quantifying "intelligence" and the "line speed" at which it can be transmitted by a communication system, giving the relation , where W is the speed of transmission of intelligence, m is the number of different voltage levels to choose from at each time step, and K is a constant. Ralph Hartley
Ralph Hartley
Ralph Vinton Lyon Hartley was an electronics researcher. He invented the Hartley oscillator and the Hartley transform, and contributed to the foundations of information theory.-Biography:...

's 1928 paper, Transmission of Information, uses the word information as a measurable quantity, reflecting the receiver's ability to distinguish one sequence of symbols from any other, thus quantifying information as , where S was the number of possible symbols, and n the number of symbols in a transmission. The natural unit of information was therefore the decimal digit, much later renamed the hartley
Ban (information)
A ban, sometimes called a hartley or a dit , is a logarithmic unit which measures information or entropy, based on base 10 logarithms and powers of 10, rather than the powers of 2 and base 2 logarithms which define the bit. As a bit corresponds to a binary digit, so a ban is a decimal digit...

in his honour as a unit or scale or measure of information. Alan Turing
Alan Turing
Alan Mathison Turing, OBE, FRS , was an English mathematician, logician, cryptanalyst, and computer scientist. He was highly influential in the development of computer science, providing a formalisation of the concepts of "algorithm" and "computation" with the Turing machine, which played a...

in 1940 used similar ideas as part of the statistical analysis of the breaking of the German second world war Enigma
Cryptanalysis of the Enigma
Cryptanalysis of the Enigma enabled the western Allies in World War II to read substantial amounts of secret Morse-coded radio communications of the Axis powers that had been enciphered using Enigma machines. This yielded military intelligence which, along with that from other decrypted Axis radio...

ciphers.

Much of the mathematics behind information theory with events of different probabilities was developed for the field of thermodynamics
Thermodynamics
Thermodynamics is a physical science that studies the effects on material bodies, and on radiation in regions of space, of transfer of heat and of work done on or by the bodies or radiation...

by Ludwig Boltzmann
Ludwig Boltzmann
Ludwig Eduard Boltzmann was an Austrian physicist famous for his founding contributions in the fields of statistical mechanics and statistical thermodynamics...

and J. Willard Gibbs. Connections between information-theoretic entropy and thermodynamic entropy, including the important contributions by Rolf Landauer
Rolf Landauer
Rolf William Landauer was an IBM physicist who in 1961 argued that when information is lost in an irreversible circuit, the information becomes entropy and an associated amount of energy is dissipated as heat...

in the 1960s, are explored in Entropy in thermodynamics and information theory
Entropy in thermodynamics and information theory
There are close parallels between the mathematical expressions for the thermodynamic entropy, usually denoted by S, of a physical system in the statistical thermodynamics established by Ludwig Boltzmann and J. Willard Gibbs in the 1870s; and the information-theoretic entropy, usually expressed as...

.

In Shannon's revolutionary and groundbreaking paper, the work for which had been substantially completed at Bell Labs by the end of 1944, Shannon for the first time introduced the qualitative and quantitative model of communication as a statistical process underlying information theory, opening with the assertion that
"The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point."

With it came the ideas of
• the information entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

and redundancy
Redundancy (information theory)
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data...

of a source, and its relevance through the source coding theorem;
• the mutual information
Mutual information
In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables...

, and the channel capacity
Channel capacity
In electrical engineering, computer science and information theory, channel capacity is the tightest upper bound on the amount of information that can be reliably transmitted over a communications channel...

of a noisy channel, including the promise of perfect loss-free communication given by the noisy-channel coding theorem;
• the practical result of the Shannon–Hartley law for the channel capacity of a Gaussian channel; as well as
• the bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

—a new way of seeing the most fundamental unit of information.

## Quantities of information

Information theory is based on probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

and statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

. The most important quantities of information are entropy, the information in a random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

, and mutual information
Mutual information
In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables...

, the amount of information in common between two random variables. The former quantity indicates how easily message data can be compressed
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

while the latter can be used to find the communication rate across a channel
Channel (communications)
In telecommunications and computer networking, a communication channel, or channel, refers either to a physical transmission medium such as a wire, or to a logical connection over a multiplexed medium such as a radio channel...

.

The choice of logarithmic base in the following formulae determines the unit
Units of measurement
A unit of measurement is a definite magnitude of a physical quantity, defined and adopted by convention and/or by law, that is used as a standard for measurement of the same physical quantity. Any other value of the physical quantity can be expressed as a simple multiple of the unit of...

of information entropy
Information entropy
In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...

that is used. The most common unit of information is the bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

, based on the binary logarithm
Binary logarithm
In mathematics, the binary logarithm is the logarithm to the base 2. It is the inverse function of n ↦ 2n. The binary logarithm of n is the power to which the number 2 must be raised to obtain the value n. This makes the binary logarithm useful for anything involving powers of 2,...

. Other units include the nat
Nat (information)
A nat is a logarithmic unit of information or entropy, based on natural logarithms and powers of e, rather than the powers of 2 and base 2 logarithms which define the bit. The nat is the natural unit for information entropy...

, which is based on the natural logarithm
Natural logarithm
The natural logarithm is the logarithm to the base e, where e is an irrational and transcendental constant approximately equal to 2.718281828...

, and the hartley, which is based on the common logarithm
Common logarithm
The common logarithm is the logarithm with base 10. It is also known as the decadic logarithm, named after its base. It is indicated by log10, or sometimes Log with a capital L...

.

In what follows, an expression of the form is considered by convention to be equal to zero whenever This is justified because for any logarithmic base.

### Entropy

The entropy, , of a discrete random variable is a measure of the amount of uncertainty associated with the value of .

Suppose one transmits 1000 bits (0s and 1s). If these bits are known ahead of transmission (to be a certain value with absolute probability), logic dictates that no information has been transmitted. If, however, each is equally and independently likely to be 0 or 1, 1000 bits (in the information theoretic sense) have been transmitted. Between these two extremes, information can be quantified as follows. If is the set of all messages that could be, and is the probability of given some , then the entropy of is defined:

(Here, is the self-information
Self-information
In information theory, self-information is a measure of the information content associated with the outcome of a random variable. It is expressed in a unit of information, for example bits,nats,or...

, which is the entropy contribution of an individual message, and is the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

.) An important property of entropy is that it is maximized when all the messages in the message space are equiprobable ,—i.e., most unpredictable—in which case .

The special case of information entropy for a random variable with two outcomes is the binary entropy function, usually taken to the logarithmic base 2:

### Joint entropy

The joint entropy of two discrete random variables and is merely the entropy of their pairing: . This implies that if and are independent
Statistical independence
In probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs...

, then their joint entropy is the sum of their individual entropies.

For example, if represents the position of a chess
Chess
Chess is a two-player board game played on a chessboard, a square-checkered board with 64 squares arranged in an eight-by-eight grid. It is one of the world's most popular games, played by millions of people worldwide at home, in clubs, online, by correspondence, and in tournaments.Each player...

piece — the row and the column, then the joint entropy of the row of the piece and the column of the piece will be the entropy of the position of the piece.

Despite similar notation, joint entropy should not be confused with cross entropy
Cross entropy
In information theory, the cross entropy between two probability distributions measures the average number of bits needed to identify an event from a set of possibilities, if a coding scheme is used based on a given probability distribution q, rather than the "true" distribution p.The cross entropy...

.

### Conditional entropy (equivocation)

The conditional entropy
Conditional entropy
In information theory, the conditional entropy quantifies the remaining entropy of a random variable Y given that the value of another random variable X is known. It is referred to as the entropy of Y conditional on X, and is written H...

or conditional uncertainty of given random variable (also called the equivocation of about ) is the average conditional entropy over :

Because entropy can be conditioned on a random variable or on that random variable being a certain value, care should be taken not to confuse these two definitions of conditional entropy, the former of which is in more common use. A basic property of this form of conditional entropy is that:

### Mutual information (transinformation)

Mutual information
Mutual information
In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables...

measures the amount of information that can be obtained about one random variable by observing another. It is important in communication where it can be used to maximize the amount of information shared between sent and received signals. The mutual information of relative to is given by:

where (Specific mutual Information) is the pointwise mutual information
Pointwise Mutual Information
Pointwise mutual information , or point mutual information, is a measure of association used in information theory and statistics.-Definition:...

.

A basic property of the mutual information is that

That is, knowing Y, we can save an average of bits in encoding X compared to not knowing Y.

Mutual information is symmetric
Symmetric function
In algebra and in particular in algebraic combinatorics, the ring of symmetric functions, is a specific limit of the rings of symmetric polynomials in n indeterminates, as n goes to infinity...

:

Mutual information can be expressed as the average Kullback–Leibler divergence
Kullback–Leibler divergence
In probability theory and information theory, the Kullback–Leibler divergence is a non-symmetric measure of the difference between two probability distributions P and Q...

(information gain) of the posterior probability distribution
Posterior probability
In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence is taken into account...

of X given the value of Y to the prior distribution
Prior probability
In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one's uncertainty about p before the "data"...

on X:

In other words, this is a measure of how much, on the average, the probability distribution on X will change if we are given the value of Y. This is often recalculated as the divergence from the product of the marginal distributions to the actual joint distribution:

Mutual information is closely related to the log-likelihood ratio test
Likelihood-ratio test
In statistics, a likelihood ratio test is a statistical test used to compare the fit of two models, one of which is a special case of the other . The test is based on the likelihood ratio, which expresses how many times more likely the data are under one model than the other...

in the context of contingency tables and the multinomial distribution and to Pearson's χ2 test
Pearson's chi-squared test
Pearson's chi-squared test is the best-known of several chi-squared tests – statistical procedures whose results are evaluated by reference to the chi-squared distribution. Its properties were first investigated by Karl Pearson in 1900...

: mutual information can be considered a statistic for assessing independence between a pair of variables, and has a well-specified asymptotic distribution.

### Kullback–Leibler divergence (information gain)

The Kullback–Leibler divergence
Kullback–Leibler divergence
In probability theory and information theory, the Kullback–Leibler divergence is a non-symmetric measure of the difference between two probability distributions P and Q...

(or information divergence, information gain, or relative entropy) is a way of comparing two distributions: a "true" probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

p(X), and an arbitrary probability distribution q(X). If we compress data in a manner that assumes q(X) is the distribution underlying some data, when, in reality, p(X) is the correct distribution, the Kullback–Leibler divergence is the number of average additional bits per datum necessary for compression. It is thus defined

Although it is sometimes used as a 'distance metric', KL divergence is not a true metric
Metric (mathematics)
In mathematics, a metric or distance function is a function which defines a distance between elements of a set. A set with a metric is called a metric space. A metric induces a topology on a set but not all topologies can be generated by a metric...

since it is not symmetric and does not satisfy the triangle inequality
Triangle inequality
In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side ....

(making it a semi-quasimetric).

### Kullback–Leibler divergence of a prior from the truth

Another interpretation of KL divergence is this: suppose a number X is about to be drawn randomly from a discrete set with probability distribution p(x). If Alice knows the true distribution p(x), while Bob believes (has a prior) that the distribution is q(x), then Bob will be more surprised
Self-information
In information theory, self-information is a measure of the information content associated with the outcome of a random variable. It is expressed in a unit of information, for example bits,nats,or...

than Alice, on average, upon seeing the value of X. The KL divergence is the (objective) expected value of the Bob's (subjective) surprisal minus Alice's surprisal, measured in bits if the log is in base 2. In this way, the extent to which Bob's prior is "wrong" can be quantified in terms of how "unnecessarily surprised" it's expected to make him.

### Other quantities

Other important information theoretic quantities include Rényi entropy
Rényi entropy
In information theory, the Rényi entropy, a generalisation of Shannon entropy, is one of a family of functionals for quantifying the diversity, uncertainty or randomness of a system...

(a generalization of entropy), differential entropy
Differential entropy
Differential entropy is a concept in information theory that extends the idea of entropy, a measure of average surprisal of a random variable, to continuous probability distributions.-Definition:...

(a generalization of quantities of information to continuous distributions), and the conditional mutual information
Conditional mutual information
In probability theory, and in particular, information theory, the conditional mutual information is, in its most basic form, the expected value of the mutual information of two random variables given the value of a third.-Definition:...

.

## Coding theory

Coding theory
Coding theory
Coding theory is the study of the properties of codes and their fitness for a specific application. Codes are used for data compression, cryptography, error-correction and more recently also for network coding...

is one of the most important and direct applications of information theory. It can be subdivided into source coding
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

theory and channel coding theory. Using a statistical description for data, information theory quantifies the number of bits needed to describe the data, which is the information entropy of the source.
• Data compression (source coding): There are two formulations for the compression problem:
1. lossless data compression
Lossless data compression
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. The term lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed, in exchange...

: the data must be reconstructed exactly;
2. lossy data compression
Lossy data compression
In information technology, "lossy" compression is a data encoding method that compresses data by discarding some of it. The procedure aims to minimize the amount of data that need to be held, handled, and/or transmitted by a computer...

: allocates bits needed to reconstruct the data, within a specified fidelity level measured by a distortion function. This subset of Information theory is called rate–distortion theory.

• Error-correcting codes (channel coding): While data compression removes as much redundancy
Redundancy (information theory)
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data...

as possible, an error correcting code adds just the right kind of redundancy (i.e., error correction) needed to transmit the data efficiently and faithfully across a noisy channel.

This division of coding theory into compression and transmission is justified by the information transmission theorems, or source–channel separation theorems that justify the use of bits as the universal currency for information in many contexts. However, these theorems only hold in the situation where one transmitting user wishes to communicate to one receiving user. In scenarios with more than one transmitter (the multiple-access channel), more than one receiver (the broadcast channel) or intermediary "helpers" (the relay channel
Relay channel
In information theory, a relay channel is a probability model of the communication between a sender and a receiver aided by one or more intermediate relay nodes...

), or more general networks
Computer network
A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....

, compression followed by transmission may no longer be optimal. Network information theory refers to these multi-agent communication models.

### Source theory

Any process that generates successive messages can be considered a source
Communication source
A source or sender is one of the basic concepts of communication and information processing. Sources are objects which encode message data and transmit the information, via a channel, to one or more observers ....

of information. A memoryless source is one in which each message is an independent identically-distributed random variable, whereas the properties of ergodicity
Ergodic theory
Ergodic theory is a branch of mathematics that studies dynamical systems with an invariant measure and related problems. Its initial development was motivated by problems of statistical physics....

and stationarity
Stationary process
In the mathematical sciences, a stationary process is a stochastic process whose joint probability distribution does not change when shifted in time or space...

impose more general constraints. All such sources are stochastic
Stochastic process
In probability theory, a stochastic process , or sometimes random process, is the counterpart to a deterministic process...

. These terms are well studied in their own right outside information theory.

#### Rate

Information rate
Entropy rate
In the mathematical theory of probability, the entropy rate or source information rate of a stochastic process is, informally, the time density of the average information in a stochastic process...

is the average entropy per symbol. For memoryless sources, this is merely the entropy of each symbol, while, in the case of a stationary stochastic process, it is

that is, the conditional entropy of a symbol given all the previous symbols generated. For the more general case of a process that is not necessarily stationary, the average rate is

that is, the limit of the joint entropy per symbol. For stationary sources, these two expressions give the same result.

It is common in information theory to speak of the "rate" or "entropy" of a language. This is appropriate, for example, when the source of information is English prose. The rate of a source of information is related to its redundancy
Redundancy (information theory)
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data...

and how well it can be compressed
Data compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....

, the subject of source coding.

### Channel capacity

Communications over a channel—such as an ethernet
Ethernet
Ethernet is a family of computer networking technologies for local area networks commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies....

cable
Cable
A cable is two or more wires running side by side and bonded, twisted or braided together to form a single assembly. In mechanics cables, otherwise known as wire ropes, are used for lifting, hauling and towing or conveying force through tension. In electrical engineering cables are used to carry...

—is the primary motivation of information theory. As anyone who's ever used a telephone (mobile or landline) knows, however, such channels often fail to produce exact reconstruction of a signal; noise, periods of silence, and other forms of signal corruption often degrade quality. How much information can one hope to communicate over a noisy (or otherwise imperfect) channel?

Consider the communications process over a discrete channel. A simple model of the process is shown below:

Here X represents the space of messages transmitted, and Y the space of messages received during a unit time over our channel. Let be the conditional probability
Conditional probability
In probability theory, the "conditional probability of A given B" is the probability of A if B is known to occur. It is commonly notated P, and sometimes P_B. P can be visualised as the probability of event A when the sample space is restricted to event B...

distribution function of Y given X. We will consider to be an inherent fixed property of our communications channel (representing the nature of the noise of our channel). Then the joint distribution of X and Y is completely determined by our channel and by our choice of , the marginal distribution of messages we choose to send over the channel. Under these constraints, we would like to maximize the rate of information, or the signal
Signal (electrical engineering)
In the fields of communications, signal processing, and in electrical engineering more generally, a signal is any time-varying or spatial-varying quantity....

, we can communicate over the channel. The appropriate measure for this is the mutual information
Mutual information
In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables...

, and this maximum mutual information is called the channel capacity
Channel capacity
In electrical engineering, computer science and information theory, channel capacity is the tightest upper bound on the amount of information that can be reliably transmitted over a communications channel...

and is given by:
This capacity has the following property related to communicating at information rate R (where R is usually bits per symbol). For any information rate R < C and coding error ε > 0, for large enough N, there exists a code of length N and rate ≥ R and a decoding algorithm, such that the maximal probability of block error is ≤ ε; that is, it is always possible to transmit with arbitrarily small block error. In addition, for any rate R > C, it is impossible to transmit with arbitrarily small block error.

Channel coding
Channel code
In digital communications, a channel code is a broadly used term mostly referring to the forward error correction code and bit interleaving in communication and storage where the communication media or storage media is viewed as a channel...

is concerned with finding such nearly optimal codes
Error detection and correction
In information theory and coding theory with applications in computer science and telecommunication, error detection and correction or error control are techniques that enable reliable delivery of digital data over unreliable communication channels...

that can be used to transmit data over a noisy channel with a small coding error at a rate near the channel capacity.

#### Capacity of particular channel models

• A continuous-time analog communications channel subject to Gaussian noise
Gaussian noise
Gaussian noise is statistical noise that has its probability density function equal to that of the normal distribution, which is also known as the Gaussian distribution. In other words, the values that the noise can take on are Gaussian-distributed. A special case is white Gaussian noise, in which...

— see Shannon–Hartley theorem
Shannon–Hartley theorem
In information theory, the Shannon–Hartley theorem tells the maximum rate at which information can be transmitted over a communications channel of a specified bandwidth in the presence of noise. It is an application of the noisy channel coding theorem to the archetypal case of a continuous-time...

.

• A binary symmetric channel
Binary symmetric channel
A binary symmetric channel is a common communications channel model used in coding theory and information theory. In this model, a transmitter wishes to send a bit , and the receiver receives a bit. It is assumed that the bit is usually transmitted correctly, but that it will be "flipped" with a...

(BSC) with crossover probability p is a binary input, binary output channel that flips the input bit with probability p. The BSC has a capacity of bits per channel use, where is the binary entropy function to the base 2 logarithm:

• A binary erasure channel
Binary erasure channel
A binary erasure channel is a common communications channel model used in coding theory and information theory. In this model, a transmitter sends a bit , and the receiver either receives the bit or it receives a message that the bit was not received...

(BEC) with erasure probability p is a binary input, ternary output channel. The possible channel outputs are 0, 1, and a third symbol 'e' called an erasure. The erasure represents complete loss of information about an input bit. The capacity of the BEC is 1 - p bits per channel use.

### Intelligence uses and secrecy applications

Information theoretic concepts apply to cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

and cryptanalysis
Cryptanalysis
Cryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information that is normally required to do so. Typically, this involves knowing how the system works and finding a secret key...

. Turing
Turing
Alan Turing was a British mathematician, logician, cryptanalyst, and computer scientist.Turing may also refer to:*Turing machine, a basic, abstract symbol-manipulating device...

's information unit, the ban
Ban (information)
A ban, sometimes called a hartley or a dit , is a logarithmic unit which measures information or entropy, based on base 10 logarithms and powers of 10, rather than the powers of 2 and base 2 logarithms which define the bit. As a bit corresponds to a binary digit, so a ban is a decimal digit...

, was used in the Ultra project, breaking the German Enigma machine
Enigma machine
An Enigma machine is any of a family of related electro-mechanical rotor cipher machines used for the encryption and decryption of secret messages. Enigma was invented by German engineer Arthur Scherbius at the end of World War I...

code and hastening the end of WWII in Europe
Victory in Europe Day
Victory in Europe Day commemorates 8 May 1945 , the date when the World War II Allies formally accepted the unconditional surrender of the armed forces of Nazi Germany and the end of Adolf Hitler's Third Reich. The formal surrender of the occupying German forces in the Channel Islands was not...

. Shannon himself defined an important concept now called the unicity distance
Unicity distance
In cryptography, unicity distance is the length of an original ciphertext needed to break the cipher by reducing the number of possible spurious keys to zero in a brute force attack. That is, after trying every possible key, there should be just one decipherment that makes sense, i.e...

. Based on the redundancy
Redundancy (information theory)
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. Informally, it is the amount of wasted "space" used to transmit certain data...

of the plaintext
Plaintext
In cryptography, plaintext is information a sender wishes to transmit to a receiver. Cleartext is often used as a synonym. Before the computer era, plaintext most commonly meant message text in the language of the communicating parties....

, it attempts to give a minimum amount of ciphertext
Ciphertext
In cryptography, ciphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext that is unreadable by a human or computer without the proper cipher...

necessary to ensure unique decipherability.

Information theory leads us to believe it is much more difficult to keep secrets than it might first appear. A brute force attack
Brute force attack
In cryptography, a brute-force attack, or exhaustive key search, is a strategy that can, in theory, be used against any encrypted data. Such an attack might be utilized when it is not possible to take advantage of other weaknesses in an encryption system that would make the task easier...

can break systems based on asymmetric key algorithms
Public-key cryptography
Public-key cryptography refers to a cryptographic system requiring two separate keys, one to lock or encrypt the plaintext, and one to unlock or decrypt the cyphertext. Neither key will do both functions. One of these keys is published or public and the other is kept private...

or on most commonly used methods of symmetric key algorithms
Symmetric-key algorithm
Symmetric-key algorithms are a class of algorithms for cryptography that use trivially related, often identical, cryptographic keys for both encryption of plaintext and decryption of ciphertext. The encryption key is trivially related to the decryption key, in that they may be identical or there is...

(sometimes called secret key algorithms), such as block cipher
Block cipher
In cryptography, a block cipher is a symmetric key cipher operating on fixed-length groups of bits, called blocks, with an unvarying transformation. A block cipher encryption algorithm might take a 128-bit block of plaintext as input, and output a corresponding 128-bit block of ciphertext...

s. The security of all such methods currently comes from the assumption that no known attack can break them in a practical amount of time.

Information theoretic security
Information theoretic security
A cryptosystem is information-theoretically secure if its security derives purely from information theory. That is, it is secure even when the adversary has unlimited computing power. The adversary simply does not have enough information to break the security...

refers to methods such as the one-time pad
In cryptography, the one-time pad is a type of encryption, which has been proven to be impossible to crack if used correctly. Each bit or character from the plaintext is encrypted by a modular addition with a bit or character from a secret random key of the same length as the plaintext, resulting...

that are not vulnerable to such brute force attacks. In such cases, the positive conditional mutual information
Mutual information
In probability theory and information theory, the mutual information of two random variables is a quantity that measures the mutual dependence of the two random variables...

between the plaintext
Plaintext
In cryptography, plaintext is information a sender wishes to transmit to a receiver. Cleartext is often used as a synonym. Before the computer era, plaintext most commonly meant message text in the language of the communicating parties....

and ciphertext
Ciphertext
In cryptography, ciphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext that is unreadable by a human or computer without the proper cipher...

(conditioned on the key
Key (cryptography)
In cryptography, a key is a piece of information that determines the functional output of a cryptographic algorithm or cipher. Without a key, the algorithm would produce no useful result. In encryption, a key specifies the particular transformation of plaintext into ciphertext, or vice versa...

) can ensure proper transmission, while the unconditional mutual information between the plaintext and ciphertext remains zero, resulting in absolutely secure communications. In other words, an eavesdropper would not be able to improve his or her guess of the plaintext by gaining knowledge of the ciphertext but not of the key. However, as in any other cryptographic system, care must be used to correctly apply even information-theoretically secure methods; the Venona project
Venona project
The VENONA project was a long-running secret collaboration of the United States and United Kingdom intelligence agencies involving cryptanalysis of messages sent by intelligence agencies of the Soviet Union, the majority during World War II...

was able to crack the one-time pads of the Soviet Union
Soviet Union
The Soviet Union , officially the Union of Soviet Socialist Republics , was a constitutionally socialist state that existed in Eurasia between 1922 and 1991....

due to their improper reuse of key material.

### Pseudorandom number generation

Pseudorandom number generator
Pseudorandom number generator
A pseudorandom number generator , also known as a deterministic random bit generator , is an algorithm for generating a sequence of numbers that approximates the properties of random numbers...

s are widely available in computer language libraries and application programs. They are, almost universally, unsuited to cryptographic use as they do not evade the deterministic nature of modern computer equipment and software. A class of improved random number generators is termed cryptographically secure pseudorandom number generator
Cryptographically secure pseudorandom number generator
A cryptographically secure pseudo-random number generator is a pseudo-random number generator with properties that make it suitable for use in cryptography.Many aspects of cryptography require random numbers, for example:...

s, but even they require external to the software random seed
Random seed
A random seed is a number used to initialize a pseudorandom number generator.The choice of a good random seed is crucial in the field of computer security...

s to work as intended. These can be obtained via extractors, if done carefully. The measure of sufficient randomness in extractors is min-entropy
Min-entropy
In probability theory or information theory, the min-entropy of a discrete random event x with possible states 1... n and corresponding probabilities p1... pn is...

, a value related to Shannon entropy through Rényi entropy
Rényi entropy
In information theory, the Rényi entropy, a generalisation of Shannon entropy, is one of a family of functionals for quantifying the diversity, uncertainty or randomness of a system...

; Rényi entropy is also used in evaluating randomness in cryptographic systems. Although related, the distinctions among these measures mean that a random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

with high Shannon entropy is not necessarily satisfactory for use in an extractor and so for cryptography uses.

### Seismic exploration

One early commercial application of information theory was in the field seismic oil exploration. Work in this field made it possible to strip off and separate the unwanted noise from the desired seismic signal. Information theory and digital signal processing
Digital signal processing
Digital signal processing is concerned with the representation of discrete time signals by a sequence of numbers or symbols and the processing of these signals. Digital signal processing and analog signal processing are subfields of signal processing...

offer a major improvement of resolution and image clarity over previous analog methods.

### Miscellaneous applications

Information theory also has applications in gambling and investing
Gambling and information theory
Statistical inference might be thought of as gambling theory applied to the world around. The myriad applications for logarithmic information measures tell us precisely how to take the best guess in the face of partial information. In that sense, information theory might be considered a formal...

, black holes
Black hole information paradox
The black hole information paradox results from the combination of quantum mechanics and general relativity. It suggests that physical information could disappear in a black hole, allowing many physical states to evolve into the same state...

, bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

, and music
Music
Music is an art form whose medium is sound and silence. Its common elements are pitch , rhythm , dynamics, and the sonic qualities of timbre and texture...

.

• Communication theory
Communication theory
Communication theory is a field of information and mathematics that studies the technical process of information and the human process of human communication.- History :- Origins :...

• List of important publications
• Philosophy of information
Philosophy of information
The philosophy of information is the area of research that studies conceptual issues arising at the intersection of computer science, information technology, and philosophy.It includes:...

### Applications

• Cryptanalysis
Cryptanalysis
Cryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information that is normally required to do so. Typically, this involves knowing how the system works and finding a secret key...

• Cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

• Cybernetics
Cybernetics
Cybernetics is the interdisciplinary study of the structure of regulatory systems. Cybernetics is closely related to information theory, control theory and systems theory, at least in its first-order form...

• Entropy in thermodynamics and information theory
Entropy in thermodynamics and information theory
There are close parallels between the mathematical expressions for the thermodynamic entropy, usually denoted by S, of a physical system in the statistical thermodynamics established by Ludwig Boltzmann and J. Willard Gibbs in the 1870s; and the information-theoretic entropy, usually expressed as...

• Gambling
Gambling
Gambling is the wagering of money or something of material value on an event with an uncertain outcome with the primary intent of winning additional money and/or material goods...

• Intelligence (information gathering)
Intelligence (information gathering)
Intelligence assessment is the development of forecasts of behaviour or recommended courses of action to the leadership of an organization, based on a wide range of available information sources both overt and covert. Assessments are developed in response to requirements declared by the leadership...

• Seismic exploration
Reflection seismology
Reflection seismology is a method of exploration geophysics that uses the principles of seismology to estimate the properties of the Earth's subsurface from reflected seismic waves. The method requires a controlled seismic source of energy, such as dynamite/Tovex, a specialized air gun or a...

### History

• Hartley, R.V.L.
Ralph Hartley
Ralph Vinton Lyon Hartley was an electronics researcher. He invented the Hartley oscillator and the Hartley transform, and contributed to the foundations of information theory.-Biography:...

• History of information theory
History of information theory
The decisive event which established the discipline of information theory, and brought it to immediate worldwide attention, was the publication of Claude E...

• Shannon, C.E.
Claude Elwood Shannon
Claude Elwood Shannon was an American mathematician, electronic engineer, and cryptographer known as "the father of information theory"....

• Timeline of information theory
Timeline of information theory
A timeline of events related to  information theory,  quantum information theory,  data compression,  error correcting codes and related subjects....

• Yockey, H.P.
Hubert Yockey
Professor Hubert P. Yockey , PhD is a physicist and information theorist. He worked under Robert Oppenheimer on the Manhattan Project, and at the University of California, Berkeley....

### The classic work

• Shannon, C.E.
Claude Elwood Shannon
Claude Elwood Shannon was an American mathematician, electronic engineer, and cryptographer known as "the father of information theory"....

(1948), "A Mathematical Theory of Communication
A Mathematical Theory of Communication
"A Mathematical Theory of Communication" is an influential 1948 article by mathematician Claude E. Shannon. As of November 2011, Google Scholar has listed more than 48,000 unique citations of the article and the later-published book version...

", Bell System Technical Journal, 27, pp. 379–423 & 623–656, July & October, 1948. PDF.
Notes and other formats.
• R.V.L. Hartley, "Transmission of Information", Bell System Technical Journal, July 1928
• Andrey Kolmogorov
Andrey Kolmogorov
Andrey Nikolaevich Kolmogorov was a Soviet mathematician, preeminent in the 20th century, who advanced various scientific fields, among them probability theory, topology, intuitionistic logic, turbulence, classical mechanics and computational complexity.-Early life:Kolmogorov was born at Tambov...

(1968), "Three approaches to the quantitative definition of information" in International Journal of Computer Mathematics.

### Other journal articles

• J. L. Kelly, Jr., Saratoga.ny.us, "A New Interpretation of Information Rate" Bell System Technical Journal, Vol. 35, July 1956, pp. 917–26.
• R. Landauer, IEEE.org, "Information is Physical" Proc. Workshop on Physics and Computation PhysComp'92 (IEEE Comp. Sci.Press, Los Alamitos, 1993) pp. 1–4.
• R. Landauer, IBM.com, "Irreversibility and Heat Generation in the Computing Process" IBM J. Res. Develop. Vol. 5, No. 3, 1961

### Textbooks on information theory

• Claude E. Shannon, Warren Weaver. The Mathematical Theory of Communication. Univ of Illinois Press, 1949. ISBN 0-252-72548-4
• Robert Gallager. Information Theory and Reliable Communication. New York: John Wiley and Sons, 1968. ISBN 0-471-29048-3
• Robert B. Ash. Information Theory. New York: Interscience, 1965. ISBN 0-470-03445-9. New York: Dover 1990. ISBN 0-486-66521-6
• Thomas M. Cover
Thomas M. Cover
Thomas M. Cover is Professor jointly in the Departments of Electrical Engineering and Statistics at Stanford University...

, Joy A. Thomas. Elements of information theory, 1st Edition. New York: Wiley-Interscience, 1991. ISBN 0-471-06259-6.
2nd Edition. New York: Wiley-Interscience, 2006. ISBN 0-471-24195-4.
• Imre Csiszar
Imre Csiszár
Imre Csiszár is a Hungarian mathematician with contributions to information theoryand probability theory. In 1996 he won the Claude E. Shannon Award, the highest annualaward given in the field of information theory....

, Janos Korner. Information Theory: Coding Theorems for Discrete Memoryless Systems Akademiai Kiado: 2nd edition, 1997. ISBN 963-05-7440-3
• Raymond W. Yeung. A First Course in Information Theory Kluwer Academic/Plenum Publishers, 2002. ISBN 0-306-46791-7
• David J. C. MacKay. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003. ISBN 0-521-64298-1
• Raymond W. Yeung. Information Theory and Network Coding Springer 2008, 2002. ISBN 978-0-387-79233-0
• Stanford Goldman. Information Theory. New York: Prentice Hall, 1953. New York: Dover 1968 ISBN 0-486-62209-6, 2005 ISBN 0-486-44271-3
• Fazlollah Reza
Fazlollah Reza
Fazlollah M. Reza is an Iranian university professor. He was born in Rasht, Iran.-Career:Reza graduated from Faculty of Engineering, University of Tehran in 1938 by receiving his bachelor's degree in electrical engineering...

. An Introduction to Information Theory. New York: McGraw-Hill 1961. New York: Dover 1994. ISBN 0-486-68210-2
• Masud Mansuripur. Introduction to Information Theory. New York: Prentice Hall, 1987. ISBN 0-13-484668-0
• Christoph Arndt: Information Measures, Information and its Description in Science and Engineering (Springer Series: Signals and Communication Technology), 2004, ISBN 978-3-540-40855-0

### Other books

• Leon Brillouin, Science and Information Theory, Mineola, N.Y.: Dover, [1956, 1962] 2004. ISBN 0-486-43918-6
• James Gleick
James Gleick
James Gleick is an American author, journalist, and biographer, whose books explore the cultural ramifications of science and technology...

, The Information: A History, a Theory, a Flood
The Information: A History, a Theory, a Flood
The Information: A History, a Theory, a Flood is a book by science history writer James Gleick, author of Chaos: Making a New Science. It covers the genesis of our current information age. The Information has also been published in ebook formats by Fourth Estate and Random House, and as an...

, New York: Pantheon, 2011. ISBN 978-0375423727
• A. I. Khinchin, Mathematical Foundations of Information Theory, New York: Dover, 1957. ISBN 0-486-60434-9
• H. S. Leff and A. F. Rex, Editors, Maxwell's Demon: Entropy, Information, Computing, Princeton University Press
Princeton University Press
-Further reading:* "". Artforum International, 2005.-External links:* * * * *...

, Princeton, NJ (1990). ISBN 0-691-08727-X
• Tom Siegfried, The Bit and the Pendulum, Wiley, 2000. ISBN 0-471-32174-5
• Charles Seife, Decoding The Universe, Viking, 2006. ISBN 0-670-03441-X
• Jeremy Campbell, Grammatical Man, Touchstone/Simon & Schuster, 1982, ISBN 0-671-44062-4
• Henri Theil, Economics and Information Theory, Rand McNally & Company - Chicago, 1967.

• alum.mit.edu, Eprint, Schneider, T. D., "Information Theory Primer"
• ND.edu, Srinivasa, S. "A Review on Multivariate Mutual Information"
• Chem.wisc.edu, Journal of Chemical Education, Shuffled Cards, Messy Desks, and Disorderly Dorm Rooms - Examples of Entropy Increase? Nonsense!
• ITsoc.org, IEEE Information Theory Society and ITsoc.org review articles
• Cam.ac.uk, On-line textbook: "Information Theory, Inference, and Learning Algorithms" by David MacKay
David MacKay (scientist)
David John Cameron MacKay, FRS, is the professor of natural philosophy in the department of Physics at the University of Cambridge and chief scientific adviser to the UK Department of Energy and Climate Change...

- giving an entertaining and thorough introduction to Shannon theory, including state-of-the-art methods from coding theory, such as arithmetic coding
Arithmetic coding
Arithmetic coding is a form of variable-length entropy encoding used in lossless data compression. Normally, a string of characters such as the words "hello there" is represented using a fixed number of bits per character, as in the ASCII code...

, low-density parity-check code
Low-density parity-check code
In information theory, a low-density parity-check code is a linear error correcting code, a method of transmitting a message over a noisy transmission channel, and is constructed using a sparse bipartite graph...

s, and Turbo code
Turbo code
In information theory, turbo codes are a class of high-performance forward error correction codes developed in 1993, which were the first practical codes to closely approach the channel capacity, a theoretical maximum for the code rate at which reliable communication is still possible given a...

s.
• UMBC.edu, Eprint, Erill, I., "A gentle introduction to information content in transcription factor binding sites"