Truncated binary encoding - AbsoluteAstronomy.com

Truncated binary encoding is an entropy encoding

Entropy encoding

In information theory an entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium....

typically used for uniform probability distribution

Probability distribution

In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

s with a finite alphabet. It is parameterized by an alphabet with total size of number n. It is a slightly more general form of binary encoding

Binary numeral system

The binary numeral system, or base-2 number system, represents numeric values using two symbols, 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2...

when n is not a power of two

Power of two

In mathematics, a power of two means a number of the form 2n where n is an integer, i.e. the result of exponentiation with as base the number two and as exponent the integer n....

.

Let n = 2^k+b, for 0 ≤ b ≤ 2^k. If n is a power of 2, you may choose two possible values of k; both produce the same coded values, which are identical to a simple binary code.

Truncated binary encoding assigns the first 2^k−b symbols codewords of length k and then assigns the remaining 2b symbols the last 2b codewords of length k+1. Because all the codewords of length k+1 consist of an unassigned codeword of length k with a "0" or "1" appended, the resulting code is a prefix code

Prefix code

A prefix code is a type of code system distinguished by its possession of the "prefix property"; which states that there is no valid code word in the system that is a prefix of any other valid code word in the set...

Example with n = 5

For example, for the alphabet {0, 1, 2, 3, 4}, n = 5 = 2²+1. Truncated binary encoding assigns the first three symbols (2²−1) the codewords 00, 01, and 10, all of length 2, then assigns the last two symbols the codewords 110 and 111, the last two codewords of length 3, each of which is the unused two-bit codeword 11 with an extra digit appended.

For example, if n is 5, plain binary encoding and truncated binary encoding allocates the following codewords. Digits shown ~~struck~~ are not transmitted in truncated binary.

Truncated binary	\| Encoding	Standard binary
0	0	0	0	0
1	0	0	1	1
2	0	1	0	2
UNUSED	0	1	1	3
UNUSED	1	0	0	4
UNUSED	1	0	1	5/UNUSED
3	1	1	0	6/UNUSED
4	1	1	1	7/UNUSED

The nearest power of 2 after n = 5 is 2³ = 8, so there are u = 8−5 = 3 unused codes in the straightforward 3-bit binary encoding.

In numerical terms, to send a value 0 ≤ x < n, which is one of 2^k ≤ n ≤ 2^k+1 symbols, there are u = 2^k+1−n unused entries when the alphabet size is rounded up to the nearest power of two. The process to encode the number x in truncated binary is: If x is less than u, encode it in k binary bits. If x is greater than or equal to u, encode the value x+u in k+1 binary bits.

Example with n = 10

Another example, encoding an alphabet of size 10 (between 0 and 9) requires k=4 bits, but there are 2⁴−10 = 6 unused codes, so input values less than 6 have the first bit discarded, while input values greater than or equal to 6 are offset by 6 to the end of the binary space. (Unused patterns are not shown in this table.)

Input value	Offset	Offset value	Standard Binary	Truncated Binary
0	0	0	0000	000
1	0	1	0001	001
2	0	2	0010	010
3	0	3	0011	011
4	0	4	0100	100
5	0	5	0101	101

6	6	12	1100	1100
7	6	13	1101	1101
8	6	14	1110	1110
9	6	15	1111	1111

To decode, read the first k bits. If they encode a value less than u, decoding is complete. Otherwise, add an additional bit and subtract u from the result.

Example with n = 7

Here is a more extreme case: with n = 7 the next power of 2 is 8 (we will then use 3 bits or 2 bits if the high bit is discarded) and u = 1:

Input value	Offset	Offset value	Standard Binary	Truncated Binary
0	0	0	000	00

1	1	2	010	010
2	1	3	011	011
3	1	4	100	100
4	1	5	101	101
5	1	6	110	110
6	1	7	111	111

This last example demonstrates that a leading zero bit does not always indicate a short code; if u < 2^k−1, some long codes will begin with a zero bit.

If n is a power of two, then the encoding may be done with either of two different values of k. Both produce equivalent outputs; one just has u = 2^k and encodes all values as "short" k-bit codes, while the other has u = 0 and encodes everything with "long" k+1-bit codes.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.