Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Units of information

Units of information

Overview
In computing
Computing
Computing is usually defined as the activity of using and developing computer technology, computer hardware and software. It is the computer-specific part of information technology...

 and telecommunication
Telecommunication
Telecommunication is transmission over a distance for the purpose of communication. In earlier times, this may have involved the use of smoke signals, drums, semaphore, flags or heliograph. In modern times, telecommunication typically involves the use of electronic devices such as the telephone,...

s, a unit of information is the capacity some standard data
Data
The term data means groups of information that represent the qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables...

 storage system or communication channel, used to measure the capacities of other systems and channels. In information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E. Shannon to find fundamental limits on compressing and reliably storing and communicating data...

, units of information are also used to measure the information
Information
Information as a concept has many meanings, from everyday usage to technical settings. The concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation.The English...

 contents or entropy
Entropy
Entropy is a concept of information maintaining great importance in physics, chemistry, and information theory...

 of random variables.

The most common units are the bit
Bit
In computing and telecommunications a bit is a basic unit of information storage and communication . It is the maximum amount of information that can be stored by a device or other physical system that can normally exist in only two distinct states...

 (the capacity of a system that can be in only two states) and the byte
Byte
A byte is a unit of information storage representing the smallest addressable element for a given computer architecture. It often designates a sequence of bits whose length is determined by the architecture...

 or octet
Octet
An octet is a group consisting of eight elements. It has several specific meanings:* Octet , a musical ensemble consisting of eight instruments....

 (equivalent to eight independent bits).
Discussion
Ask a question about 'Units of information'
Start a new discussion about 'Units of information'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia
In computing
Computing
Computing is usually defined as the activity of using and developing computer technology, computer hardware and software. It is the computer-specific part of information technology...

 and telecommunication
Telecommunication
Telecommunication is transmission over a distance for the purpose of communication. In earlier times, this may have involved the use of smoke signals, drums, semaphore, flags or heliograph. In modern times, telecommunication typically involves the use of electronic devices such as the telephone,...

s, a unit of information is the capacity some standard data
Data
The term data means groups of information that represent the qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables...

 storage system or communication channel, used to measure the capacities of other systems and channels. In information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E. Shannon to find fundamental limits on compressing and reliably storing and communicating data...

, units of information are also used to measure the information
Information
Information as a concept has many meanings, from everyday usage to technical settings. The concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation.The English...

 contents or entropy
Entropy
Entropy is a concept of information maintaining great importance in physics, chemistry, and information theory...

 of random variables.

The most common units are the bit
Bit
In computing and telecommunications a bit is a basic unit of information storage and communication . It is the maximum amount of information that can be stored by a device or other physical system that can normally exist in only two distinct states...

 (the capacity of a system that can be in only two states) and the byte
Byte
A byte is a unit of information storage representing the smallest addressable element for a given computer architecture. It often designates a sequence of bits whose length is determined by the architecture...

 or octet
Octet
An octet is a group consisting of eight elements. It has several specific meanings:* Octet , a musical ensemble consisting of eight instruments....

 (equivalent to eight independent bits). Larger units can be formed from these by the SI power-of-ten prefixes or the newer IEC binary power prefixes
Binary prefix
In computing, a binary prefix is a set of letters that precede a unit of digital quantity to indicate multiplication by a power of two....

.

Primary units



As observed by Hartley
Ralph Hartley
Ralph Vinton Lyon Hartley was an electronics researcher. He invented the Hartley oscillator and the Hartley transform, and contributed to the foundations of information theory.-Biography:...

 in 1928, and further formalized by Shannon
Claude Shannon
Claude Elwood Shannon , an American electronic engineer and mathematician, is known as "the father of information theory".Shannon is famous for having founded information theory with one landmark paper published in 1948...

 in 1945, the information that can be stored in a system is proportional to the logarithm
Logarithm
In mathematics, the logarithm of a number to a given base is the power or exponent to which the base must be raised in order to produce the number....

 logb N of the number N of possible states of that system. Changing the basis of the logarithm from b to a different number c has the effect of multiplying the value of the logarithm by a fixed constant, namely
logc N = (logc b) logb N

Therefore, the choice of the basis b determines the unit used to measure information. In particular, if b is a positive integer, then the unit is the amount of information that can be stored in a system with b possible states.

When b is 2, the unit is the "bit" (a contraction of binary digit). A system with 8 possible states, for example, can store up to log28 = 3 bits of information. Other units that have been named include:
  • Base b = 3: the unit is called "trit", and is equal to log2 3 (≈ 1.585) bits.
  • Base b = 10: the unit is called "decimal digit
    Digit
    Digit may refer to:* Digit , one of several most distal parts of a limb* Phone number, slang as digit, as in "Let me get your digits so I can call you tonight."* Numerical digit, as used in mathematics or computer science...

    ", "Hartley", "ban
    Ban (information)
    A ban, sometimes called a hartley or a dit , is a logarithmic unit which measures information or entropy, based on base 10 logarithms and powers of 10, rather than the powers of 2 and base 2 logarithms which define the bit. As a bit corresponds to a binary digit, so a ban is a decimal digit...

    ", "decit", or "dit", and is equal to log2 10 (≈ 3.322) bits.
  • Base b = e, the base of natural logarithms: the unit is called a "nat
    Nat (information)
    A nat is a logarithmic unit of information or entropy, based on natural logarithms and powers of e, rather than the powers of 2 and base 2 logarithms which define the bit. The nat is the natural unit for information entropy...

    ", "nit", or "nepit" (from Neperian
    John Napier
    John Napier of Merchistoun - also signed as Neper, Nepair - named Marvellous Merchiston, was a Scottish mathematician, physicist, astronomer/astrologer and 8th Laird of Merchistoun, son of Sir Archibald Napier of Merchiston. He is most remembered as the inventor of logarithms and Napier's bones,...

    ), and is worth log2 e (≈ 1.443) bits.

The trit, ban, and nat are rarely used to measure storage capacity; but the nat, in particular, is often used in information theory, because natural logarithms are sometimes easier to handle than logarithms in other bases.

Byte


Historically, a byte
Byte
A byte is a unit of information storage representing the smallest addressable element for a given computer architecture. It often designates a sequence of bits whose length is determined by the architecture...

 was the number of bits used to encode a character
Character (computing)
In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....

 of text in the computer, which depended on computer hardware architecture; but today it almost always means eight bits — that is, an octet
Octet (computing)
In computing, an octet is a grouping of eight bits.Octet, with the only exception noted below, always refers to an entity having exactly eight bits. As such, it is often used where the term byte might be ambiguous. For that reason, computer networking standards almost exclusively use octet...

. A byte can represent 28 = 256 distinct values, such as the integers 0 to 255, or -128 to 127. The IEEE 1541-2002 standard specifies "B" (upper case) as the symbol for byte. Bytes, or multiples thereof, are almost always used to specify the sizes of computer files and the capacity of storage units. Most modern computers and peripheral devices are designed to manipulate data in whole bytes or groups of bytes, rather than individual bits.

Nybble


A group of four bits, or half a byte, is sometimes called a nibble
Nibble
In computing, a nibble is a four-bit aggregation, or half an octet. As a nibble contains 4 bits, there are sixteen possible values, so a nibble corresponds to a single hexadecimal digit .A full byte is represented by two hexadecimal digits; therefore, it is common to display a byte...

 or nybble. This units is most often used in the context of hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen.Its primary use is as a...

 number representation, since a nybble can store precisely one hexadecimal digit.

Word, line, and page


Computers usually manipulate bits in groups of a fixed size, conventionally called words. The number of bits in a word is usually defined by the size of the registers in the computer's CPU, or by the number of data bits that are fetched from its main memory in a single operation. In the IA-32
IA-32
IA-32 , often generically called x86, x86-32 or i386, is the instruction set architecture of Intel's most commercially successful microprocessors yet. It is a 32-bit extension, first implemented in the Intel 80386, of the earlier 16-bit Intel 8086, 80186 and 80286 processors and the common...

 architecture more commonly known as x86-32, a word is 16 bits, but other past and current architectures use words with 8, 24, 32, 36, 51, 64, 80 bits or others.

Some machine instructions and computer number formats use two words (a "double word" or "dword"), or four words (a "quad word" or "quad").

Computer memory caches usually operate on blocks of memory that consist of several consecutive words. These units are customarily called "lines".

Virtual memory
Virtual memory
Virtual memory is a computer system technique which gives an application program the impression that it has contiguous working memory , while in fact it may be physically fragmented and may even overflow on to disk storage. Systems that use this technique make programming of large applications...

 systems partition the computer's main storage into even larger units, traditionally called "pages".

Systematic multiples


Terms for large quantities of bits can be formed using the standard range of SI prefixes for powers of 10, e.g., kilo
Kilo
Kilo is a unit prefix in the International System of Units denoting 103 or 1000. For example:* one kilogram is 1000 grams* one kilometre is 1000 metres* one kilojoule is 1000 joules...

 = 103 = 1000 (kilobit
Kilobit
A kilobit is an expression of grouped bits meaning 1,000 bits. Use of the term to denote a kibibit, although the most common use due to the nature of bits , is deprecated and contrary to international standard....

 or kbit), mega
Mega
mega is an SI prefix in the SI system of units denoting a factor of 106, 1,000,000 .For example, 1 MW = 1,000,000 watts = 1,000 kilowatts....

 = 106 = 1,000,000 (megabit
Megabit
A megabit is an SI-multiple of the unit of bit for digital information storage or transmission. The International Electrotechnical Commission's standard IEC 60027 specifies the symbol to be Mbit, but Mb is also in common use....

 or Mbit) and giga
Giga
Giga is a prefix in the SI system of units denoting 109, or 1,000,000,000. The Oxford English Dictionary reports the earliest written use of giga- in this sense to be in the Reports of the IUPAC 14th Conference in 1947: "The following prefixes to abbreviations for the names of units...

 = 109 = 1,000,000,000 (gigabit
Gigabit
Gigabit is a unit of digital information storage, with the symbol Gbit .1 gigabit = 109 = 1,000,000,000 bits...

 or Gbit). These prefixes are more often used for multiples of bytes, as in kilobyte
Kilobyte
The kilobyte is a unit of digital information storage equal to either 1,000 bytes or 1,024 bytes , depending on context....

 (kB = 8,000 bits), megabyte
Megabyte
The megabyte is an SI-multiple of the unit byte for digital information storage or transmission and is equal to 106 bytes. However, due to historical usage in computer-related fields it is still often used to represent 220 bytes. In rare cases, it is used to mean...

 (MB = 8,000,000 bits), and gigabyte
Gigabyte
The gigabyte is an SI-multiple of the unit byte for digital information storage. The prefix giga means 109, therefore 1 gigabyte is ....

 (GB = 8,000,000,000 bits).

However, for technical reasons, the capacities of computer memories and some storage units are often multiples of some large power of two, such as 228 = 268,435,456 bytes. To avoid such unwieldy numbers, people have often misused the SI prefixes to mean the nearest power of two, e.g. using "kilo" for 210 = 1024, "mega" for 220 = 1,048,576, "giga" for 230 = 1,073,741,824, and so on. So, for example, a memory chip with capacity of 228 bytes would be referred to as a "256 megabyte chip". The table below illustrates these differences.
Symbol Prefix SI
SI prefix
An SI prefix is a name or associated symbol that precedes a basic unit of measure to form a decimal multiple or submultiple. The abbreviation SI is from the French language name Système International d’Unités...

 Meaning
Binary
Binary prefix
In computing, a binary prefix is a set of letters that precede a unit of digital quantity to indicate multiplication by a power of two....

 meaning
Size difference
k kilo 103   = 10001 210 = 10241 2.40%
M mega 106   = 10002 220 = 10242 4.86%
G giga 109   = 10003 230 = 10243 7.37%
T tera 1012 = 10004 240 = 10244 9.95%
P peta 1015 = 10005 250 = 10245 12.59%
E exa 1018 = 10006 260 = 10246 15.29%
Z zetta 1021 = 10007 270 = 10247 18.67%


In the past, uppercase "K" has been used instead of "k" to indicate 1024 instead of 1000. However, this usage was never consistently applied.

On the other hand, for external storage systems (such as optical disks), the SI prefixes were commonly used with their proper values (powers of 10). There have been many attempts to resolve the confusion by providing alternative notations for power-of-two multiples. In 1998 the International Electrotechnical Commission
International Electrotechnical Commission
The International Electrotechnical Commission is a not-for-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known as "electrotechnology"...

 (IEC) issued a standard for this purpose, namely a series of binary prefixes that use 1024 instead of 1000 as the main radix:
Symbol Prefix
Ki kibi, binary kilo 1 kibibyte
Kibibyte
A kibibyte is a unit of information or computer storage, established by the International Electrotechnical Commission in 2000. Its symbol is KiB...

 (KiB)
210 bytes 1024 B
Mi mebi, binary mega 1 mebibyte
Mebibyte
The mebibyte is a standards-based binary multiple of the byte, a unit of digital information storage. Mebibyte is abbreviated MiB....

 (MiB)
220 bytes 1024 KiB
Gi gibi, binary giga 1 gibibyte
Gibibyte
The gibibyte is a standards-based binary multiple of the byte, a unit of digital information storage...

 (GiB)
230 bytes 1024 MiB
Ti tebi, binary tera 1 tebibyte
Tebibyte
The tebibyte is a standards-based binary multiple of the byte, a unit of digital information storage...

 (TiB)
240 bytes 1024 GiB
Pi pebi, binary peta 1 pebibyte
Pebibyte
The pebibyte is a standards-based binary multiple of the byte, a unit of digital information storage...

 (PiB)
250 bytes 1024 TiB
Ei exbi, binary exa 1 exbibyte
Exbibyte
The exbibyte is a standards-based binary multiple of the byte, a unit of digital information storage...

 (EiB)
260 bytes 1024 PiB


The JEDEC
JEDEC
JEDEC Solid State Technology Association, formerly known as Joint Electron Device Engineering Council , is the semiconductor engineering standardization body of the Electronic Industries Alliance , a trade association that represents all areas of the electronics industry in the United States...

 however recommends uppercase K, M, G, and T for the binary powers 210, 220, 230, and 240.

Size examples

  • 90 bytes: enough to store a typical line of text from a book.
  • 512 bytes = ½ KiB: the typical sector
    Disk sector
    In the context of computer disk storage, a sector is a subdivision of a track on a magnetic disk or optical disc. Each sector stores a fixed amount of data...

     of a hard disk
    Hard disk
    A hard disk drive is a non-volatile storage device that stores digitally encoded data on rapidly rotating platters with magnetic surfaces. Strictly speaking, "drive" refers to the motorized mechanical aspect that is distinct from its medium, such as a tape drive and its tape, or a floppy disk...

    .
  • 1024 bytes = 1 KiB: the classical block
    Block (data storage)
    In computing , a block is a sequence of bytes or bits, having a nominal length . Data thus structured are said to be blocked. The process of putting data into blocks is called blocking. Blocking is used to facilitate the handling of the data-stream by the computer program receiving the data...

     size in UNIX
    Unix
    Unix is a computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

     filesystems.
  • 2048 bytes = 2 KiB: a CD-ROM
    CD-ROM
    CD-ROM is a pre-pressed compact disc that contains data accessible to, but not writable by, a computer for data storage and music playback, the 1985 “Yellow Book” standard developed by Sony and Philips adapted the format to hold any form of binary data. CD-ROMs are popularly used to...

     sector.
  • 4096 bytes = 4 KiB: a memory page in x86 (since Intel 80386
    Intel 80386
    The Intel 80386, also known as the i386, or just 386, was a 32-bit microprocessor introduced by Intel in 1985. The first versions had 275,000 transistors and were used as the central processing unit of many personal computers and workstations...

    ).
  • 4 kB: about one page of text from a novel
    Novel
    A novel is a long narrative in literary prose. The genre has historical roots both in the fields of the medieval and early modern romance and in the tradition of the novella. The latter supplied the present generic term in the late 18th century....

    .
  • 120 kB: the text of a typical pocket book.


Obsolete and unusual units


Several other units of information storage have been named.:
  • 1 bit: sniff.
  • 2 bits: crumb, quad, quarter, tayste, tydbit.
  • 5 bits: nickel, nyckle.
  • 6 bits: byte (in early IBM
    IBM
    International Business Machines Corporation, abbreviated IBM, is a multinational computer technology and IT consulting corporation headquartered in Armonk, Town of North Castle, New York, United States. The company is one of the few information technology companies with a continuous history dating...

     machines using BCD alphamerics).
  • 10 bits: deckle, dyme.
  • 16 bits: doublet, plate, playte, chomp, chawmp (on a 32-bit machine).
  • 18 bits: chomp, chawmp (on a 36-bit machine).
  • 32 bits: quadlet, dinner, dynner, gawble (on a 32-bit machine).
  • 48 bits: gobble, gawble (under circumstances that remain obscure).
  • 64 bits: octlet.


Most of these names are jargon
Jargon
Jargon is terminology which is especially defined in relationship to a specific activity, profession, or group. In other words, the term most often covers the language used by people who work in a particular area or who have a common interest...

, obsolete, or used only in very restricted contexts.

External links