In
computingComputing is usually defined as the activity of using and developing computer technology, computer hardware and software. It is the computer-specific part of information technology...
and
telecommunicationTelecommunication is transmission over a distance for the purpose of communication. In earlier times, this may have involved the use of smoke signals, drums, semaphore, flags or heliograph. In modern times, telecommunication typically involves the use of electronic devices such as the telephone,...
s, a
unit of information is the capacity some standard
dataThe term data means groups of information that represent the qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables...
storage system or communication channel, used to measure the capacities of other systems and channels. In
information theoryInformation theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E. Shannon to find fundamental limits on compressing and reliably storing and communicating data...
, units of information are also used to measure the
informationInformation as a concept has many meanings, from everyday usage to technical settings. The concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation.The English...
contents or
entropyEntropy is a concept of information maintaining great importance in physics, chemistry, and information theory...
of random variables.
The most common units are the
bitIn computing and telecommunications a bit is a basic unit of information storage and communication . It is the maximum amount of information that can be stored by a device or other physical system that can normally exist in only two distinct states...
(the capacity of a system that can be in only two states) and the
byteA byte is a unit of information storage representing the smallest addressable element for a given computer architecture. It often designates a sequence of bits whose length is determined by the architecture...
or
octetAn octet is a group consisting of eight elements. It has several specific meanings:* Octet , a musical ensemble consisting of eight instruments....
(equivalent to eight independent bits).
In
computingComputing is usually defined as the activity of using and developing computer technology, computer hardware and software. It is the computer-specific part of information technology...
and
telecommunicationTelecommunication is transmission over a distance for the purpose of communication. In earlier times, this may have involved the use of smoke signals, drums, semaphore, flags or heliograph. In modern times, telecommunication typically involves the use of electronic devices such as the telephone,...
s, a
unit of information is the capacity some standard
dataThe term data means groups of information that represent the qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables...
storage system or communication channel, used to measure the capacities of other systems and channels. In
information theoryInformation theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Historically, information theory was developed by Claude E. Shannon to find fundamental limits on compressing and reliably storing and communicating data...
, units of information are also used to measure the
informationInformation as a concept has many meanings, from everyday usage to technical settings. The concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation.The English...
contents or
entropyEntropy is a concept of information maintaining great importance in physics, chemistry, and information theory...
of random variables.
The most common units are the
bitIn computing and telecommunications a bit is a basic unit of information storage and communication . It is the maximum amount of information that can be stored by a device or other physical system that can normally exist in only two distinct states...
(the capacity of a system that can be in only two states) and the
byteA byte is a unit of information storage representing the smallest addressable element for a given computer architecture. It often designates a sequence of bits whose length is determined by the architecture...
or
octetAn octet is a group consisting of eight elements. It has several specific meanings:* Octet , a musical ensemble consisting of eight instruments....
(equivalent to eight independent bits). Larger units can be formed from these by the SI power-of-ten prefixes or the newer
IEC binary power prefixesIn computing, a binary prefix is a set of letters that precede a unit of digital quantity to indicate multiplication by a power of two....
.
Primary units
As observed by
HartleyRalph Vinton Lyon Hartley was an electronics researcher. He invented the Hartley oscillator and the Hartley transform, and contributed to the foundations of information theory.-Biography:...
in 1928, and further formalized by
ShannonClaude Elwood Shannon , an American electronic engineer and mathematician, is known as "the father of information theory".Shannon is famous for having founded information theory with one landmark paper published in 1948...
in 1945, the information that can be stored in a system is proportional to the
logarithmIn mathematics, the logarithm of a number to a given base is the power or exponent to which the base must be raised in order to produce the number....
log
b N of the number
N of possible states of that system. Changing the basis of the logarithm from
b to a different number
c has the effect of multiplying the value of the logarithm by a fixed constant, namely
- logc N = (logc b) logb N
Therefore, the choice of the basis
b determines the unit used to measure information. In particular, if
b is a positive integer, then the unit is the amount of information that can be stored in a system with
b possible states.
When
b is 2, the unit is the "bit" (a contraction of
binary dig
it). A system with 8 possible states, for example, can store up to log
28 = 3 bits of information. Other units that have been named include:
- Base b = 3: the unit is called "trit", and is equal to log2 3 (≈ 1.585) bits.
- Base b = 10: the unit is called "decimal digit
Digit may refer to:* Digit , one of several most distal parts of a limb* Phone number, slang as digit, as in "Let me get your digits so I can call you tonight."* Numerical digit, as used in mathematics or computer science...
", "Hartley", "banA ban, sometimes called a hartley or a dit , is a logarithmic unit which measures information or entropy, based on base 10 logarithms and powers of 10, rather than the powers of 2 and base 2 logarithms which define the bit. As a bit corresponds to a binary digit, so a ban is a decimal digit...
", "decit", or "dit", and is equal to log2 10 (≈ 3.322) bits.
- Base b = e, the base of natural logarithms: the unit is called a "nat
A nat is a logarithmic unit of information or entropy, based on natural logarithms and powers of e, rather than the powers of 2 and base 2 logarithms which define the bit. The nat is the natural unit for information entropy...
", "nit", or "nepit" (from NeperianJohn Napier of Merchistoun - also signed as Neper, Nepair - named Marvellous Merchiston, was a Scottish mathematician, physicist, astronomer/astrologer and 8th Laird of Merchistoun, son of Sir Archibald Napier of Merchiston. He is most remembered as the inventor of logarithms and Napier's bones,...
), and is worth log2 e (≈ 1.443) bits.
The trit, ban, and nat are rarely used to measure storage capacity; but the nat, in particular, is often used in information theory, because natural logarithms are sometimes easier to handle than logarithms in other bases.
Byte
Historically, a
byteA byte is a unit of information storage representing the smallest addressable element for a given computer architecture. It often designates a sequence of bits whose length is determined by the architecture...
was the number of bits used to encode a
characterIn computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....
of text in the computer, which depended on computer hardware architecture; but today it almost always means eight bits — that is, an
octetIn computing, an octet is a grouping of eight bits.Octet, with the only exception noted below, always refers to an entity having exactly eight bits. As such, it is often used where the term byte might be ambiguous. For that reason, computer networking standards almost exclusively use octet...
. A byte can represent 2
8 = 256 distinct values, such as the integers 0 to 255, or -128 to 127. The IEEE 1541-2002 standard specifies "B" (upper case) as the symbol for byte. Bytes, or multiples thereof, are almost always used to specify the sizes of computer files and the capacity of storage units. Most modern computers and peripheral devices are designed to manipulate data in whole bytes or groups of bytes, rather than individual bits.
Nybble
A group of four bits, or half a byte, is sometimes called a
nibbleIn computing, a nibble is a four-bit aggregation, or half an octet. As a nibble contains 4 bits, there are sixteen possible values, so a nibble corresponds to a single hexadecimal digit .A full byte is represented by two hexadecimal digits; therefore, it is common to display a byte...
or nybble. This units is most often used in the context of
hexadecimalIn mathematics and computer science, hexadecimal is a numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen.Its primary use is as a...
number representation, since a nybble can store precisely one hexadecimal digit.
Word, line, and page
Computers usually manipulate bits in groups of a fixed size, conventionally called words. The number of bits in a word is usually defined by the size of the registers in the computer's CPU, or by the number of data bits that are fetched from its main memory in a single operation. In the
IA-32IA-32 , often generically called x86, x86-32 or i386, is the instruction set architecture of Intel's most commercially successful microprocessors yet. It is a 32-bit extension, first implemented in the Intel 80386, of the earlier 16-bit Intel 8086, 80186 and 80286 processors and the common...
architecture more commonly known as x86-32, a word is 16 bits, but other past and current architectures use words with 8, 24, 32, 36, 51, 64, 80 bits or others.
Some machine instructions and computer number formats use two words (a "double word" or "dword"), or four words (a "quad word" or "quad").
Computer memory caches usually operate on blocks of memory that consist of several consecutive words. These units are customarily called "lines".
Virtual memoryVirtual memory is a computer system technique which gives an application program the impression that it has contiguous working memory , while in fact it may be physically fragmented and may even overflow on to disk storage. Systems that use this technique make programming of large applications...
systems partition the computer's main storage into even larger units, traditionally called "pages".
Systematic multiples
Terms for large quantities of bits can be formed using the standard range of SI prefixes for powers of 10, e.g.,
kiloKilo is a unit prefix in the International System of Units denoting 103 or 1000. For example:* one kilogram is 1000 grams* one kilometre is 1000 metres* one kilojoule is 1000 joules...
= 10
3 = 1000 (
kilobitA kilobit is an expression of grouped bits meaning 1,000 bits. Use of the term to denote a kibibit, although the most common use due to the nature of bits , is deprecated and contrary to international standard....
or kbit),
megamega is an SI prefix in the SI system of units denoting a factor of 106, 1,000,000 .For example, 1 MW = 1,000,000 watts = 1,000 kilowatts....
= 10
6 = 1,000,000 (
megabitA megabit is an SI-multiple of the unit of bit for digital information storage or transmission. The International Electrotechnical Commission's standard IEC 60027 specifies the symbol to be Mbit, but Mb is also in common use....
or Mbit) and
gigaGiga is a prefix in the SI system of units denoting 109, or 1,000,000,000. The Oxford English Dictionary reports the earliest written use of giga- in this sense to be in the Reports of the IUPAC 14th Conference in 1947: "The following prefixes to abbreviations for the names of units...
= 10
9 = 1,000,000,000 (
gigabitGigabit is a unit of digital information storage, with the symbol Gbit .1 gigabit = 109 = 1,000,000,000 bits...
or Gbit). These prefixes are more often used for multiples of bytes, as in
kilobyteThe kilobyte is a unit of digital information storage equal to either 1,000 bytes or 1,024 bytes , depending on context....
(kB = 8,000 bits),
megabyteThe megabyte is an SI-multiple of the unit byte for digital information storage or transmission and is equal to 10
6 bytes. However, due to historical usage in computer-related fields it is still often used to represent 2
20 bytes. In rare cases, it is used to mean...
(MB = 8,000,000 bits), and
gigabyteThe gigabyte is an SI-multiple of the unit byte for digital information storage. The prefix giga means 109, therefore 1 gigabyte is ....
(GB = 8,000,000,000 bits).
However, for technical reasons, the capacities of computer memories and some storage units are often multiples of some large power of two, such as 2
28 = 268,435,456 bytes. To avoid such unwieldy numbers, people have often misused the SI prefixes to mean the nearest power of two, e.g. using "kilo" for 2
10 = 1024, "mega" for 2
20 = 1,048,576, "giga" for 2
30 = 1,073,741,824, and so on. So, for example, a memory chip with capacity of 2
28 bytes would be referred to as a "256 megabyte chip". The table below illustrates these differences.
| Symbol |
Prefix |
SI An SI prefix is a name or associated symbol that precedes a basic unit of measure to form a decimal multiple or submultiple. The abbreviation SI is from the French language name Système International d’Unités... Meaning |
BinaryIn computing, a binary prefix is a set of letters that precede a unit of digital quantity to indicate multiplication by a power of two.... meaning |
Size difference |
| k |
kilo |
103 = 10001 |
210 = 10241 |
2.40% |
| M |
mega |
106 = 10002 |
220 = 10242 |
4.86% |
| G |
giga |
109 = 10003 |
230 = 10243 |
7.37% |
| T |
tera |
1012 = 10004 |
240 = 10244 |
9.95% |
| P |
peta |
1015 = 10005 |
250 = 10245 |
12.59% |
| E |
exa |
1018 = 10006 |
260 = 10246 |
15.29% |
| Z |
zetta |
1021 = 10007 |
270 = 10247 |
18.67% |
In the past, uppercase "K" has been used instead of "k" to indicate 1024 instead of 1000. However, this usage was never consistently applied.
On the other hand, for external storage systems (such as optical disks), the SI prefixes were commonly used with their proper values (powers of 10). There have been many attempts to resolve the confusion by providing alternative notations for power-of-two multiples. In 1998 the
International Electrotechnical CommissionThe International Electrotechnical Commission is a not-for-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known as "electrotechnology"...
(IEC) issued a standard for this purpose, namely a series of binary prefixes that use 1024 instead of 1000 as the main radix:
| Symbol |
Prefix |
|
| Ki |
kibi, binary kilo |
1 kibibyte A kibibyte is a unit of information or computer storage, established by the International Electrotechnical Commission in 2000. Its symbol is KiB... (KiB) |
210 bytes |
1024 B |
| Mi |
mebi, binary mega |
1 mebibyte The mebibyte is a standards-based binary multiple of the byte, a unit of digital information storage. Mebibyte is abbreviated MiB.... (MiB) |
220 bytes |
1024 KiB |
| Gi |
gibi, binary giga |
1 gibibyte The gibibyte is a standards-based binary multiple of the byte, a unit of digital information storage... (GiB) |
230 bytes |
1024 MiB |
| Ti |
tebi, binary tera |
1 tebibyte The tebibyte is a standards-based binary multiple of the byte, a unit of digital information storage... (TiB) |
240 bytes |
1024 GiB |
| Pi |
pebi, binary peta |
1 pebibyte The pebibyte is a standards-based binary multiple of the byte, a unit of digital information storage... (PiB) |
250 bytes |
1024 TiB |
| Ei |
exbi, binary exa |
1 exbibyte The exbibyte is a standards-based binary multiple of the byte, a unit of digital information storage... (EiB) |
260 bytes |
1024 PiB |
The
JEDECJEDEC Solid State Technology Association, formerly known as Joint Electron Device Engineering Council , is the semiconductor engineering standardization body of the Electronic Industries Alliance , a trade association that represents all areas of the electronics industry in the United States...
however recommends uppercase K, M, G, and T for the binary powers 2
10, 2
20, 2
30, and 2
40.
Size examples
- 90 bytes: enough to store a typical line of text from a book.
- 512 bytes = ½ KiB: the typical sector
In the context of computer disk storage, a sector is a subdivision of a track on a magnetic disk or optical disc. Each sector stores a fixed amount of data...
of a hard diskA hard disk drive is a non-volatile storage device that stores digitally encoded data on rapidly rotating platters with magnetic surfaces. Strictly speaking, "drive" refers to the motorized mechanical aspect that is distinct from its medium, such as a tape drive and its tape, or a floppy disk...
.
- 1024 bytes = 1 KiB: the classical block
In computing , a block is a sequence of bytes or bits, having a nominal length . Data thus structured are said to be blocked. The process of putting data into blocks is called blocking. Blocking is used to facilitate the handling of the data-stream by the computer program receiving the data...
size in UNIXUnix is a computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
filesystems.
- 2048 bytes = 2 KiB: a CD-ROM
CD-ROM is a pre-pressed compact disc that contains data accessible to, but not writable by, a computer for data storage and music playback, the 1985 “Yellow Book” standard developed by Sony and Philips adapted the format to hold any form of binary data. CD-ROMs are popularly used to...
sector.
- 4096 bytes = 4 KiB: a memory page in x86 (since Intel 80386
The Intel 80386, also known as the i386, or just 386, was a 32-bit microprocessor introduced by Intel in 1985. The first versions had 275,000 transistors and were used as the central processing unit of many personal computers and workstations...
).
- 4 kB: about one page of text from a novel
A novel is a long narrative in literary prose. The genre has historical roots both in the fields of the medieval and early modern romance and in the tradition of the novella. The latter supplied the present generic term in the late 18th century....
.
- 120 kB: the text of a typical pocket book.
Obsolete and unusual units
Several other units of information storage have been named.:
- 1 bit: sniff.
- 2 bits: crumb, quad, quarter, tayste, tydbit.
- 5 bits: nickel, nyckle.
- 6 bits: byte (in early IBM
International Business Machines Corporation, abbreviated IBM, is a multinational computer technology and IT consulting corporation headquartered in Armonk, Town of North Castle, New York, United States. The company is one of the few information technology companies with a continuous history dating...
machines using BCD alphamerics).
- 10 bits: deckle, dyme.
- 16 bits: doublet, plate, playte, chomp, chawmp (on a 32-bit machine).
- 18 bits: chomp, chawmp (on a 36-bit machine).
- 32 bits: quadlet, dinner, dynner, gawble (on a 32-bit machine).
- 48 bits: gobble, gawble (under circumstances that remain obscure).
- 64 bits: octlet.
Most of these names are
jargonJargon is terminology which is especially defined in relationship to a specific activity, profession, or group. In other words, the term most often covers the language used by people who work in a particular area or who have a common interest...
, obsolete, or used only in very restricted contexts.
External links