Endianness - AbsoluteAstronomy.com

Computing

Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory (or, sometimes, as sent on a serial connection). Each sub-component in the representation has a unique degree of significance, like the place value of digits in a decimal number. These sub-components are typically 16- or 32-bit words, 8-bit byte

Byte

The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

s, or even bit

Bit

A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

s. Endianness is a difference in data representation at the hardware level and may or may not be transparent at higher levels, depending on factors such as the type of high level language used.

The most common cases refer to how bytes are ordered within a single 16-

16-bit

-16-bit architecture:The HP BPC, introduced in 1975, was the world's first 16-bit microprocessor. Prominent 16-bit processors include the PDP-11, Intel 8086, Intel 80286 and the WDC 65C816. The Intel 8088 was program-compatible with the Intel 8086, and was 16-bit in that its registers were 16...

, 32-

32-bit

The range of integer values that can be stored in 32 bits is 0 through 4,294,967,295. Hence, a processor with 32-bit memory addresses can directly access 4 GB of byte-addressable memory....

, or 64-bit word, and endianness is then the same as byte order. The usual contrast is whether the most significant or least significant byte is ordered first — i.e. at the lowest byte address — within the larger data item. A big-endian machine stores the most significant byte first, and a little-endian machine stores the least significant byte first. In these standard forms, the bytes remain ordered by significance. However, mixed forms are also possible where the ordering of bytes within a 16-bit word may differ from the ordering of 16-bit words within a 32-bit word, for instance. Although rare, such cases do exist and may sometimes be referred to as mixed-endian or middle-endian.

Endianness is important as a low-level attribute of a particular data format. For example, the order in which the two bytes of a UCS-2 character are stored in memory is of considerable importance in network programming where two computers with different byte orders may be communicating with each other. Failure to account for a varying endianness across architectures when writing code for mixed platforms leads to failures and bugs

Software bug

A software bug is the common term used to describe an error, flaw, mistake, failure, or fault in a computer program or system that produces an incorrect or unexpected result, or causes it to behave in unintended ways. Most bugs arise from mistakes and errors made by people in either a program's...

that can be difficult to detect.

Endian	First byte (lowest address)	Middle bytes	Last byte (highest address)	Notes
big	most significant	...	least significant	Similar to a number written on paper (in Arabic numerals Arabic numerals Arabic numerals or Hindu numerals or Hindu-Arabic numerals or Indo-Arabic numerals are the ten digits . They are descended from the Hindu-Arabic numeral system developed by Indian mathematicians, in which a sequence of digits such as "975" is read as a numeral... as used in most Western scripts)
little	least significant	...	most significant	Arithmetic calculation order (see carry propagation); similar to numerals in the Arabic script

Endianness and hardware

The full register width among different CPUs and other processor types varies widely (typically between 4 and 64 bits). The internal bit-, byte-, or word-ordering within such a register is normally not considered "endianness", despite the fact that some CPU instructions may address individual bits (or other parts) using various kinds of internal addressing schemes. The "endianness" only describes how the bits are organized as seen from the outside (i.e. when stored in memory). The fact that some assembly languages label bits in an unorthodox manner is also largely another matter (a few architectures/assemblers turn the conventional msb..lsb = D31..D0 the other way round, so that msb=D0).

Large integers are usually stored in memory as a sequence of smaller ones and obtained by simple concatenation. The simple forms are:

increasing numeric significance with increasing memory addresses (or increasing time), known as little-endian, and
decreasing numeric significance with increasing memory addresses (or increasing time), known as big-endian

Well-known processor architectures that use the little-endian format include x86 (including x86-64

X86-64

x86-64 is an extension of the x86 instruction set. It supports vastly larger virtual and physical address spaces than are possible on x86, thereby allowing programmers to conveniently work with much larger data sets. x86-64 also provides 64-bit general purpose registers and numerous other...

), 6502

MOS Technology 6502

The MOS Technology 6502 is an 8-bit microprocessor that was designed by Chuck Peddle and Bill Mensch for MOS Technology in 1975. When it was introduced, it was the least expensive full-featured microprocessor on the market by a considerable margin, costing less than one-sixth the price of...

(including 65802, 65C816), Z80 (including Z180, eZ80 etc.), MCS-48, 8051, DEC Alpha

DEC Alpha

Alpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations. Alpha was implemented in microprocessors...

, Altera

Altera

Altera Corporation is a Silicon Valley manufacturer of PLDs . The company offered its first programmable logic device in 1984. PLDs can be reprogrammed during the design cycle as well as in the field to perform multiple functions, and they support a fairly fast design process...

Nios, Atmel AVR

Atmel AVR

The AVR is a modified Harvard architecture 8-bit RISC single chip microcontroller which was developed by Atmel in 1996. The AVR was one of the first microcontroller families to use on-chip flash memory for program storage, as opposed to one-time programmable ROM, EPROM, or EEPROM used by other...

, SuperH

SuperH

SuperH is a 32-bit reduced instruction set computer instruction set architecture developed by Hitachi. It is implemented by microcontrollers and microprocessors for embedded systems....

, VAX

VAX

VAX was an instruction set architecture developed by Digital Equipment Corporation in the mid-1970s. A 32-bit complex instruction set computer ISA, it was designed to extend or replace DEC's various Programmed Data Processor ISAs...

, and, largely, PDP-11

PDP-11

The PDP-11 was a series of 16-bit minicomputers sold by Digital Equipment Corporation from 1970 into the 1990s, one of a succession of products in the PDP series. The PDP-11 replaced the PDP-8 in many real-time applications, although both product lines lived in parallel for more than 10 years...

.

Well-known processors that use the big-endian format include Motorola 6800

Motorola 6800

The 6800 was an 8-bit microprocessor designed and first manufactured by Motorola in 1974. The MC6800 microprocessor was part of the M6800 Microcomputer System that also included serial and parallel interface ICs, RAM, ROM and other support chips...

and 68k

68k

The Motorola 680x0/m68000/68000 is a family of 32-bit CISC microprocessors. During the 1980s and early 1990s, they were popular in personal computers and workstations and were the primary competitors of Intel's x86 microprocessors...

, Xilinx Microblaze

MicroBlaze

The MicroBlaze is a soft processor core designed for Xilinx FPGAs from Xilinx. As a soft-core processor, MicroBlaze is implemented entirely in the general-purpose memory and logic fabric of Xilinx FPGAs.-Overview:...

, IBM POWER

IBM POWER

POWER is a reduced instruction set computer instruction set architecture developed by IBM. The name is an acronym for Performance Optimization With Enhanced RISC....

, and System/360

System/360

The IBM System/360 was a mainframe computer system family first announced by IBM on April 7, 1964, and sold between 1964 and 1978. It was the first family of computers designed to cover the complete range of applications, from small to large, both commercial and scientific...

and its successors such as System/370

System/370

The IBM System/370 was a model range of IBM mainframes announced on June 30, 1970 as the successors to the System/360 family. The series maintained backward compatibility with the S/360, allowing an easy migration path for customers; this, plus improved performance, were the dominant themes of the...

, ESA/390

ESA/390

ESA/390 was introduced in September 1990 and is IBM's last 31-bit-address/32-bit-data mainframe computing design, copied by Amdahl, Hitachi, and Fujitsu among other competitors...

, and z/Architecture

Z/Architecture

z/Architecture, initially and briefly called ESA Modal Extensions , refers to IBM's 64-bit computing architecture for IBM mainframe computers. IBM introduced its first z/Architecture-based system, the zSeries Model 900, in late 2000. Later z/Architecture systems include the IBM z800, z990, z890,...

. The PDP-10

PDP-10

The PDP-10 was a mainframe computer family manufactured by Digital Equipment Corporation from the late 1960s on; the name stands for "Programmed Data Processor model 10". The first model was delivered in 1966...

also used big-endian addressing for byte-oriented instructions. SPARC

SPARC

SPARC is a RISC instruction set architecture developed by Sun Microsystems and introduced in mid-1987....

historically used big-endian until version 9, which is bi-endian just like the ARM architecture

ARM architecture

ARM is a 32-bit reduced instruction set computer instruction set architecture developed by ARM Holdings. It was named the Advanced RISC Machine, and before that, the Acorn RISC Machine. The ARM architecture is the most widely used 32-bit ISA in numbers produced...

, and the PowerPC

PowerPC

PowerPC is a RISC architecture created by the 1991 Apple–IBM–Motorola alliance, known as AIM...

and Power Architecture

Power Architecture

Power Architecture is a broad term to describe similar RISC instruction sets for microprocessors developed and manufactured by such companies as IBM, Freescale, AMCC, Tundra and P.A. Semi...

descendants of IBM POWER are also bi-endian (see below).

Serial protocols may also be regarded as either little or big-endian at the bit- and/or byte-levels (which may differ). Many serial interfaces, such as the ubiquitous USB, are little-endian at the bit-level. Physical standards like RS-232

RS-232

In telecommunications, RS-232 is the traditional name for a series of standards for serial binary single-ended data and control signals connecting between a DTE and a DCE . It is commonly used in computer serial ports...

, RS-422 and RS-485 are also typically used with UARTs that send the least significant bit first, such as in industrial instrumentation applications, lighting protocols (DMX512), and so on. The same could be said for digital current loop

Current loop

A current loop describes two different electrical signalling schemes.- Digital :For digital serial communications, a current loop is a communication interface that uses current instead of voltage for signaling...

signaling systems such as MIDI. There are also several serial formats where the most significant bit is normally sent first, such as I²C

I²C

I²C is a multi-master serial single-ended computer bus invented by Philips that is used to attach low-speed peripherals to a motherboard, embedded system, cellphone, or other electronic device. Since the mid 1990s, several competitors I²C ("i-squared cee" or "i-two cee"; Inter-Integrated Circuit;...

and the related SMBus. However, the bit order may often be reversed (or is "transparent") in the interface between the UART or communication controller and the host CPU

Central processing unit

The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...

or DMA

Direct memory access

Direct memory access is a feature of modern computers that allows certain hardware subsystems within the computer to access system memory independently of the central processing unit ....

controller (and/or system memory), especially in more complex systems and personal computers. These interfaces may be of any type and are often configurable.

Bi-endian hardware

Some architectures (including ARM

ARM architecture

, PowerPC

PowerPC

PowerPC is a RISC architecture created by the 1991 Apple–IBM–Motorola alliance, known as AIM...

, Alpha

DEC Alpha

, SPARC

SPARC

SPARC is a RISC instruction set architecture developed by Sun Microsystems and introduced in mid-1987....

V9, MIPS

MIPS architecture

MIPS is a reduced instruction set computer instruction set architecture developed by MIPS Technologies . The early MIPS architectures were 32-bit, and later versions were 64-bit...

, PA-RISC

PA-RISC

PA-RISC is an instruction set architecture developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer architecture, where the PA stands for Precision Architecture...

and IA-64) feature a setting which allows for switchable endianness in data segments, code segments or both. This feature can improve performance or simplify the logic of networking devices and software. The word bi-endian, when said of hardware, denotes the capability of the machine to compute or pass data in either endian format.

Many of these architectures can be switched via software to default to a specific endian format (usually done when the computer starts up); however, on some systems the default endianness is selected by hardware on the motherboard and cannot be changed via software (e.g., the Alpha, which runs only in big-endian mode on the Cray T3E

Cray T3E

The Cray T3E was Cray Research's second-generation massively parallel supercomputer architecture, launched in late November 1995. The first T3E was installed at the Pittsburgh Supercomputing Center in 1996. Like the previous Cray T3D, it was a fully distributed memory machine using a 3D torus...

).

Note that the term "bi-endian" refers primarily to how a processor treats data accesses. Instruction accesses (fetches of instruction words) on a given processor may still assume a fixed endianness, even if data accesses are fully bi-endian, though this is not always the case, such as on Intel's IA-64-based Itanium CPU, which allows both.

Note, too, that some nominally bi-endian CPUs require motherboard help to fully switch endianness. For instance, the 32-bit desktop-oriented PowerPC

PowerPC

PowerPC is a RISC architecture created by the 1991 Apple–IBM–Motorola alliance, known as AIM...

processors in little-endian mode act as little-endian from the point of view of the executing programs but they require the motherboard to perform a 64-bit swap across all 8 byte lanes to ensure that the little-endian view of things will apply to I/O

Input/output

In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world, possibly a human, or another information processing system. Inputs are the signals or data received by the system, and outputs are the signals or data sent from it...

devices. In the absence of this unusual motherboard hardware, device driver software must write to different addresses to undo the incomplete transformation and also must perform a normal byte swap.

Some CPUs, such as many PowerPC processors intended for embedded use, allow per-page choice of endianness.

Floating-point and endianness

Although the ubiquitous x86 of today use little-endian storage for all types of data (integer, floating point, BCD), there have been a few historical machines where floating point

Floating point

In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...

numbers were represented in big-endian form while integers were represented in little-endian form. Because there have been many floating point formats with no "network

Computer network

A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....

" standard representation for them, there is no formal standard for transferring floating point values between heterogeneous systems. It may therefore appear strange that the widespread IEEE 754 floating point standard does not specify endianness. Theoretically, this means that even standard IEEE floating point data written by one machine might not be readable by another. However, on modern standard computers (i.e. implementing IEEE 754), one may in practice safely assume that the endianness is the same for floating point numbers as for integers, making the conversion straight forward regardless of data type. (Small embedded system

Embedded system

An embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...

s using special floating point formats may be another matter however.)

Etymology

The term big-endian originally comes from Jonathan Swift

Jonathan Swift

Jonathan Swift was an Irish satirist, essayist, political pamphleteer , poet and cleric who became Dean of St...

's satirical novel Gulliver’s Travels
Gulliver's Travels
Travels into Several Remote Nations of the World, in Four Parts. By Lemuel Gulliver, First a Surgeon, and then a Captain of Several Ships, better known simply as Gulliver's Travels , is a novel by Anglo-Irish writer and clergyman Jonathan Swift that is both a satire on human nature and a parody of...

by way of Danny Cohen

Danny Cohen (engineer)

Danny Cohen is a member of the National Academy of Engineering and anIEEE Fellow . In 1993 Cohen received a USAF Meritorious Civilian Service Award....

in 1980. In 1726, Swift described tensions in Lilliput and Blefuscu

Lilliput and Blefuscu

Lilliput and Blefuscu are two fictional island nations that appear in the first part of the 1726 novel Gulliver's Travels by Jonathan Swift. The two islands are neighbors in the South Indian Ocean, separated by a channel eight hundred yards wide. Both are inhabited by tiny people who are about...

: whereas royal edict in Lilliput requires cracking open one's soft-boiled egg at the small end, inhabitants of the rival kingdom of Blefuscu crack theirs at the big end (giving them the moniker Big-endians). The terms little-endian and endianness have a similar intent.

"On Holy Wars and a Plea for Peace" by Danny Cohen ends with: "Swift's point is that the difference between breaking the egg
at the little-end and breaking it at the big-end is trivial. Therefore, he suggests, that everyone does it in
his own preferred way. We agree that the difference between sending eggs with the little- or the big-end first
is trivial, but we insist that everyone must do it in the same way, to avoid anarchy. Since the difference is trivial
we may choose either way, but a decision must be made."

History

The problem of dealing with data in different representations is sometimes termed the NUXI problem. This terminology alludes to the issue that a value represented by the byte-string "UNIX" on a big-endian system may be stored as "NUXI" on a PDP-11 middle-endian system; UNIX

Unix

Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

was one of the first systems to allow the same code to run on, and transfer data between, platforms with different internal representations.

An often-cited argument in favor of big-endian is that it is consistent with the ordering commonly used in natural languages. Spoken languages have a wide variety of organizations of numbers: the decimal number 92 is spoken in English as ninety-two, in German

German language

German is a West Germanic language, related to and classified alongside English and Dutch. With an estimated 90 – 98 million native speakers, German is one of the world's major languages and is the most widely-spoken first language in the European Union....

and Dutch

Dutch language

Dutch is a West Germanic language and the native language of the majority of the population of the Netherlands, Belgium, and Suriname, the three member states of the Dutch Language Union. Most speakers live in the European Union, where it is a first language for about 23 million and a second...

as two and ninety and in French

French language

French is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...

as four-twenty-twelve with a similar system in Danish

Danish language

Danish is a North Germanic language spoken by around six million people, principally in the country of Denmark. It is also spoken by 50,000 Germans of Danish ethnicity in the northern parts of Schleswig-Holstein, Germany, where it holds the status of minority language...

(two-and-four-and-a-half-times-twenty). However, numbers are written almost universally in the Hindu-Arabic numeral system

Hindu-Arabic numeral system

The Hindu–Arabic numeral system or Hindu numeral system is a positional decimal numeral system developed between the 1st and 5th centuries by Indian mathematicians, adopted by Persian and Arab mathematicians , and spread to the western world...

, in which the most significant digits are written first in languages written left-to-right, and last in languages written right-to-left.

Optimization

The little-endian system has the property that the same value can be read from memory at different lengths without using different addresses (even when alignment restrictions are imposed). For example, a 32-bit memory location with content 4A 00 00 00 can be read at the same address as either 8-bit

8-bit

The first widely adopted 8-bit microprocessor was the Intel 8080, being used in many hobbyist computers of the late 1970s and early 1980s, often running the CP/M operating system. The Zilog Z80 and the Motorola 6800 were also used in similar computers...

(value = 4A), 16-bit

16-bit

(004A), 24-bit

24-bit

Notable 24-bit machines include the ICT 1900 series and the Harris H series.The IBM System/360, announced in 1964, was a popular computer system with 24-bit addressing and 32-bit general registers and arithmetic...

(00004A), or 32-bit

32-bit

The range of integer values that can be stored in 32 bits is 0 through 4,294,967,295. Hence, a processor with 32-bit memory addresses can directly access 4 GB of byte-addressable memory....

(0000004A), all of which retain the same numeric value. Although this little-endian property is rarely used directly by high-level programmers, it is often employed by code optimizers as well as by assembly language

Assembly language

An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

programmers.

On the other hand, in some situations it may be useful to obtain an approximation of a multi-byte or multi-word value by reading only its most-significant portion instead of the complete representation; a big-endian processor may read such an approximation using the same base-address that would be used for the full value.

Calculation order

Little-endian representation simplifies hardware in processors that add multi-byte integral values a byte at a time, such as small-scale byte-addressable processors and microcontroller

Microcontroller

A microcontroller is a small computer on a single integrated circuit containing a processor core, memory, and programmable input/output peripherals. Program memory in the form of NOR flash or OTP ROM is also often included on chip, as well as a typically small amount of RAM...

s. As carry propagation must start at the least significant bit (and thus byte), multi-byte addition can then be carried out with a monotonic incrementing address sequence, a simple operation already present in hardware. On a big-endian processor, its addressing unit has to be told how big the addition is going to be so that it can hop forward to the least significant byte, then count back down towards the most significant. However, high performance processors usually perform these operations as a single operation, fetching multi-byte operands from memory in a single operation, so that the complexity of the hardware is not affected by the byte ordering.

Diagram for mapping registers to memory locations

Using this chart, one can map an access (or, for a concrete example: "write 32 bit value to address 0") from register to memory or from memory to register. To help in understanding that access, little and big endianness can be seen in the diagram as differing in their coordinate system's orientation. Big endianness's atomic units (in this example the atomic unit is the byte) and memory coordinate system increases in the diagram from left to right, while little endianness's units increase from right to left.

A simple reminder is "In Little Endian, The Least significant byte goes into the Lowest value slot". So in the above example, D, the least significant byte, goes into slot 0.

If you are writing in a western language the hex value 0x0a0b0c0d you are writing the bytes from left to right, you are implicitly writing Big-Endian style. 0x0a at 0, 0x0b at 1, 0x0c at 2, 0x0d at 3. On the other hand the output of memory is normally also printed out bytewise from left to right, first memory address 0, then memory address 1, then memory address 2, then memory address 3. So on a Big-Endian system when you write a 32-bit value (from a register) to an address in memory and after that output the memory, you "see what you have written" (because you are using the left to right coordinate system for the output of values in registers as well as the output of memory). However on a Little-Endian system the logical 0 address of a value in a register (for 8-bit, 16-bit and 32-bit) is the least significant byte, the one to the right. 0x0d at 0, 0x0c at 1, 0x0b at 2, 0x0a at 3. If you write a 32 bit register value to a memory location on a Little-Endian system and after that output the memory location (with growing addresses from left to right), then the output of the memory will appear reversed (byte-swapped). You have 2 choices now to synchronize the output of what you are seeing as values in registers and what you are seeing as memory: You can swap the output of the register values (0x0a0b0c0d => 0x0d0c0b0a) or you can swap the output of the memory (print from right to left). Because the values of registers are interpreted as numbers, which are, in western languages, written from left to right, it is natural to use the second approach, to display the memory from right to left. The above diagram does exactly that, when visualizing memory (when "thinking memory") on a Little-Endian system the memory should be seen growing to the left.

Examples of storing the value `0A0B0C0D_h` in memory

Note that hexadecimal
Hexadecimal
In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

notation is used.

To illustrate the notions this section provides example layouts of the 32-bit number 0A0B0C0D_h in the most common variants of endianness. There exist several digital processors that use other formats, but these two are the most common in general processors. That is true for typical embedded system

Embedded system

s as well as for general computer CPU(s). Most processors used in non CPU roles in typical computers (in storage units, peripherals etc.) also use one of these two basic formats, although not always 32-bit of course.

All the examples refer to the storage in memory of the value.

Atomic element size 8-bit, address increment 1-byte (octet)

increasing addresses →
`...`	`0A_h`	`0B_h`	`0C_h`	`0D_h`	`...`

The most significant byte (MSB) value, which is 0A_h in our example, is stored at the memory location with the lowest address, the next byte value in significance, 0B_h, is stored at the following memory location and so on. This is akin to Left-to-Right reading in hexadecimal order.

Atomic element size 16-bit

increasing addresses →
`...`	`0A0B_h`	`0C0D_h`	`...`

The most significant atomic element stores now the value 0A0B_h, followed by 0C0D_h.

Atomic element size 8-bit, address increment 1-byte (octet)

increasing addresses →
`...`	`0D_h`	`0C_h`	`0B_h`	`0A_h`	`...`

The least significant byte (LSB) value, 0D_h, is at the lowest address. The other bytes follow in increasing order of significance.

Atomic element size 16-bit

increasing addresses →
`...`	`0C0D_h`	`0A0B_h`	`...`

The least significant 16-bit unit stores the value 0C0D_h, immediately followed by 0A0B_h. Note that 0C0D_h and 0A0B_h represent integers, not bit layouts (see bit numbering

Bit numbering

In computing, bit numbering is the convention used to identify the bit positions in a binary number or a container for such a value...

Byte addresses increasing from right to left

Visualising memory addresses from left to right makes little-endian values appear backwards. If the addresses are written increasing towards the left instead, each individual little-endian value will appear forwards. However strings of values or characters appear reversed instead.

With 8-bit atomic elements:

← increasing addresses
`...`	`0A_h`	`0B_h`	`0C_h`	`0D_h`	`...`

The least significant byte (LSB) value, 0D_h, is at the lowest address. The other bytes follow in increasing order of significance.

With 16-bit atomic elements:

← increasing addresses
`...`	`0A0B_h`	`0C0D_h`	`...`

The least significant 16-bit unit stores the value 0C0D_h, immediately followed by 0A0B_h.

The display of text is reversed from the normal display of languages such as English that read from left to right. For example, the word "XRAY" displayed in this manner, with each character stored in an 8-bit atomic element:

← increasing addresses
`...`	`"Y"`	`"A"`	`"R"`	`"X"`	`...`

If pairs of characters are stored in 16-bit atomic elements (using 8 bits per character), it could look even stranger:

← increasing addresses
`...`	`"AY"`	`"XR"`	`...`

This conflict between the memory arrangements of binary data and text is intrinsic to the nature of the little-endian convention, but is a conflict only for languages written left-to-right, such as Indo-European languages

Indo-European languages

The Indo-European languages are a family of several hundred related languages and dialects, including most major current languages of Europe, the Iranian plateau, and South Asia and also historically predominant in Anatolia...

including English. For right-to-left languages such as Arabic and Hebrew, there is no conflict of text with binary, and the preferred display in both cases would be with addresses increasing to the left. (On the other hand, right-to-left languages have a complementary intrinsic conflict in the big-endian system.)

Middle-endian

Numerous other orderings, generically called middle-endian or mixed-endian, are possible. On the PDP-11

PDP-11

(16-bit little-endian) for example, the compiler stored 32-bit values with the 16-bit halves swapped from the expected little-endian order. This ordering is known as PDP-endian.

storage of a 32-bit word on a PDP-11

increasing addresses →
`...`	`0B_h`	`0A_h`	`0D_h`	`0C_h`	`...`

The ARM architecture

ARM architecture

can also produce this format when writing a 32-bit word to an address 2 bytes from a 32-bit word alignment

Data structure alignment

Data structure alignment is the way data is arranged and accessed in computer memory. It consists of two separate but related issues: data alignment and data structure padding. When a modern computer reads from or writes to a memory address, it will do this in word sized chunks...

Endianness in networking

Many IETF RFCs use the term network order; it simply describes the order of transmission for bits and bytes over the wire in network protocols. Among others the historic RFC 1700, corresponding to Internet standard

Internet standard

In computer network engineering, an Internet Standard is a normative specification of a technology or methodology applicable to the Internet. Internet Standards are created and published by the Internet Engineering Task Force .-Overview:...

STD 2, explains this big endian order.

The telephone network

Telephone network

A telephone network is a telecommunications network used for telephone calls between two or more parties.There are a number of different types of telephone network:...

, historically and presently, sends the most significant part first, the area code; doing so allows routing

Routing

Routing is the process of selecting paths in a network along which to send network traffic. Routing is performed for many kinds of networks, including the telephone network , electronic data networks , and transportation networks...

while a telephone number

Telephone number

A telephone number or phone number is a sequence of digits used to call from one telephone line to another in a public switched telephone network. When telephone numbers were invented, they were short — as few as one, two or three digits — and were given orally to a switchboard operator...

is being composed.

The Internet Protocol

Internet Protocol

The Internet Protocol is the principal communications protocol used for relaying datagrams across an internetwork using the Internet Protocol Suite...

defines big-endian as the standard network byte order used for all numeric values in the packet headers and by many higher level protocols and file formats that are designed for use over IP. The Berkeley sockets

Berkeley sockets

The Berkeley sockets application programming interface comprises a library for developing applications in the C programming language that perform inter-process communication, most commonly for communications across a computer network....

API

Application programming interface

An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...

defines a set of functions to convert 16-bit and 32-bit integers to and from network byte order: the htonl (host-to-network-long) and htons (host-to-network-short) functions convert 32-bit and 16-bit values respectively from machine (host) to network order; the ntohl and ntohs functions convert from network to host order. These functions may be a no-op on a big-endian system.

In CANopen

CANopen

CANopen is a communication protocol and device profile specification for embedded systems used in automation. In terms of the OSI model, CANopen implements the layers above and including the network layer. The CANopen standard consists of an addressing scheme, several small communication protocols...

multi-byte parameters are always sent least significant byte first (little endian).

While the lowest network protocols may deal with sub-byte formatting, all the layers above them usually consider the byte (mostly meant as octet
Octet (computing)
An octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as there is no standard for the size of the byte.-Overview:...

) as their atomic unit.

Endianness in files and byte swap

Endianness is a problem when a binary file created on a computer is read on another computer with different endianness. Some compiler

Compiler

A compiler is a computer program that transforms source code written in a programming language into another computer language...

s have built-in facilities to deal with data written in other formats. For example, the Intel Fortran

Fortran

Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

compiler supports the non-standard CONVERT specifier, so a file can be opened as

OPEN(unit,CONVERT='BIG_ENDIAN',...)

OPEN(unit,CONVERT='LITTLE_ENDIAN',...)

Some compilers have options to generate code that globally enables the conversion for all file IO operations. This allows one to reuse code on a system with the opposite endianness without having to modify the code itself. If the compiler does not support such conversion, the programmer needs to swap the bytes via ad hoc code.

Fortran sequential unformatted files created with one endianness usually cannot be read on a system using the other endianness because Fortran usually implements a record

Storage record

In computer science, a storage record is:* A group of related data, words, or fields treated as a meaningful unit; for instance, a Name, Address, and Telephone Number can be a "Personal Record"....

(defined as the data written by a single Fortran statement) as data preceded and succeeded by count fields, which are integers equal to the number of bytes in the data. An attempt to read such file on a system of the other endianness then results in a run-time error, because the count fields are incorrect. This problem can be avoided by writing out sequential binary files as opposed to sequential unformatted.

Unicode

Unicode

Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

text can optionally start with a byte order mark

Byte Order Mark

The byte order mark is a Unicode character used to signal the endianness of a text file or stream. Its code point is U+FEFF. BOM use is optional, and, if used, should appear at the start of the text stream...

(BOM) to signal the endianness of the file or stream. Its code point is U+FEFF. In UTF-32 for example, a Big Endian file should start with 0x00 00 FE FF. In a Little Endian file these bytes are reversed.

Application binary data formats, such as for example MATLAB

MATLAB

MATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...

.mat files, or the .BIL data format, used in topography, are usually endianness-independent. This is achieved by storing the data always in one fixed endianness, or carrying with the data a switch to indicate which endianness the data was written with. When reading the file, the application converts the endianness, transparently to the user.

This is the case of TIFF image files, which instructs in its header about endianness of their internal binary integers. If a file starts with the signature "MM" it means that integers are represented as big-endian while "II" means little-endian. Those signatures need a single 16 bit word each, and they are palindrome

Palindrome

A palindrome is a word, phrase, number, or other sequence of units that can be read the same way in either direction, with general allowances for adjustments to punctuation and word dividers....

s (that is, they read the same forwards and backwards), so they are endianness independent. "I" stands for Intel and "M" stands for Motorola

Motorola

Motorola, Inc. was an American multinational telecommunications company based in Schaumburg, Illinois, which was eventually divided into two independent public companies, Motorola Mobility and Motorola Solutions on January 4, 2011, after losing $4.3 billion from 2007 to 2009...

, the respective CPU providers of the IBM PC

IBM PC

The IBM Personal Computer, commonly known as the IBM PC, is the original version and progenitor of the IBM PC compatible hardware platform. It is IBM model number 5150, and was introduced on August 12, 1981...

compatibles and Apple Macintosh platforms in the 1980s. Intel CPUs are little-endian, while Motorola 680x0 CPUs are big-endian. This explicit signature allows a TIFF reader program to swap bytes if necessary when a given file was generated by a TIFF writer program running on a computer with a different endianness.

The LabVIEW

LabVIEW

LabVIEW is a system design platform and development environment for a visual programming language from National Instruments. LabVIEW provides engineers and scientists with the tools needed to create and deploy measurement and control systems.The graphical language is named "G"...

programming environment, though most commonly installed on Windows machines, was first developed on a Macintosh, and uses Big Endian format for its binary numbers, while most Windows programs use Little Endian format.
Note that since the required byte swap depends on the length of the variables stored in the file (two 2 byte integers require a different swap than one 4 byte integer), a general utility to convert endianness in binary files cannot exist.

"Bit endianness"

The terms bit endianness or bit-level endianness are seldom used when talking about the representation of a stored value, as they are only meaningful for the rare computer architectures where each individual bit has a unique address. They are used however to refer to the transmission order of bits over a serial medium. Most often that order is transparently managed by the hardware and is the bit-level analogue of little-endian (low-bit first), although protocols exist which require the opposite ordering (e.g. I²C

I²C

). In networking, the decision about the order of transmission of bits is made in the very bottom of the data link layer

Data link layer

The data link layer is layer 2 of the seven-layer OSI model of computer networking. It corresponds to, or is part of the link layer of the TCP/IP reference model....

of the OSI model

OSI model

The Open Systems Interconnection model is a product of the Open Systems Interconnection effort at the International Organization for Standardization. It is a prescription of characterizing and standardizing the functions of a communications system in terms of abstraction layers. Similar...

Other meanings

Some authors extend the usage of the word "endianness", and of related terms, to entities such as street addresses, date formats and others. Such usages—basically reducing endianness to a mere synonym of ordering of the parts—are non-standard usage (e.g., ISO 8601:2004

ISO 8601

ISO 8601 Data elements and interchange formats – Information interchange – Representation of dates and times is an international standard covering the exchange of date and time-related data. It was issued by the International Organization for Standardization and was first published in 1988...

talks about "descending order year-month-day", not about "big-endian format"), do not have widespread usage, and are generally (other than for date formats) employed in a metaphorical sense.

"Endianness" is sometimes used to describe the order of the components of a domain name, e.g. 'en.wikipedia.org' (the usual modern 'little-endian' form) versus the reverse-DNS

Reverse-DNS

The Reverse-DNS is a naming convention for the components, packages, and types used by a programming language, system or framework. A characteristic of reverse-DNS strings is that they are based on registered domain names, and are only reversed for sorting purposes...

'org.wikipedia.en' ('big-endian', used for naming components, packages, or types in computer systems, for example Java packages, Macintosh ".plist" files, etc.). URLs can be considered 'big-endian', even though the host part could be a 'little-endian' DNS name.

External links

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

Endianness and hardware

Bi-endian hardware

Floating-point and endianness

Etymology

History

Optimization

Calculation order

Diagram for mapping registers to memory locations

Examples of storing the value 0A0B0C0Dh in memory

Atomic element size 8-bit, address increment 1-byte (octet)

Atomic element size 16-bit

Atomic element size 8-bit, address increment 1-byte (octet)

Atomic element size 16-bit

Byte addresses increasing from right to left

Middle-endian

Endianness in networking

Endianness in files and byte swap

"Bit endianness"

Other meanings

Further reading

External links

Examples of storing the value `0A0B0C0D_h` in memory