Comparison of Unicode encodings
Encyclopedia
This article compares Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 encodings. Two situations are considered: 8-bit-clean environments and environments that forbid use of byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

 values that have the high bit set. Originally such prohibitions were to allow for links that used only seven data bits, but they remain in the standards and so software must generate messages that comply with the restrictions. Standard Compression Scheme for Unicode
Standard Compression Scheme for Unicode
The Standard Compression Scheme for Unicode is a Unicode Technical Standard for reducing the number of bytes needed to represent Unicode text, especially if that text uses mostly characters from one or a small number of per-language character blocks. It does so by dynamically mapping values in the...

 and Binary Ordered Compression for Unicode
Binary Ordered Compression for Unicode
Binary Ordered Compression for Unicode is a MIME compatible Unicode compression scheme. BOCU-1 combines the wide applicability of UTF-8 with the compactness of Standard Compression Scheme for Unicode . This Unicode encoding is designed to be useful for compressing short strings, and maintains code...

 are excluded from the comparison tables because it is difficult to simply quantify their size.

Compatibility issues

A UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

 file that contains only ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 characters is identical to an ASCII file. Legacy programs can generally handle UTF-8 encoded files even if they contain non-ASCII characters. For instance the C printf
Printf
Printf format string refers to a control parameter used by a class of functions typically associated with some types of programming languages. The format string specifies a method for rendering an arbitrary number of varied data type parameter into a string...

 function can print a UTF-8 format string, as it only looks for the byte matching the ASCII '%' character and prints all other bytes unchanged, thus any UTF-8 (which never contains a '%' byte) will be copied unchanged to the output.

UTF-16 and UTF-32 are incompatible with ASCII files, and thus require Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

-aware programs to display, print and manipulate them, even if the file is known to contain only characters in the ASCII subset. Because they contain many zero bytes, the strings cannot be manipulated by normal null-terminated string
Null-terminated string
In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character...

 handling for even simple operations such as copy.

Therefore even most UTF-16 systems such as Windows and Java store text files such as program code with 8-bit encodings (ASCII, ISO-8859-1, or UTF-8), not UTF-16. One of the few counterexamples of a UTF-16 file is the "strings" file used by Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...

 (10.3 and later) applications for lookup of internationalized versions of messages, these default to UTF-16 and "files encoded using UTF-8 are not guaranteed to work. When in doubt, encode the file using UTF-16". This is because the default string class in Mac OS X (NSString) stores characters in UTF-16.

XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 is by default encoded as UTF-8, and all XML processors must at least support UTF-8 (including US-ASCII by definition) and UTF-16.

Size issues

UTF-32/UCS-4
UTF-32/UCS-4
UTF-32 is a protocol to encode Unicode characters that uses exactly 32 bits per Unicode code point. All other Unicode transformation formats use variable-length encodings. The UTF-32 form of a character is a direct representation of its codepoint.The main advantage of UTF-32, versus variable...

 requires four bytes to encode any character. Since characters outside the basic multilingual plane (BMP)
Mapping of Unicode character planes
In the Unicode system, planes are groups of numerical values that point to specific characters. Unicode code points are logically divided into 17 planes, each with 65,536 code points. Planes are identified by the numbers 0 to 16decimal, which corresponds with the possible values 00-10hexadecimal...

 are typically rare, a document encoded in UTF-32 will often be nearly twice as large as its UTF-16/UCS-2–encoded equivalent because UTF-16 uses two bytes for the characters inside the BMP, or four bytes otherwise.

UTF-8 uses between one and four bytes to encode a character. It requires one byte for ASCII characters, making it half the space of UTF-16 for texts consisting only of ASCII. For other Latin characters and many non-Latin scripts it requires two bytes, the same as UTF-16. Only a few frequently used Western characters in the range U+0800 to U+FFFF, such as the € sign
Euro sign
The euro sign is the currency sign used for the euro, the official currency of the Eurozone in the European Union . The design was presented to the public by the European Commission on 12 December 1996. The international three-letter code for the euro is EUR...

 U+20AC, require three bytes in UTF-8. Characters outside of the BMP
Mapping of Unicode character planes
In the Unicode system, planes are groups of numerical values that point to specific characters. Unicode code points are logically divided into 17 planes, each with 65,536 code points. Planes are identified by the numbers 0 to 16decimal, which corresponds with the possible values 00-10hexadecimal...

 above U+FFFF need four bytes in UTF-8 and UTF-16.

The conservation of bytes in encoding files to a Unicode transformation format (UTF) depends on encoded code point
Code point
In character encoding terminology, a code point or code position is any of the numerical values that make up the code space . For example, ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises 1,114,112...

s, namely, blocks from which those code points are drawn. Say, it depends on the scripts
Alphabet
An alphabet is a standard set of letters—basic written symbols or graphemes—each of which represents a phoneme in a spoken language, either as it exists now or as it was in the past. There are other systems, such as logographies, in which each character represents a word, morpheme, or semantic...

 in use. For example, UTF-16 use less space than UTF-32 only for characters from BMP, which are though overwhelmingly most common of all Unicode. In the same way using characters predominantly from the UTF-8 scripts makes UTF-8 more space efficient than UTF-16. The UTF-8 scripts are those scripts where UTF-8 only requires fewer than three bytes per character (only one byte for the ASCII-equivalent Basic Latin block, digits and most punctuation marks) and include: Latin
Latin alphabet
The Latin alphabet, also called the Roman alphabet, is the most recognized alphabet used in the world today. It evolved from a western variety of the Greek alphabet called the Cumaean alphabet, which was adopted and modified by the Etruscans who ruled early Rome...

, Greek
Greek alphabet
The Greek alphabet is the script that has been used to write the Greek language since at least 730 BC . The alphabet in its classical and modern form consists of 24 letters ordered in sequence from alpha to omega...

, Cyrillic
Cyrillic alphabet
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...

, Armenian
Armenian alphabet
The Armenian alphabet is an alphabet that has been used to write the Armenian language since the year 405 or 406. It was devised by Saint Mesrop Mashtots, an Armenian linguist and ecclesiastical leader, and contained originally 36 letters. Two more letters, օ and ֆ, were added in the Middle Ages...

, Hebrew
Hebrew alphabet
The Hebrew alphabet , known variously by scholars as the Jewish script, square script, block script, or more historically, the Assyrian script, is used in the writing of the Hebrew language, as well as other Jewish languages, most notably Yiddish, Ladino, and Judeo-Arabic. There have been two...

, Arabic
Arabic alphabet
The Arabic alphabet or Arabic abjad is the Arabic script as it is codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually stand for consonants, it is classified as an abjad.-Consonants:The Arabic alphabet has...

, Syriac
Syriac alphabet
The Syriac alphabet is a writing system primarily used to write the Syriac language from around the 2nd century BC . It is one of the Semitic abjads directly descending from the Aramaic alphabet and shares similarities with the Phoenician, Hebrew, Arabic, and the traditional Mongolian alphabets.-...

, Thaana, N'Ko
N'Ko
N'Ko is both a script devised by Solomana Kante in 1949 as a writing system for the Mande languages of West Africa, and the name of the literary language itself written in the script. The term N'Ko means 'I say' in all Manding languages....

, and the IPA
International Phonetic Alphabet
The International Phonetic Alphabet "The acronym 'IPA' strictly refers [...] to the 'International Phonetic Association'. But it is now such a common practice to use the acronym also to refer to the alphabet itself that resistance seems pedantic...

 and other Latin-based phonetic alphabets.

All printable characters in UTF-EBCDIC
UTF-EBCDIC
UTF-EBCDIC is a character encoding used to represent Unicode characters. It is meant to be EBCDIC-friendly, so that legacy EBCDIC applications on mainframes may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar to UTF-8's advantages for...

 use at least as many bytes as in UTF-8, and most use more, due to a decision made to allow encoding the C1 control codes as single bytes.

For seven-bit environments, UTF-7
UTF-7
UTF-7 is a variable-length character encoding that was proposed for representing Unicode text using a stream of ASCII characters...

 is more space efficient than the combination of other Unicode encodings with quoted-printable
Quoted-printable
Quoted-printable, or QP encoding, is an encoding using printable ASCII characters to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean...

 or base64
Base64
Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation...

 for almost all types of text (see "Seven bit environments" below).

Processing issues

For processing, a format should be easy to search, truncate, and generally process safely. All normal Unicode encodings use some form of fixed size code unit. Depending on the format and the code point to be encoded, one or more of these code units will represent a Unicode code point
Code point
In character encoding terminology, a code point or code position is any of the numerical values that make up the code space . For example, ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises 1,114,112...

. To allow easy searching and truncation, a sequence must not occur within a longer sequence or across the boundary of two other sequences. UTF-8, UTF-16, UTF-32 and UTF-EBCDIC have these important properties but UTF-7
UTF-7
UTF-7 is a variable-length character encoding that was proposed for representing Unicode text using a stream of ASCII characters...

 and GB 18030
GB 18030
GB18030 is a Chinese government standard describing the required language and character support necessary for software in China. In addition to the "GB18030 code page" this standard contains requirements about which scripts must be supported, font support, etc....

 do not.

Fixed-size characters can be helpful, but even if there is a fixed byte count per code point (as in UTF-32), there is not a fixed byte count per displayed character due to combining character
Combining character
In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks ....

s. If you are working with a particular API
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...

 heavily and that API has standardised on a particular Unicode encoding, it is generally a good idea to use the encoding that the API does to avoid the need to convert before every call to the API. Similarly if you are writing server-side software, it may simplify matters to use the same format for processing that you are communicating in.

UTF-16 is popular because many APIs date to the time when Unicode was 16-bit fixed width. However, using UTF-16 makes characters outside the Basic Multilingual Plane
Mapping of Unicode character planes
In the Unicode system, planes are groups of numerical values that point to specific characters. Unicode code points are logically divided into 17 planes, each with 65,536 code points. Planes are identified by the numbers 0 to 16decimal, which corresponds with the possible values 00-10hexadecimal...

 a special case which increases the risk of oversights related to their handling. That said, programs that mishandle surrogate pairs probably also have problems with combining sequences, so using UTF-32 is unlikely to solve the more general problem of poor handling of multi-code-unit characters.

If any stored data is in UTF-8 (such as file contents or names), it is very difficult to write a system that uses UTF-16 or UTF-32 as an api. This is due to the often-overlooked fact that the byte array used by UTF-8 can physically contain invalid sequences. For instance it is impossible to fix an invalid UTF-8 filename using a UTF-16 api, as no possible UTF-16 string will translate to that invalid filename. The opposite is not true, it is trivial to translate invalid UTF-16 to a unique (though technically invalid) UTF-8 string, so a UTF-8 API can control both UTF-8 and UTF-16 files and names, making UTF-8 preferred in any such mixed environment. (An unfortunate but far more common "solution" used by UTF-16 systems is to interpret the UTF-8 as some other encoding such as cp1252 and ignore the mojibake
Mojibake
, from the Japanese 文字 "character" + 化け "change", is the occurrence of incorrect, unreadable characters shown when computer software fails to render text correctly according to its associated character encoding.-Causes:...

 for any non-ASCII data)

For communication and storage

UTF-16 and UTF-32 are not byte oriented, so a byte order must be selected when transmitting them over a byte-oriented network or storing them in a byte-oriented file. This may be achieved by standardising on a single byte order, by specifying the endianness
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...

 as part of external metadata (for example the MIME
MIME
Multipurpose Internet Mail Extensions is an Internet standard that extends the format of email to support:* Text in character sets other than ASCII* Non-text attachments* Message bodies with multiple parts...

 charset registry has distinct UTF-16BE and UTF-16LE registrations) or by using a byte-order mark at the start of the text. UTF-8 is byte-oriented and does not have this problem.

If the byte stream is subject to corruption
Data corruption
Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data...

 then some encodings recover better than others. UTF-8 and UTF-EBCDIC are best in this regard as they can always resynchronize at the start of the next code point, GB 18030 is unable to recover after a corrupt or missing byte until the next ASCII non-number. UTF-16 and UTF-32 will handle corrupt (altered) bytes by resynchronizing on the next good code point, but an odd number of lost or spurious byte (octet)
Octet (computing)
An octet is a unit of digital information in computing and telecommunications that consists of eight bits. The term is often used when the term byte might be ambiguous, as there is no standard for the size of the byte.-Overview:...

s will garble all following text.

In detail

The tables below list the number of bytes per code point for different Unicode ranges. Any additional comments needed are included in the table. The figures assume that overheads at the start and end of the block of text are negligible.


N.B. The tables below list numbers of bytes per code point, not per user visible "character" (or "grapheme cluster"). It can take multiple code points to describe a single grapheme cluster, so even in UTF-32, care must be taken when splitting or concatenating strings.

Eight-bit environments

Code range (hexadecimal) UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

 
UTF-16  UTF-32  UTF-EBCDIC
UTF-EBCDIC
UTF-EBCDIC is a character encoding used to represent Unicode characters. It is meant to be EBCDIC-friendly, so that legacy EBCDIC applications on mainframes may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar to UTF-8's advantages for...

 
GB 18030
GB 18030
GB18030 is a Chinese government standard describing the required language and character support necessary for software in China. In addition to the "GB18030 code page" this standard contains requirements about which scripts must be supported, font support, etc....

000000 – 00007F 1 2 4 1 1
000080 – 00009F 2 2 for characters inherited from
GB 2312
GB 2312
GB2312 is the registered internet name for a key official character set of the People's Republic of China, used for simplified Chinese characters...

/GBK
GBK
GBK is an extension of the GB2312 character set for simplified Chinese characters, used in the People's Republic of China.GB abbreviates Guojia Biaozhun , which means national standard in Chinese, while K stands for Extension...

 (e.g. most
Chinese characters) 4 for
everything else.
0000A0 – 0003FF 2
000400 – 0007FF 3
000800 – 003FFF 3
004000 – 00FFFF 4
010000 – 03FFFF 4 4 4
040000 – 10FFFF 5

Seven-bit environments

This table may not cover every special case and so should be used for estimation and comparison only. To accurately determine the size of text in an encoding, see the actual specifications.
Code range (hexadecimal) UTF-7 UTF-8 quoted-
printable
Quoted-printable
Quoted-printable, or QP encoding, is an encoding using printable ASCII characters to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean...

UTF-8 base64
Base64
Base64 is a group of similar encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation...

UTF-16 q.-p. UTF-16 base64 GB 18030 q.-p. GB 18030 base64
ASCII
graphic character
Graphic character
In ISO/IEC 646 and related standards including ISO 8859 and Unicode, a graphic character is any character intended to be written, printed, or otherwise displayed in a form that can be read by humans...

s
(except U+003D “=”)
1 for "direct characters" (depends on the encoder setting for some code points), 2 for U+002B “+”, otherwise same as for 000080 – 00FFFF 1 1⅓ 4 2⅔ 1 1⅓
00003D (equals sign) 3 6 3
ASCII
control character
Control character
In computing and telecommunication, a control character or non-printing character is a code point in a character set, that does not in itself represent a written symbol.It is in-band signaling in the context of character encoding....

s:
000000 – 00001F
and 00007F
as above, depending on directness 1 or 3 depending on directness 1 or 3 depending on directness
000080 – 0007FF 5 for an isolated case inside a run of single byte characters. For runs 2⅔ per character plus padding to make it a whole number of bytes plus two to start and finish the run 6 2⅔ 2–6 depending on if the byte values need to be escaped 4–6 for characters inherited from GB2312/GBK (e.g.
most Chinese characters) 8 for everything else.
2⅔ for characters inherited from GB2312/GBK (e.g.
most Chinese characters) 5⅓ for everything else.
000800 – 00FFFF 9 4
010000 – 10FFFF 8 for isolated case, 5⅓ per character plus padding to integer plus 2 for a run 12 5⅓ 8–12 depending on if the low bytes of the surrogates need to be escaped. 5⅓ 8 5⅓

Size of codes for UTF-16 do not differ for -LE and -BE versions of UTF-16.
The use of UTF-32 under quoted-printable is highly impratical, but if implemented, will result in 8–12 bytes per code point (about 10 bytes in average), namely for BMP, each code point will occupy exactly 6 bytes more than the same code in quoted-printable/UTF-16. Base64/UTF-32 gets 5⅓ bytes for any code point. Endianness also does not affect sizes for UTF-32.

An ASCII control character under quoted-printable or UTF-7 may be represented either directly or encoded (escaped). The need to escape a given control character depends on many circumstances, but newline
Newline
In computing, a newline, also known as a line break or end-of-line marker, is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the...

s in text data are usually coded directly.

Compression schemes

BOCU-1
Binary Ordered Compression for Unicode
Binary Ordered Compression for Unicode is a MIME compatible Unicode compression scheme. BOCU-1 combines the wide applicability of UTF-8 with the compactness of Standard Compression Scheme for Unicode . This Unicode encoding is designed to be useful for compressing short strings, and maintains code...

 and SCSU
Standard Compression Scheme for Unicode
The Standard Compression Scheme for Unicode is a Unicode Technical Standard for reducing the number of bytes needed to represent Unicode text, especially if that text uses mostly characters from one or a small number of per-language character blocks. It does so by dynamically mapping values in the...

 are two ways to compress Unicode data. Their encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

 relies on how frequently the text is used. Most runs of text use the same script; for example, Latin
Latin alphabet
The Latin alphabet, also called the Roman alphabet, is the most recognized alphabet used in the world today. It evolved from a western variety of the Greek alphabet called the Cumaean alphabet, which was adopted and modified by the Etruscans who ruled early Rome...

, Cyrillic
Cyrillic alphabet
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...

, Greek
Greek alphabet
The Greek alphabet is the script that has been used to write the Greek language since at least 730 BC . The alphabet in its classical and modern form consists of 24 letters ordered in sequence from alpha to omega...

 and so on. This normal use allows many runs of text to compress down to about 1 byte per code point. These stateful encodings make it more difficult to randomly access text at any position of a string.

These two compression schemes are not as efficient as other compression schemes, like zip
ZIP (file format)
Zip is a file format used for data compression and archiving. A zip file contains one or more files that have been compressed, to reduce file size, or stored as is...

 or bzip2
Bzip2
bzip2 is a free and open source implementation of the Burrows–Wheeler algorithm. It is developed and maintained by Julian Seward. Seward made the first public release of bzip2, version 0.15, in July 1996.-Compression efficiency:...

. Those general-purpose compression schemes can compress longer runs of bytes to just a few bytes. The SCSU
Standard Compression Scheme for Unicode
The Standard Compression Scheme for Unicode is a Unicode Technical Standard for reducing the number of bytes needed to represent Unicode text, especially if that text uses mostly characters from one or a small number of per-language character blocks. It does so by dynamically mapping values in the...

 and BOCU-1
Binary Ordered Compression for Unicode
Binary Ordered Compression for Unicode is a MIME compatible Unicode compression scheme. BOCU-1 combines the wide applicability of UTF-8 with the compactness of Standard Compression Scheme for Unicode . This Unicode encoding is designed to be useful for compressing short strings, and maintains code...

 compression schemes will not compress more than the theoretical 25% of text encoded as UTF-8, UTF-16 or UTF-32. Other general-purpose compression schemes can easily compress to 10% of original text size. The general purpose schemes require more complicated algorithms and longer chunks of text for a good compression ratio.

Unicode Technical Note #14 contains a more detailed comparison of compression schemes.

Historical: UTF-5 and UTF-6

Proposals have been made for a UTF-5 and UTF-6 for the internationalization of domain names
Internationalized domain name
An internationalized domain name is an Internet domain name that contains at least one label that is displayed in software applications, in whole or in part, in a language-specific script or alphabet, such as Arabic, Chinese, Russian, Hindi or the Latin alphabet-based characters with diacritics,...

 (IDN). The UTF-5 proposal used a base 32 encoding, where Punycode
Punycode
In computing, Punycode is an instance of a general encoding syntax by which a string of Unicode characters is transformed uniquely and reversibly into a smaller, restricted character set....

 is (among other things, and not exactly) a base 36
Base 36
Base 36 is a positional numeral system using 36 as the radix. The choice of 36 is convenient in that the digits can be represented using the Arabic numerals 0-9 and the Latin letters A-Z...

 encoding. explains the name UTF-5 for a code unit of 5 bits. The UTF-6 proposal added a running length encoding to UTF-5, here 6 simply stands for UTF-5 plus 1.
The IETF
Internet Engineering Task Force
The Internet Engineering Task Force develops and promotes Internet standards, cooperating closely with the W3C and ISO/IEC standards bodies and dealing in particular with standards of the TCP/IP and Internet protocol suite...

 IDN WG later adopted the more efficient Punycode
Punycode
In computing, Punycode is an instance of a general encoding syntax by which a string of Unicode characters is transformed uniquely and reversibly into a smaller, restricted character set....

 for this purpose.

Not being seriously pursued

UTF-1
UTF-1
UTF-1 is a way of transforming ISO 10646/Unicode into a stream of bytes. Due to the design it is not possible to resynchronise if decoding starts in the middle of a character and simple byte-oriented search routines cannot be reliably used with it. UTF-1 is also fairly slow due to its use of...

 never gained serious acceptance. UTF-8 is much more frequently used.

UTF-9 and UTF-18
UTF-9 and UTF-18
UTF-9 and UTF-18 were two April Fools' Day RFC joke specifications for encoding Unicode on systems where the nonet is a better fit for the native word size than the octet, such as the 36-bit PDP-10...

, despite being theoretically functional encodings, were not intended for practical use, mostly because systems using 9-bit bytes were largely extinct by the time they were designed.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK