HZ (character encoding)
Encyclopedia
The HZ character encoding is an encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

 of GB2312 that was formerly commonly used in email and USENET
Usenet
Usenet is a worldwide distributed Internet discussion system. It developed from the general purpose UUCP architecture of the same name.Duke University graduate students Tom Truscott and Jim Ellis conceived the idea in 1979 and it was established in 1980...

 postings. It was designed in 1989 by Fung Fung Lee (李楓峰) of Stanford University, and subsequently codified in 1995 into RFC 1843.

The HZ (short for Hanzi) encoding was invented to facilitate the use of Chinese characters through e-mail, which at that time only allowed 7-bit characters. Therefore, in lieu of standard ISO 2022 escape sequences (as in the case of ISO-2022-JP) or 8-bit characters (as in the case of EUC
Extended Unix Code
Extended Unix Code is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.The structure of EUC is based on the ISO-2022 standard, which specifies a way to represent character sets containing a maximum of 94 characters, or 8836 characters, or 830584 ...

), the HZ code uses only printable, 7-bit characters to represent Chinese characters.

It was also popular in USENET networks, which in the late 1980s and early 1990s, generally did not allow transmission of 8-bit characters or escape characters.

Structure and use

In the HZ encoding system, the character sequences "~{" and "~}" act as escape sequences; anything between them is interpreted as Chinese encoded in GB2312 (the most significant bits are ignored). Outside the escape sequences, characters are assumed to be ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

.

An example will help illustrate the relationship between GB2312, EUC-CN, and the HZ code:
Various forms of the GB2312 code (0xD2BB) for the character "一" (one)
Form Code With escape sequences Remarks
Kuten / Qūwèi / 区位 form 5027 Zone (ku/qū/区) 50, point (ten/wèi/位) 27
ISO 2022 form 5216 3B16 0E16 5216 3B16 0F16 50 + 32 = 82 = 5216
EUC-CN form D216 BB16 D216 BB16 5216 ∨ 8016 = D216
HZ form (standard) 5216 3B16 7E16 7B16 5216 3B16 7E16 7D16 Appears as ~{R;~} without HZ decoder
HZ form (alternate) D216 BB16 7E16 7B16 D216 BB16 7E16 7D16 EUC form acceptable to at least some decoders


HZ was originally designed to be used purely as a 7-bit code. However, when situations allow, the escape sequences "~{" and "~}" sometimes surround characters represented in EUC-CN; this alternative use allows Chinese to be readable either with the help of HZ decoder software, or with a system that understands EUC-CN.

Additionally, the specification defines that
  • the sequence "~~" is to be treated as encoding a single ASCII "~"
  • the character "~" followed by a newline is to be discarded.

However, not all HZ decoders follow these two rules.

HZ decoders

The first HZ decoder was written in 1989 by the code's inventor for the Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 operating system.

The hztty program, also for the Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 operating system, was also among the first and one of the most popular HZ decoders. It deviates from the specification in that it will display the escape sequences (i.e., "~{" and "~}"), and it does not treat "~~" and "~" followed by a newline specially. This was probably to allow software which assumes one character to occupy one screen position (on a text screen) to function correctly without modification.

Support on Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

came later, and a number of third-party "Chinese systems" support HZ. These systems may provide an option to hide the escape sequences.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK