TIS 620
Encyclopedia
Thai Industrial Standard 620-2533, commonly referred to as TIS-620, is the most common character set and character encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

 for the Thai language
Thai language
Thai , also known as Central Thai and Siamese, is the national and official language of Thailand and the native language of the Thai people, Thailand's dominant ethnic group. Thai is a member of the Tai group of the Tai–Kadai language family. Historical linguists have been unable to definitively...

. The standard is published by the Thai Industrial Standards Institute (TISI), an organ of the Ministry of Industry under the Royal Thai Government, and is the sole official standard for encoding Thai in Thailand
Thailand
Thailand , officially the Kingdom of Thailand , formerly known as Siam , is a country located at the centre of the Indochina peninsula and Southeast Asia. It is bordered to the north by Burma and Laos, to the east by Laos and Cambodia, to the south by the Gulf of Thailand and Malaysia, and to the...

. The descriptive name of the standard is "Standard for Thai Character Codes for Computers" (Thai: รหัสสำหรับอักขระไทยที่ใช้กับคอมพิวเตอร์). "2533" refers to year 2533 of the Buddhist Era (1990), the year the present version of the standard was published; a previous revision, TIS 620-2529 (1986), is now obsolete.

TIS-620 is the IANA
Internet Assigned Numbers Authority
The Internet Assigned Numbers Authority is the entity that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System , media types, and other Internet Protocol-related symbols and numbers...

 preferred charset name for TIS-620, and that charset name is used also for ISO/IEC 8859-11
ISO/IEC 8859-11
ISO/IEC 8859-11:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. It is informally referred to as Latin/Thai. It is nearly...

 (which adds a no-break space character at 0xA0, which is unassigned in TIS-620).
When the IANA name is used the codes are supplemented with the C0 and C1 control codes
C0 and C1 control codes
Most character encodings, in addition to representing printable characters, may also represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received...

 from ISO/IEC 6429.

Structure

TIS-620 is a conventionally structured Extended ASCII
Extended ASCII
The term extended ASCII describes eight-bit or larger character encodings that include the standard seven-bit ASCII characters as well as others...

 national character set that retains full compatibility with 7-bit ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 and uses the 8-bit range hex A1 to FB for encoding the Thai alphabet
Thai alphabet
Thai script , is used to write the Thai language and other, minority, languages in Thailand. It has forty-four consonants , fifteen vowel symbols that combine into at least twenty-eight vowel forms, and four tone marks ....

. Due to the complex combining nature of Thai vowels and diacritics, TIS-620 is intended for information interchange only, and an additional display engine is required to compose characters correctly.

Variants

A nearly identical version of TIS-620 has been adopted as ISO/IEC 8859-11
ISO/IEC 8859-11
ISO/IEC 8859-11:2001, Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. It is informally referred to as Latin/Thai. It is nearly...

 in 2001, the sole difference being that ISO/IEC 8859-11 defines hex A0 as a non-breaking space
Non-breaking space
In computer-based text processing and digital typesetting, a non-breaking space or no-break space is a variant of the space character that prevents an automatic line break at its position. In certain formats , it also prevents the “collapsing” of multiple consecutive whitespace characters into a...

, while TIS-620 leaves it undefined but reserved. (In practice, this small distinction is usually ignored.)

The ISO/IEC 8859-11 set has also been registered as ISO-IR-166 by Ecma International
Ecma International
Ecma International is an international, private non-profit standards organization for information and communication systems. It acquired its name in 1994, when the European Computer Manufacturers Association changed its name to reflect the organization's global reach and activities...

, but this variation adds explicit escape codes for signaling the beginning and end of Thai character sequences.

The TIS-620 character set ordering has been used essentially as is within Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 (ISO/IEC 10646) as well. Unicode's Thai range is U+0E01 through U+0E7F, and TIS-620 Thai characters can be converted to UTF-16 simply by prefixing each byte with 0E and subtracting hex A0 from the value.

Codepage layout

>
]]|125}}
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||

|}

In the table above, 20 is the regular SPACE character. Code values 00-1F, 7F, 80-9F, A0, DB-DE and FC-FF are not assigned to characters by TIS-620.

Code values D1, D4-DA, E7-EE are combining character
Combining character
In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks ....

s.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK