KOI character encodings
Encyclopedia
KOI is a family of several code page
Code page
Code page is another term for character encoding. It consists of a table of values that describes the character set for a particular language. The term code page originated from IBM's EBCDIC-based mainframe systems, but many vendors use this term including Microsoft, SAP, and Oracle Corporation...

s for the Cyrillic alphabet
Cyrillic alphabet
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...

.
The name stands for Kod Obmena Informatsiey which means "Code for Information Exchange".

A particular feature of the KOI code pages is that the text remains human-readable when the leftmost bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...

 is stripped, should it inadvertently pass through equipment or software that can only deal with 7 bit wide characters. This is due to characters being placed in a special order (128 codepoints apart from the Latin letter they look most similar to), which, however, does not correspond to the alphabetic order in either language that is written in Cyrillic and necessitates the use of lookup tables
Lookup table
In computer science, a lookup table is a data structure, usually an array or associative array, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than...

 to perform sorting
Sorting algorithm
In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order...

.

These encodings are derived from ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 on the base of some correspondence between Latin and Cyrillic (nearly phonetical), which was already used in Russian dialect of Morse code
Morse code
Morse code is a method of transmitting textual information as a series of on-off tones, lights, or clicks that can be directly understood by a skilled listener or observer without special equipment...

 and in MTK-2 telegraph code.

KOI8

Modern KOI code pages are 8-bit extensions of ASCII
Extended ASCII
The term extended ASCII describes eight-bit or larger character encodings that include the standard seven-bit ASCII characters as well as others...

.
This family of encodings is also known as KOI8, KOI 8 and KOI-8.

The family members are:
  • KOI8-R
    KOI8-R
    KOI8-R is an 8-bit character encoding, designed to cover Russian, which uses the Cyrillic alphabet. It also happens to cover Bulgarian, but is not used since CP1251 is accepted. A derivative encoding is KOI8-U, which adds Ukrainian characters...

     for Russian
    Russian language
    Russian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...

     and Bulgarian
    Bulgarian language
    Bulgarian is an Indo-European language, a member of the Slavic linguistic group.Bulgarian, along with the closely related Macedonian language, demonstrates several linguistic characteristics that set it apart from all other Slavic languages such as the elimination of case declension, the...

  • KOI8-U
    KOI8-U
    KOI8-U is an 8-bit character encoding, designed to cover Ukrainian, which uses the Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight graphic characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.In Microsoft Windows,...

     and KOI8-RU for Ukrainian
    Ukrainian language
    Ukrainian is a language of the East Slavic subgroup of the Slavic languages. It is the official state language of Ukraine. Written Ukrainian uses a variant of the Cyrillic alphabet....

     and Belorussian
  • KOI8-T for Tajik
    Tajik language
    Tajik, Tajik Persian, or Tajiki, is a variety of modern Persian spoken in Central Asia. Historically Tajiks called their language zabani farsī , meaning Persian language in English; the term zabani tajikī, or Tajik language, was introduced in the 20th century by the Soviets...

  • KOI8-CS for Czech
    Czech language
    Czech is a West Slavic language with about 12 million native speakers; it is the majority language in the Czech Republic and spoken by Czechs worldwide. The language was known as Bohemian in English until the late 19th century...

     and Slovak
    Slovak language
    Slovak , is an Indo-European language that belongs to the West Slavic languages .Slovak is the official language of Slovakia, where it is spoken by 5 million people...

     (ČSN (Czech technical standard) 369103. Devised by the Comecon
    Comecon
    The Council for Mutual Economic Assistance , 1949–1991, was an economic organisation under hegemony of Soviet Union comprising the countries of the Eastern Bloc along with a number of communist states elsewhere in the world...

    . This encoded Latin with diacritics
    Diacritics
    diacritics is a quarterly academic journal established in 1971 at Cornell University and published by the Johns Hopkins University Press. Articles serve to review recent literature in the field of literary criticism, and have covered topics in gender studies, political theory, psychoanalysis, queer...

    , as used in Czech and Slovak, rather than Cyrillic, but the basic idea was the same - text was ought to remain legible with the 8-th bit cleared, thus e.g. Č became C etc.)
  • KOI8-O for Old Russian

KOI7

There is also an obsolete 7-bit KOI7
KOI7
KOI7 is a 7-bit character encoding, designed to cover Russian, which uses the Cyrillic alphabet.In Russian, KOI7 stands for Kod Obmena Informatsiey, 7 bit which means "Code for Information Exchange, 7 bit"....

code page, which does not contain lowercase letters.
Codes of 31 Russian uppercase letters are just their KOI8 codes with most significant bit cleared. Other code points are the same as in ASCII.

External links

  • http://koi8.pp.ru/main.html
  • http://www.orwell.ru/info/cyrsoup
  • http://czyborra.com/charsets/cyrillic.html
  • http://www.iis.ru/cyrillic/resource/tables.en.html
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK