KOI8-R - AbsoluteAstronomy.com

KOI8-R is an 8-bit character encoding

Character encoding

A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

, designed to cover Russian

Russian language

Russian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...

, which uses the Cyrillic alphabet. It also happens to cover Bulgarian

Bulgarian language

Bulgarian is an Indo-European language, a member of the Slavic linguistic group.Bulgarian, along with the closely related Macedonian language, demonstrates several linguistic characteristics that set it apart from all other Slavic languages such as the elimination of case declension, the...

, but is not used since CP1251 is accepted. A derivative encoding is KOI8-U

KOI8-U

KOI8-U is an 8-bit character encoding, designed to cover Ukrainian, which uses the Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight graphic characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.In Microsoft Windows,...

, which adds Ukrainian

Ukrainian language

Ukrainian is a language of the East Slavic subgroup of the Slavic languages. It is the official state language of Ukraine. Written Ukrainian uses a variant of the Cyrillic alphabet....

characters. The original KOI-8 encoding was designed by Soviet authorities in 1974.
KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is Windows-1251

Windows-1251

Windows-1251 is a popular 8-bit character encoding, designed to cover languages that use the Cyrillic alphabet such as Russian, Bulgarian, Serbian Cyrillic and other languages...

. The usage of these older code pages is being replaced with Unicode

Unicode

Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

as a more common way to represent Cyrillic together with other non-Latin languages.

In Microsoft Windows

Microsoft Windows

Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

, KOI8-R is assigned the code page number 20866. In IBM

IBM

International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

, KOI8-R is assigned code page 878.

In Russian, KOI8 stands for "" (Kod Obmena Informatsiey, 8 bit) which means "Code for Information Exchange, 8 bit".

The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the 8th bit is stripped, the text is partially readable in ASCII and may convert to syntactically correct KOI7

KOI7

KOI7 is a 7-bit character encoding, designed to cover Russian, which uses the Cyrillic alphabet.In Russian, KOI7 stands for Kod Obmena Informatsiey, 7 bit which means "Code for Information Exchange, 7 bit"....

. For instance, "Русский Текст" in KOI8-R becomes rUSSKIJ tEKST ("Russian Text") if the 8th bit is stripped; attempting to interpret the ASCII string rUSSKIJ tEKST as KOI7 yields "Русский Текст".

Codepage layout

]]|125}}
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||

|}

External links

RFC 1489
All about KOI8-R
Universal Cyrillic decoder, an online program that may help recovering Cyrillic texts
Cyrillic alphabet
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...

with broken KOI8-R or other character encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

s.
A brief history of Cyrillic encodings
IBM CDRA
- IBM codepage 878

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.