KOI8-R is an 8-bit
character encodingA character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...
, designed to cover
RussianRussian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...
, which uses the Cyrillic alphabet. It also happens to cover
BulgarianBulgarian is an Indo-European language, a member of the Slavic linguistic group.Bulgarian, along with the closely related Macedonian language, demonstrates several linguistic characteristics that set it apart from all other Slavic languages such as the elimination of case declension, the...
, but is not used since CP1251 is accepted. A derivative encoding is
KOI8-UKOI8-U is an 8-bit character encoding, designed to cover Ukrainian, which uses the Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight graphic characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.In Microsoft Windows,...
, which adds
UkrainianUkrainian is a language of the East Slavic subgroup of the Slavic languages. It is the official state language of Ukraine. Written Ukrainian uses a variant of the Cyrillic alphabet....
characters. The original KOI-8 encoding was designed by Soviet authorities in 1974.
KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is
Windows-1251Windows-1251 is a popular 8-bit character encoding, designed to cover languages that use the Cyrillic alphabet such as Russian, Bulgarian, Serbian Cyrillic and other languages...
. The usage of these older code pages is being replaced with
UnicodeUnicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
as a more common way to represent Cyrillic together with other non-Latin languages.
In
Microsoft WindowsMicrosoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
, KOI8-R is assigned the code page number 20866. In
IBMInternational Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
, KOI8-R is assigned code page 878.
In Russian, KOI8 stands for "" (
Kod Obmena Informatsiey, 8 bit) which means "Code for Information Exchange, 8 bit".
The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the 8th bit is stripped, the text is partially readable in ASCII and may convert to syntactically correct
KOI7KOI7 is a 7-bit character encoding, designed to cover Russian, which uses the Cyrillic alphabet.In Russian, KOI7 stands for Kod Obmena Informatsiey, 7 bit which means "Code for Information Exchange, 7 bit"....
. For instance, "Русский Текст" in KOI8-R becomes
rUSSKIJ tEKST ("Russian Text") if the 8th bit is stripped; attempting to interpret the ASCII string
rUSSKIJ tEKST as KOI7 yields "Русский Текст".
Codepage layout
>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
]]|125}}
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|}
External links
- RFC 1489
- All about KOI8-R
- Universal Cyrillic decoder, an online program that may help recovering Cyrillic texts
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...
with broken KOI8-R or other character encodingA character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...
s.
- A brief history of Cyrillic encodings
- IBM CDRA