Cyrillic digraphs
Encyclopedia
The Cyrillic alphabet
Cyrillic alphabet
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...

 family contains a large number of specially treated two-letter combinations, or digraphs
Digraph (orthography)
A digraph or digram is a pair of characters used to write one phoneme or a sequence of phonemes that does not correspond to the normal values of the two characters combined...

, but few of these are used in Slavic languages
Slavic languages
The Slavic languages , a group of closely related languages of the Slavic peoples and a subgroup of Indo-European languages, have speakers in most of Eastern Europe, in much of the Balkans, in parts of Central Europe, and in the northern part of Asia.-Branches:Scholars traditionally divide Slavic...

. In a few alphabets, trigraphs
Trigraph (orthography)
A trigraph is a group of three letters used to represent a single sound or a combination of sounds that does not correspond to the written letters combined. For example, in the word schilling, the trigraph sch represents the voiceless postalveolar fricative , rather than the consonant cluster...

 and even the occasional tetragraph
Tetragraph
A tetragraph is a sequence of four letters used to represent a single sound , or a combination of sounds, that do not necessarily correspond to the individual values of the letters. In German, for example, the tetragraph tsch represents the sound of the English digraph ch...

 are used.

In early Cyrillic, the digraphs ⟨оу⟩ and ⟨оѵ⟩ were used for /u/. As with the equivalent digraph in Greek, they were reduced to a typographic ligature, ⟨ꙋ⟩, and are now written ⟨у⟩. The modern letters ⟨ы⟩ and ⟨ю⟩ started out as digraphs, ⟨ъі⟩ and ⟨іо⟩. In Church Slavonic printing practice, both historical and modern, ⟨оу⟩ (which is considered as a letter from the alphabet's point of view) is mostly treated as two individual characters, but ⟨ы⟩ is a single letter. For example, letter-spacing affects ⟨оу⟩ as if they were two individual letters, and never affects components of ⟨ы⟩. In a context of Old Slavonic language, ⟨шт⟩ is a digraph that can replace a letter ⟨щ⟩ and vice versa.

Modern Slavic languages written in the Cyrillic alphabet make little or no use of digraphs. There are only two true digraphs: ⟨дж⟩ for /dʐ/ (Belarusian, Bulgarian, Ukrainian) and ⟨дз⟩ for /dz/ (Belarusian, Ukrainian). Sometimes these digraphs are even considered as special letters of respective alphabets. In standard Russian, however, the letters in ⟨дж⟩ and ⟨дз⟩ are always pronounced separately. Digraph-like letter pairs include combinations of consonants with the soft sign ⟨ь⟩ (Serbian/Macedonian letters ⟨љ⟩ and ⟨њ⟩ are derived from ⟨ль⟩ and ⟨нь⟩), and ⟨жж⟩ or ⟨зж⟩ for the uncommon and optional Russian phoneme /ʑː/. Native descriptions of Cyrillic writing system often use the term "digraph" to combinations ⟨ьо⟩ and ⟨йо⟩ (Bulgarian, Ukrainian) as they both correspond to a single letter ⟨ё⟩ of Russian and Belarusian alphabets (⟨ьо⟩ is used for /ʲo/, and ⟨йо⟩ for /jo/).

Cyrillic only uses large numbers of digraphs when used to write non-Slavic languages, and in some languages such as Avar these are completely regular in formation.

Many Caucasian languages use ⟨ә⟩ (Abkhaz), ⟨у⟩ (Kabardian), or ⟨в⟩ (Avar) for labialization, for instance Abkhaz ⟨дә⟩ for /dʷ/ (sometimes [d͡b]), just as many of them, like Russian, use ⟨ь⟩ for palatalization
Palatalization
In linguistics, palatalization , also palatization, may refer to two different processes by which a sound, usually a consonant, comes to be produced with the tongue in a position in the mouth near the palate....

. Since such sequences are decomposable, regular forms will not be listed below. (In Abkhaz
Abkhaz language
Abkhaz is a Northwest Caucasian language spoken mainly by the Abkhaz people. It is the official language of Abkhazia where around 100,000 people speak it. Furthermore, it is spoken by thousands of members of the Abkhazian diaspora in Turkey, Georgia's autonomous republic of Adjara, Syria, Jordan...

, ⟨ә⟩ with sibilants is equivalent to ⟨ьә⟩, for instance ж /ʐ/, жь /ʒ/~/ʐʲ/, жә /ʒʷ/~/ʐʲʷ/, but this is predictable phonetic detail.) Similarly, long vowels written double in some languages, such as ⟨аа⟩ for Abkhaz /aː/ or ⟨аюу⟩ for Kirghiz /ajuː/ "bear", or with glottal stop, as Tajik
Tajik language
Tajik, Tajik Persian, or Tajiki, is a variety of modern Persian spoken in Central Asia. Historically Tajiks called their language zabani farsī , meaning Persian language in English; the term zabani tajikī, or Tajik language, was introduced in the 20th century by the Soviets...

 аъ [aʔ~aː], are not included.

Archi

Archi
Archi language
Archi is a Northeast Caucasian language spoken by the 1,200 Archis in the village of Archib, southern Dagestan, Russia and the six surrounding smaller villages...

: а́а [áː], аӀ [aˤ], а́Ӏ [áˤ], ааӀ, гв [ɡʷ], гь [h], гъ [ʁ], гъв [ʁʷ], гъӀ [ʁˤ], гъӀв [ʁʷˤ], гӀ [ʕ], е́е [éː], еӀ [eˤ], е́Ӏ [éˤ], жв [ʒʷ], зв [], и́и [íː], иӀ [iˤ], кк [], кв [kʷ], ккв [], кӀ [kʼ], кӀв [kʷʼ], къ [qʼ], къв [], ккъ [], къӀ [qˤʼ], ккъӀ [qːˤʼ], къӀв [qʷˤʼ], ккъӀв [], кь [kʟ̥ʼ], кьв [kʟ̥ʷʼ], лъ [ɬ], ллъ [ɬː], лъв [ɬʷ], ллъв [ɬːʷ], лӀ [kʟ̥], лӀв [kʟ̥ʷ], о́о [óː], оӀ [oˤ], о́Ӏ [óˤ], ооӀ [], пп [], пӀ [pʼ], сс [sː], св [], тт [], тӀ [tʼ], тв [tʷ], твӀ [], у́у [úː], уӀ [uˤ], у́Ӏ [úˤ], хх [χː], хв [χʷ], ххв [χːʷ], хӀ [ħ], хьӀ [χˤ], ххьӀ [χːˤ], хьӀв [χʷˤ], ххьӀв [χːʷˤ], хъ [q], хъв [], хъӀ [qˤ], хъӀв [qʷˤ], цв [], цӀ [tsʼ], ццӀ [], чв [], чӀ [tʃʼ], чӀв [], шв [ʃʷ], щв [ʃːʷ], ээ [], эӀ []

Avar

Avar
Avar language
The modern Avar language belongs to the Avar–Andic group of the Northeast Caucasian language family....

 uses ⟨в⟩ for labialization, as in хьв /xʷ/. Other digraphs are:
  • Ejective consonant
    Ejective consonant
    In phonetics, ejective consonants are voiceless consonants that are pronounced with simultaneous closure of the glottis. In the phonology of a particular language, ejectives may contrast with aspirated or tenuis consonants...

    s in ⟨Ӏ⟩: кӀ /kʼ/, цӀ /tsʼ/, чӀ /tʃʼ/
  • Other consonants based on к /k/: къ /qʼː/, кь /tɬʼː/,
  • Based on г /ɡ/: гъ /ʁ/, гь /h/, гӀ /ʕ/
  • Based on л /l/: лъ /tɬː/
  • Based on х /χ/: хъ /qː/, хь /x/, хӀ /ħ/


The ь digraphs are spelled this way even before vowels, as in /habuna/ "made", not *гябуна.
  • Gemination: кк /kː/, кӀкӀ /kʼː/, хх /χː/, цц /tsː/, цӀцӀ /tsʼː/, чӀчӀ /tʃʼː/.

Note that three of these are tetragraphs. However, gemination for the 'strong' consonants in Avar orthography is sporadic, and the simple letters or digraphs are frequently used in their place.

Chechen and Ingush

Chechen
Chechen language
The Chechen language is spoken by more than 1.5 million people, mostly in Chechnya and by Chechen people elsewhere. It is a member of the Northeast Caucasian languages.-Classification:...

 uses the following digraphs:
  • Vowels: аь /æ/, яь /jæ/, оь /ø/, ёь /jø/, уь /y/, юь /jy/
  • Ejectives in ⟨Ӏ⟩: кӀ /kʼ/, пӀ /pʼ/, тӀ /tʼ/, цӀ /tsʼ/, чӀ /tʃʼ/
  • Other consonants: гӀ /ɣ/, кх /q/, къ /qʼ/, хь /ħ/, хӀ /h/
  • The trigraph рхӀ /r̥/


In Ingush
Ingush language
Ingush is a language spoken by about 413,000 people , known as the Ingush, across a region covering Ingushetia, Chechnya, Kazakhstan and Russia. In Ingush, the language is called ГІалгІай Ğalğaj .-Classification:...

, there are no ejectives, so for example кӀ is pronounced /k/. Some of the other values are also different: аь /æ/ etc., уь /ɨ/ etc., кх /qχ/ (vs. къ /q/), хь /ç/

The vowel digraphs are used for front vowels for other Dagestanian languages and also the local Turkic languages Kumyk
Kumyk language
Kumyk is a Turkic language, spoken by about 365,000 speakers in the Dagestan republic of Russian Federation....

 and Nogay. ⟨Ӏ⟩ digraphs for ejectives is common across the North Caucasus, as is гӀ for /ɣ~ʁ~ʕ/.

Kabardian

Kabardian
Kabardian language
The Kabardian language, also known as East Circassian , is a Northwest Caucasian language, closely related to the Adyghe language. It is spoken mainly in the Russian republics of Kabardino-Balkaria and Karachay-Cherkessia and in Turkey and the Middle East...

 uses ⟨у⟩ for labialization, as Ӏу /ʔʷ/. гу is /ɡʷ/, though г is /ɣ/); ку is /kʷ/, despite the fact that к is not used outside loan words. Other digraphs are:
  • Slavic дж /ɡʲ/, дз /dz/
  • Ejectives in ⟨Ӏ⟩: кӀ /kʲʼ/ (but кӀу is /kʷʼ/), лӀ /ɬʼ/, пӀ /pʼ/, тӀ /tʼ/, фӀ /fʼ/, цӀ /tsʼ/, щӀ /ɕʼ/
  • Other consonants: гъ /ʁ/, жь /ʑ/, къ /qʼ/, лъ /ɬ/ (from л /ɮ/), хь /ħ/, хъ /χ/
  • The trigraph кхъ /q/

Labialized, the trigraph becomes the unusual tetragraph кхъу /qʷ/.

Tabasaran

Tabasaran
Tabasaran language
Tabasaran is a Northeast Caucasian language of the Lezgic branch. It is spoken by the Tabasaran people in southern part of the Russian Republic of Dagestan. There are two main dialects: North and South Tabasaran. It has a literary language based on the Southern dialect, one of six in the Dagestan...

 uses gemination for its 'strong' consonants, but this has a different value with г.
  • Front vowels: аь /æ/, уь /y/
  • Gemination for 'strong' consonants: кк /kː/, пп /pː/, тт /tː/, цц /tsʰː/, чч /tʃʰː/
  • Ejectives with ⟨Ӏ⟩: кӀ /kʼ/, пӀ /pʼ/, тӀ /tʼ/, цӀ /tsʼ/, чӀ /tʃʼ/
  • Based on г /ɡ/: гг /ɣ/, гъ /ʕ/, гь /h/
  • Other consonants based on к /kʰ/: къ /qʰː/, кь /qʼ/,
  • Based on х /ɦ/: хъ /qʰ/, хь /x/


It uses ⟨в⟩ for labialization of its postalveolar consonant
Postalveolar consonant
Postalveolar consonants are consonants articulated with the tongue near or touching the back of the alveolar ridge, further back in the mouth than the alveolar consonants, which are at the ridge itself, but not as far back as the hard palate...

s: шв /ʃʷ/, жв /ʒʷ/, чв /tʃʰʷ/, джь /dʒʷ/, ь /tʃʼʷ/, ччь /tʃʷʰː/).

Tatar

Tatar
Tatar language
The Tatar language , or more specifically Kazan Tatar, is a Turkic language spoken by the Tatars of historical Kazan Khanate, including modern Tatarstan and Bashkiria...

 has a number of vowels which are written with ambiguous letters that are normally resolved by context, but which are resolved by discontinuous digraphs when context is not sufficient. These ambiguous vowel letters are е, front
Front vowel
A front vowel is a type of vowel sound used in some spoken languages. The defining characteristic of a front vowel is that the tongue is positioned as far in front as possible in the mouth without creating a constriction that would be classified as a consonant. Front vowels are sometimes also...

 /je/ or back
Back vowel
A back vowel is a type of vowel sound used in spoken languages. The defining characteristic of a back vowel is that the tongue is positioned as far back as possible in the mouth without creating a constriction that would be classified as a consonant. Back vowels are sometimes also called dark...

 /jɤ/, ю, front /jy/ or back /ju/; and я, front /jæ/ or back /ja/. They interact with the ambiguous consonant letters к, velar
Velar consonant
Velars are consonants articulated with the back part of the tongue against the soft palate, the back part of the roof of the mouth, known also as the velum)....

 /k/ or uvular
Uvular consonant
Uvulars are consonants articulated with the back of the tongue against or near the uvula, that is, further back in the mouth than velar consonants. Uvulars may be plosives, fricatives, nasal stops, trills, or approximants, though the IPA does not provide a separate symbol for the approximant, and...

 /q/, and г, velar /ɡ/ or uvular /ʁ/.

In general, velar consonants occur before front vowels and uvular consonants before back vowels, so it is frequently not necessary to specify these values in the orthography. However, this is not always the case. A uvular followed by a front vowel, as in /qærdæʃ/ "kinsman", for example, is written with the corresponding back vowel to specify the uvular value: кардәш. The front value of а is required by vowel harmony
Vowel harmony
Vowel harmony is a type of long-distance assimilatory phonological process involving vowels that occurs in some languages. In languages with vowel harmony, there are constraints on which vowels may be found near each other....

 with the following front vowel ә, so this spelling is unambiguous.

If, however, the proper value of the vowel is not recoverable by through vowel harmony, then the letter ь /ʔ/ is added at the end of the syllable, as in /ʃaʁir/ "poet". That is, /i/ is written with a ы rather than a и to show that the г is pronounced /ʁ/ rather than /ɡ/, then the ь is added to show that the ы is pronounced as if it were a и, so the discontinuous digraph ы...ь is used here to write the vowel /i/. This strategy is also followed with the ambiguous letters е, ю, and я in final syllables, for instance in /jyn/ cheap. That is, the discontinuous digraphs е...ь, ю...ь, я...ь are used for /j/ plus the front vowels /e, y, æ/.

Exceptional final-syllable velars and uvulars, however, are written with simple digraphs, with ь for velars and ъ for uvulars: /pak/ pure, /wæʁdæ/ promise.

Other alphabets

Dungan
Dungan language
The Dungan language is a Sinitic language spoken by the Dungan of Central Asia, an ethnic group related to the Hui people of China.-History:...

  • ан (ян) /(j)æ̃/, он /(j)aŋ/, эр /əɻ/, etc.


Mandarin Chinese
In the Cyrillization of Mandarin, there are digraphs цз and чж, which correspond to Pinyin z/j and zh. Final n is нь, while н stands for final ng. юй is yu, but ю you, ю- yu-, -уй -ui.

Karachay-Balkar
Karachay-Balkar language
The Karachay-Balkar language is a Turkic language spoken by the Karachays and Balkars. It is divided into two dialects: Karachay-Baksan-Chegem which pronounces two phonemes as and , and Balkar, which pronounces the corresponding phonemes as and .- Alphabet :Modern Karachay-Balkar Cyrillic...

  • гъ /ɣ/, дж /dʒ/~/dz/, къ /q/, нг /ŋ/. Нг /ŋ/ is also found in Uzbek
    Uzbek language
    Uzbek is a Turkic language and the official language of Uzbekistan. It has about 25.5 million native speakers, and it is spoken by the Uzbeks in Uzbekistan and elsewhere in Central Asia...

    .


Khanty
Khanty language
Khanty or Xanty language, also known previously as the Ostyak language, is a language of the Khant peoples. It is spoken in Khanty-Mansi and Yamalo-Nenets Autonomous okrugs, as well as in Aleksandrovsky and Kargosoksky districts of Tomsk Oblast in Russia...

  • л’ /ɬ/, ч’ ?


Ossetian
  • Slavic дж /dʒ/, дз /dz/
  • Ejectives in ⟨ъ⟩: къ /kʼ/, пъ /pʼ/, тъ /tʼ/, цъ /tsʼ/, чъ /tʃʼ/
  • гъ /ʁ/, хъ /q/


Komi
Komi language
The Komi language is a Finno-Permic language spoken by the Komi peoples in the northeastern European part of Russia. Komi is one of the two members of the Permic subgroup of the Finno-Ugric branch...

  • дж /dʒ/, дз /dz/, тш /tʃ/ (ч is /tsʲ/.)


Turkmen
Turkmen language
Turkmen is the national language of Turkmenistan...

  • Long үй /yː/, from ү /y/.


Yakut
  • дь /ɟ/, нь /ɲ/

See also

  • Languages using Cyrillic
    Languages using Cyrillic
    This is a list of languages that have been written in the Cyrillic script at one time or another. See also early Cyrillic alphabet.- Indo-European languages :* Indo-Iranian languages**Indo-Aryan languages...

  • List of Cyrillic letters
  • Bigram
    Bigram
    Bigrams or digrams are groups of two written letters, two syllables, or two words, and are very commonly used as the basis for simple statistical analysis of text. They are used in one of the most successful language models for speech recognition...

  • Diacritic
    Diacritic
    A diacritic is a glyph added to a letter, or basic glyph. The term derives from the Greek διακριτικός . Diacritic is both an adjective and a noun, whereas diacritical is only an adjective. Some diacritical marks, such as the acute and grave are often called accents...

  • Diphthong
    Diphthong
    A diphthong , also known as a gliding vowel, refers to two adjacent vowel sounds occurring within the same syllable. Technically, a diphthong is a vowel with two different targets: That is, the tongue moves during the pronunciation of the vowel...

  • Typographic ligature
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK