A cedilla also known as cedilha or cédille, is a hook ( ) added under certain letters as a
diacritical markA diacritic is a glyph added to a letter, or basic glyph. The term derives from the Greek διακριτικός . Diacritic is both an adjective and a noun, whereas diacritical is only an adjective. Some diacritical marks, such as the acute and grave are often called accents...
to modify their pronunciation.
Origin
The tail originated in Spain as the bottom half of a miniature
cursiveCursive, also known as joined-up writing, joint writing, or running writing, is any style of handwriting in which the symbols of the language are written in a simplified and/or flowing manner, generally for the purpose of making writing easier or faster...
zZ is the twenty-sixth and final letter of the basic modern Latin alphabet.-Name and pronunciation:In most dialects of English, the letter's name is zed , reflecting its derivation from the Greek zeta but in American English, its name is zee , deriving from a late 17th century English dialectal...
. The word "cedilla" is the
diminutiveIn language structure, a diminutive, or diminutive form , is a formation of a word used to convey a slight degree of the root meaning, smallness of the object or quality named, encapsulation, intimacy, or endearment...
of the
Old SpanishThe language known today as Spanish is derived from a dialect of spoken Latin that developed in the north-central part of the Iberian Peninsula in what is now northern Spain. Over the past 1,000 years, the language expanded south to the Mediterranean Sea, and was later transferred to the Spanish...
name for this letter,
cedaZeta is the sixth letter of the Greek alphabet. In the system of Greek numerals, it has a value of 7. It was derived from the Phoenician letter Zayin...
(zeta). Modern Spanish, however, no longer uses this diacritic, although it is still current in
PortuguesePortuguese is a Romance language that arose in the medieval Kingdom of Galicia, nowadays Galicia and Northern Portugal. The southern part of the Kingdom of Galicia became independent as the County of Portugal in 1095...
,
CatalanCatalan is a Romance language, the national and only official language of Andorra and a co-official language in the Spanish autonomous communities of Catalonia, the Balearic Islands and Valencian Community, where it is known as Valencian , as well as in the city of Alghero, on the Italian island...
,
Occitan, and
FrenchFrench is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...
, which gives
EnglishEnglish is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
the alternative spellings of
cedille, from
FrenchFrench is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...
"
cédille", and the
PortuguesePortuguese is a Romance language that arose in the medieval Kingdom of Galicia, nowadays Galicia and Northern Portugal. The southern part of the Kingdom of Galicia became independent as the County of Portugal in 1095...
form
cedilha. An obsolete spelling of
cedilla is
cerilla. The earliest use in English cited by the
Oxford English DictionaryThe Oxford English Dictionary , published by the Oxford University Press, is the self-styled premier dictionary of the English language. Two fully bound print editions of the OED have been published under its current name, in 1928 and 1989. The first edition was published in twelve volumes , and...
is a 1599 Spanish-English dictionary and grammar. Chambers'
Cyclopædia is cited for the printer-trade variant
ceceril in use in 1738. The main use in English is not universal and applies to loan words from
FrenchFrench is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...
and
PortuguesePortuguese is a Romance language that arose in the medieval Kingdom of Galicia, nowadays Galicia and Northern Portugal. The southern part of the Kingdom of Galicia became independent as the County of Portugal in 1095...
such as "
cachaçaCachaça is a liquor made from fermented sugarcane.It is the most popular distilled alcoholic beverage in Brazil. It is also known as aguardente, pinga, caninha and many other names...
", "
limaçonIn geometry, a limaçon or limacon , also known as a limaçon of Pascal, is defined as a roulette formed when a circle rolls around the outside of a circle of equal radius. It can also be defined as the roulette formed when a circle rolls around a circle with half its radius so that the smaller...
" and "
façadeA facade or façade is generally one exterior side of a building, usually, but not always, the front. The word comes from the French language, literally meaning "frontage" or "face"....
" (often typed "cachaca", "limacon" and "facade" due to lack of
Ç keys on the keyboards of most Anglophone countries).
Encodings
In Unicode, the symbol is .
Many precomposed characters are defined too, like:
Use of the cedilla with the letter C
The most frequent character with cedilla is "
çis a Latin script letter, used in the Albanian, Azerbaijani, Ligurian, Tatar, Turkish, Turkmen, Kurdish and Zazaki alphabets. This letter also appears in Catalan, French, Friulian, Occitan and Portuguese as a variant of the letter “c”...
" ("c" with cedilla, as in
façade). It was first used for the sound of the
voiceless alveolar affricateThe voiceless alveolar affricate is a type of consonantal sound, used in some spoken languages. The sound is transcribed in the International Phonetic Alphabet with ⟨⟩ or ⟨⟩ . The voiceless alveolar affricate occurs in such languages as German, Cantonese, Italian, Russian, Japanese and Mandarin...
/ts/ in old Spanish and stems from the
VisigothicVisigothic script was a type of medieval script that originated in the Visigothic kingdom in Hispania...
form of the letter "z" , whose upper loop was lengthened and reinterpreted as a "c", whereas its lower loop became the diminished appendage, the cedilla.
It represents the "soft" sound /s/ where a "c" would normally represent the "hard" sound /k/ (before "a", "o", "u", or at the end of a word) in English and in certain Romance languages such as
CatalanCatalan is a Romance language, the national and only official language of Andorra and a co-official language in the Spanish autonomous communities of Catalonia, the Balearic Islands and Valencian Community, where it is known as Valencian , as well as in the city of Alghero, on the Italian island...
,
FrenchFrench is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...
,
Occitan, and
PortuguesePortuguese is a Romance language that arose in the medieval Kingdom of Galicia, nowadays Galicia and Northern Portugal. The southern part of the Kingdom of Galicia became independent as the County of Portugal in 1095...
. In Occitan, Friulian and Catalan
ç can also be found at the beginning of a word (
Çubran,
ço) or at the end (
braç).
It represents the
voiceless postalveolar affricateThe voiceless palato-alveolar affricate or domed postalveolar affricate is a type of consonantal sound used in some spoken languages. The sound is transcribed in the International Phonetic Alphabet with ⟨⟩ or ⟨⟩...
/tʃ/ (as in English "church") in
AlbanianAlbanian is an Indo-European language spoken by approximately 7.6 million people, primarily in Albania and Kosovo but also in other areas of the Balkans in which there is an Albanian population, including western Macedonia, southern Montenegro, southern Serbia and northwestern Greece...
,
AzerbaijaniAzerbaijani or Azeri or Torki is a language belonging to the Turkic language family, spoken in southwestern Asia by the Azerbaijani people, primarily in Azerbaijan and northwestern Iran...
,
FriulianFriulan , is a Romance language belonging to the Rhaeto-Romance family, spoken in the Friuli region of northeastern Italy. Friulan has around 800,000 speakers, the vast majority of whom also speak Italian...
,
KurdishKurdish is a dialect continuum spoken by the Kurds in western Asia. It is part of the Iranian branch of the Indo-Iranian group of Indo-European languages....
,
TatarThe Tatar language , or more specifically Kazan Tatar, is a Turkic language spoken by the Tatars of historical Kazan Khanate, including modern Tatarstan and Bashkiria...
,
TurkishTurkish is a language spoken as a native language by over 83 million people worldwide, making it the most commonly spoken of the Turkic languages. Its speakers are located predominantly in Turkey and Northern Cyprus with smaller groups in Iraq, Greece, Bulgaria, the Republic of Macedonia, Kosovo,...
(as in
çiçek,
çam,
çekirdek,
ÇorumÇorum is a landlocked northern Anatolian city that is the capital of the Çorum Province of Turkey. Çorum is located inland in the central Black Sea Region of Turkey, and is approximately from Ankara and from Istanbul...
), and
TurkmenTurkmen is the national language of Turkmenistan...
. It is also sometimes used this way in
ManxManx , also known as Manx Gaelic, and as the Manks language, is a Goidelic language of the Indo-European language family, historically spoken by the Manx people. Only a small minority of the Island's population is fluent in the language, but a larger minority has some knowledge of it...
, to distinguish it from the
velar fricativeVelar fricative can refer to*voiced velar fricative: in the International Phonetic Alphabet*voiceless velar fricative: in the International Phonetic Alphabet...
.
In the
International Phonetic AlphabetThe International Phonetic Alphabet "The acronym 'IPA' strictly refers [...] to the 'International Phonetic Association'. But it is now such a common practice to use the acronym also to refer to the alphabet itself that resistance seems pedantic...
, /ç/ represents the
voiceless palatal fricativeThe voiceless palatal fricative is a type of consonantal sound used in some spoken languages. The symbol in the International Phonetic Alphabet that represents this sound is . The symbol ç is the letter c with a cedilla, as used to spell French words such as façade...
.
Use of the cedilla with the letter S
The character "ş" represents the
voiceless postalveolar fricativeThe voiceless palato-alveolar fricative or voiceless domed postalveolar fricative is a type of consonantal sound, used in many spoken languages, including English...
/ʃ/ (as in "
show") in several languages, including many belonging to the
Turkic languagesThe Turkic languages constitute a language family of at least thirty five languages, spoken by Turkic peoples across a vast area from Eastern Europe and the Mediterranean to Siberia and Western China, and are considered to be part of the proposed Altaic language family.Turkic languages are spoken...
:
- Turkish
Turkish is a language spoken as a native language by over 83 million people worldwide, making it the most commonly spoken of the Turkic languages. Its speakers are located predominantly in Turkey and Northern Cyprus with smaller groups in Iraq, Greece, Bulgaria, the Republic of Macedonia, Kosovo,...
(It is included as a separate letter (Ş) in the Turkish alphabetThe Turkish alphabet is a Latin alphabet used for writing the Turkish language, consisting of 29 letters, seven of which have been modified from their Latin originals for the phonetic requirements of the language. This alphabet represents modern Turkish pronunciation with a high degree of accuracy...
.)
- For example, it is used in Turkish words and names like Eskişehir
Eskişehir is a city in northwestern Turkey and the capital of the Eskişehir Province. According to the 2009 census, the population of the city is 631,905. The city is located on the banks of the Porsuk River, 792 m above sea level, where it overlooks the fertile Phrygian Valley. In the nearby...
, ŞımarıkŞımarık is a 1997 song by Turkish singer Tarkan. Its lyrics were written by Sezen Aksu, with music credited as composed by Tarkan, Aksu and Ozan Çolakoğlu at the time. However, Tarkan later admitted in a 2006 interview that this had been done without Aksu's consent, who was the true copyright owner...
, Hasan ŞaşHasan Gökhan Şaş is a former Turkish international football player, most commonly known for his time at Galatasaray.-Galatasaray:...
, Rüştü ReçberRüştü Reçber is a Turkish international footballer who currently plays as a goalkeeper for Beşiktaş J.K. in the Turkish Süper Lig.He has played a key role in the success of the Turkish national squad, and it was at the 2002 FIFA World Cup, where Turkey finished third, that he earned a selection to...
etc.
- Azerbaijani
Azerbaijani or Azeri or Torki is a language belonging to the Turkic language family, spoken in southwestern Asia by the Azerbaijani people, primarily in Azerbaijan and northwestern Iran...
- Crimean Tatar
The Crimean Tatar language is the language of the Crimean Tatars. It is a Turkic language spoken in Crimea, Central Asia , and the Crimean Tatar diasporas in Turkey, Romania, Bulgaria...
- Gagauz
The Gagauz language is a Turkic language, spoken by the Gagauz people, and the official language of Gagauzia, Moldova. There are two dialects, Bulgar Gagauzi and Maritime Gagauzi. This is a different language from Balkan Gagauz Turkish....
- Tatar
The Tatar language , or more specifically Kazan Tatar, is a Turkic language spoken by the Tatars of historical Kazan Khanate, including modern Tatarstan and Bashkiria...
- Turkmen
Turkmen is the national language of Turkmenistan...
- Romanian
Romanian Romanian Romanian (or Daco-Romanian; obsolete spellings Rumanian, Roumanian; self-designation: română, limba română ("the Romanian language") or românește (lit. "in Romanian") is a Romance language spoken by around 24 to 28 million people, primarily in Romania and Moldova...
(substitution use when S-commaS-comma is a letter which is part of the Romanian alphabet, used to represent the Romanian language sound , the voiceless postalveolar fricative ....
[Ș] was missing from pre-3.0 UnicodeUnicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
standards, and older standards, still frequent)
- Kurdish
Kurdish is a dialect continuum spoken by the Kurds in western Asia. It is part of the Iranian branch of the Indo-Iranian group of Indo-European languages....
It is also used in some
RomanizationIn linguistics, romanization or latinization is the representation of a written word or spoken speech with the Roman script, or a system for doing so, where the original word or language uses a different writing system . Methods of romanization include transliteration, for representing written...
s of
ArabicArabic is a name applied to the descendants of the Classical Arabic language of the 6th century AD, used most prominently in the Quran, the Islamic Holy Book...
,
PersianPersian is an Iranian language within the Indo-Iranian branch of the Indo-European languages. It is primarily spoken in Iran, Afghanistan, Tajikistan and countries which historically came under Persian influence...
and
Tiberian HebrewTiberian Hebrew is the extinct canonical pronunciation of the Hebrew Bible or Tanakh and related documents in the Roman Empire. This traditional medieval pronunciation was committed to writing by Masoretic scholars based in the Jewish community of Tiberias , in the form of the Tiberian vocalization...
to represent a variant of "s", (ص in the Arabic alphabet, צ in Hebrew) although the letter "" is more frequently used for this.
- In HTML character entity references
Ş and ş can be used.
Use of the cedilla in Latvian
Comparatively, some consider the diacritics on the
LatvianLatvian is the official state language of Latvia. It is also sometimes referred to as Lettish. There are about 1.4 million native Latvian speakers in Latvia and about 150,000 abroad. The Latvian language has a relatively large number of non-native speakers, atypical for a small language...
consonants "ģ", "ķ", "ļ", "ņ", and formerly "ŗ" to be cedillas. Although their
AdobeAdobe Systems Incorporated is an American computer software company founded in 1982 and headquartered in San Jose, California, United States...
glyphA glyph is an element of writing: an individual mark on a written medium that contributes to the meaning of what is written. A glyph is made up of one or more graphemes....
names are commas, their names in the
UnicodeUnicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
Standard are "g", "k", "l", "n", and "r" with a cedilla. They were introduced to the
UnicodeUnicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
standard before 1992, and their name cannot be altered. The uppercase equivalent "Ģ" sometimes has a regular cedilla.
Use of the cedilla in Marshallese
Four lettersMarshallese underwent a change of orthography in recent times. However, most people still use the old orthography. It is written in a form of the Latin script with unusual diacritic combinations. There are different alphabetic systems in use by Marshallese speakers depending on religious...
in
MarshalleseThe Marshallese language is a Malayo-Polynesian language of the Marshall Islands, and the principal language of the country...
have cedillas: "", "", "" and "". In standard printed text they are
always cedillas, and their omission or the substitution of comma below and dot below diacritics are nonstandard.
, many font rendering engines do not display
any of these properly, for two reasons:
- "" and "" usually do not display properly at all, because of the use of the cedilla in Latvian. Unicode has precombined glyphs for these letters, but most quality fonts display them with comma below diacritics to accommodate the expectations of Latvian orthography
Latvian orthography, historically, has used a system based upon German phonetic principles and the Latgalian dialect was written using Polish orthographic principles. The present-day orthography has been in use since 1908. Its basis is the Latin alphabet. For the most part it is phonetic in that it...
. This is considered nonstandard in Marshallese. The use of a zero-width non-joinerThe zero-width non-joiner is a non-printing character used in the computerization of writing systems that make use of ligatures. When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to be printed in their final and initial forms, respectively...
between the letter and the diacritic can alleviate this problem: "" and "" may display properly, but may not; see below.
- "" and "" do not currently exist in Unicode as precombined glyphs, and must be encoded as the plain Latin letters "" and "" with the combining cedilla diacritic. Most Unicode fonts issued with Windows do not display combining diacritics properly, showing them too far to the right of the letter, as with Tahoma
Tahoma is the original form of the word "Tacoma", as in the city of Tacoma, Washington. It can refer to:Places:* Mount Tahoma, an alternative spelling of Mount Tacoma, the original name of Mount Rainier in the Cascade Range...
("m̧" and "o̧") and Times New Roman ("m̧" and "o̧"). This mostly affects "", and may or may not affect "". But some common Unicode fonts like Arial Unicode MSIn digital typography, the TrueType font Arial Unicode MS is an extended version of the font Arial. Compared to Arial, it includes higher line height, omits kerning pairs and adds enough glyphs to cover a large subset of Unicode 2.1—thus supporting most Microsoft code pages, but also requiring much...
("m̧" and "o̧"), CambriaCambria is part of the suite of fonts that comes with Microsoft Windows Vista, Windows 7, Microsoft Office 2007, Microsoft Office 2008 for Mac and Microsoft Office 2011 for Mac, specifically designed for on-screen reading and to be aesthetically pleasing when printed at small sizes. It is a...
("m̧" and "o̧") and Lucida Sans UnicodeIn digital typography, Lucida Sans Unicode OpenType font from the design studio of Bigelow & Holmes is designed to support the most commonly used characters defined in version 2.0 of the Unicode standard...
("m̧" and "o̧") don't have this problem. When "" is properly displayed, the cedilla is either underneath the center of the letter, or is underneath the right-most leg of the letter, but is always directly underneath the letter wherever it's positioned.
Because of these font display issues, it is not uncommon to find nonstandard
ad hoc substitutes for these letters. The
Marshallese-English Dictionary (the only complete Marshallese dictionary in existence) displays the letters with dot below diacritics, all of which do exist as precombined glyphs in Unicode: "", "", "" and "". The first three exist in the International Alphabet of Sanskrit Transliteration, and "" exists in the
Vietnamese alphabetThe Vietnamese alphabet, called Chữ Quốc Ngữ , usually shortened to Quốc Ngữ , is the modern writing system for the Vietnamese language...
, and both of these systems are supported by the most recent versions of common fonts like
ArialArial, sometimes marketed or displayed in software as Arial MT, is a sans-serif typeface and set of computer fonts. Fonts from the Arial family are packaged with Microsoft Windows, some other Microsoft software applications, Apple Mac OS X and many PostScript 3 computer printers...
, Courier New,
TahomaTahoma is the original form of the word "Tacoma", as in the city of Tacoma, Washington. It can refer to:Places:* Mount Tahoma, an alternative spelling of Mount Tacoma, the original name of Mount Rainier in the Cascade Range...
and Times New Roman. This sidesteps most of the Marshallese text display issues associated with the cedilla, but is still inappropriate for polished standard text.
Other diacritics confused with the cedilla
Several languages add a comma (virgula) to some letters, such as , which looks like a cedilla, but is more precisely a diacritical comma. This is particularly confusing with letters which can take either diacritic: for example, the consonant /ʃ/ is written as "ş" in
TurkishTurkish is a language spoken as a native language by over 83 million people worldwide, making it the most commonly spoken of the Turkic languages. Its speakers are located predominantly in Turkey and Northern Cyprus with smaller groups in Iraq, Greece, Bulgaria, the Republic of Macedonia, Kosovo,...
but in
RomanianRomanian Romanian Romanian (or Daco-Romanian; obsolete spellings Rumanian, Roumanian; self-designation: română, limba română ("the Romanian language") or românește (lit. "in Romanian") is a Romance language spoken by around 24 to 28 million people, primarily in Romania and Moldova...
, and Romanian writers will sometimes use the former instead of the latter because of insufficient font or character-set support.
The
PolishPolish is a language of the Lechitic subgroup of West Slavic languages, used throughout Poland and by Polish minorities in other countries...
letters "ą" and "ę" and
LithuanianLithuanian is the official state language of Lithuania and is recognized as one of the official languages of the European Union. There are about 2.96 million native Lithuanian speakers in Lithuania and about 170,000 abroad. Lithuanian is a Baltic language, closely related to Latvian, although they...
letters "ą", "ę", "į", and "ų" are not made with the cedilla either, but with the unrelated
ogonekThe ogonek is a diacritic hook placed under the lower right corner of a vowel in the Latin alphabet used in several European and Native American languages.-Use:...
diacritic, which merely resembles a reversed cedilla (opening to the right instead of the left).
A proposal for the use of the cedilla with the letter T in French
In 1868, Ambroise Firmin-Didot suggested in his book
Observations sur l'orthographe, ou ortografie, française (Observations on French Spelling) that French phonetics could be better regularized by adding a cedilla beneath the letter "t" in some words. For example, it is well-known that in the suffix
-tion this letter is usually not pronounced as (or close to) /t/ in either French or English. It has to be distinctly learned that in words such as French
diplomatie (but not
diplomatique) and English
action it is pronounced /s/ and /ʃ/, respectively (but not in
active in either language). A similar effect occurs with other prefixes or within words also in French and English, such as
partial where
t is pronounced /s/ and /ʃ/ respectively. Firmin-Didot surmised that a new character could be added to French orthography. A similar letter, the
t-commaT-comma is a letter which is part of the Romanian alphabet, used to represent the Romanian language sound , the voiceless alveolar affricate . It is written as the letter T with a small comma below and it has both the lower-case and the upper-case variants...
, does exist in Romanian, but it has a comma accent, not a cedilla one.
External links