Hapax legomenon

Hapax legomenon

Overview
A hapax legomenon is a word
Word
In language, a word is the smallest free form that may be uttered in isolation with semantic or pragmatic content . This contrasts with a morpheme, which is the smallest unit of meaning but will not necessarily stand on its own...

 which occurs only once within a context, either in the written record of an entire language, in the works of an author, or just in a single text. The term is sometimes used incorrectly to describe a word that occurs in just one of an author's works, even though it occurs more than once in that work. Hapax legomenon is a transliteration of Greek
Greek language
Greek is an independent branch of the Indo-European family of languages. Native to the southern Balkans, it has the longest documented history of any Indo-European language, spanning 34 centuries of written records. Its writing system has been the Greek alphabet for the majority of its history;...

 ἅπαξ λεγόμενον, meaning "(something) said (only) once".

The related terms dis legomenon, tris legomenon, and tetrakis legomenon refer respectively to double, triple, or quadruple occurrences, but are far less commonly used.

Hapax legomena are quite common, as predicted by Zipf's law, which states that the frequency of any word in a work or corpus is inversely proportional to its rank in the frequency table.
Discussion
Ask a question about 'Hapax legomenon'
Start a new discussion about 'Hapax legomenon'
Answer questions from other users
Full Discussion Forum
 
Recent Discussions
Encyclopedia
A hapax legomenon is a word
Word
In language, a word is the smallest free form that may be uttered in isolation with semantic or pragmatic content . This contrasts with a morpheme, which is the smallest unit of meaning but will not necessarily stand on its own...

 which occurs only once within a context, either in the written record of an entire language, in the works of an author, or just in a single text. The term is sometimes used incorrectly to describe a word that occurs in just one of an author's works, even though it occurs more than once in that work. Hapax legomenon is a transliteration of Greek
Greek language
Greek is an independent branch of the Indo-European family of languages. Native to the southern Balkans, it has the longest documented history of any Indo-European language, spanning 34 centuries of written records. Its writing system has been the Greek alphabet for the majority of its history;...

 ἅπαξ λεγόμενον, meaning "(something) said (only) once".

The related terms dis legomenon, tris legomenon, and tetrakis legomenon refer respectively to double, triple, or quadruple occurrences, but are far less commonly used.

Hapax legomena are quite common, as predicted by Zipf's law, which states that the frequency of any word in a work or corpus is inversely proportional to its rank in the frequency table. For large corpora, about 40% to 60% of the words (counting by type
Type-token distinction
In disciplines such as philosophy and knowledge representation, the type-token distinction is a distinction that separates an abstract concept from the objects which are particular instances of the concept...

) occurring are hapax legomena, and another 10% to 15% are dis legomena. Thus, in the Brown Corpus
Brown Corpus
The Brown University Standard Corpus of Present-Day American English was compiled in the 1960s by Henry Kucera and W. Nelson Francis at Brown University, Providence, Rhode Island as a general corpus in the field of corpus linguistics...

 of American English, about half of the 50,000 words are hapax legomena within that corpus.

Note that the term hapax legomenon refers to a word's appearance in a body of text, not to its origins, nor to its prevalence in speech. It thus differs from a nonce word
Nonce word
A nonce word is a word used only "for the nonce"—to meet a need that is not expected to recur. Quark, for example, was formerly a nonce word in English, appearing only in James Joyce's Finnegans Wake. Murray Gell-Mann then adopted it to name a new class of subatomic particle...

, which may never be recorded, or may find currency and be recorded widely, or may appear several times in the work which coins it, and so on.

Significance


Hapax legomena in ancient texts are difficult to translate and decipher, since it is easier to infer meaning from multiple contexts than from just one. For example, many of the remaining undeciphered Mayan glyphs
Maya script
The Maya script, also known as Maya glyphs or Maya hieroglyphs, is the writing system of the pre-Columbian Maya civilization of Mesoamerica, presently the only Mesoamerican writing system that has been substantially deciphered...

 are hapax legomena, and Biblical (particularly Hebrew
Hebrew language
Hebrew is a Semitic language of the Afroasiatic language family. Culturally, is it considered by Jews and other religious groups as the language of the Jewish people, though other Jewish languages had originated among diaspora Jews, and the Hebrew language is also used by non-Jewish groups, such...

) hapax legomena pose sometimes difficult issues in translation. Hapax legomena also pose challenges in natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

.

Some scholars consider Hapax legomena useful in determining the authorship of written works. For example, each of Shakespeare's plays contains a roughly similar percentage of hapax legomena not found elsewhere in his work.

P.N. Harrison, in The Problem of the Pastoral Epistles (1921) made hapax legomena popular among Bible scholars
Biblical studies
Biblical studies is the academic study of the Judeo-Christian Bible and related texts. For Christianity, the Bible traditionally comprises the New Testament and Old Testament, which together are sometimes called the "Scriptures." Judaism recognizes as scripture only the Hebrew Bible, also known as...

, when he argued that there are considerably more of them in the three Pastoral Epistles
Pastoral epistles
The three pastoral epistles are books of the canonical New Testament: the First Epistle to Timothy the Second Epistle to Timothy , and the Epistle to Titus. They are presented as letters from Paul of Tarsus...

 than in other Pauline Epistles
Pauline epistles
The Pauline epistles, Epistles of Paul, or Letters of Paul, are the thirteen New Testament books which have the name Paul as the first word, hence claiming authorship by Paul the Apostle. Among these letters are some of the earliest extant Christian documents...

. He argued that the number of hapax legomena in a putative author's corpus indicates his or her vocabulary and is characteristic of the author as an individual.

Harrison's theory has faded in significance due to a number of problems raised by other scholars. For example, in 1896, W.P. Workman found the following numbers of hapax legomena in each Pauline Epistle
Pauline epistles
The Pauline epistles, Epistles of Paul, or Letters of Paul, are the thirteen New Testament books which have the name Paul as the first word, hence claiming authorship by Paul the Apostle. Among these letters are some of the earliest extant Christian documents...

: Rom. 113, I Cor. 110, II Cor. 99, Gal. 34, Eph. 43 Phil. 41, Col. 38, I Thess. 23, II Thess. 11, Philem. 5, I Tim. 82, II Tim. 53, Titus 33. At first glance, the last three totals (for the Pastoral Epistles) are not out of line with the others. To take account of the varying length of the epistles, Workman also calculated the average number of hapax legomena per page of the Greek text
The New Testament in the Original Greek
The New Testament in the Original Greek is the name of a Greek language version of the New Testament published in 1881. It is also known as the Westcott and Hort text, after its editors Brooke Foss Westcott and Fenton John Anthony Hort...

, which ranged from 3.6 to 13, as summarised in the diagram on the right. Although the Pastoral Epistles have more hapax legomena per page, Workman found the differences to be moderate in comparison to the variation between other Epistles. This was reinforced when Workman looked at several plays
Shakespeare's plays
William Shakespeare's plays have the reputation of being among the greatest in the English language and in Western literature. Traditionally, the 37 plays are divided into the genres of tragedy, history, and comedy; they have been translated into every major living language, in addition to being...

 by William Shakespeare
William Shakespeare
William Shakespeare was an English poet and playwright, widely regarded as the greatest writer in the English language and the world's pre-eminent dramatist. He is often called England's national poet and the "Bard of Avon"...

, which showed similar variations (from 3.4 to 10.4 per page of Irving's one-volume edition), as summarised in the second diagram on the right.

Apart from author identity, there are several other factors which can explain the number of hapax legomena in a work, such as:
  • text length: this directly affects the expected number and percentage of hapax legomena; the brevity of the Pastoral Epistles also makes any statistical analysis problematic.
  • text topic: if the author writes on different subjects, of course many subject-specific words will occur only in limited contexts.
  • text audience: if the author is writing to a peer rather than a student, or their spouse rather than their employer, again quite different vocabulary will appear.
  • time: over the course of years, both the language itself and a given author's knowledge and use of language, will change.


In the particular case of the Pastoral Epistles, all of these variables are quite different than in the rest of the Pauline corpus, and hapax legomena are no longer widely accepted as a strong indicator of authorship (although the authorship of the Pastorals is subject to debate on other grounds).

There are also subjective questions over whether two forms amount to "the same word": dog vs dogs, clue vs clueless, sign vs signature, and many other grey cases arise. The Jewish Encyclopedia points out that although there are 1500 hapaxes in the Old Testament
Old Testament
The Old Testament, of which Christians hold different views, is a Christian term for the religious writings of ancient Israel held sacred and inspired by Christians which overlaps with the 24-book canon of the Masoretic Text of Judaism...

, only about 400 of those are not obviously related to other attested word forms.

It would not be especially difficult for a forger to construct a work with any percentage of hapax legomena desired. However, it seems unlikely that forgers much before the 20th century would have thought of such a ploy, much less thought it worth the effort.

A final difficulty with the use of hapax legomena for authorship determination is that there is considerable variation among works known to be by a single author, and disparate authors can show very similar values. In other words, it is not a reliable indicator. Authorship studies now usually use a wide range of measures, and look for a pattern across them, rather than relying on a single measurement.

Computer science


In the fields of computational linguistics
Computational linguistics
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....

 and natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

 (NLP), esp. corpus linguistics
Corpus linguistics
Corpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely...

 and machine learned
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

 NLP, it is common to disregard hapax legomena (and sometimes other infrequent words), as they are likely to have little value for computational techniques. This has the added benefit of significantly reducing the memory usage of application, since by Zipf's law, many words are hapaxes.

Hebrew examples

  • Gvina (גבינה – cheese
    Cheese
    Cheese is a generic term for a diverse group of milk-based food products. Cheese is produced throughout the world in wide-ranging flavors, textures, and forms....

    ) is a hapax legomenon of Biblical Hebrew, found only in . The word has become extremely common in modern Hebrew
    Hebrew language
    Hebrew is a Semitic language of the Afroasiatic language family. Culturally, is it considered by Jews and other religious groups as the language of the Jewish people, though other Jewish languages had originated among diaspora Jews, and the Hebrew language is also used by non-Jewish groups, such...

    .
  • Akut (אקוט – fought), only appears once in the Hebrew Bible, in .
  • Lilith
    Lilith
    Lilith is a character in Jewish mythology, found earliest in the Babylonian Talmud, who is generally thought to be related to a class of female demons Līlīṯu in Mesopotamian texts. However, Lowell K. Handy notes, "Very little information has been found relating to the Akkadian and Babylonian view...

    (לילית) occurs once in the Hebrew Bible
    Hebrew Bible
    The Hebrew Bible is a term used by biblical scholars outside of Judaism to refer to the Tanakh , a canonical collection of Jewish texts, and the common textual antecedent of the several canonical editions of the Christian Old Testament...

    , in , which describes the desolation of Edom
    Edom
    Edom or Idumea was a historical region of the Southern Levant located south of Judea and the Dead Sea. It is mentioned in biblical records as a 1st millennium BC Iron Age kingdom of Edom, and in classical antiquity the cognate name Idumea was used to refer to a smaller area in the same region...

    . The word is translated into English in several ways.
  • Atzei Gopher (עֲצֵי-גֹפֶר – Gopher wood
    Gopher wood
    Gopher wood or gopherwood is a term used once in the Bible, for the substance from which Noah's ark was built. Gen 6:14 states that Noah was to build the Ark of גפר, gofer, more commonly gopher wood, a word not otherwise known in the Bible or in Hebrew.Although,Older English translations, including...

    ) is mentioned once in the Bible, in , in the instruction to make Noah's ark "of gopher wood". Because of the single appearance, the literal meaning is lost. Gopher is simply a transliteration, although scholars today tentatively suggest that the wood intended is cypress
    Cypress
    Cypress is the name applied to many plants in the cypress family Cupressaceae, which is a conifer of northern temperate regions. Most cypress species are trees, while a few are shrubs...

    .

Greek examples

  • autoguos (αυτογυος), an ancient Greek
    Ancient Greek
    Ancient Greek is the stage of the Greek language in the periods spanning the times c. 9th–6th centuries BC, , c. 5th–4th centuries BC , and the c. 3rd century BC – 6th century AD of ancient Greece and the ancient world; being predated in the 2nd millennium BC by Mycenaean Greek...

     word for a sort of plough
    Plough
    The plough or plow is a tool used in farming for initial cultivation of soil in preparation for sowing seed or planting. It has been a basic instrument for most of recorded history, and represents one of the major advances in agriculture...

    , is found once (and exclusively) in Hesiod
    Hesiod
    Hesiod was a Greek oral poet generally thought by scholars to have been active between 750 and 650 BC, around the same time as Homer. His is the first European poetry in which the poet regards himself as a topic, an individual with a distinctive role to play. Ancient authors credited him and...

    : the precise meaning remaining obscure.
  • panaorios (παναωριος), ancient Greek
    Ancient Greek
    Ancient Greek is the stage of the Greek language in the periods spanning the times c. 9th–6th centuries BC, , c. 5th–4th centuries BC , and the c. 3rd century BC – 6th century AD of ancient Greece and the ancient world; being predated in the 2nd millennium BC by Mycenaean Greek...

     for "very untimely", is one of many hapax legomena of the Iliad
    Iliad
    The Iliad is an epic poem in dactylic hexameters, traditionally attributed to Homer. Set during the Trojan War, the ten-year siege of the city of Troy by a coalition of Greek states, it tells of the battles and events during the weeks of a quarrel between King Agamemnon and the warrior Achilles...

    .
  • The Greek New Testament
    New Testament
    The New Testament is the second major division of the Christian biblical canon, the first such division being the much longer Old Testament....

     contains 686 local hapax legomena, sometimes called "New Testament hapax" of which 62 occur in 1 Peter, and 54 occur in 2 Peter.
  • aphedron
    Aphedron
    Aphedron is a Greek word for latrine.Because the word occurs only once in the New Testament , and was unknown in classical texts, Martin Luther translated the word as meaning the bowel. Likewise during the 19th Century various scholars assumed it was a euphemism for the human bowel....

    "latrine" was a hapax legomenon thought to mean "bowel" until an inscription was found in Pergamos.

Latin examples

  • Ambacu, a word relating in some way to a meat dish, only appears in book 2 of Apuleius
    Apuleius
    Apuleius was a Latin prose writer. He was a Berber, from Madaurus . He studied Platonist philosophy in Athens; travelled to Italy, Asia Minor and Egypt; and was an initiate in several cults or mysteries. The most famous incident in his life was when he was accused of using magic to gain the...

    ' Metamorphoses.
  • Mnemosynus, presumably meaning a keepsake or aide-memoire, only appears in Poem 12 of Catullus
    Catullus
    Gaius Valerius Catullus was a Latin poet of the Republican period. His surviving works are still read widely, and continue to influence poetry and other forms of art.-Biography:...

    ' Carmina
    Poetry of Catullus
    The poetry of Gaius Valerius Catullus was written towards the end of the Roman Republic. It describes the lifestyle of the poet and his friends, as well as, most famously, his love for the woman he calls Lesbia.-Sources and organization:...

    .
  • Deproeliantis, a participle of the word deproelior, which means "to fight fiercely" or "to struggle violently," only appears in line 11 of Horace's Ode 1.9.

Arabic examples

  • The proper nouns Iram (Q 89:7, Iram of the Pillars
    Iram of the Pillars
    Iram of the Pillars , also called Aram, Iram, Irum, Irem, Erum, Wabar, Ubar, or the City of a Thousand Pillars, is a lost city on the Arabian Peninsula.-Introduction:Ubar, a name of a region or a name of a people, was mentioned in ancient records, and was spoken of in folk...

    ), Bābil (Q 2:102, Babylon
    Babylon
    Babylon was an Akkadian city-state of ancient Mesopotamia, the remains of which are found in present-day Al Hillah, Babil Province, Iraq, about 85 kilometers south of Baghdad...

    ), Bakka(t) (Q 3:96, Bakkah
    Bakkah
    Bakkah is an ancient name for Mecca, the most holy city of Islam. Most people believe they are synonyms, but to Muslim scholars there is a distinction: Bakkah refers to the Kaaba and the sacred site immediately surrounding it, while Mecca is the name of the city in which they are both...

    ), Ǧibt (Q 4:51), Ramaḍān (Q 2:185, Ramadan
    Ramadan
    Ramadan is the ninth month of the Islamic calendar, which lasts 29 or 30 days. It is the Islamic month of fasting, in which participating Muslims refrain from eating, drinking, smoking and sex during daylight hours and is intended to teach Muslims about patience, spirituality, humility and...

    ), ar-Rūm (Q 30:2, Ancient Rome
    Ancient Rome
    Ancient Rome was a thriving civilization that grew on the Italian Peninsula as early as the 8th century BC. Located along the Mediterranean Sea and centered on the city of Rome, it expanded to one of the largest empires in the ancient world....

    ), Tasnīm (Q 83:27), Qurayš (Q 106:1, Quraysh), Maǧūs (Q 22:17, Magi
    Magi
    Magi is a term, used since at least the 4th century BC, to denote a follower of Zoroaster, or rather, a follower of what the Hellenistic world associated Zoroaster with, which...

    ), Mārūt (Q 2:102, Harut and Marut
    Harut and Marut
    Harut and Marut are two angels mentioned in the second Surah of the Qur'an, who were sent down to test the people at Babel or Babylon by performing deeds of magic....

    ), Makka(t) (Q 48:24, Mecca
    Mecca
    Mecca is a city in the Hijaz and the capital of Makkah province in Saudi Arabia. The city is located inland from Jeddah in a narrow valley at a height of above sea level...

    ), Nasr (Q 71:23), (Ḏū) an-Nūn (Q 21:87) and Hārūt (Q 2:102, Harut and Marut
    Harut and Marut
    Harut and Marut are two angels mentioned in the second Surah of the Qur'an, who were sent down to test the people at Babel or Babylon by performing deeds of magic....

    ) occur only once in the Qurʾān.
  • zanǧabīl (زَنْجَبِيل - ginger
    Ginger
    Ginger is the rhizome of the plant Zingiber officinale, consumed as a delicacy, medicine, or spice. It lends its name to its genus and family . Other notable members of this plant family are turmeric, cardamom, and galangal....

    ) is a Qurʾānic hapax (Q 76:17).
  • The epitheton ornans aṣ-ṣamad (الصَّمَد – the One besought (Names of God in the Qur'an)) is a Qurʾānic hapax (Q 112:2).

Italian examples

  • Ramogna is mentioned only once in Italian literature
    Italian literature
    Italian literature is literature written in the Italian language, particularly within Italy. It may also refer to literature written by Italians or in Italy in other languages spoken in Italy, often languages that are closely related to modern Italian....

    , precisely in Dante
    Dante Alighieri
    Durante degli Alighieri, mononymously referred to as Dante , was an Italian poet, prose writer, literary theorist, moral philosopher, and political thinker. He is best known for the monumental epic poem La commedia, later named La divina commedia ...

    's Divina Commedia (Purgatorio
    Purgatorio
    Purgatorio is the second part of Dante's Divine Comedy, following the Inferno, and preceding the Paradiso. The poem was written in the early 14th century. It is an allegory telling of the climb of Dante up the Mount of Purgatory, guided by the Roman poet Virgil...

    XI, 25).
  • Trasumanar is another hapax legomenon mentioned in Dante
    Dante Alighieri
    Durante degli Alighieri, mononymously referred to as Dante , was an Italian poet, prose writer, literary theorist, moral philosopher, and political thinker. He is best known for the monumental epic poem La commedia, later named La divina commedia ...

    's Divina Commedia (Paradiso
    Paradiso (Dante)
    Paradiso is the third and final part of Dante's Divine Comedy, following the Inferno and the Purgatorio. It is an allegory telling of Dante's journey through Heaven, guided by Beatrice, who symbolises theology...

    I, 70, translated as "Passing beyond the human" by Mandelbaum
    Allen Mandelbaum
    Allen Mandelbaum was a American professor of Italian literature, poet, and translator. He was the W. R...

    ).

English examples

  • Honorificabilitudinitatibus
    Honorificabilitudinitatibus
    Honorificabilitudinitatibus is the dative and ablative plural of the mediæval Latin word honorificabilitudinitas, which can be translated as "the state of being able to achieve honours". It is mentioned by the character Costard in Act V, Scene I of William Shakespeare's Love's Labour's Lost...

    is a hapax legomenon of Shakespeare's works.
  • Nortelrye, a word for "education", occurs exactly once in Chaucer
    Geoffrey Chaucer
    Geoffrey Chaucer , known as the Father of English literature, is widely considered the greatest English poet of the Middle Ages and was the first poet to have been buried in Poet's Corner of Westminster Abbey...

    .
  • Slæpwerigne occurs exactly once in the Old English corpus, in the Exeter Book
    Exeter Book
    The Exeter Book, Exeter Cathedral Library MS 3501, also known as the Codex Exoniensis, is a tenth-century book or codex which is an anthology of Anglo-Saxon poetry. It is one of the four major Anglo-Saxon literature codices. The book was donated to the library of Exeter Cathedral by Leofric, the...

    . There is debate over whether it means "weary with sleep" or "weary for sleep".
  • Flother, a synonym for snow
    Snow
    Snow is a form of precipitation within the Earth's atmosphere in the form of crystalline water ice, consisting of a multitude of snowflakes that fall from clouds. Since snow is composed of small ice particles, it is a granular material. It has an open and therefore soft structure, unless packed by...

    flake, is a hapax legomenon of written English
    English language
    English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

    pre-1900, found in a manuscript from around 1275.

External links