All Topics  
Frequency analysis

 

   Email Print
   Bookmark   Link






 

Frequency analysis



 
 
In cryptanalysis
Cryptanalysis

Cryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information which is normally required to do so....
, frequency analysis is the study of the frequency of letters
Letter frequencies

The frequency of letters in text has often been studied for use in cryptography, and frequency analysis in particular. No exact letter frequency distribution underlies a given language, since all writers write slightly differently....
 or groups of letters in a ciphertext. The method is used as an aid to breaking classical cipher
Classical cipher

In cryptography, a classical cipher is a type of cipher used historically but which now have fallen, for the most part, into disuse. In general, classical ciphers operate on an alphabet of letters , and are implemented by hand or with simple mechanical devices....
s.

Frequency analysis is based on the fact that, in any given stretch of written language, certain letters and combinations of letters occur with varying frequencies.






Discussion
Ask a question about 'Frequency analysis'
Start a new discussion about 'Frequency analysis'
Answer questions from other users
Full Discussion Forum



Encyclopedia


English Slf
In cryptanalysis
Cryptanalysis

Cryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information which is normally required to do so....
, frequency analysis is the study of the frequency of letters
Letter frequencies

The frequency of letters in text has often been studied for use in cryptography, and frequency analysis in particular. No exact letter frequency distribution underlies a given language, since all writers write slightly differently....
 or groups of letters in a ciphertext. The method is used as an aid to breaking classical cipher
Classical cipher

In cryptography, a classical cipher is a type of cipher used historically but which now have fallen, for the most part, into disuse. In general, classical ciphers operate on an alphabet of letters , and are implemented by hand or with simple mechanical devices....
s.

Frequency analysis is based on the fact that, in any given stretch of written language, certain letters and combinations of letters occur with varying frequencies. Moreover, there is a characteristic distribution of letters that is roughly the same for almost all samples of that language. For instance, given a section of English language
English language

English is a West Germanic language that originated in Anglo-Saxon England and has lingua franca status in many parts of the world as a result of the military, economic, scientific, political and cultural influence of the British Empire in the 18th, 19th and early 20th centuries and that of the United States from the mid 20th century onwa...
, E tends to be very common, while X is very rare. Likewise, ST, NG, TH, and QU are common pairs of letters (termed bigram
Bigram

Bigrams are groups of two written letters, two syllables, or two words, and are very commonly used as the basis for simple statistical analysis of text....
s
or digraphs), while NZ and QJ are rare. The nonsense phrase "ETAOIN SHRDLU
ETAOIN SHRDLU

ETAOIN SHRDLU is the approximate order of Frequency analysis of the twelve most commonly used letters in the English language, best known as a nonsense phrase that sometimes appeared in print in the days of "Hot metal typesetting" publishing due to a custom of Linotype machine operators....
" represents the 12 most frequent letters in typical English language text.

In some ciphers, such properties of the natural language plaintext are preserved in the ciphertext, and these patterns have the potential to be exploited in a ciphertext-only attack
Ciphertext-only attack

In cryptography, a ciphertext-only attack or known ciphertext attack is an attack model for cryptanalysis where the attacker is assumed to have access only to a set of ciphertexts....
.

Frequency analysis for simple substitution ciphers

In a simple substitution cipher
Substitution cipher

In cryptography, a substitution cipher is a method of encryption by which units of plaintext are replaced with ciphertext according to a regular system; the "units" may be single letters , pairs of letters, triplets of letters, mixtures of the above, and so forth....
, each letter of the plaintext
Plaintext

In cryptography, plaintext is the information which the sender wishes to transmit to the receiver. Before the computer era, plaintext simply meant text in the language of the communicating parties....
 is replaced with another, and any particular letter in the plaintext will always be transformed into the same letter in the ciphertext. For instance, if all occurrences of the letter e turn into the letter X, a ciphertext message containing numerous instances of the letter X would suggest to a cryptanalyst that X represents e.

The basic use of frequency analysis is to first count the frequency of ciphertext letters and then associate guessed plaintext letters with them. More X's in the ciphertext than anything else suggests that X corresponds to e in the plaintext, but this is not certain; t and a are also very common in English, so X might be either of them also. It is unlikely to be a plaintext z or q which are less common. Thus the cryptanalyst may need to try several combinations of mappings between ciphertext and plaintext letters.

More complex use of statistics can be conceived, such as considering counts of pairs of letters (digrams), triplets (trigrams), and so on. This is done to provide more information to the cryptanalyst, for instance, Q and U nearly always occur together in that order in English, even though Q itself is rare.

An example


Suppose Eve
Alice and Bob

Placeholder names are commonly used for archetypal characters in fields such as cryptography and physics. The names are used for convenience, since explanations such as "Person A wants to send a message to person B" can be difficult to follow in complex systems involving many steps....
 has intercepted the cryptogram
Cryptogram

A cryptogram is a type of puzzle which consists of a short piece of encryption text. Generally the cipher used to encrypt the text is simple enough that cryptogram can be solved by hand....
 below, and it is known to be encrypted using a simple substitution cipher: LIVITCSWPIYVEWHEVSRIQMXLEYVEOIEWHRXEXIPFEMVEWHKVSTYLXZIXLIKIIXPIJVSZEYPERRGERIM WQLMGLMXQERIWGPSRIHMXQEREKIETXMJTPRGEVEKEITREWHEXXLEXXMZITWAWSQWXSWEXTVEPMRXRSJ GSTVRIEYVIEXCVMUIMWERGMIWXMJMGCSMWXSJOMIQXLIVIQIVIXQSVSTWHKPEGARCSXRWIEVSWIIBXV IZMXFSJXLIKEGAEWHEPSWYSWIWIEVXLISXLIVXLIRGEPIRQIVIIBGIIHMWYPFLEVHEWHYPSRRFQMXLE PPXLIECCIEVEWGISJKTVWMRLIHYSPHXLIQIMYLXSJXLIMWRIGXQEROIVFVIZEVAEKPIEWHXEAMWYEPP XLMWYRMWXSGSWRMHIVEXMSWMGSTPHLEVHPFKPEZINTCMXIVJSVLMRSCMWMSWVIRCIGXMWYMX For this example, uppercase letters are used to denote ciphertext, lowercase letters are used to denote plaintext (or guesses at such), and X~t is used to express a guess that ciphertext letter X represents the plaintext letter t.

Eve could use frequency analysis to help solve the message along the following lines: counts of the letters in the cryptogram show that I is the most common single letter, XL most common bigram, and XLI is the most common trigram. e is the most common letter in the English language, th is the most common bigram, and the the most common trigram. This strongly suggests that X~t, L~h and I~e. The second most common letter in the cryptogram is E; since the first and second most frequent letters in the English language, e and t are accounted for, Eve guesses that E~a, the third most frequent letter. Tentatively making these assumptions, the following partial decrypted message is obtained.

heVeTCSWPeYVaWHaVSReQMthaYVaOeaWHRtatePFaMVaWHKVSTYhtZetheKeetPeJVSZaYPaRRGaReM WQhMGhMtQaReWGPSReHMtQaRaKeaTtMJTPRGaVaKaeTRaWHatthattMZeTWAWSQWtSWatTVaPMRtRSJ GSTVReaYVeatCVMUeMWaRGMeWtMJMGCSMWtSJOMeQtheVeQeVetQSVSTWHKPaGARCStRWeaVSWeeBtV eZMtFSJtheKaGAaWHaPSWYSWeWeaVtheStheVtheRGaPeRQeVeeBGeeHMWYPFhaVHaWHYPSRRFQMtha PPtheaCCeaVaWGeSJKTVWMRheHYSPHtheQeMYhtSJtheMWReGtQaROeVFVeZaVAaKPeaWHtaAMWYaPP thMWYRMWtSGSWRMHeVatMSWMGSTPHhaVHPFKPaZeNTCMteVJSVhMRSCMWMSWVeRCeGtMWYMt

Using these initial guesses, Eve can spot patterns that confirm her choices, such as "that". Moreover, other patterns suggest further guesses. "Rtate" might be "state", which would mean R~s. Similarly "atthattMZe" could be guessed as "atthattime", yielding M~i and Z~m. Furthermore, "heVe" might be "here", giving V~r. Filling in these guesses, Eve gets:

hereTCSWPeYraWHarSseQithaYraOeaWHstatePFairaWHKrSTYhtmetheKeetPeJrSmaYPassGasei WQhiGhitQaseWGPSseHitQasaKeaTtiJTPsGaraKaeTsaWHatthattimeTWAWSQWtSWatTraPistsSJ GSTrseaYreatCriUeiWasGieWtiJiGCSiWtSJOieQthereQeretQSrSTWHKPaGAsCStsWearSWeeBtr emitFSJtheKaGAaWHaPSWYSWeWeartheStherthesGaPesQereeBGeeHiWYPFharHaWHYPSssFQitha PPtheaCCearaWGeSJKTrWisheHYSPHtheQeiYhtSJtheiWseGtQasOerFremarAaKPeaWHtaAiWYaPP thiWYsiWtSGSWsiHeratiSWiGSTPHharHPFKPameNTCiterJSrhisSCiWiSWresCeGtiWYit

In turn, these guesses suggest still others (for example, "remarA" could be "remark", implying A~k) and so on, and it is relatively straigh­tforward to deduce the rest of the letters, eventually yielding the plaintext.

hereuponlegrandarosewithagraveandstatelyairandbroughtmethebeetlefromaglasscasei nwhichitwasencloseditwasabeautifulscarabaeusandatthattimeunknowntonaturalistsof courseagreatprizeinascientificpointofviewthereweretworoundblackspotsnearoneextr emityofthebackandalongoneneartheotherthescaleswereexceedinglyhardandglossywitha lltheappearanceofburnishedgoldtheweightoftheinsectwasveryremarkableandtakingall thingsintoconsiderationicouldhardlyblamejupiterforhisopinionrespectingit

At this point, it would be a good idea for Eve to insert spaces and punctuation:

Hereupon Legrand arose, with a grave and stately air, and brought me the beetle from a glass case in which it was enclosed. It was a beautiful scarabaeus, and, at that time, unknown to naturalists—of course a great prize in a scientific point of view. There were two round black spots near one extremity of the back, and a long one near the other. The scales were exceedingly hard and glossy, with all the appearance of burnished gold. The weight of the insect was very remarkable, and, taking all things into consideration, I could hardly blame Jupiter for his opinion respecting it.

In this example from The Gold-Bug
The Gold-Bug

"The Gold-Bug" is a short story by Edgar Allan Poe. Set on Sullivan's Island, South Carolina, South Carolina, the plot follows William Legrand, who was recently bitten by a gold-colored bug, as well as his servant Jupiter and an unnamed narrator....
, Eve's guesses were all correct. This would not always be the case, however; the variation in statistics for individual plaintexts can mean that initial guesses are incorrect. It may be necessary to backtrack
Backtracking

Backtracking is a general algorithm for finding all solutions to some computational problem, that incrementally builds candidates to the solutions, and abandons each partial candidate c as soon as it determines that c cannot possibly be completed to a valid solution ....
 incorrect guesses or to analyze the available statistics in much more depth than the somewhat simplified justifications given in the above example.

It is also possible that the plaintext does not exhibit the expected distribution of letter frequencies. Shorter messages are likely to show more variation. It is also possible to construct artificially skewed texts. For example, entire novels have been written that omit the letter "e" altogether — a form of literature known as a lipogram
Lipogram

A lipogram is a kind of constrained writing or word game consisting of writing paragraphs or longer works in which a particular letter or group of letters is omitted, usually a common vowel, the most common in English language being e ....
.

History and usage

Al Kindi Cryptanalysis
The first known recorded explanation of frequency analysis (indeed, of any kind of cryptanalysis) was given in the 9th century by Al-Kindi
Al-Kindi

, also known to the Western world by the Latinized version of his name 'Alkindus', was an Arab polymath: an Early Islamic philosophy, Islamic science, Islamic astrology, Islamic astronomy, Alchemy and chemistry in Islam, Logic in Islamic philosophy, Islamic mathematics, Arabic music, Islamic medicine, Islamic physics, Islamic psychologi...
, an Arab
Arab

An Arab is a person who Identity as such on linguistic or cultural grounds. The plural form, Arabs , refers to the Ethnocultural group at large....
 polymath
Polymath

A polymath is a person whose knowledge is not restricted to one subject area. In less formal terms, a polymath may simply refer to someone who is very knowledgeable....
, in A Manuscript on Deciphering Cryptographic Messages. It has been suggested that close textual study of the Qur'an
Qur'an

The Qur?an is the central religious text of Islam. Muslims believe the Qur?an to be the book of divine guidance and direction for mankind, and consider the original Arabic text to be the final revelation of God....
 first brought to light that Arabic
Arabic language

Arabic is a Central Semitic language, thus related to and classified alongside other Semitic languages languages such as Hebrew language and Aramaic language....
 has a characteristic letter frequency. Its use spread, and similar systems were widely used in European states by the time of the Renaissance
Renaissance

The Renaissance was a cultural movement that spanned roughly the 14th to the 17th century, beginning in Italy in the late Middle Ages and later spreading to the rest of Europe....
. By 1474 Cicco Simonetta
Cicco Simonetta

Francesco Simonetta was an Italian secretary, and statesman. He also is remembered for composing an early treatise on cryptography....
 had written a manual on deciphering encryptions of Latin and Italian
Italian language

Italian is a Romance languages spoken by about 63 million people as a first language, primarily in Italy. In Switzerland, Italian is one of four Linguistic geography of Switzerlands....
 text.

Several schemes were invented by cryptographers to defeat this weakness in simple substitution encryptions. These included:

  • Use of homophones — several alternatives to the most common letters in otherwise monoalphabetic substitution ciphers (for example, for English, both X and Y ciphertext might mean plaintext E).
  • Polyalphabetic substitution
    Polyalphabetic cipher

    A polyalphabetic cipher is any cipher based on substitution cipher, using multiple substitution alphabets. The Vigen?re cipher is probably the best-known example of a polyalphabetic cipher, though it is a simplified special case....
    , that is, the use of several alphabets — chosen in assorted, more or less devious, ways (Leone Alberti seems to have been the first to propose this); and
  • Polygraphic substitution, schemes where pairs or triplets of plaintext letters are treated as units for substitution, rather than single letters (for example, the Playfair cipher
    Playfair cipher

    The Playfair cipher or Playfair square is a manual symmetric key algorithm encryption technique and was the first literal polygraphic substitution cipher....
     invented by Charles Wheatstone
    Charles Wheatstone

    Knighthood Charles Wheatstone Fellow of the Royal Society , was a United Kingdom scientist and inventor of many scientific breakthroughs of the Victorian era, including the English concertina, the stereoscope , and the Playfair cipher ....
     in the mid 1800s).


A disadvantage of all these attempts to defeat frequency counting attacks is that it increases complication of both enciphering and deciphering, leading to mistakes. Famously, a British Foreign Secretary is said to have rejected the Playfair cipher because, even if school boys could cope successfully as Wheatstone and Playfair had shown, 'our attachés could never learn it!'.

The rotor machine
Rotor machine

In cryptography, a rotor machine is an electro-mechanical device used for encryption and decrypting secret messages. Rotor machines were the cryptographic state-of-the-art for a brief but prominent period of history; they were in widespread use in the 1930s–1950s....
s of the first half of the 20th century (for example, the Enigma machine
Enigma machine

The Enigma machine is any of a family of related electro-mechanical rotor machines that have been used to generate ciphers for the encryption and decryption of secret messages....
) were essentially immune to straigh­tforward frequency analysis. However, other kinds of analysis ("attacks") successfully decoded messages from some of those machines.

Frequency analysis requires only a basic understanding of the statistics of the plaintext language and some problem solving skills, and, if performed by hand, some tolerance for extensive letter bookkeeping. During World War II
World War II

World War II, or the Second World War , was a global military conflict which involved a Participants in World War II, including all of the great powers, organised into two opposing military alliances: the Allies of World War II and the Axis powers....
 (WWII), both the British
United Kingdom

The United Kingdom of Great Britain and Northern Ireland, commonly known as the United Kingdom , the UK or Britain,is a sovereign state located off the northwestern coast of continental Europe....
 and the Americans
United States

The United States of America is a Federal government constitutional republic comprising U.S. state and a federal district. The country is situated mostly in central North America, where its Contiguous United States and Washington, D.C., the Capital districts and territories, lie between the Pacific Ocean and Atlantic Oceans, Borders of the U...
 recruited codebreakers by placing crossword
Crossword

A crossword is a word puzzle that normally takes the form of a square or rectangular grid of black and white squares. The goal is to fill the white squares with letters, forming words or phrases, by solving clues which lead to the answers....
 puzzles in major newspapers and running contests for who could solve them the fastest. Several of the ciphers used by the Axis powers
Axis Powers

The Axis powers were those countries that were opposed to the Allies of World War II during World War II. The three major Axis powers - Nazi Germany, Kingdom of Italy , and Empire of Japan - were part of a military alliance on the signing of the Tripartite Pact in September 1940, which officially founded the Axis powers....
 were breakable using frequency analysis (for example, some of the consular ciphers used by the Japanese). Mechanical methods of letter counting and statistical analysis (generally IBM card type machinery) were first used in WWII, possibly by the US Army's SIS
Signals Intelligence Service

The Signals Intelligence Service was the United States Army codebreaking division, headquartered at Arlington Hall. It was a part of the United States Army Signal Corps so secret that outside the office of the Chief Signal officer, it did not officially exist....
. Today, the hard work of letter counting and analysis has been replaced by computer
Computer

A computer is a machine that manipulates Data according to a list of Code .The first devices that resemble modern computers date to the mid-20th century , although the computer concept and various machines similar to computers existed earlier....
 software, which can carry out such analysis in seconds. With modern computing power, classical ciphers are unlikely to provide any real protection for confidential data.

Frequency analysis in fiction

Frequency analysis has been described in fiction. Edgar Allan Poe
Edgar Allan Poe

Edgar Allan Poe was an American poet, Short story writer, Editing and Literary criticism, and is considered part of the American Romanticism. Best known for his tales of Mystery and the macabre, Poe was one of the earliest American practitioners of the short story and is considered the inventor of the Detective fiction genre....
's "The Gold-Bug
The Gold-Bug

"The Gold-Bug" is a short story by Edgar Allan Poe. Set on Sullivan's Island, South Carolina, South Carolina, the plot follows William Legrand, who was recently bitten by a gold-colored bug, as well as his servant Jupiter and an unnamed narrator....
," and Sir Arthur Conan Doyle's
Arthur Conan Doyle

Sir Arthur Ignatius Conan Doyle, Deputy Lieutenant was a Scotland author most noted for his stories about the Detective fiction Sherlock Holmes, which are generally considered a major innovation in the field of crime fiction, and for the adventures of Professor Challenger....
 Sherlock Holmes
Sherlock Holmes

Sherlock Holmes is a fictional character of the late nineteenth and early twentieth centuries, who first appeared in publication in 1887. He is the creation of Scotland-born author and physician Sir Arthur Conan Doyle....
 tale "The Adventure of the Dancing Men
The Adventure of the Dancing Men

The Adventure of the Dancing Men, one of the 56 Sherlock Holmes short stories written by British author Sir Arthur Conan Doyle, is one of 13 stories in the cycle collected as The Return of Sherlock Holmes....
" are examples of stories which describe the use of frequency analysis to attack simple substitution ciphers. The cipher in the Poe story is encrusted with several deception measures, but this is more a literary device than anything significant cryptographically.

Dancing Men

See also

  • ETAOIN SHRDLU
    ETAOIN SHRDLU

    ETAOIN SHRDLU is the approximate order of Frequency analysis of the twelve most commonly used letters in the English language, best known as a nonsense phrase that sometimes appeared in print in the days of "Hot metal typesetting" publishing due to a custom of Linotype machine operators....
  • Letter frequencies
    Letter frequencies

    The frequency of letters in text has often been studied for use in cryptography, and frequency analysis in particular. No exact letter frequency distribution underlies a given language, since all writers write slightly differently....
  • Index of coincidence
    Index of coincidence

    In cryptography, coincidence counting is the technique of putting two texts side-by-side and counting the number of times that identical letters appear in the same position in both texts....
  • Topics in cryptography
    Topics in cryptography

    This article is intended to be an 'analytic glossary', or alternatively, an organized collection of annotated pointers....
  • Zipf's law
    Zipf's law

    Zipf's law, an empirical law formulated using mathematical statistics, refers to the fact that many types of data studied in the physical science and social science sciences can be approximated with a Zipfian distribution, one of a family of related discrete power law probability distributions....


Further reading

  • Helen Fouché Gaines, "Cryptanalysis", 1939, Dover. ISBN 0-486-20097-3
  • Abraham Sinkov
    Abraham Sinkov

    Dr. Abraham Sinkov was a US cryptanalysis....
    , "Elementary Cryptanalysis: A Mathematical Approach", The Mathematical Association of America, 1966. ISBN 0-88385-622-0.


External links