Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Substitution cipher

Substitution cipher

Overview
In cryptography
Cryptography
Cryptography is the practice and study of hiding information. Modern cryptography intersects the disciplines of mathematics, computer science, and engineering...

, a substitution cipher is a method of encryption
Encryption
In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. The result of the process is encrypted information...

 by which units of plaintext are replaced with ciphertext
Ciphertext
In cryptography, ciphertext is the result of the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. This result is also known as encrypted information...

 according to a regular system; the "units" may be single letters (the most common), pairs of letters, triplets of letters, mixtures of the above, and so forth. The receiver deciphers the text by performing an inverse substitution.

Substitution ciphers can be compared with transposition cipher
Transposition cipher
In cryptography, a transposition cipher is a method of encryption by which the positions held by units of plaintext are shifted according to a regular system, so that the ciphertext constitutes a permutation of the plaintext. That is, the order of the units is changed...

s. In a transposition cipher, the units of the plaintext are rearranged in a different and usually quite complex order, but the units themselves are left unchanged.
Discussion
Ask a question about 'Substitution cipher'
Start a new discussion about 'Substitution cipher'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia
In cryptography
Cryptography
Cryptography is the practice and study of hiding information. Modern cryptography intersects the disciplines of mathematics, computer science, and engineering...

, a substitution cipher is a method of encryption
Encryption
In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. The result of the process is encrypted information...

 by which units of plaintext are replaced with ciphertext
Ciphertext
In cryptography, ciphertext is the result of the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. This result is also known as encrypted information...

 according to a regular system; the "units" may be single letters (the most common), pairs of letters, triplets of letters, mixtures of the above, and so forth. The receiver deciphers the text by performing an inverse substitution.

Substitution ciphers can be compared with transposition cipher
Transposition cipher
In cryptography, a transposition cipher is a method of encryption by which the positions held by units of plaintext are shifted according to a regular system, so that the ciphertext constitutes a permutation of the plaintext. That is, the order of the units is changed...

s. In a transposition cipher, the units of the plaintext are rearranged in a different and usually quite complex order, but the units themselves are left unchanged. By contrast, in a substitution cipher, the units of the plaintext are retained in the same sequence in the ciphertext, but the units themselves are altered.

There are a number of different types of substitution cipher. If the cipher operates on single letters, it is termed a simple substitution cipher; a cipher that operates on larger groups of letters is termed polygraphic. A monoalphabetic cipher uses fixed substitution over the entire message, whereas a polyalphabetic cipher uses a number of substitutions at different times in the message, where a unit from the plaintext is mapped to one of several possibilities in the ciphertext and vice-versa.

Simple substitution


Substitution over a single letter—simple substitution—can be demonstrated by writing out the alphabet in some order to represent the substitution. This is termed a substitution alphabet. The cipher alphabet may be shifted or reversed (creating the Caesar
Caesar cipher
In cryptography, a Caesar cipher, also known as a Caesar's cipher, the shift cipher, Caesar's code or Caesar shift, is one of the simplest and most widely known encryption techniques. It is a type of substitution cipher in which each letter in the plaintext is replaced by a letter some fixed number...

 and Atbash
Atbash
Atbash is a simple substitution cipher for the Hebrew alphabet. It consists in substituting aleph for tav , beth for shin , and so on, reversing the alphabet. In the Book of Jeremiah, Lev Kamai is Atbash for Kasdim , and Sheshakh is Atbash for Bavel...

 ciphers, respectively) or scrambled in a more complex fashion, in which case it is called a mixed alphabet or deranged alphabet. Traditionally, mixed alphabets are created by first writing out a keyword, removing repeated letters in it, then writing all the remaining letters in the alphabet.

Examples


Using this system, the keyword "zebras" gives us the following alphabets:
Plaintext alphabet: abcdefghijklmnopqrstuvwxyz
Ciphertext alphabet: ZEBRASCDFGHIJKLMNOPQTUVWXY

A message of

flee at once. we are discovered!

enciphers to

SIAA ZQ LKBA. VA ZOA RFPBLUAOAR!

Traditionally, the ciphertext is written out in blocks of fixed length, omitting punctuation and spaces; this is done to help avoid transmission errors and to disguise word boundaries from the plaintext
Plaintext
In cryptography, plaintext is information a sender wishes to transmit to a receiver. Cleartext is, sometimes confusingly, often used as a synonym. Before the computer era, plaintext most commonly meant message text in the language of the communicating parties...

. These blocks are called "groups", and sometimes a "group count" (i.e., the number of groups) is given as an additional check. Five letter groups are traditional, dating from when messages used to be transmitted by telegraph
Telegraphy
Telegraphy is the long-distance transmission of written messages without physical transport of letters. It is a compound term formed from the Greek words tele = far and graphein = write. Radiotelegraphy or wireless telegraphy transmits messages using radio...

:

SIAAZ QLKBA VAZOA RFPBL UAOAR

If the length of the message happens not to be divisible by five, it may be padded at the end with "null
Null
Null means 'nothing' or without value or consequence.Null may refer to:* Null , a special value in computer programming* Null , a zero value in several branches of mathematics...

s". These can be any characters that decrypt to obvious nonsense, so the receiver can easily spot them and discard them.

The ciphertext alphabet is sometimes different from the plaintext alphabet; for example, in the pigpen cipher
Pigpen cipher
The pigpen cipher is a simple substitution cipher exchanging letters for symbols based on a grid. The use of symbols is no impediment to cryptanalysis however, and cryptanalysis is identical to that of other simple substitution schemes...

, the ciphertext consists of a set of symbols derived from a grid. For example:

Such features make little difference to the security of a scheme, however — at the very least, any set of strange symbols can be transcribed back into an A-Z alphabet and dealt with as normal.

In lists and catalogues for sales people sometimes a very simple encryption is used to replace numeric digits by letters.
Plain digits: 1234567890
Ciphertext alphabet: MAKEPROFIT 

Example: MAT would be used to represent 120.

Security for simple substitution ciphers


A disadvantage of this method of derangement is that the last letters of the alphabet (which are mostly low frequency) tend to stay at the end. A stronger way of constructing a mixed alphabet is to perform a columnar transposition on the ordinary alphabet using the keyword, but this is not often done.

Although the number of possible key
Key (cryptography)
In cryptography, a key is a piece of information that determines the functional output of a cryptographic algorithm or cipher. Without a key, the algorithm would have no result. In encryption, a key specifies the particular transformation of plaintext into ciphertext, or vice versa during decryption...

s is very large (26! ≈ 288.4, or about 88 bits
Key size
In cryptography, key size or key length is the size of the key used in a cryptographic algorithm . An algorithm's key length is distinct from its cryptographic security, which is a logarithmic measure of the fastest known computational attack on the algorithm, also measured in bits...

), this cipher is not very strong, being easily broken. Provided the message is of reasonable length (see below), the cryptanalyst
Cryptanalysis
Cryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information which is normally required to do so. Typically, this involves knowing how the system works and finding a secret key...

 can deduce the probable meaning of the most common symbols by analyzing the frequency distribution
Frequency distribution
In statistics, a frequency distribution is a tabulation of the values that one or more variables take in a sample.-Univariate frequency tables:...

 of the ciphertext—frequency analysis
Frequency analysis
In cryptanalysis, frequency analysis is the study of the frequency of letters or groups of letters in a ciphertext. The method is used as an aid to breaking classical ciphers....

. This allows formation of partial words, which can be tentatively filled in, progressively expanding the (partial) solution (see frequency analysis for a demonstration of this). In some cases, underlying words can also be determined from the pattern of their letters; for example, attract, osseous, and words with those two as the root are the only common English
English language
English is a West Germanic language that developed in England during the Anglo-Saxon era. As a result of the military, economic, scientific, political, and cultural influence of the British Empire during the 18th, 19th, and early 20th centuries, and of the United States since the mid 20th century,...

 words with the pattern ABBCADB. Many people solve such ciphers for recreation, as with cryptogram
Cryptogram
A cryptogram is a type of puzzle which consists of a short piece of encrypted text. Generally the cipher used to encrypt the text is simple enough that cryptogram can be solved by hand. Frequently used are substitution ciphers where each letter is replaced by a different letter or number. To solve...

 puzzles in the newspaper.

According to the unicity distance
Unicity distance
Unicity distance is a term used in cryptography referring to the length of an original ciphertext needed to break the cipher by reducing the number of possible spurious keys to zero in a brute force attack...

 of English
English language
English is a West Germanic language that developed in England during the Anglo-Saxon era. As a result of the military, economic, scientific, political, and cultural influence of the British Empire during the 18th, 19th, and early 20th centuries, and of the United States since the mid 20th century,...

, 27.6 letters of ciphertext are required to crack a mixed alphabet simple substitution. In practice, typically about 50 letters are needed, although some messages can be broken with fewer if unusual patterns are found. In other cases, the plaintext can be contrived to have a nearly flat frequency distribution, and much longer plaintexts will then be required by the user.

Homophonic substitution



An early attempt to increase the difficulty of frequency analysis attacks on substitution ciphers was to disguise plaintext letter frequencies by homophony. In these ciphers, plaintext letters map to more than one ciphertext symbol. Usually, the highest-frequency plaintext symbols are given more equivalents than lower frequency letters. In this way, the frequency distribution is flattened, making analysis more difficult.

Since more than 26 characters will be required in the ciphertext alphabet, various solutions are employed to invent larger alphabets. Perhaps the simplest is to use a numeric substitution 'alphabet'. Another method consists of simple variations on the existing alphabet; uppercase, lowercase, upside down, etc. More artistically, though not necessarily more securely, some homophonic ciphers employed wholly invented alphabets of fanciful symbols. (See Poe
Edgar Allan Poe
Edgar Allan Poe was an American writer, poet, editor and literary critic, considered part of the American Romantic Movement. Best known for his tales of mystery and the macabre, Poe was one of the earliest American practitioners of the short story and is considered the inventor of the...

's "The Gold-Bug
The Gold-Bug
"The Gold-Bug" is a short story by Edgar Allan Poe. Set on Sullivan's Island, South Carolina, the plot follows William Legrand, who was recently bitten by a gold-colored bug, as well as his servant Jupiter and an unnamed narrator...

" for a literary example; cf. the Voynich manuscript
Voynich manuscript
The Voynich manuscript is a mysterious, undeciphered illustrated book. It is thought to have been written in the 15th or 16th century. The author, script, and language of the manuscript remain unknown....

.)

An interesting variant is the nomenclator. Named after the public official who announced the titles of visiting dignitaries, this cipher combined a small codebook
Codebook
In cryptography, a codebook is a document used for implementing a code. A codebook contains a lookup table for coding and decoding; each word or phrase has one or more strings which replace it. To decipher messages written in code, corresponding copies of the codebook must be available at either end...

 with large homophonic substitution tables. Originally the code
Code (cryptography)
In cryptography, a code is a method used to transform a message into an obscured form, preventing those who do not possess special information, or key, required to apply the transform from understanding what is actually transmitted. The usual method is to use a codebook with a list of common...

 was restricted to the names of important people, hence the name of the cipher; in later years it covered many common words and place names as well. The symbols for whole words (codewords in modern parlance) and letters (cipher
Cipher
In cryptography, a cipher is an algorithm for performing encryption or decryption — a series of well-defined steps that can be followed as a procedure. An alternative, less common term is encipherment. In non-technical usage, a “cipher” is the same thing as a “code”; however, the concepts...

 in modern parlance) were not distinguished in the ciphertext. The Rossignols
Rossignols
The Rossignols, a family of French cryptographers and cryptanalysts, included:* Antoine Rossignol * Bonaventure Rossignol* Antoine-Bonaventure RossignolThe family name meant "nightingale" in French...

' Great Cipher
Great Cipher
In the history of cryptography, the Great Cipher or Grand Chiffre was a nomenclator cipher developed by the Rossignols, several generations of whom served the French Crown as cryptographers. The Great Cipher was excellent of its class and so was given this name; it was reputed to be unbreakable...

 used by Louis XIV of France
Louis XIV of France
Louis XIV , popularly known as the Sun King , was King of France and of Navarre His reign, from 1643 to his death in 1715, lasted seventy-two years, three months, and eighteen days, and is the longest documented reign of any European monarch.Louis began personally governing France after the death...

 was one; after it went out of use, messages in French archive
Archive
An archive is a collection of historical records, and the location in which the collection is kept. Archives contain records which have accumulated over the course of an individual or organization's lifetime....

s were unbreakable for several hundred years.

Nomenclators were the standard fare of diplomatic
Diplomacy
Diplomacy is the art and practice of conducting negotiations between representatives of groups or states. It usually refers to international diplomacy, the conduct of international relations through the intercession of professional diplomats with regard to issues of peace-making, trade, war,...

 correspondence, espionage
Espionage
Espionage or spying involves an individual obtaining information that is considered secret or confidential without the permission of the holder of the information. Espionage is inherently clandestine, as the legitimate holder of the information may change plans or take other countermeasures once it...

, and advanced political conspiracy
Conspiracy (political)
In a political sense, conspiracy refers to a group of persons united in the goal of usurping or overthrowing an established political power. Typically, the final goal is to gain power through a revolutionary coup d'état or through assassination....

 from the early fifteenth century to the late eighteenth century; most conspirators were and have remained less cryptographically sophisticated. Although government
Government
A government is the body within a community, political entity or organization which has the authority to make and enforce rules, laws and regulations.....

 intelligence
Intelligence agency
An intelligence agency is a governmental agency that is devoted to information gathering for purposes of national security and defense. Means of information gathering may include espionage, communication interception, cryptanalysis, cooperation with other institutions, and evaluation of public...

 cryptanalysts were systematically breaking nomenclators by the mid-sixteenth century, and superior systems had been available since 1467, the usual response to cryptanalysis
Cryptanalysis
Cryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information which is normally required to do so. Typically, this involves knowing how the system works and finding a secret key...

 was simply to make the tables larger. By the late eighteenth century, when the system was beginning to die out, some nomenclators had 50,000 symbols.

Nevertheless, not all nomenclators were broken; today, cryptanalysis of archived ciphertexts remains a fruitful area of historical research
History
History is the study of the human past, with special attention to the written record. Scholars who write about history are called historians. It is a field of research which uses a narrative to examine and analyse the sequence of events, and it often attempts to investigate objectively the patterns...

.

The book cipher
Book cipher
A book cipher is a cipher in which the key is some aspect of a book or other piece of text; books being common and widely available in modern times, users of book cyphers take the position that the details of the key is sufficiently well hidden from attackers in practice. This is in some ways an...

 and straddling checkerboard
Straddling checkerboard
In cryptography, a straddling checkerboard is a device for converting an alphabetic plaintext into digits whilst simultaneously achieving fractionation and homophony...

 are types of homophonic cipher.

Polyalphabetic substitution



Polyalphabetic substitution ciphers were first described in 1467 by Leone Battista Alberti
Leone Battista Alberti
Leon Battista Alberti was an Italian author, artist, architect, poet, priest, linguist, philosopher, and cryptographer, and general Renaissance humanist polymath...

 in the form of disks. Johannes Trithemius
Johannes Trithemius
Johannes Trithemius was born Johann Heidenberg. He was an abbot, cryptographer and occultist who had an influence on later occultism. The name by which he is more commonly known is derived from his native town of Trittenheim on the Mosel in Germany.-Life:He studied at the University of Heidelberg...

, in his book Steganographia (Ancient Greek
Ancient Greek
Ancient Greek is the historical stage in the development of the Greek language spanning across the Archaic , Classical , and Hellenistic periods of ancient Greece and the ancient world. It is predated in the 2nd millennium BC by Mycenaean Greek...

 for "hidden writing") introduced the now more standard form of a tableau (see below; ca. 1500 but not published until much later). A more sophisticated version using mixed alphabets was described in 1563 by Giovanni Battista della Porta in his book, De Furtivis Literarum Notis (Latin
Latin
Latin is an Italic language originally spoken in Latium and Ancient Rome. Through the Roman conquest, Latin spread throughout the Mediterranean and a large part of Europe...

 for "On concealed characters in writing").

In a polyalphabetic cipher, multiple cipher alphabets are used. To facilitate encryption, all the alphabets are usually written out in a large table
Table (information)
A table is both a mode of visual communication and a means of arranging data. The use of tables is pervasive throughout all communication, research and data analysis. Tables appear in print media, handwritten notes, computer software, architectural ornamentation, traffic signs and many other places...

, traditionally called a tableau. The tableau is usually 26×26, so that 26 full ciphertext alphabets are available. The method of filling the tableau, and of choosing which alphabet to use next, defines the particular polyalphabetic cipher. All such ciphers are easier to break than once believed, as substitution alphabets are repeated for sufficiently large plaintexts.

One of the most popular was that of Blaise de Vigenère
Blaise de Vigenère
Blaise de Vigenère was a French diplomat and cryptographer. The Vigenère cipher is so named due to the cipher being incorrectly attributed to him in the 19th century....

. First published in 1585, it was considered unbreakable until 1863, and indeed was commonly called le chiffre indéchiffrable (French
French language
French is a Romance language globally spoken by about 65 million people as a first language , by 50 million as a second language, and by about another 200 million people as an acquired foreign language, with significant speakers in 57 countries. Most native speakers of the language live in France,...

 for "indecipherable cipher").

In the Vigenère cipher
Vigenère cipher
The Vigenère cipher is a method of encrypting alphabetic text by using a series of different Caesar ciphers based on the letters of a keyword. It is a simple form of polyalphabetic substitution....

, the first row of the tableau is filled out with a copy of the plaintext alphabet, and successive rows are simply shifted one place to the left. (Such a simple tableau is called a tabula recta
Tabula recta
In cryptography, the tabula recta is a square table of alphabets, each row of which is made by shifting the previous one to the left. The term was invented by Johannes Trithemius in 1518, and used in his cipher....

, and mathematically corresponds to adding the plaintext and key letters, modulo
Modular arithmetic
In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" after they reach a certain value—the modulus...

 26.) A keyword is then used to choose which ciphertext alphabet to use. Each letter of the keyword is used in turn, and then they are repeated again from the beginning. So if the keyword is 'CAT', the first letter of plaintext is enciphered under alphabet 'C', the second under 'A', the third under 'T', the fourth under 'C' again, and so on. In practice, Vigenère keys were often phrases several words long.

In 1863, Friedrich Kasiski
Friedrich Kasiski
Major Friedrich Wilhelm Kasiski was a Prussian infantry officer, cryptographer and archeologist. Kasiski was born in Schlochau, West Prussia .-Military service:...

 published a method (probably discovered secretly and independently before the Crimean War
Crimean War
The Crimean War was fought between the Russian Empire on one side and an alliance of the British Empire, France, the Ottoman Empire and the Kingdom of Sardinia on the other. The war was part of a long-running contest between the major European powers for influence over territories of the declining...

 by Charles Babbage
Charles Babbage
Charles Babbage, FRS was an English mathematician, philosopher, inventor and mechanical engineer who originated the concept of a programmable computer. Parts of his uncompleted mechanisms are on display in the London Science Museum. In 1991, a perfectly functioning difference engine was...

) which enabled the calculation of the length of the keyword in a Vigenère ciphered message. Once this was done, ciphertext letters that had been enciphered under the same alphabet could be picked out and attacked separately as a number of semi-independent simple substitutions - complicated by the fact that within one alphabet letters were separated and did not form complete words, but simplified by the fact that usually a tabula recta had been employed.

As such, even today a Vigenère type cipher should theoretically be difficult to break if mixed alphabets are used in the tableau, if the keyword is random, and if the total length of ciphertext is less than 27.6 times the length of the keyword. These requirements are rarely understood in practice, and so Vigenère enciphered message security is usually less than might have been.

Other notable polyalphabetics include:
  • The Gronsfeld cipher. This is identical to the Vigenère except that only 10 alphabets are used, and so the "keyword" is numerical.
  • The Beaufort cipher
    Beaufort cipher
    The Beaufort cipher, created by Sir Francis Beaufort, is a substitution cipher that is similar to the Vigenère cipher but uses a slightly modified enciphering mechanism and tableau...

    . This is practically the same as the Vigenère, except the tabula recta is replaced by a backwards one, mathematically equivalent to ciphertext = key - plaintext. This operation is self-inverse, whereby the same table is used for both encryption and decryption.
  • The autokey cipher
    Autokey cipher
    An autokey cipher is a cipher which incorporates the message into the key. There are two forms of autokey cipher: key autokey and text autokey ciphers. A key-autokey cipher uses previous members of the keystream to determine the next element in the keystream...

    , which mixes plaintext with a key to avoid periodic
    Periodic function
    In mathematics, a periodic function is a function that repeats its values in regular intervals or periods. The most important examples are the trigonometric functions, which repeat over intervals of length 2π...

    ity.
  • The running key cipher
    Running key cipher
    In classical cryptography, the running key cipher is a type of polyalphabetic substitution cipher in which a text, typically from a book, is used to provide a very long keystream...

    , where the key is made very long by using a passage from a book or similar text.


Modern stream cipher
Stream cipher
In cryptography, a stream cipher is a symmetric key cipher where plaintext bits are combined with a pseudorandom cipher bit stream , typically by an exclusive-or operation. In a stream cipher the plaintext digits are encrypted one at a time, and the transformation of successive digits varies...

s can also be seen, from a sufficiently abstract perspective, to be a form of polyalphabetic cipher in which all the effort has gone into making the keystream
Keystream
In cryptography, a keystream is a stream of random or pseudorandom characters that are combined with a plaintext message to produce an encrypted message ....

 as long and unpredictable as possible.

Polygraphic substitution


In a polygraphic substitution cipher, plaintext letters are substituted in larger groups, instead of substituting letters individually. The first advantage is that the frequency distribution is much flatter than that of individual letters (though not actually flat in real languages; for example, 'TH' is much more common than 'XQ' in English). Second, the larger number of symbols requires correspondingly more ciphertext to productively analyze letter frequencies.

To substitute pairs of letters would take a substitution alphabet 676 symbols long . In the same De Furtivis Literarum Notis mentioned above, della Porta actually proposed such a system, with a 20 x 20 tableau (for the 20 letters of the Italian/Latin alphabet he was using) filled with 400 unique glyph
Glyph
A glyph is an element of writing. It is a slightly vague term, but a more precise definition might be an individual mark on paper or another written medium which contributes to the meaning of what is written there...

s. However the system was impractical and probably never actually used.

The earliest practical digraphic cipher (pairwise substitution), was the so-called Playfair cipher
Playfair cipher
The Playfair cipher or Playfair square is a manual symmetric encryption technique and was the first literal digraph substitution cipher. The scheme was invented in 1854 by Charles Wheatstone, but bears the name of Lord Playfair who promoted the use of the cipher.The technique encrypts pairs of...

, invented by Sir Charles Wheatstone
Charles Wheatstone
Sir Charles Wheatstone FRS , was a British scientist and inventor of many scientific breakthroughs of the Victorian era, including the English concertina, the stereoscope , and the Playfair cipher...

 in 1854. In this cipher, a 5 x 5 grid is filled with the letters of a mixed alphabet (two letters, usually I and J, are combined). A digraphic substitution is then simulated by taking pairs of letters as two corners of a rectangle, and using the other two corners as the ciphertext (see the Playfair cipher
Playfair cipher
The Playfair cipher or Playfair square is a manual symmetric encryption technique and was the first literal digraph substitution cipher. The scheme was invented in 1854 by Charles Wheatstone, but bears the name of Lord Playfair who promoted the use of the cipher.The technique encrypts pairs of...

 main article for a diagram). Special rules handle double letters and pairs falling in the same row or column. Playfair was in military use from the Boer War
Second Boer War
The Second Boer War , commonly referred to as The Boer War and also known as the South African War , the Anglo-Boer War and in Afrikaans as the Anglo-Boereoorlog or Tweede Vryheidsoorlog , or the Engelse oorlog was fought...

 through World War II
World War II
World War II, or the Second World War , was a global military conflict which involved a majority of the world's nations, including all great powers, organized into two opposing military alliances: the Allies and the Axis...

.

Several other practical polygraphics were introduced in 1901 by Felix Delastelle
Felix Delastelle
Félix Marie Delastelle was a Frenchman most famous for his invention of several systems of polygraphic substitution ciphers including the bifid, trifid, and the four-square ciphers....

, including the bifid
Bifid cipher
In classical cryptography, the bifid cipher is a cipher which combines the Polybius square with transposition, and uses fractionation to achieve diffusion...

 and four-square cipher
Four-square cipher
The four-square cipher is a manual symmetric encryption technique. It was invented by famous French cryptographer Felix Delastelle.The technique encrypts pairs of letters , and thus falls into a category of ciphers known as polygraphic substitution ciphers...

s (both digraphic) and the trifid cipher
Trifid cipher
In classical cryptography, the trifid cipher is a cipher invented around 1901 by Felix Delastelle, which extends the concept of the bifid cipher to a third dimension, allowing each symbol to be fractionated into 3 elements instead of two...

 (probably the first practical trigraphic).

The Hill cipher
Hill cipher
In classical cryptography, the Hill cipher is a polygraphic substitution cipher based on linear algebra. Invented by Lester S. Hill in 1929, it was the first polygraphic cipher in which it was practical to operate on more than three symbols at once...

, invented in 1929 by Lester S. Hill, is a polygraphic substitution which can combine much larger groups of letters simultaneously using linear algebra
Linear algebra
Linear algebra is a branch of mathematics concerned with the study of vectors, vector spaces , linear maps , and systems of linear equations. Vector spaces are a central theme in modern mathematics; thus, linear algebra is widely used in both abstract algebra and functional analysis...

. Each letter is treated as a digit in base 26
Numeral system
A numeral system is a writing system for expressing numbers, that is a mathematical notation for representing numbers of a given set, using graphemes or symbols in a consistent manner....

: A = 0, B =1, and so on. (In a variation, 3 extra symbols are added to make the basis
Basis (linear algebra)
In linear algebra, a basis is a set of vectors that, in a linear combination, can represent every vector in a given vector space or free module, and such that no element of the set can be represented as a linear combination of the others...

 prime
Prime number
In mathematics, a prime number is a natural number which has exactly two distinct natural number divisors: 1 and itself. The first twenty-six prime numbers are:An infinitude of prime numbers exists, as demonstrated by Euclid around 300 BC. The number 1 is by definition not a prime number...

.) A block of n letters is then considered as a vector
Vector space
A vector space is a mathematical structure formed by a collection of vectors: objects that may be added together and multiplied by numbers, called scalars in this context. Scalars are often taken to be real numbers, but one may also consider vector spaces with scalar multiplication by complex...

 of n dimension
Dimension
In mathematics and physics, the dimension of a space or object is informally defined as the minimum number of coordinates needed to specify each point within it. Thus a line has a dimension of one because only one coordinate is needed to specify a point on it...

s, and multiplied by a n x n matrix
Matrix (mathematics)
In mathematics, a matrix is a rectangular array of numbers, such asEntries of a matrix are often denoted by a variable with two subscripts, as shown on the right. Matrices of the same size can be added and subtracted entrywise and matrices of compatible size can be multiplied...

, modulo
Modular arithmetic
In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" after they reach a certain value—the modulus...

 26. The components of the matrix are the key, and should be random provided that the matrix is invertible in (to ensure decryption is possible). A Hill cipher of dimension 6 was once implemented mechanically.

The Hill cipher is vulnerable to a known-plaintext attack
Known-plaintext attack
The known-plaintext attack is an attack model for cryptanalysis where the attacker has samples of both the plaintext and its encrypted version and is at liberty to make use of them to reveal further secret information such as secret keys and code books.-History:At Bletchley Park in World War II,...

 because it is completely linear
Linear
The word linear comes from the Latin word linearis, which means created by lines.In mathematics, a linear map or function f is a function which satisfies the following two properties......

, so it must be combined with some non-linear step to defeat this attack. The combination of wider and wider weak, linear diffusive
Confusion and diffusion
In cryptography, confusion and diffusion are two properties of the operation of a secure cipher which were identified by Claude Shannon in his paper Communication Theory of Secrecy Systems, published in 1949....

 steps like a Hill cipher, with non-linear substitution steps, ultimately leads to a substitution-permutation network
Substitution-permutation network
In cryptography, an SP-network, or substitution-permutation network , is a series of linked mathematical operations used in block cipher algorithms such as AES ....

 (e.g. a Feistel cipher
Feistel cipher
In cryptography, a Feistel cipher is a symmetric structure used in the construction of block ciphers, named after the German-born physicist and cryptographer Horst Feistel who did pioneering research while working for IBM ; it is also commonly known as a Feistel network. A large proportion of block...

), so it is possible — from this extreme perspective — to consider modern block cipher
Block cipher
In cryptography, a block cipher is a symmetric key cipher operating on fixed-length groups of bits, termed blocks, with an unvarying transformation. A block cipher encryption algorithm might take a 128-bit block of plaintext as input, and output a corresponding 128-bit block of ciphertext...

s as a type of polygraphic substitution.

Mechanical substitution ciphers


Between circa World War I
World War I
World War I , also known as the First World War, the Great War, and the War to End All Wars, was a global military conflict which involved most of the world's great powers, assembled in two opposing alliances: the Triple Entente and the Triple Alliance...

 and the widespread availability of computer
Computer
A computer is a machine that manipulates data according to a set of instructions.Although mechanical examples of computers have existed through much of recorded human history, the first electronic computers were developed in the mid-20th century . These were the size of a large room, consuming as...

s (for some governments this was approximately the 1950s or 1960s; for other organizations it was a decade or more later; for individuals it was no earlier than 1975), mechanical implementations of polyalphabetic substitution ciphers were widely used. Several inventors had similar ideas about the same time, and rotor cipher machine
Rotor machine
In cryptography, a rotor machine is an electro-mechanical device used for encrypting and decrypting secret messages. Rotor machines were the cryptographic state-of-the-art for a brief but prominent period of history; they were in widespread use in the 1930s–1950s...

s were patented four times in 1919. The most important of the resulting machines was the Enigma
Enigma machine
An Enigma machine is any of a family of related electro-mechanical rotor machines used for the encryption and decryption of secret messages. The first Enigma was invented by German engineer Arthur Scherbius at the end of World War I...

, especially in the versions used by the German military
Wehrmacht
Wehrmacht was the name of the unified armed forces of Germany from 1935 to 1945. It consisted of the Heer , the Kriegsmarine and the Luftwaffe ....

 from approximately 1930. The Allies
Allies
In general, allies are people, groups, or nations that have joined together in an association for mutual benefit or to achieve some common purpose. In English usage, those who share a common goal and whose work toward that goal is complementary may be viewed as allies for various purposes even when...

 also developed and used rotor machines (eg, SIGABA
SIGABA
In the history of cryptography, the ECM Mark II was a rotor machine used by the United States from World War II until the 1950s. The machine was also known as the SIGABA or Converter M-134 by the Army, or CSP-888/889 by the Navy, and a modified Navy version was termed the CSP-2900.Like many...

 and Typex
Typex
In the history of cryptography, Typex machines were British cipher machines used from 1937. It was an adaptation of the commercial German Enigma with a number of enhancements that greatly increased its security....

).

All of these were similar in that the substituted letter was chosen electrically from amongst the huge number of possible combinations resulting from the rotation of several letter disks. Since one or more of the disks rotated mechanically with each plaintext letter enciphered, the number of alphabets used was substantially more than astronomical. Early versions of these machine were, nevertheless, breakable. William F. Friedman
William F. Friedman
William Frederick Friedman was a US Army cryptographer . He ran the research division of the Army's Signals Intelligence Service in the 1930s, and parts of its follow-on services into the 1950s...

 of the US Army's SIS
Signals Intelligence Service
The Signals Intelligence Service was the United States Army codebreaking division, headquartered at Arlington Hall. It was a part of the Signal Corps so secret that outside the office of the Chief Signal officer, it did not officially exist. William Friedman began the division with three "junior...

 early found vulnerabilities in Hebern's rotor machine
Hebern rotor machine
The Hebern Rotor Machine was an electro-mechanical encryption machine built by combining the mechanical parts of a standard typewriter with the electrical parts of an electric typewriter, connecting the two through a scrambler...

, and GC&CS's Dillwyn Knox solved versions of the Enigma machine (those without the "plugboard") well before WWII
World War II
World War II, or the Second World War , was a global military conflict which involved a majority of the world's nations, including all great powers, organized into two opposing military alliances: the Allies and the Axis...

 began. Traffic protected by essentially all of the German military Enigmas was broken by Allied cryptanalysts, most notably those at Bletchley Park
Bletchley Park
Bletchley Park, also known as Station X, is an estate located in the town of Bletchley, in Buckinghamshire, England. Since 1967, Bletchley has been part of Milton Keynes....

, beginning with the German Army variant used in the early 1930s. This version was broken by inspired mathematical insight by Marian Rejewski
Marian Rejewski
Marian Adam Rejewski was a Polish mathematician and cryptologist who in 1932 solved the plugboard-equipped Enigma machine, the main cipher device used by Germany...

 in Poland
Poland
Poland , officially the Republic of Poland , is a country in Central Europe . Poland is bordered by Germany to the west; the Czech Republic and Slovakia to the south; Ukraine, Belarus and Lithuania to the east; and the Baltic Sea and Kaliningrad Oblast, a Russian exclave, to the north...

.

No messages protected by the SIGABA
SIGABA
In the history of cryptography, the ECM Mark II was a rotor machine used by the United States from World War II until the 1950s. The machine was also known as the SIGABA or Converter M-134 by the Army, or CSP-888/889 by the Navy, and a modified Navy version was termed the CSP-2900.Like many...

 and Typex
Typex
In the history of cryptography, Typex machines were British cipher machines used from 1937. It was an adaptation of the commercial German Enigma with a number of enhancements that greatly increased its security....

 machines were ever, so far as is publicly known, broken.

The one-time pad



One type of substitution cipher, the one-time pad
One-time pad
In cryptography, the one-time pad is an encryption algorithm in which the plaintext is combined with a secret random key or pad, which is used only once. A modular addition is typically used to combine plaintext elements with pad elements. It was invented in 1917 and patented a couple of years...

, is quite special. It was invented near the end of WWI by Gilbert Vernam
Gilbert Vernam
Gilbert Sandford Vernam was a AT&T Bell Labs engineer who, in 1917, invented the stream cipher and later co-invented the one-time pad cipher. Vernam proposed a teletype cipher in which a previously-prepared key, kept on paper tape, is combined character by character with the plaintext message to...

 and Joseph Mauborgne
Joseph Mauborgne
In the history of cryptography, Joseph Oswald Mauborgne co-invented the one-time pad with Gilbert Vernam of Bell Labs. In 1914 he published the first recorded solution of the Playfair cipher....

 in the US. It was mathematically proved unbreakable by Claude Shannon
Claude Shannon
Claude Elwood Shannon , an American electronic engineer and mathematician, is known as "the father of information theory".Shannon is famous for having founded information theory with one landmark paper published in 1948...

, probably during WWII; his work was first published in the late 1940s. In its most common implementation, the one-time pad can be called a substitution cipher only from an unusual perspective; typically, the plaintext letter is combined (not substituted) in some manner (eg, XOR) with the key material character at that position.

The one-time pad is, in most cases, impractical as it requires that the key material be as long as the plaintext, actually random, used once and only once, and kept entirely secret from all except the sender and intended receiver. When these conditions are violated, even marginally, the one-time pad is no longer unbreakable. Soviet
Soviet Union
The Union of Soviet Socialist Republics was a constitutionally socialist state that existed in Eurasia from 1922 to 1991. The name is a translation of the , tr. Soyuz Sovetskikh Sotsialisticheskikh Respublik, abbreviated СССР, SSSR. The common short name is Soviet Union, from , Sovetskiy Soyuz...

 one-time pad messages sent from the US for a brief time during WWII used non-random key material. US cryptanalysts, beginning in the late 40s, were able to, entirely or partially, break a few thousand messages out of several hundred thousand. (See VENONA)

In a mechanical implementation, rather like the ROCKEX
Rockex
Rockex, or Telekrypton, was an offline one-time tape cipher machine known to have been used by Britain and Canada from 1943. It was developed by Benjamin deForest Bayly, working during the war for British Security Coordination....

 equipment, the one-time pad was used for messages sent on the Moscow
Moscow
Moscow is the capital and the largest city of Russia. It is also the largest metropolitan area in Europe, and ranks among the largest urban areas in the world. Moscow is a major political, economic, cultural, religious, financial, educational, and transportation centre of Russia and the world, a...

-Washington
Washington, D.C.
Washington, D.C. , formally the District of Columbia and commonly referred to as Washington, the District, or simply D.C., is the capital of the United States, founded on July 16, 1790...

 hot line established after the Cuban missile crisis
Cuban Missile Crisis
The Cuban Missile Crisis was a confrontation between the United States, the Soviet Union, and Cuba in October 1962, during the Cold War. In Russia, former Eastern Bloc, and communist countries , it is termed the "Caribbean Crisis" , while in Cuba it is called the "October Crisis"...

.

Substitution in modern cryptography


Substitution ciphers as discussed above, especially the older pencil-and-paper hand ciphers, are no longer in serious use. However, the cryptographic concept of substitution carries on even today. From a sufficiently abstract perspective, modern bit-oriented block cipher
Block cipher
In cryptography, a block cipher is a symmetric key cipher operating on fixed-length groups of bits, termed blocks, with an unvarying transformation. A block cipher encryption algorithm might take a 128-bit block of plaintext as input, and output a corresponding 128-bit block of ciphertext...

s (eg, DES
Data Encryption Standard
The Data Encryption Standard is a block cipher that was selected by the National Bureau of Standards as an official Federal Information Processing Standard for the United States in 1976 and which has subsequently enjoyed widespread use internationally. It is based on a symmetric-key algorithm...

, or AES
Advanced Encryption Standard
In cryptography, the Advanced Encryption Standard is an encryption standard adopted by the U.S. government. The standard comprises three block ciphers, AES-128, AES-192 and AES-256, adopted from a larger collection originally published as Rijndael. Each AES cipher has a 128-bit block size, with...

) can be viewed as substitution ciphers on an enormously large binary
Binary numeral system
The binary numeral system, or base-2 number system represents numeric values using two symbols, 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2...

 alphabet. In addition, block ciphers often include smaller substitution tables called S-boxes. See also substitution-permutation network
Substitution-permutation network
In cryptography, an SP-network, or substitution-permutation network , is a series of linked mathematical operations used in block cipher algorithms such as AES ....

.

Substitution ciphers in popular culture

  • Sherlock Holmes
    Sherlock Holmes
    Sherlock Holmes is a fictional character of the late nineteenth and early twentieth centuries who first appeared in publication in 1887. He is the creation of British author and physician Sir Arthur Conan Doyle...

     breaks a substitution cipher in "The Adventure of the Dancing Men
    The Adventure of the Dancing Men
    The Adventure of the Dancing Men, one of the 56 Sherlock Holmes short stories written by British author Sir Arthur Conan Doyle, is one of 13 stories in the cycle collected as The Return of Sherlock Holmes.-Summary:...

    ."
  • The Al Bhed language in Final Fantasy X
    Final Fantasy X
    is a console role-playing game developed and published by Square as the tenth title in the Final Fantasy series. It was released in 2001 for Sony's PlayStation 2. The game marks the Final Fantasy series' transition from entirely pre-rendered backdrops to fully three-dimensional areas, and is also...

    is actually a substitution cipher, although it is pronounced phonetically (i.e. "you" in English is translated to "oui" in Al Bhed, but is pronounced the same way that "oui" is pronounced in French
    French language
    French is a Romance language globally spoken by about 65 million people as a first language , by 50 million as a second language, and by about another 200 million people as an acquired foreign language, with significant speakers in 57 countries. Most native speakers of the language live in France,...

    ).
  • Uryuomoco, the fictional language from El Goonish Shive
    El Goonish Shive
    El Goonish Shive is a contemporary fantasy webcomic, written and drawn by Dan Shive. It debuted on 2002-01-21 and was hosted on Keenspot from mid-2003 to early-2009, when it changed hosting to 910CMX. Shive rates EGS as being "13+"....

    is a substitution cipher from English.
  • The Minbari
    Minbari
    The Minbari are a fictional alien race featured in the television show Babylon 5. The Minbari characters of Delenn, Lennier, Neroon, Draal, and Dukhat figure prominently in the series....

    's alphabet from the Babylon 5
    Babylon 5
    Babylon 5 is an American science fiction television series created, produced and largely written by J. Michael Straczynski. The show centers on the Babylon 5 space station: a focal point for politics, diplomacy, and conflict during the years 2257–2262...

     series is a substitution cipher from English.
  • The language in Starfox Adventures: Dinosaur Planet spoken by native Saurians and Krystal
    Krystal
    Krystal may refer to:*Krystal , one of the oldest fast-food chains in the United States, founded in 1932*Krystal, a video game character from the Star Fox series*Krystal, a character from a video game, Dark Wizard...

     is also a substitution cipher of the English alphabet
    English alphabet
    The modern English alphabet is a Latin-based alphabet consisting of 26 letters – the same letters that are found in the Basic modern Latin alphabet:
    The exact shape of printed letters varies depending on the typeface...

    .
  • The television program Futurama
    Futurama
    Futurama is an animated American sci-fi sitcom created by Matt Groening and developed by Groening and David X. Cohen for the Fox network. The series follows the adventures of a late 20th-century New York City pizza delivery boy, Philip J...

     contained a substitution cipher in which all 26 letters were replaced by symbols and called "Alien Language". This was deciphered rather quickly by the die hard viewers by showing a "Slurm" ad with the word "Drink" in both plain English and the Alien language thus giving the key. Later, the producers created a second alien language that used a combination of replacement and mathematical Ciphers. Once the English letter of the alien language is deciphered, then the numerical value of that letter (1 through 26 respectively) is then added to the value of the previous letter showing the actual intended letter. These messages can be seen throughout every episode of the series and it's subsequent movies.

See also

  • Vigenère cipher
    Vigenère cipher
    The Vigenère cipher is a method of encrypting alphabetic text by using a series of different Caesar ciphers based on the letters of a keyword. It is a simple form of polyalphabetic substitution....

  • Topics in cryptography
  • Ban (information)
    Ban (information)
    A ban, sometimes called a hartley or a dit , is a logarithmic unit which measures information or entropy, based on base 10 logarithms and powers of 10, rather than the powers of 2 and base 2 logarithms which define the bit. As a bit corresponds to a binary digit, so a ban is a decimal digit...

    with Centiban Table

External links