Substitution cipher
Encyclopedia
In cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

, a substitution cipher is a method of encryption
Encryption
In cryptography, encryption is the process of transforming information using an algorithm to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. The result of the process is encrypted information...

 by which units of plaintext are replaced with ciphertext
Ciphertext
In cryptography, ciphertext is the result of encryption performed on plaintext using an algorithm, called a cipher. Ciphertext is also known as encrypted or encoded information because it contains a form of the original plaintext that is unreadable by a human or computer without the proper cipher...

 according to a regular system; the "units" may be single letters (the most common), pairs of letters, triplets of letters, mixtures of the above, and so forth. The receiver deciphers the text by performing an inverse substitution.

Substitution ciphers can be compared with transposition cipher
Transposition cipher
In cryptography, a transposition cipher is a method of encryption by which the positions held by units of plaintext are shifted according to a regular system, so that the ciphertext constitutes a permutation of the plaintext. That is, the order of the units is changed...

s. In a transposition cipher, the units of the plaintext are rearranged in a different and usually quite complex order, but the units themselves are left unchanged. By contrast, in a substitution cipher, the units of the plaintext are retained in the same sequence in the ciphertext, but the units themselves are altered.

There are a number of different types of substitution cipher. If the cipher operates on single letters, it is termed a simple substitution cipher; a cipher that operates on larger groups of letters is termed polygraphic. A monoalphabetic cipher uses fixed substitution over the entire message, whereas a polyalphabetic cipher uses a number of substitutions at different times in the message, where a unit from the plaintext is mapped to one of several possibilities in the ciphertext and vice-versa.

Simple substitution

Substitution over a single letter—simple substitution—can be demonstrated by writing out the alphabet in some order to represent the substitution. This is termed a substitution alphabet. The cipher alphabet may be shifted or reversed (creating the Caesar
Caesar cipher
In cryptography, a Caesar cipher, also known as a Caesar's cipher, the shift cipher, Caesar's code or Caesar shift, is one of the simplest and most widely known encryption techniques. It is a type of substitution cipher in which each letter in the plaintext is replaced by a letter some fixed number...

 and Atbash
Atbash
Atbash is a simple substitution cipher for the Hebrew alphabet. It consists in substituting aleph for tav , beth for shin , and so on, reversing the alphabet. In the Book of Jeremiah, Lev Kamai is Atbash for Kasdim , and Sheshakh is Atbash for Bavel...

 ciphers, respectively) or scrambled in a more complex fashion, in which case it is called a mixed alphabet or deranged alphabet. Traditionally, mixed alphabets are created by first writing out a keyword, removing repeated letters in it, then writing all the remaining letters in the alphabet.

Examples

Using this system, the keyword "zebras" gives us the following alphabets:
Plaintext alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Ciphertext alphabet: ZEBRASCDFGHIJKLMNOPQTUVWXY

A message of

flee at once. we are discovered!

enciphers to

SIAA ZQ LKBA. VA ZOA RFPBLUAOAR!

Traditionally, the ciphertext is written out in blocks of fixed length, omitting punctuation and spaces; this is done to help avoid transmission errors and to disguise word boundaries from the plaintext
Plaintext
In cryptography, plaintext is information a sender wishes to transmit to a receiver. Cleartext is often used as a synonym. Before the computer era, plaintext most commonly meant message text in the language of the communicating parties....

. These blocks are called "groups", and sometimes a "group count" (i.e., the number of groups) is given as an additional check. Five letter groups are traditional, dating from when messages used to be transmitted by telegraph
Telegraphy
Telegraphy is the long-distance transmission of messages via some form of signalling technology. Telegraphy requires messages to be converted to a code which is known to both sender and receiver...

:

SIAAZ QLKBA VAZOA RFPBL UAOAR

If the length of the message happens not to be divisible by five, it may be padded at the end with "null
Null
-In computing:* Null , a special marker and keyword in SQL* Null character, the zero-valued ASCII character, also designated by NUL, often used as a terminator, separator or filler* Null device, a special computer file that discards all data written to it...

s". These can be any characters that decrypt to obvious nonsense, so the receiver can easily spot them and discard them.

The ciphertext alphabet is sometimes different from the plaintext alphabet; for example, in the pigpen cipher
Pigpen cipher
The pigpen cipher is a geometric simple substitution cipher which exchanges letters for symbols which are fragments of a grid...

, the ciphertext consists of a set of symbols derived from a grid. For example:

Such features make little difference to the security of a scheme, however – at the very least, any set of strange symbols can be transcribed back into an A-Z alphabet and dealt with as normal.

In lists and catalogues for sales people sometimes a very simple encryption is used to replace numeric digits by letters.
Plain digits: 1234567890
Ciphertext alphabet: MAKEPROFIT 

Example: MAT would be used to represent 120.

Security for simple substitution ciphers

A disadvantage of this method of derangement is that the last letters of the alphabet (which are mostly low frequency) tend to stay at the end. A stronger way of constructing a mixed alphabet is to perform a columnar transposition on the ordinary alphabet using the keyword, but this is not often done.

Although the number of possible key
Key (cryptography)
In cryptography, a key is a piece of information that determines the functional output of a cryptographic algorithm or cipher. Without a key, the algorithm would produce no useful result. In encryption, a key specifies the particular transformation of plaintext into ciphertext, or vice versa...

s is very large (26! ≈ 288.4, or about 88 bits
Key size
In cryptography, key size or key length is the size measured in bits of the key used in a cryptographic algorithm . An algorithm's key length is distinct from its cryptographic security, which is a logarithmic measure of the fastest known computational attack on the algorithm, also measured in bits...

), this cipher is not very strong, being easily broken. Provided the message is of reasonable length (see below), the cryptanalyst
Cryptanalysis
Cryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information that is normally required to do so. Typically, this involves knowing how the system works and finding a secret key...

 can deduce the probable meaning of the most common symbols by analyzing the frequency distribution
Frequency distribution
In statistics, a frequency distribution is an arrangement of the values that one or more variables take in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, and in this way, the table summarizes the distribution of...

 of the ciphertext—frequency analysis
Frequency analysis
In cryptanalysis, frequency analysis is the study of the frequency of letters or groups of letters in a ciphertext. The method is used as an aid to breaking classical ciphers....

. This allows formation of partial words, which can be tentatively filled in, progressively expanding the (partial) solution (see frequency analysis for a demonstration of this). In some cases, underlying words can also be determined from the pattern of their letters; for example, attract, osseous, and words with those two as the root are the only common English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

 words with the pattern ABBCADB. Many people solve such ciphers for recreation, as with cryptogram
Cryptogram
A cryptogram is a type of puzzle which consists of a short piece of encrypted text. Generally the cipher used to encrypt the text is simple enough that cryptogram can be solved by hand. Frequently used are substitution ciphers where each letter is replaced by a different letter or number. To solve...

 puzzles in the newspaper.

According to the unicity distance
Unicity distance
In cryptography, unicity distance is the length of an original ciphertext needed to break the cipher by reducing the number of possible spurious keys to zero in a brute force attack. That is, after trying every possible key, there should be just one decipherment that makes sense, i.e...

 of English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

, 27.6 letters of ciphertext are required to crack a mixed alphabet simple substitution. In practice, typically about 50 letters are needed, although some messages can be broken with fewer if unusual patterns are found. In other cases, the plaintext can be contrived to have a nearly flat frequency distribution, and much longer plaintexts will then be required by the user.

Homophonic substitution

An early attempt to increase the difficulty of frequency analysis attacks on substitution ciphers was to disguise plaintext letter frequencies by homophony. In these ciphers, plaintext letters map to more than one ciphertext symbol. Usually, the highest-frequency plaintext symbols are given more equivalents than lower frequency letters. In this way, the frequency distribution is flattened, making analysis more difficult.

Since more than 26 characters will be required in the ciphertext alphabet, various solutions are employed to invent larger alphabets. Perhaps the simplest is to use a numeric substitution 'alphabet'. Another method consists of simple variations on the existing alphabet; uppercase, lowercase, upside down, etc. More artistically, though not necessarily more securely, some homophonic ciphers employed wholly invented alphabets of fanciful symbols. (See Poe
Edgar Allan Poe
Edgar Allan Poe was an American author, poet, editor and literary critic, considered part of the American Romantic Movement. Best known for his tales of mystery and the macabre, Poe was one of the earliest American practitioners of the short story and is considered the inventor of the detective...

's "The Gold-Bug
The Gold-Bug
"The Gold-Bug" is a short story by Edgar Allan Poe. Set on Sullivan's Island, South Carolina, the plot follows William Legrand, who was recently bitten by a gold-colored bug. His servant Jupiter fears him to be going insane and goes to Legrand's friend, an unnamed narrator who agrees to visit his...

" for a literary example; cf. the Voynich manuscript
Voynich manuscript
The Voynich manuscript, described as "the world's most mysterious manuscript", is a work which dates to the early 15th century, possibly from northern Italy. It is named after the book dealer Wilfrid Voynich, who purchased it in 1912....

.)

An interesting variant is the nomenclator. Named after the public official who announced the titles of visiting dignitaries, this cipher combined a small codebook
Codebook
A codebook is a type of document used for gathering and storing codes. Originally codebooks were often literally books, but today codebook is a byword for the complete record of a series of codes, regardless of physical format.-Cryptography:...

 with large homophonic substitution tables. Originally the code
Code (cryptography)
In cryptography, a code is a method used to transform a message into an obscured form, preventing those who do not possess special information, or key, required to apply the transform from understanding what is actually transmitted. The usual method is to use a codebook with a list of common...

 was restricted to the names of important people, hence the name of the cipher; in later years it covered many common words and place names as well. The symbols for whole words (codewords in modern parlance) and letters (cipher
Cipher
In cryptography, a cipher is an algorithm for performing encryption or decryption — a series of well-defined steps that can be followed as a procedure. An alternative, less common term is encipherment. In non-technical usage, a “cipher” is the same thing as a “code”; however, the concepts...

 in modern parlance) were not distinguished in the ciphertext. The Rossignols
Rossignols
The Rossignols, a family of French cryptographers and cryptanalysts, included:* Antoine Rossignol * Bonaventure Rossignol* Antoine-Bonaventure RossignolThe family name meant "nightingale" in French...

' Great Cipher
Great Cipher
In the history of cryptography, the Great Cipher or Grand Chiffre was a nomenclator cipher developed by the Rossignols, several generations of whom served the French Crown as cryptographers. The Great Cipher was excellent of its class and so was given this name; it was reputed to be unbreakable...

 used by Louis XIV of France
Louis XIV of France
Louis XIV , known as Louis the Great or the Sun King , was a Bourbon monarch who ruled as King of France and Navarre. His reign, from 1643 to his death in 1715, began at the age of four and lasted seventy-two years, three months, and eighteen days...

 was one; after it went out of use, messages in French archive
Archive
An archive is a collection of historical records, or the physical place they are located. Archives contain primary source documents that have accumulated over the course of an individual or organization's lifetime, and are kept to show the function of an organization...

s were unbroken for several hundred years.

Nomenclators were the standard fare of diplomatic
Diplomacy
Diplomacy is the art and practice of conducting negotiations between representatives of groups or states...

 correspondence, espionage
Espionage
Espionage or spying involves an individual obtaining information that is considered secret or confidential without the permission of the holder of the information. Espionage is inherently clandestine, lest the legitimate holder of the information change plans or take other countermeasures once it...

, and advanced political conspiracy
Conspiracy (political)
In a political sense, conspiracy refers to a group of persons united in the goal of usurping or overthrowing an established political power. Typically, the final goal is to gain power through a revolutionary coup d'état or through assassination....

 from the early fifteenth century to the late eighteenth century; most conspirators were and have remained less cryptographically sophisticated. Although government
Government
Government refers to the legislators, administrators, and arbitrators in the administrative bureaucracy who control a state at a given time, and to the system of government by which they are organized...

 intelligence
Intelligence agency
An intelligence agency is a governmental agency that is devoted to information gathering for purposes of national security and defence. Means of information gathering may include espionage, communication interception, cryptanalysis, cooperation with other institutions, and evaluation of public...

 cryptanalysts were systematically breaking nomenclators by the mid-sixteenth century, and superior systems had been available since 1467, the usual response to cryptanalysis
Cryptanalysis
Cryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information that is normally required to do so. Typically, this involves knowing how the system works and finding a secret key...

 was simply to make the tables larger. By the late eighteenth century, when the system was beginning to die out, some nomenclators had 50,000 symbols.

Nevertheless, not all nomenclators were broken; today, cryptanalysis of archived ciphertexts remains a fruitful area of historical research
History
History is the discovery, collection, organization, and presentation of information about past events. History can also mean the period of time after writing was invented. Scholars who write about history are called historians...

.

The Beale Ciphers
Beale ciphers
The Beale ciphers are a set of three ciphertexts, one of which allegedly states the location of a buried treasure of gold, silver and jewels estimated to be worth over USD$63 million as of September, 2011. The other two ciphertexts allegedly describe the content of the treasure, and list the names...

 are another example of a homophonic cipher. This is a fascinating story of buried treasure that was described in the 1819-21 period by use of a ciphered text that was keyed to the Declaration of Independence. Here each ciphertext character was represented by a number. The number was determined by taking the plaintext character and finding a word in the Declaration of Independence that started with that character and using the numerical position of that word in the Declaration of Independence as the encrypted form of that letter. Since many words in the Declaration of Independence start with the same letter, the encryption of that character could be any of the numbers associated with the words in the Declaration of Independence that start with that letter. Deciphering the encrypted text character X (which is a number) is as simple as looking up the Xth word of the Declaration of Independence and using the first letter of that word as the decrypted character.

Another homophonic cipher was described by Stahl and was one of the first attempts to provide for computer security of data systems in computers through encryption. In Stahl's method, since plaintext and ciphertext were stored as binary strings of digits, he constructed the cipher in such a way that the number of homophones for a given character was in proportion to the frequency of the character, thus making frequency analysis much more difficult.

The book cipher
Book cipher
A book cipher is a cipher in which the key is some aspect of a book or other piece of text; books being common and widely available in modern times, users of book ciphers take the position that the details of the key is sufficiently well hidden from attackers in practice. This is in some ways an...

 and straddling checkerboard
Straddling checkerboard
In cryptography, a straddling checkerboard is a device for converting an alphabetic plaintext into digits whilst simultaneously achieving fractionation and data compression relative to other schemes using digits...

 are types of homophonic cipher.

Polyalphabetic substitution

Polyalphabetic substitution ciphers were first described in 1467 by Leone Battista Alberti
Leone Battista Alberti
Leon Battista Alberti was an Italian author, artist, architect, poet, priest, linguist, philosopher, cryptographer and general Renaissance humanist polymath...

 in the form of disks. Johannes Trithemius
Johannes Trithemius
Johannes Trithemius , born Johann Heidenberg, was a German abbot, lexicographer, historian, cryptographer, polymath and occultist who had an influence on later occultism. The name by which he is more commonly known is derived from his native town of Trittenheim on the Mosel in Germany.-Life:He...

, in his book Steganographia (Ancient Greek
Ancient Greek
Ancient Greek is the stage of the Greek language in the periods spanning the times c. 9th–6th centuries BC, , c. 5th–4th centuries BC , and the c. 3rd century BC – 6th century AD of ancient Greece and the ancient world; being predated in the 2nd millennium BC by Mycenaean Greek...

 for "hidden writing") introduced the now more standard form of a tableau (see below; ca. 1500 but not published until much later). A more sophisticated version using mixed alphabets was described in 1563 by Giovanni Battista della Porta in his book, De Furtivis Literarum Notis (Latin
Latin
Latin is an Italic language originally spoken in Latium and Ancient Rome. It, along with most European languages, is a descendant of the ancient Proto-Indo-European language. Although it is considered a dead language, a number of scholars and members of the Christian clergy speak it fluently, and...

 for "On concealed characters in writing").

In a polyalphabetic cipher, multiple cipher alphabets are used. To facilitate encryption, all the alphabets are usually written out in a large table
Table (information)
A table is a means of arranging data in rows and columns.Production % of goalNorth 4087102%South 4093110% The use of tables is pervasive throughout all communication, research and data analysis. Tables appear in print media, handwritten notes, computer software, architectural...

, traditionally called a tableau. The tableau is usually 26×26, so that 26 full ciphertext alphabets are available. The method of filling the tableau, and of choosing which alphabet to use next, defines the particular polyalphabetic cipher. All such ciphers are easier to break than once believed, as substitution alphabets are repeated for sufficiently large plaintexts.

One of the most popular was that of Blaise de Vigenère
Blaise de Vigenère
Blaise de Vigenère was a French diplomat and cryptographer. The Vigenère cipher is so named due to the cipher being incorrectly attributed to him in the 19th century....

. First published in 1585, it was considered unbreakable until 1863, and indeed was commonly called le chiffre indéchiffrable (French
French language
French is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...

 for "indecipherable cipher").

In the Vigenère cipher
Vigenère cipher
The Vigenère cipher is a method of encrypting alphabetic text by using a series of different Caesar ciphers based on the letters of a keyword. It is a simple form of polyalphabetic substitution....

, the first row of the tableau is filled out with a copy of the plaintext alphabet, and successive rows are simply shifted one place to the left. (Such a simple tableau is called a tabula recta
Tabula recta
In cryptography, the tabula recta is a square table of alphabets, each row of which is made by shifting the previous one to the left...

, and mathematically corresponds to adding the plaintext and key letters, modulo
Modular arithmetic
In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" after they reach a certain value—the modulus....

 26.) A keyword is then used to choose which ciphertext alphabet to use. Each letter of the keyword is used in turn, and then they are repeated again from the beginning. So if the keyword is 'CAT', the first letter of plaintext is enciphered under alphabet 'C', the second under 'A', the third under 'T', the fourth under 'C' again, and so on. In practice, Vigenère keys were often phrases several words long.

In 1863, Friedrich Kasiski
Friedrich Kasiski
Major Friedrich Wilhelm Kasiski was a Prussian infantry officer, cryptographer and archeologist. Kasiski was born in Schlochau, West Prussia .-Military service:...

 published a method (probably discovered secretly and independently before the Crimean War
Crimean War
The Crimean War was a conflict fought between the Russian Empire and an alliance of the French Empire, the British Empire, the Ottoman Empire, and the Kingdom of Sardinia. The war was part of a long-running contest between the major European powers for influence over territories of the declining...

 by Charles Babbage
Charles Babbage
Charles Babbage, FRS was an English mathematician, philosopher, inventor and mechanical engineer who originated the concept of a programmable computer...

) which enabled the calculation of the length of the keyword in a Vigenère ciphered message. Once this was done, ciphertext letters that had been enciphered under the same alphabet could be picked out and attacked separately as a number of semi-independent simple substitutions - complicated by the fact that within one alphabet letters were separated and did not form complete words, but simplified by the fact that usually a tabula recta had been employed.

As such, even today a Vigenère type cipher should theoretically be difficult to break if mixed alphabets are used in the tableau, if the keyword is random, and if the total length of ciphertext is less than 27.6 times the length of the keyword. These requirements are rarely understood in practice, and so Vigenère enciphered message security is usually less than might have been.

Other notable polyalphabetics include:
  • The Gronsfeld cipher. This is identical to the Vigenère except that only 10 alphabets are used, and so the "keyword" is numerical.
  • The Beaufort cipher
    Beaufort cipher
    The Beaufort cipher, created by Sir Francis Beaufort, is a substitution cipher that is similar to the Vigenère cipher but uses a slightly modified enciphering mechanism and tableau....

    . This is practically the same as the Vigenère, except the tabula recta is replaced by a backwards one, mathematically equivalent to ciphertext = key - plaintext. This operation is self-inverse, whereby the same table is used for both encryption and decryption.
  • The autokey cipher
    Autokey cipher
    An autokey cipher is a cipher which incorporates the message into the key. There are two forms of autokey cipher: key autokey and text autokey ciphers. A key-autokey cipher uses previous members of the keystream to determine the next element in the keystream...

    , which mixes plaintext with a key to avoid periodic
    Periodic function
    In mathematics, a periodic function is a function that repeats its values in regular intervals or periods. The most important examples are the trigonometric functions, which repeat over intervals of length 2π radians. Periodic functions are used throughout science to describe oscillations,...

    ity.
  • The running key cipher
    Running key cipher
    In classical cryptography, the running key cipher is a type of polyalphabetic substitution cipher in which a text, typically from a book, is used to provide a very long keystream...

    , where the key is made very long by using a passage from a book or similar text.


Modern stream cipher
Stream cipher
In cryptography, a stream cipher is a symmetric key cipher where plaintext digits are combined with a pseudorandom cipher digit stream . In a stream cipher the plaintext digits are encrypted one at a time, and the transformation of successive digits varies during the encryption...

s can also be seen, from a sufficiently abstract perspective, to be a form of polyalphabetic cipher in which all the effort has gone into making the keystream
Keystream
In cryptography, a keystream is a stream of random or pseudorandom characters that are combined with a plaintext message to produce an encrypted message ....

 as long and unpredictable as possible.

Polygraphic substitution

In a polygraphic substitution cipher, plaintext letters are substituted in larger groups, instead of substituting letters individually. The first advantage is that the frequency distribution is much flatter than that of individual letters (though not actually flat in real languages; for example, 'TH' is much more common than 'XQ' in English). Second, the larger number of symbols requires correspondingly more ciphertext to productively analyze letter frequencies.

To substitute pairs of letters would take a substitution alphabet 676 symbols long (). In the same De Furtivis Literarum Notis mentioned above, della Porta actually proposed such a system, with a 20 x 20 tableau (for the 20 letters of the Italian/Latin alphabet he was using) filled with 400 unique glyph
Glyph
A glyph is an element of writing: an individual mark on a written medium that contributes to the meaning of what is written. A glyph is made up of one or more graphemes....

s. However the system was impractical and probably never actually used.

The earliest practical digraphic cipher (pairwise substitution), was the so-called Playfair cipher
Playfair cipher
The Playfair cipher or Playfair square is a manual symmetric encryption technique and was the first literal digraph substitution cipher. The scheme was invented in 1854 by Charles Wheatstone, but bears the name of Lord Playfair who promoted the use of the cipher.The technique encrypts pairs of...

, invented by Sir Charles Wheatstone
Charles Wheatstone
Sir Charles Wheatstone FRS , was an English scientist and inventor of many scientific breakthroughs of the Victorian era, including the English concertina, the stereoscope , and the Playfair cipher...

 in 1854. In this cipher, a 5 x 5 grid is filled with the letters of a mixed alphabet (two letters, usually I and J, are combined). A digraphic substitution is then simulated by taking pairs of letters as two corners of a rectangle, and using the other two corners as the ciphertext (see the Playfair cipher
Playfair cipher
The Playfair cipher or Playfair square is a manual symmetric encryption technique and was the first literal digraph substitution cipher. The scheme was invented in 1854 by Charles Wheatstone, but bears the name of Lord Playfair who promoted the use of the cipher.The technique encrypts pairs of...

 main article for a diagram). Special rules handle double letters and pairs falling in the same row or column. Playfair was in military use from the Boer War
Second Boer War
The Second Boer War was fought from 11 October 1899 until 31 May 1902 between the British Empire and the Afrikaans-speaking Dutch settlers of two independent Boer republics, the South African Republic and the Orange Free State...

 through World War II
World War II
World War II, or the Second World War , was a global conflict lasting from 1939 to 1945, involving most of the world's nations—including all of the great powers—eventually forming two opposing military alliances: the Allies and the Axis...

.

Several other practical polygraphics were introduced in 1901 by Felix Delastelle
Felix Delastelle
Félix Marie Delastelle was a Frenchman most famous for his invention of several systems of polygraphic substitution ciphers including the bifid, trifid, and the four-square ciphers....

, including the bifid
Bifid cipher
In classical cryptography, the bifid cipher is a cipher which combines the Polybius square with transposition, and uses fractionation to achieve diffusion...

 and four-square cipher
Four-square cipher
The four-square cipher is a manual symmetric encryption technique. It was invented by famous French cryptographer Felix Delastelle.The technique encrypts pairs of letters , and thus falls into a category of ciphers known as polygraphic substitution ciphers...

s (both digraphic) and the trifid cipher
Trifid cipher
In classical cryptography, the trifid cipher is a cipher invented around 1901 by Felix Delastelle, which extends the concept of the bifid cipher to a third dimension, allowing each symbol to be fractionated into 3 elements instead of two...

 (probably the first practical trigraphic).

The Hill cipher
Hill cipher
In classical cryptography, the Hill cipher is a polygraphic substitution cipher based on linear algebra. Invented by Lester S. Hill in 1929, it was the first polygraphic cipher in which it was practical to operate on more than three symbols at once. The following discussion assumes an elementary...

, invented in 1929 by Lester S. Hill
Lester S. Hill
Lester S. Hill was an American mathematician and educator who was interested in applications of mathematics to communications. He received a Bachelor's degree from Columbia College and a Ph.D. from Yale University . He taught at the University of Montana, Princeton University, the University of...

, is a polygraphic substitution which can combine much larger groups of letters simultaneously using linear algebra
Linear algebra
Linear algebra is a branch of mathematics that studies vector spaces, also called linear spaces, along with linear functions that input one vector and output another. Such functions are called linear maps and can be represented by matrices if a basis is given. Thus matrix theory is often...

. Each letter is treated as a digit in base 26
Numeral system
A numeral system is a writing system for expressing numbers, that is a mathematical notation for representing numbers of a given set, using graphemes or symbols in a consistent manner....

: A = 0, B =1, and so on. (In a variation, 3 extra symbols are added to make the basis
Basis (linear algebra)
In linear algebra, a basis is a set of linearly independent vectors that, in a linear combination, can represent every vector in a given vector space or free module, or, more simply put, which define a "coordinate system"...

 prime
Prime number
A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. A natural number greater than 1 that is not a prime number is called a composite number. For example 5 is prime, as only 1 and 5 divide it, whereas 6 is composite, since it has the divisors 2...

.) A block of n letters is then considered as a vector
Vector space
A vector space is a mathematical structure formed by a collection of vectors: objects that may be added together and multiplied by numbers, called scalars in this context. Scalars are often taken to be real numbers, but one may also consider vector spaces with scalar multiplication by complex...

 of n dimension
Dimension
In physics and mathematics, the dimension of a space or object is informally defined as the minimum number of coordinates needed to specify any point within it. Thus a line has a dimension of one because only one coordinate is needed to specify a point on it...

s, and multiplied by a n x n matrix
Matrix (mathematics)
In mathematics, a matrix is a rectangular array of numbers, symbols, or expressions. The individual items in a matrix are called its elements or entries. An example of a matrix with six elements isMatrices of the same size can be added or subtracted element by element...

, modulo
Modular arithmetic
In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" after they reach a certain value—the modulus....

 26. The components of the matrix are the key, and should be random provided that the matrix is invertible in (to ensure decryption is possible). A Hill cipher of dimension 6 was once implemented mechanically.

The Hill cipher is vulnerable to a known-plaintext attack
Known-plaintext attack
The known-plaintext attack is an attack model for cryptanalysis where the attacker has samples of both the plaintext , and its encrypted version . These can be used to reveal further secret information such as secret keys and code books...

 because it is completely linear
Linear
In mathematics, a linear map or function f is a function which satisfies the following two properties:* Additivity : f = f + f...

, so it must be combined with some non-linear step to defeat this attack. The combination of wider and wider weak, linear diffusive
Confusion and diffusion
In cryptography, confusion and diffusion are two properties of the operation of a secure cipher which were identified by Claude Shannon in his paper Communication Theory of Secrecy Systems, published in 1949....

 steps like a Hill cipher, with non-linear substitution steps, ultimately leads to a substitution-permutation network
Substitution-permutation network
In cryptography, an SP-network, or substitution-permutation network , is a series of linked mathematical operations used in block cipher algorithms such as AES .Other ciphers that use SPNs are 3-Way, SAFER, SHARK, and Square....

 (e.g. a Feistel cipher
Feistel cipher
In cryptography, a Feistel cipher is a symmetric structure used in the construction of block ciphers, named after the German-born physicist and cryptographer Horst Feistel who did pioneering research while working for IBM ; it is also commonly known as a Feistel network. A large proportion of block...

), so it is possible – from this extreme perspective – to consider modern block cipher
Block cipher
In cryptography, a block cipher is a symmetric key cipher operating on fixed-length groups of bits, called blocks, with an unvarying transformation. A block cipher encryption algorithm might take a 128-bit block of plaintext as input, and output a corresponding 128-bit block of ciphertext...

s as a type of polygraphic substitution.

Mechanical substitution ciphers

Between circa World War I
World War I
World War I , which was predominantly called the World War or the Great War from its occurrence until 1939, and the First World War or World War I thereafter, was a major war centred in Europe that began on 28 July 1914 and lasted until 11 November 1918...

 and the widespread availability of computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...

s (for some governments this was approximately the 1950s or 1960s; for other organizations it was a decade or more later; for individuals it was no earlier than 1975), mechanical implementations of polyalphabetic substitution ciphers were widely used. Several inventors had similar ideas about the same time, and rotor cipher machine
Rotor machine
In cryptography, a rotor machine is an electro-mechanical device used for encrypting and decrypting secret messages. Rotor machines were the cryptographic state-of-the-art for a prominent period of history; they were in widespread use in the 1920s–1970s...

s were patented four times in 1919. The most important of the resulting machines was the Enigma
Enigma machine
An Enigma machine is any of a family of related electro-mechanical rotor cipher machines used for the encryption and decryption of secret messages. Enigma was invented by German engineer Arthur Scherbius at the end of World War I...

, especially in the versions used by the German military
Wehrmacht
The Wehrmacht – from , to defend and , the might/power) were the unified armed forces of Nazi Germany from 1935 to 1945. It consisted of the Heer , the Kriegsmarine and the Luftwaffe .-Origin and use of the term:...

 from approximately 1930. The Allies
Allies
In everyday English usage, allies are people, groups, or nations that have joined together in an association for mutual benefit or to achieve some common purpose, whether or not explicit agreement has been worked out between them...

 also developed and used rotor machines (e.g., SIGABA
SIGABA
In the history of cryptography, the ECM Mark II was a cipher machine used by the United States for message encryption from World War II until the 1950s...

 and Typex
Typex
In the history of cryptography, Typex machines were British cipher machines used from 1937. It was an adaptation of the commercial German Enigma with a number of enhancements that greatly increased its security....

).

All of these were similar in that the substituted letter was chosen electrically from amongst the huge number of possible combinations resulting from the rotation of several letter disks. Since one or more of the disks rotated mechanically with each plaintext letter enciphered, the number of alphabets used was substantially more than astronomical. Early versions of these machine were, nevertheless, breakable. William F. Friedman
William F. Friedman
William Frederick Friedman was a US Army cryptographer who ran the research division of the Army's Signals Intelligence Service in the 1930s, and parts of its follow-on services into the 1950s...

 of the US Army's SIS
Signals Intelligence Service
The Signals Intelligence Service was the United States Army codebreaking division, headquartered at Arlington Hall. It was a part of the Signal Corps so secret that outside the office of the Chief Signal officer, it did not officially exist. William Friedman began the division with three "junior...

 early found vulnerabilities in Hebern's rotor machine
Hebern rotor machine
The Hebern Rotor Machine was an electro-mechanical encryption machine built by combining the mechanical parts of a standard typewriter with the electrical parts of an electric typewriter, connecting the two through a scrambler...

, and GC&CS's Dillwyn Knox solved versions of the Enigma machine (those without the "plugboard") well before WWII
World War II
World War II, or the Second World War , was a global conflict lasting from 1939 to 1945, involving most of the world's nations—including all of the great powers—eventually forming two opposing military alliances: the Allies and the Axis...

 began. Traffic protected by essentially all of the German military Enigmas was broken by Allied cryptanalysts, most notably those at Bletchley Park
Bletchley Park
Bletchley Park is an estate located in the town of Bletchley, in Buckinghamshire, England, which currently houses the National Museum of Computing...

, beginning with the German Army variant used in the early 1930s. This version was broken by inspired mathematical insight by Marian Rejewski
Marian Rejewski
Marian Adam Rejewski was a Polish mathematician and cryptologist who in 1932 solved the plugboard-equipped Enigma machine, the main cipher device used by Germany...

 in Poland
Poland
Poland , officially the Republic of Poland , is a country in Central Europe bordered by Germany to the west; the Czech Republic and Slovakia to the south; Ukraine, Belarus and Lithuania to the east; and the Baltic Sea and Kaliningrad Oblast, a Russian exclave, to the north...

.

No messages protected by the SIGABA
SIGABA
In the history of cryptography, the ECM Mark II was a cipher machine used by the United States for message encryption from World War II until the 1950s...

 and Typex
Typex
In the history of cryptography, Typex machines were British cipher machines used from 1937. It was an adaptation of the commercial German Enigma with a number of enhancements that greatly increased its security....

 machines were ever, so far as is publicly known, broken.

The one-time pad

One type of substitution cipher, the one-time pad
One-time pad
In cryptography, the one-time pad is a type of encryption, which has been proven to be impossible to crack if used correctly. Each bit or character from the plaintext is encrypted by a modular addition with a bit or character from a secret random key of the same length as the plaintext, resulting...

, is quite special. It was invented near the end of WWI by Gilbert Vernam
Gilbert Vernam
Gilbert Sandford Vernam was an AT&T Bell Labs engineer who, in 1917, invented the stream cipher and later co-invented the one-time pad cipher. Vernam proposed a teleprinter cipher in which a previously-prepared key, kept on paper tape, is combined character by character with the plaintext message...

 and Joseph Mauborgne
Joseph Mauborgne
In the history of cryptography, Joseph Oswald Mauborgne co-invented the one-time pad with Gilbert Vernam of Bell Labs. In 1914 he published the first recorded solution of the Playfair cipher...

 in the US. It was mathematically proven unbreakable by Claude Shannon, probably during WWII; his work was first published in the late 1940s. In its most common implementation, the one-time pad can be called a substitution cipher only from an unusual perspective; typically, the plaintext letter is combined (not substituted) in some manner (e.g., XOR) with the key material character at that position.

The one-time pad is, in most cases, impractical as it requires that the key material be as long as the plaintext, actually random, used once and only once, and kept entirely secret from all except the sender and intended receiver. When these conditions are violated, even marginally, the one-time pad is no longer unbreakable. Soviet
Soviet Union
The Soviet Union , officially the Union of Soviet Socialist Republics , was a constitutionally socialist state that existed in Eurasia between 1922 and 1991....

 one-time pad messages sent from the US for a brief time during WWII used non-random key material. US cryptanalysts, beginning in the late 40s, were able to, entirely or partially, break a few thousand messages out of several hundred thousand. (See VENONA)

In a mechanical implementation, rather like the ROCKEX
Rockex
Rockex, or Telekrypton, was an offline one-time tape cipher machine known to have been used by Britain and Canada from 1943. It was developed by Benjamin deForest Bayly, working during the war for British Security Coordination....

 equipment, the one-time pad was used for messages sent on the Moscow
Moscow
Moscow is the capital, the most populous city, and the most populous federal subject of Russia. The city is a major political, economic, cultural, scientific, religious, financial, educational, and transportation centre of Russia and the continent...

-Washington
Washington, D.C.
Washington, D.C., formally the District of Columbia and commonly referred to as Washington, "the District", or simply D.C., is the capital of the United States. On July 16, 1790, the United States Congress approved the creation of a permanent national capital as permitted by the U.S. Constitution....

 hot line established after the Cuban missile crisis
Cuban Missile Crisis
The Cuban Missile Crisis was a confrontation among the Soviet Union, Cuba and the United States in October 1962, during the Cold War...

.

Substitution in modern cryptography

Substitution ciphers as discussed above, especially the older pencil-and-paper hand ciphers, are no longer in serious use. However, the cryptographic concept of substitution carries on even today. From a sufficiently abstract perspective, modern bit-oriented block cipher
Block cipher
In cryptography, a block cipher is a symmetric key cipher operating on fixed-length groups of bits, called blocks, with an unvarying transformation. A block cipher encryption algorithm might take a 128-bit block of plaintext as input, and output a corresponding 128-bit block of ciphertext...

s (e.g., DES
Data Encryption Standard
The Data Encryption Standard is a block cipher that uses shared secret encryption. It was selected by the National Bureau of Standards as an official Federal Information Processing Standard for the United States in 1976 and which has subsequently enjoyed widespread use internationally. It is...

, or AES
Advanced Encryption Standard
Advanced Encryption Standard is a specification for the encryption of electronic data. It has been adopted by the U.S. government and is now used worldwide. It supersedes DES...

) can be viewed as substitution ciphers on an enormously large binary
Binary numeral system
The binary numeral system, or base-2 number system, represents numeric values using two symbols, 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2...

 alphabet. In addition, block ciphers often include smaller substitution tables called S-boxes. See also substitution-permutation network
Substitution-permutation network
In cryptography, an SP-network, or substitution-permutation network , is a series of linked mathematical operations used in block cipher algorithms such as AES .Other ciphers that use SPNs are 3-Way, SAFER, SHARK, and Square....

.

Substitution ciphers in popular culture

  • Sherlock Holmes
    Sherlock Holmes
    Sherlock Holmes is a fictional detective created by Scottish author and physician Sir Arthur Conan Doyle. The fantastic London-based "consulting detective", Holmes is famous for his astute logical reasoning, his ability to take almost any disguise, and his use of forensic science skills to solve...

     breaks a substitution cipher in "The Adventure of the Dancing Men
    The Adventure of the Dancing Men
    "The Adventure of the Dancing Men", one of the 56 Sherlock Holmes short stories written by British author Sir Arthur Conan Doyle, is one of 13 stories in the cycle collected as The Return of Sherlock Holmes....

    ".
  • The Al Bhed language in Final Fantasy X
    Final Fantasy X
    is a role-playing video game developed and published by Square as the tenth title in the Final Fantasy series. It was released in 2001 for Sony's PlayStation 2, and will be re-released for PlayStation 3 and PlayStation Vita in 2012...

    is actually a substitution cipher, although it is pronounced phonetically (i.e. "you" in English is translated to "oui" in Al Bhed, but is pronounced the same way that "oui" is pronounced in French
    French language
    French is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...

    ).
  • The Minbari
    Minbari
    The Minbari are a fictional alien race featured in the television show Babylon 5. The Minbari characters of Delenn and Lennier figure prominently throughout the series; Neroon, Draal, and Dukhat are less prominent Minbari characters....

    's alphabet from the Babylon 5
    Babylon 5
    Babylon 5 is an American science fiction television series created, produced and largely written by J. Michael Straczynski. The show centers on a space station named Babylon 5: a focal point for politics, diplomacy, and conflict during the years 2257–2262...

    series is a substitution cipher from English.
  • The language in Starfox Adventures: Dinosaur Planet spoken by native Saurians and Krystal is also a substitution cipher of the English alphabet
    English alphabet
    The modern English alphabet is a Latin alphabet consisting of 26 letters and 2 ligatures – the same letters that are found in the Basic modern Latin alphabet:...

    .
  • The television program Futurama
    Futurama
    Futurama is an American animated science fiction sitcom created by Matt Groening and developed by Groening and David X. Cohen for the Fox Broadcasting Company. The series follows the adventures of a late 20th-century New York City pizza delivery boy, Philip J...

    contained a substitution cipher in which all 26 letters were replaced by symbols and called "Alien Language". This was deciphered rather quickly by the die hard viewers by showing a "Slurm" ad with the word "Drink" in both plain English and the Alien language thus giving the key. Later, the producers created a second alien language that used a combination of replacement and mathematical Ciphers. Once the English letter of the alien language is deciphered, then the numerical value of that letter (1 through 26 respectively) is then added to the value of the previous letter showing the actual intended letter. These messages can be seen throughout every episode of the series and the subsequent movies.

See also

  • Ban (information)
    Ban (information)
    A ban, sometimes called a hartley or a dit , is a logarithmic unit which measures information or entropy, based on base 10 logarithms and powers of 10, rather than the powers of 2 and base 2 logarithms which define the bit. As a bit corresponds to a binary digit, so a ban is a decimal digit...

     with Centiban Table
  • Copiale cipher
    Copiale cipher
    The Copiale cipher is an encrypted manuscript consisting of 75,000 handwritten characters filling 105 pages in a bound volume. It is thought to date to between 1760 and 1780. It was first examined at the German Academy of Sciences at Berlin in the 1970s but did not come to public attention until...

  • Dvorak encoding
    Dvorak encoding
    Dvorak encoding is a type of encoding based on the differences in layout of a QWERTY keyboard and a Dvorak keyboard. There are two main variations: One is achieved by typing in the Dvorak layout on a QWERTY keyboard, and the other is similarly achieved by typing in the QWERTY layout on a Dvorak...

  • Leet
    Leet
    Leet , also known as eleet or leetspeak, is an alternative alphabet for the English language that is used primarily on the Internet. It uses various combinations of ASCII characters to replace Latinate letters...

  • Vigenère cipher
    Vigenère cipher
    The Vigenère cipher is a method of encrypting alphabetic text by using a series of different Caesar ciphers based on the letters of a keyword. It is a simple form of polyalphabetic substitution....

  • Topics in cryptography

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK