Collision attack
Encyclopedia
In cryptography, a collision attack on a cryptographic hash tries to find two arbitrary inputs that will produce the same hash value, i.e. a hash collision
Hash collision
Not to be confused with wireless packet collision.In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest....

. In contrast to a preimage attack
Preimage attack
In cryptography, the preimage attack is a classification of attacks on hash functions for finding a message that has a specific hash value.There are two types of preimage attacks:...

, neither the hash value nor one of the inputs is specified.

There are roughly two types of collision attacks:
  • Collision attack: find two arbitrary different messages m1 and m2 such that hash(m1) = hash(m2).
  • Prefix collision attack: given two different prefixes p1, p2 find two appendages m1 and m2 such that hash(p1 ∥ m1) = hash(p2 ∥ m2) (where is the concatenation
    Concatenation
    In computer programming, string concatenation is the operation of joining two character strings end-to-end. For example, the strings "snow" and "ball" may be concatenated to give "snowball"...

     operation).

Classical collision attack

Mathematically stated, a collision attack finds two different messages m1 and m2, such that hash(m1) = hash(m2). In a classical collision attack, the attacker has no control over the content of either message, but they are arbitrarily chosen by the algorithm.

Much like symmetric-key ciphers are vulnerable to brute force attack
Brute force attack
In cryptography, a brute-force attack, or exhaustive key search, is a strategy that can, in theory, be used against any encrypted data. Such an attack might be utilized when it is not possible to take advantage of other weaknesses in an encryption system that would make the task easier...

s, every cryptographic hash function
Cryptographic hash function
A cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the hash value, such that an accidental or intentional change to the data will change the hash value...

 is inherently vulnerable to collisions using a birthday attack
Birthday attack
A birthday attack is a type of cryptographic attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties...

. Due to the birthday problem, these attacks are much faster than a brute force would be. A hash of n bits can be broken in 2n / 2 time (evaluations of the hash function).

More efficient attacks are possible by employing cryptanalysis
Cryptanalysis
Cryptanalysis is the study of methods for obtaining the meaning of encrypted information, without access to the secret information that is normally required to do so. Typically, this involves knowing how the system works and finding a secret key...

 to specific hash functions. When a collision attack is discovered and is found to be faster than a birthday attack, a hash function is often denounced as "broken". The NIST hash function competition
NIST hash function competition
The NIST hash function competition is an open competition held by the US National Institute of Standards and Technology for a new SHA-3 function to replace the older SHA-1 and SHA-2, which was formally announced in the Federal Register on November 2, 2007...

 was largely induced by published collision attacks against two very commonly used hash functions, MD5
MD5
The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...

 and SHA-1. The collision attacks against MD5 have improved so much that it takes just a few seconds on a regular computer.

Hash collisions created this way are usually constant length and largely unstructured, so cannot directly be applied to attack widespread document formats or protocols. However, workarounds are possible by abusing dynamic constructs present in many formats. Such a malicious document would contain two different messages in the same document, but conditionally displays one or the other, depending on which of two collided hash values is present:
  • Computer program
    Computer program
    A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

    s have conditional constructs (if-then-else) that allow testing whether a location in the file has one value or another.
  • Some document formats like PostScript
    PostScript
    PostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...

    , or macros in Microsoft Word
    Microsoft Word
    Microsoft Word is a word processor designed by Microsoft. It was first released in 1983 under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platforms including IBM PCs running DOS , the Apple Macintosh , the AT&T Unix PC , Atari ST , SCO UNIX,...

    , also have conditional constructs.
  • File formats that can include images, including TIFF and PDF, are vulnerable to collision attacks by using colliding hash values as indexed color
    Indexed color
    In computing, indexed color is a technique to manage digital images' colors in a limited fashion, in order to save computer memory and file storage, while speeding up display refresh and file transfers...

    s, such that text of one message is displayed with a bright color that blends into the background, and text of the other message is displayed with a dark color.

Chosen prefix collision attack

An extension of the collision attack is the chosen prefix collision attack, which is specific to Merkle–Damgård hash functions. In this case, the attacker can choose two arbitrarily different documents, and then append different calculated values that result in the whole documents having an equal hash value. This attack is much more powerful than a classical collision attack.

Mathematically stated, given two different prefixes p1, p2, the attack finds two appendages m1 and m2 such that hash(p1 ∥ m1) = hash(p2 ∥ m2) (where is the concatenation
Concatenation
In computer programming, string concatenation is the operation of joining two character strings end-to-end. For example, the strings "snow" and "ball" may be concatenated to give "snowball"...

 operation).

In 2007, a chosen prefix collision attack was found against MD5, requiring roughly 250 evaluations of the MD5 function. The paper also demonstrates two X.509
X.509
In cryptography, X.509 is an ITU-T standard for a public key infrastructure and Privilege Management Infrastructure . X.509 specifies, amongst other things, standard formats for public key certificates, certificate revocation lists, attribute certificates, and a certification path validation...

 certificates for different domain names, with colliding hash values. This means that a certificate authority
Certificate authority
In cryptography, a certificate authority, or certification authority, is an entity that issues digital certificates. The digital certificate certifies the ownership of a public key by the named subject of the certificate...

 could be asked to sign a certificate for one domain, and then that certificate could be used to impersonate another domain.

Perhaps the best known real-world collision attack was published in December 2008 when a group of security researchers published a forged X.509
X.509
In cryptography, X.509 is an ITU-T standard for a public key infrastructure and Privilege Management Infrastructure . X.509 specifies, amongst other things, standard formats for public key certificates, certificate revocation lists, attribute certificates, and a certification path validation...

 signing certificate that could be used to launch a rogue certificate authority
Certificate authority
In cryptography, a certificate authority, or certification authority, is an entity that issues digital certificates. The digital certificate certifies the ownership of a public key by the named subject of the certificate...

, taking advantage of a prefix collision attack against the MD5 hash function. This meant that an attacker could impersonate any SSL-secured website as a man-in-the-middle, subverting certificate validation in web browsers. What's worse, the rogue certificate would not be revokable by real authorities, and could also have an arbitrary forged expiry time. Even though MD5 was known to be very weak in 2004, certificate authorities were still willing to sign MD5-verified certificates in December 2008.

Attack scenarios

Many applications of crytographic hash functions do not rely on collision resistance
Collision resistance
Collision resistance is a property of cryptographic hash functions: a hash function is collision resistant if it is hard to find two inputs that hash to the same output; that is, two inputs a and b such that H = H, and a ≠ b.Every hash function with more inputs than outputs will necessarily have...

, thus collision attacks do not affect their security. For example, password hashing
Password cracking
Password cracking is the process of recovering passwords from data that has been stored in or transmitted by a computer system. A common approach is to repeatedly try guesses for the password...

 and HMAC
HMAC
In cryptography, HMAC is a specific construction for calculating a message authentication code involving a cryptographic hash function in combination with a secret key. As with any MAC, it may be used to simultaneously verify both the data integrity and the authenticity of a message...

s are not vulnerable. For the attack to be useful, the attacker must be in control of the input to the hash function.

Digital signatures

Because digital signature
Digital signature
A digital signature or digital signature scheme is a mathematical scheme for demonstrating the authenticity of a digital message or document. A valid digital signature gives a recipient reason to believe that the message was created by a known sender, and that it was not altered in transit...

 algorithms cannot sign a large amount of data efficiently, most implementations use a hash function to reduce ("compress") the amount of data that needs to be signed down to a constant size. Digital signature schemes are often vulnerable to hash collisions, unless using techniques like randomized hashing.

Note that all public key certificate
Public key certificate
In cryptography, a public key certificate is an electronic document which uses a digital signature to bind a public key with an identity — information such as the name of a person or an organization, their address, and so forth...

s, like SSL
Transport Layer Security
Transport Layer Security and its predecessor, Secure Sockets Layer , are cryptographic protocols that provide communication security over the Internet...

certificates, also rely on the security of digital signatures and are compromised by hash collisions.

The usual attack scenario goes like this:
  1. Mallory creates two different documents A and B, that have an identical hash value (collision).
  2. Mallory then sends document A to Alice, who agrees to what the document says, signs it and sends it back to Mallory.
  3. Mallory copies the signature sent by Alice from document A to document B.
  4. Then she sends document B to Bob, claiming that Alice signed the different document. Because the digital signature matches the document hash, Bob's software is unable to detect the modification.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK