Provably secure cryptographic hash function
Encyclopedia
In cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

, cryptographic hash functions can be divided into two main categories. In the first category are those functions whose designs are based on a mathematical problem and thus their security follows from rigorous mathematical proofs, complexity theory
Computational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...

 and formal reduction
Reduction (complexity)
In computability theory and computational complexity theory, a reduction is a transformation of one problem into another problem. Depending on the transformation used this can be used to define complexity classes on a set of problems....

. These functions are called Provably Secure Cryptographic Hash Functions. However this does not mean that such a function could not be broken. To construct them is very difficult and only a few examples were introduced. The practical use is limited.

In the second category are functions that are not based on mathematical problems but on an ad hoc basis, where the bits of the message are mixed to produce the hash. They are then believed to be hard to break, but no such formal proof is given. Almost all widely-spread hash functions fall in this category. Some of these functions are already broken and are no longer in use.

Types of Security of Hash Functions

Generally, the basic security of cryptographic hash functions can be seen from three different angles: pre-image resistance, second pre-image resistance and collision resistance.
  • Pre-image resistance: given a hash h it should be hard to find any message m such that h = hash(m). This concept is related to that of one way function. Functions that lack this property are vulnerable to pre-image attacks.
  • Second pre-image resistance: given an input m1, it should be hard to find another input, m2 (not equal to m1) such that hash(m1) = hash(m2). This property is sometimes referred to as weak collision resistance. Functions that lack this property are vulnerable to second pre-image attacks.
  • Collision resistance: it should be hard to find two different messages m1 and m2 such that hash(m1) = hash(m2). Such a pair is called a (cryptographic) hash collision. This property is sometimes referred to as strong collision resistance. It requires a hash value at least twice as long as what is required for pre-image resistance, otherwise collisions may be found by a birthday attack
    Birthday attack
    A birthday attack is a type of cryptographic attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties...

    .

The Meaning of "Hard"

The basic question is the meaning of "hard". There are two approaches to answer this question. First is the intuitive/practical approach: "hard means that it is almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important."

The second approach is theoretical and is based on the computational complexity theory
Computational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...

. If problem A is hard, there exists a formal security reduction
Reduction (complexity)
In computability theory and computational complexity theory, a reduction is a transformation of one problem into another problem. Depending on the transformation used this can be used to define complexity classes on a set of problems....

 from a problem which is widely supposed to be unsolvable in polynomial time, such as integer factorization
Integer factorization
In number theory, integer factorization or prime factorization is the decomposition of a composite number into smaller non-trivial divisors, which when multiplied together equal the original integer....

 problem or discrete logarithm
Discrete logarithm
In mathematics, specifically in abstract algebra and its applications, discrete logarithms are group-theoretic analogues of ordinary logarithms. In particular, an ordinary logarithm loga is a solution of the equation ax = b over the real or complex numbers...

 problem.

However, non-existence of a polynomial time algorithm does not automatically ensure that the system is secure. It is equally important to choose the parameters sensibly (e.g. a length of the numbers that the system works with). For instance, factoring 21 is easy although in general integer factorization
Integer factorization
In number theory, integer factorization or prime factorization is the decomposition of a composite number into smaller non-trivial divisors, which when multiplied together equal the original integer....

 is considered a hard problem.

Classical Hash Functions - Practical Approach to Security

Most hash functions are built on an ad hoc basis, where the bits of the message are nicely mixed to produce the hash. Various bitwise operation
Bitwise operation
A bitwise operation operates on one or more bit patterns or binary numerals at the level of their individual bits. This is used directly at the digital hardware level as well as in microcode, machine code and certain kinds of high level languages...

s (e.g. rotations), modular additions
Modular arithmetic
In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" after they reach a certain value—the modulus....

 and compression functions are used in iterative mode to ensure high complexity and pseudo-randomness of the output. In this way, the security is very hard to prove and the proof is usually not done. Only a few years ago, one of the most popular hash functions, SHA-1, was shown to be less secure than its length suggested: collisions could be found in only 263 tests, rather than the brute-force number of 280. The search for a "good" hash function has thus become a hot topic.

In other words, most of the hash functions in use nowadays, are not provably collision-resistant. These hashes are not based on purely mathematical functions. This approach results generally in more effective hashing functions, but with the risk that a weakness of such a function will be eventually used to find collisions. The famous case is MD5
MD5
The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...

.

Meaning of "hard to find collision" in these cases means "almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important." The meaning of the term is therefore somewhat dependent on the application, since the effort that a malicious agent may put into the task is usually proportional to his expected gain. However, since the needed effort usually grows very quickly with the digest length, even a thousandfold advantage in processing power can be neutralized by adding a few dozen bits to the latter.

Provably Secure Hash Functions

In this approach, we base the security of hash function on some hard mathematical problem and we prove that finding collisions of the hash functions is as hard as breaking the underlying problem. This gives much stronger security than just relying on complex mixing of bits as in the classical approach.

A cryptographic hash function has provable security against collision attacks if finding collisions is provably polynomial-time reducible
Polynomial-time reduction
In computational complexity theory a polynomial-time reduction is a reduction which is computable by a deterministic Turing machine in polynomial time. If it is a many-one reduction, it is called a polynomial-time many-one reduction, polynomial transformation, or Karp reduction...

 from problem P which is supposed to be unsolvable in polynomial time. The function is then called provably secure, or just provable.

It means that if finding collisions would be feasible in polynomial time by algorithm A, we could find and use polynomial time algorithm R (reduction algorithm) that would use algorithm A to solve problem P, which is widely supposed to be unsolvable in polynomial time. That is a contradiction. This means, that finding collisions cannot be easier than solving P.

Hash functions with the proof of their security are based on mathematical functions.

Hard problems

Examples of problems, that are assumed to be not solvable in polynomial time
  • Discrete Logarithm Problem
    Discrete logarithm
    In mathematics, specifically in abstract algebra and its applications, discrete logarithms are group-theoretic analogues of ordinary logarithms. In particular, an ordinary logarithm loga is a solution of the equation ax = b over the real or complex numbers...

  • Finding Modular Square Roots
  • Integer Factorization Problem
    Integer factorization
    In number theory, integer factorization or prime factorization is the decomposition of a composite number into smaller non-trivial divisors, which when multiplied together equal the original integer....

  • Subset Sum Problem

Downsides of Provable Approach

  • Current collision-resistant hash algorithms that have provable security reductions
    Reduction (complexity)
    In computability theory and computational complexity theory, a reduction is a transformation of one problem into another problem. Depending on the transformation used this can be used to define complexity classes on a set of problems....

     are too inefficient to be used in practice. In comparison to classical hash functions, they tend to be relatively slow and do not always meet all of criteria traditionally expected of cryptographic hashes. Very smooth hash
    Very smooth hash
    In cryptography, Very Smooth Hash is a secure cryptographic hash function invented in 2005 by Scott Contini, Arjen Lenstra and Ron Steinfeld....

     is an example.
  • Constructing a hash function with provable security is much more difficult than using a classical approach where we just hope that the complex mixing of bits in the hashing algorithm is strong enough to prevent adversary from finding collisions.
  • The proof is often a reduction to a problem with asymptotically hard worst-case or average-case complexity
    Average-case complexity
    Average-case complexity is a subfield of computational complexity theory that studies the complexity of algorithms on random inputs.The study of average-case complexity has applications in the theory of cryptography....

    . Worst-case measures the difficulty of solving pathological cases rather than typical cases of the underlying problem. Even a reduction to a problem with hard average complexity offers only limited security as there still can be an algorithm that easily solves the problem for a subset of the problem space. For example, early versions of Fast Syndrome Based Hash
    Fast Syndrome Based Hash
    In cryptography, the Fast Syndrome-based hash Functions are a family of cryptographic hash functions introduced in 2003 by Daniel Augot, Matthieu Finiasz, and Nicolas Sendrier....

     turned out to be insecure. This problem was solved in the latest version.


SWIFFT
SWIFFT
In cryptography, SWIFFT is a collection of provably secure hash functions. It is based on the concept of the Fast Fourier Transform . SWIFFT is not the first hash function based on FFT, but it sets itself apart by providing a mathematical proof of its security. It also uses the LLL basis reduction...

 is an example of a hash function that circumvents these security problems. It can be shown that for any algorithm that can break SWIFFT with probability P within an estimated time T, we can find an algorithm that solves the worst case scenario of a certain difficult mathematical problem within time T' depending on T and P.

Example of (unpractical) Provably Secure Hash Function

Hash(m) = xm mod n where n is hard to factor composite number, and x is some prespecified base value. A collision xm1 congruent to xm2 reveals a multiple m1 - m2 of the order of x. Such information can be used to factor n in polynomial time assuming certain properties of x.

But the algorithm is quite inefficient because it requires on average 1.5 multiplications modulo n per message-bit.

More practical provably secure hash functions

  • VSH - Very Smooth Hash function
    Very smooth hash
    In cryptography, Very Smooth Hash is a secure cryptographic hash function invented in 2005 by Scott Contini, Arjen Lenstra and Ron Steinfeld....

     - a provably secure collision-resistant hash function assuming the hardness of finding nontrivial modular square roots modulo composite number (this is proven to be as hard as factoring
    Integer factorization
    In number theory, integer factorization or prime factorization is the decomposition of a composite number into smaller non-trivial divisors, which when multiplied together equal the original integer....

     ).
  • MuHASH
  • ECOH - Elliptic Curve Only hash function
    Elliptic curve only hash
    The elliptic curve only hash algorithm was submitted as a candidate for SHA-3 in the NIST hash function competition. However, it was rejected in the beginning of the competition since a second pre-image attack was found....

     - based on the concept of Elliptic curves, Subset Sum Problem and summation of polynomials. The security proof of the collision resistance was based on weakened assumptions and eventually a second pre-image attack was found.
  • FSB - Fast Syndrome-Based hash function
    Fast Syndrome Based Hash
    In cryptography, the Fast Syndrome-based hash Functions are a family of cryptographic hash functions introduced in 2003 by Daniel Augot, Matthieu Finiasz, and Nicolas Sendrier....

     - it can be proven that breaking FSB is at least as difficult as solving a certain NP-complete problem known as Regular Syndrome Decoding.
  • SWIFFT
    SWIFFT
    In cryptography, SWIFFT is a collection of provably secure hash functions. It is based on the concept of the Fast Fourier Transform . SWIFFT is not the first hash function based on FFT, but it sets itself apart by providing a mathematical proof of its security. It also uses the LLL basis reduction...

     - SWIFFT is based on the Fast Fourier transform
    Fast Fourier transform
    A fast Fourier transform is an efficient algorithm to compute the discrete Fourier transform and its inverse. "The FFT has been called the most important numerical algorithm of our lifetime ." There are many distinct FFT algorithms involving a wide range of mathematics, from simple...

     and is provably collision resistant, under a relatively mild assumption about the worst-case difficulty of finding short vectors in cyclic/ideal lattices
    Ideal lattice cryptography
    Ideal lattices are a special class of lattices and a generalization of cyclic lattices. Ideal lattices naturally occur in many parts of number theory, but also in other areas. In particular, they have a significant place in cryptography. Micciancio defined a generalization of cyclic lattices as...

    .
  • Chaum, van Heijst, Pfitzmann hash function - A compression function where finding collisions is as hard as solving the discrete logarithm problem
    Discrete logarithm
    In mathematics, specifically in abstract algebra and its applications, discrete logarithms are group-theoretic analogues of ordinary logarithms. In particular, an ordinary logarithm loga is a solution of the equation ax = b over the real or complex numbers...

     in a finite group .
  • Knapsack-based hash functions - A family of hash functions based on the Knapsack problem
    Knapsack problem
    The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the count of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as...

    .
  • The Zémor-Tillich hash function - A family of hash functions that rely on the arithmetic of the group of matrices SL2
    Special linear group
    In mathematics, the special linear group of degree n over a field F is the set of n×n matrices with determinant 1, with the group operations of ordinary matrix multiplication and matrix inversion....

    . Finding collisions is at least as difficult as finding factorization of certain elements in this group. This is supposed to be hard, at least PSPACE-complete
    PSPACE-complete
    In complexity theory, a decision problem is PSPACE-complete if it is in the complexity class PSPACE, and every problem in PSPACE can be reduced to it in polynomial time...

    . For this hash, an attack was eventually discovered with a time complexity close to . This beat by far the birthday bound and ideal pre-image complexities which are and for the Zémor-Tillich hash function. As the attacks include a birthday search in a reduced set of size they indeed do not destroy the idea of provable security of invalidate the scheme but rather suggest that the initial parameters were too small.
  • Hash functions from Sigma Protocols - there exists a general way of constructing a provably secure hash, specifically from a any (suitable) sigma protocol. A faster version of VSH
    Very smooth hash
    In cryptography, Very Smooth Hash is a secure cryptographic hash function invented in 2005 by Scott Contini, Arjen Lenstra and Ron Steinfeld....

    (called VSH*) could be obtained in this way.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK