Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Hash tree

Hash tree

Overview
In cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

 and computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

 Hash trees or Merkle trees are a type of data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

which contains a tree
Tree (data structure)
In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...

 of summary information about a larger piece of data – for instance a file – used to verify its contents. Hash trees are a combination of hash list
Hash list
In computer science, a hash list is typically a list of hashes of the data blocks in a file or set of files. Lists of hashes are used for many different purposes, such as fast table lookup and distributed databases...

s and hash chain
Hash chain
A hash chain is the successive application of a cryptographic hash function to a piece of data. In computer security, a hash chain is a method to produce many one-time keys from a single key or password...

ing, which in turn are extensions of hash
Hash function
A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...

ing. Hash trees in which the underlying hash function is Tiger are often called Tiger trees or Tiger tree hashes.
Discussion
Ask a question about 'Hash tree'
Start a new discussion about 'Hash tree'
Answer questions from other users
Full Discussion Forum
 
Recent Discussions
Encyclopedia
In cryptography
Cryptography
Cryptography is the practice and study of techniques for secure communication in the presence of third parties...

 and computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

 Hash trees or Merkle trees are a type of data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

which contains a tree
Tree (data structure)
In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...

 of summary information about a larger piece of data – for instance a file – used to verify its contents. Hash trees are a combination of hash list
Hash list
In computer science, a hash list is typically a list of hashes of the data blocks in a file or set of files. Lists of hashes are used for many different purposes, such as fast table lookup and distributed databases...

s and hash chain
Hash chain
A hash chain is the successive application of a cryptographic hash function to a piece of data. In computer security, a hash chain is a method to produce many one-time keys from a single key or password...

ing, which in turn are extensions of hash
Hash function
A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...

ing. Hash trees in which the underlying hash function is Tiger are often called Tiger trees or Tiger tree hashes.

Uses


Hash trees can be used to verify any kind of data stored, handled and transferred in and between computers. Currently the main use of hash trees is to make sure that data blocks received from other peers in a peer-to-peer network
Peer-to-peer
Peer-to-peer computing or networking is a distributed application architecture that partitions tasks or workloads among peers. Peers are equally privileged, equipotent participants in the application...

 are received undamaged and unaltered, and even to check that the other peers do not lie and send fake blocks. Suggestions have been made to use hash trees in trusted computing
Trusted Computing
Trusted Computing is a technology developed and promoted by the Trusted Computing Group. The term is taken from the field of trusted systems and has a specialized meaning. With Trusted Computing, the computer will consistently behave in expected ways, and those behaviors will be enforced by...

 systems. Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

 has used Hash Trees in the ZFS
ZFS
In computing, ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include data integrity verification against data corruption modes , support for high storage capacities, integration of the concepts of filesystem and volume management,...

 filesystem. Hash Trees are used in Google Wave
Google Wave
Apache Wave is a software framework for real-time collaborative editing online. Google Inc. originally developed it as Google Wave.It was announced at the Google I/O conference on May 27, 2009....

 protocol, Git
Git (software)
Git is a distributed revision control system with an emphasis on speed. Git was initially designed and developed by Linus Torvalds for Linux kernel development. Every Git working directory is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on...

 distributed revision control system, the Tahoe-LAFS backup system, the Bitcoin
Bitcoin
Bitcoin is a decentralized, peer-to-peer network over which users make transactions that are tracked and verified through this network. The word Bitcoin also refers to the digital currency implemented as the currency medium for user transactions over this network...

 peer-to-peer network, and a number of NoSQL
Nosql
In computing, NoSQL is a broad class of database management systems that differ from the classic model of the relational database management system in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally...

 systems like Apache Cassandra & Riak
Riak
Riak is a NoSQL database implementing the principles from Amazon's Dynamo paper.Riak has a pluggable backend for its core shard-partitioned storage, with the default storage backend being Bitcask as of the 0.12 release...



Hash trees were invented in 1979 by Ralph Merkle
Ralph Merkle
Ralph C. Merkle is a researcher in public key cryptography, and more recently a researcher and speaker on molecular nanotechnology and cryonics...

. The original purpose was to make it possible to efficiently handle many Lamport one-time signatures. Lamport signatures are believed to still be secure in the event that quantum computer
Quantum computer
A quantum computer is a device for computation that makes direct use of quantum mechanical phenomena, such as superposition and entanglement, to perform operations on data. Quantum computers are different from traditional computers based on transistors...

s become reality. Unfortunately each Lamport key can only be used to sign a single message. But combined with hash trees they can be used for many messages and then become a fairly efficient digital signature scheme
Merkle signature scheme
The Merkle signature scheme is a digital signature scheme based on hash trees and one-time signatures such as the Lamport signature scheme. It was developed by Ralph Merkle in the late 70s and is an alternative to traditional digital signatures such as the Digital Signature Algorithm or RSA...

.

How hash trees work


A hash tree is a tree
Binary tree
In computer science, a binary tree is a tree data structure in which each node has at most two child nodes, usually distinguished as "left" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to...

 of hashes
Hash function
A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...

 in which the leaves are hashes of data blocks in, for instance, a file or set of files. Nodes further up in the tree are the hashes of their respective children. For example, in the picture hash 0 is the result of hashing hash 0-0 and then hash 0-1. That is, hash 0 = hash( hash 0-0 || hash 0-1 ) where || denotes concatenation.

Most hash tree implementations are binary (two child nodes under each node) but they can just as well use many more child nodes under each node.

Usually, a cryptographic hash function
Cryptographic hash function
A cryptographic hash function is a deterministic procedure that takes an arbitrary block of data and returns a fixed-size bit string, the hash value, such that an accidental or intentional change to the data will change the hash value...

 such as SHA-1, Whirlpool, or Tiger
Tiger (hash)
In cryptography, Tiger is a cryptographic hash function designed by Ross Anderson and Eli Biham in 1995 for efficiency on 64-bit platforms. The size of a Tiger hash value is 192 bits. Truncated versions can be used for compatibility with protocols assuming a particular hash size...

 is used for the hashing. If the hash tree only needs to protect against unintentional damage, much less secure checksum
Checksum
A checksum or hash sum is a fixed-size datum computed from an arbitrary block of digital data for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by recomputing the checksum and...

s such as CRCs
Cyclic redundancy check
A cyclic redundancy check is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data...

 can be used.

In the top of a hash tree there is a top hash (or root hash or master hash). Before downloading a file on a p2p network, in most cases the top hash is acquired from a trusted source, for instance a friend or a web site that is known to have good recommendations of files to download. When the top hash is available, the hash tree can be received from any non-trusted source, like any peer in the p2p network. Then, the received hash tree is checked against the trusted top hash, and if the hash tree is damaged or fake, another hash tree from another source will be tried until the program finds one that matches the top hash.

The main difference from a hash list
Hash list
In computer science, a hash list is typically a list of hashes of the data blocks in a file or set of files. Lists of hashes are used for many different purposes, such as fast table lookup and distributed databases...

 is that one branch of the hash tree can be downloaded at a time and the integrity of each branch can be checked immediately, even though the whole tree is not available yet. For example, in the picture the integrity of data block 001 can be verified immediately if the tree already contains hash 0-0 and hash 1 by hashing the data block and iteratively combining the result with hash 0-0 and then hash 1 and finally comparing the result with the top hash. Similarly, the integrity of data block 002 can be verified if the tree already has hash 1-1 and hash 0. This can be an advantage since it is efficient to split files up in very small data blocks so that only small blocks have to be redownloaded if they get damaged. If the hashed file is very big, such a hash tree or hash list becomes fairly big. But if it is a tree, one small branch can be downloaded quickly, the integrity of the branch can be checked, and then the downloading of data blocks can start.

There are several additional tricks, benefits and details regarding hash trees. See the references and external links below for more in-depth information.

Tiger tree hash


The Tiger tree hash is a widely used form of hash tree. It uses a binary hash tree (two child nodes under each node), usually has a data block size of 1024-byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...

s and uses the cryptographically secure Tiger hash
Tiger (hash)
In cryptography, Tiger is a cryptographic hash function designed by Ross Anderson and Eli Biham in 1995 for efficiency on 64-bit platforms. The size of a Tiger hash value is 192 bits. Truncated versions can be used for compatibility with protocols assuming a particular hash size...

.

Tiger tree hashes are used in the Gnutella
Gnutella
Gnutella is a large peer-to-peer network which, at the time of its creation, was the first decentralized peer-to-peer network of its kind, leading to other, later networks adopting the model...

, Gnutella2
Gnutella2
Gnutella2, often referred to as G2, is a peer-to-peer protocol developed mainly by Michael Stokes and released in 2002. While inspired by the gnutella protocol, G2 shares little of its design with the exception of its connection handshake and download mechanics. It adopts an extensible binary...

, and Direct Connect
Direct Connect (file sharing)
Direct connect is a peer-to-peer file sharing protocol. Direct connect clients connect to a central hub and can download files directly from one another. Advanced Direct Connect can be considered a successor protocol....

 P2P
Peer-to-peer
Peer-to-peer computing or networking is a distributed application architecture that partitions tasks or workloads among peers. Peers are equally privileged, equipotent participants in the application...

 file sharing protocols and in file sharing
File sharing
File sharing is the practice of distributing or providing access to digitally stored information, such as computer programs, multimedia , documents, or electronic books. It may be implemented through a variety of ways...

 applications such as
Phex
Phex
- External links :* * * * at SourceForge.net* at *...

, BearShare
BearShare
BearShare is a peer-to-peer file sharing application originally created by Free Peers, Inc. for Microsoft Windows, and now sold by MusicLab, LLC .- History :...

, LimeWire
LimeWire
LimeWire is a free peer-to-peer file sharing client program that runs on Windows, Mac OS X, Linux, and other operating systems supported by the Java software platform. LimeWire uses the gnutella network as well as the BitTorrent protocol. A free software version and a purchasable "enhanced"...

, Shareaza
Shareaza
Shareaza is a peer-to-peer file sharing client running under Microsoft Windows which supports the gnutella, Gnutella2 , eDonkey, BitTorrent, FTP, HTTP and HTTPS network protocols and handles magnet links, ed2k links, and the now deprecated gnutella and Piolet links...

, DC++ and Valknut
Valknut (software)
Valknut is a program that uses the Direct Connect protocol. It is compatible with other DC clients, such as the original DC from Neomodus, DC++ and derivatives. Valknut also interoperates with all common DC hub software....

.

External links

  • http://www.codeproject.com/cs/algorithms/thexcs.asp – Tiger Tree Hash (TTH) source code in C# – by Gil Schmidt
  • http://sourceforge.net/projects/tigertree/ – Tiger Tree Hash (TTH) implementations in C and Java
  • RHash, an open source
    Open source
    The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

    command-line tool, which can calculate TTH and magnet links with TTH.