Comparison of data structures
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, a search data structure is any data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

 that allows the efficient retrieval of specific items from a set of items, such as a specific record
Record (computer science)
In computer science, a record is an instance of a product of primitive data types called a tuple. In C it is the compound data in a struct. Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed...

 from a database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

.

The simplest, most general, and least efficient search structure is merely an unordered sequential list of all the items. Locating the desired item in such a list, by the linear search
Linear search
In computer science, linear search or sequential search is a method for finding a particular value in a list, that consists of checking every one of its elements, one at a time and in sequence, until the desired one is found....

 method, inevitably requires a number of operations proportional to the number n of items, in the worst case
Worst case complexity
In computer science, the worst-case complexity measures the resources an algorithm requires in the worst-case...

 as well as in the average case. Useful search data structures allow faster retrieval; however, they are limited to queries of some specific kind. Moreover, since the cost of building such structures is at least proportional to n, they only pay off if several queries are to be performed on the same database (or on a database that changes little between queries).

Static search structures are designed for answering many queries
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

 on a fixed database; dynamic structures also allow insertion, deletion, or modification of items between successive queries. In the dynamic case, one must also consider the cost of fixing the search structure to account for the changes in the database.

Classification

The simplest kind of query is to locate a record that has a specific field (the key) equal to a specified value v. Other common kinds of query are "find the item with smallest (or largest) key value", "find the item with largest key value not exceeding v", "find all items with key values between specified bounds vmin and vmax".

In certain databases the key values may be points in some multi-dimensional space. For example, the key may be a geographic position (latitude
Latitude
In geography, the latitude of a location on the Earth is the angular distance of that location south or north of the Equator. The latitude is an angle, and is usually measured in degrees . The equator has a latitude of 0°, the North pole has a latitude of 90° north , and the South pole has a...

 and longitude
Longitude
Longitude is a geographic coordinate that specifies the east-west position of a point on the Earth's surface. It is an angular measurement, usually expressed in degrees, minutes and seconds, and denoted by the Greek letter lambda ....

) on the Earth
Earth
Earth is the third planet from the Sun, and the densest and fifth-largest of the eight planets in the Solar System. It is also the largest of the Solar System's four terrestrial planets...

. In that case, common kinds of queries are find the record with a key closest to a given point v", or "find all items whose key lies at a given distance from v", or "find all items within a specified region R of the space".

A common special case of the latter are simultaneous range queries on two or more simple keys, such as "find all employee records with salary between 50,000 and 100,000 and hired between 1995 and 2007".

Single ordered keys

  • Array if the key values span a moderately compact interval.
  • Priority-sorted list; see linear search
    Linear search
    In computer science, linear search or sequential search is a method for finding a particular value in a list, that consists of checking every one of its elements, one at a time and in sequence, until the desired one is found....

  • Key-sorted array; see binary search
  • Self-balancing binary search tree
    Self-balancing binary search tree
    In computer science, a self-balancing binary search tree is any node based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions....

  • Hash table
    Hash table
    In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...


Asymptotic amortized worst-case analysis

In this table, the asymptotic
Asymptotic analysis
In mathematical analysis, asymptotic analysis is a method of describing limiting behavior. The methodology has applications across science. Examples are...

 notation O(f(n)) means "not exceeding some fixed multiple of f(n) in the worst case."
Insert Delete Search Find maximum Space usage
Unsorted array O(1) O(1) O(n) O(n) O(n)
Value-indexed array O(1) O(1) O(1) O(n) O(n)
Sorted array
Sorted array
A sorted array is an array data structure in which each element is sorted in numerical, alphabetical, or some other order, and placed at equally spaced addresses in computer memory. It is typically used in computer science to implement static lookup tables to hold multiple values which has the same...

O(n) O(n) O(log n) O(1) O(n)
Unsorted linked list
Linked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...

O(1)* O(1)* O(n) O(n) O(n)
Sorted linked list O(1)* O(1)* O(n) O(1) O(n)
Self-balancing binary tree O(log n) O(log n) O(log n) O(log n) O(n)
Heap
Heap (data structure)
In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key ≥ key. This implies that an element with the greatest key is always in the root node, and so such a heap is sometimes called a max-heap...

O(log n) O(log n)** O(n) O(1) O(n)
Hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...

O(1) O(1) O(1) O(n) O(n)

 * The cost to add or delete an element into a known location in the list (i.e. if you have an iterator to the location) is O(1). If you don't know the location, then you need to traverse the list to the location of deletion/insertion, which takes O(n) time.
 ** The deletion cost is O(log n) for the minimum or maximum, O(n) for an arbitrary element.

This table is only an approximate summary; for each data structure there are special situations and variants that may lead to different costs. Also two or more data structures can be combined to obtain lower costs.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK