In
computer scienceComputer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, an
associative array (also called a
map or a
dictionary) is an
abstract data typeIn computing, an abstract data type is a mathematical model for a certain class of data structures that have similar behavior; or for certain data types of one or more programming languages that have similar semantics...
composed of a
collectionIn computer science, a collection is a grouping of some variable number of data items that have some shared significance to the problem being solved and need to be operated upon together in some controlled fashion. Generally, the data items will be of the same type or, in languages supporting...
of (key,value) pairs, such that each possible key appears at most once in the collection.
Operations associated with this data type allow :
- the addition of pairs to the collection,
- the removal of pairs from the collection,
- the modification of the values of existing pairs, and
- the lookup of the value associated with a particular key.
The
dictionary problem is the task of designing a
data structureIn computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
that implements an associative array. A standard solution to the dictionary problem is a
hash tableIn computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
; in some cases it is also possible to solve the problem using directly addressed arrays,
binary search treeIn computer science, a binary search tree , which may sometimes also be called an ordered or sorted binary tree, is a node-based binary tree data structurewhich has the following properties:...
s, or other more specialized structures.
Many programming languages include associative arrays as primitive data types, and they are available in software libraries for many others.
Content-addressable memoryContent-addressable memory is a special type of computer memory used in certain very high speed searching applications. It is also known as associative memory, associative storage, or associative array, although the last term is more often used for a programming data structure...
is a form of direct hardware-level support for associative arrays.
Associative arrays have many applications including such fundamental programming patterns as
memoizationIn computing, memoization is an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously processed inputs...
and the
decorator patternIn object-oriented programming, the decorator pattern is a design pattern that allows behaviour to be added to an existing object dynamically.-Introduction:...
.
Operations
In an associative array, the association between a key and a value is often known as a "binding", and the same word "binding" may also be used to refer to the process of creating a new association.
The operations that are usually defined for an associative array are:
- Add or insert: add a new (key,value) pair to the collection, binding the new key to its new value. The arguments to this operation are the key and the value.
- Reassign: replace the value in one of the (key,value) pairs that are already in the collection, binding an old key to a new value. As with an insertion, the arguments to this operation are the key and the value.
- Remove or delete: remove a (key,value) pair from the collection, unbinding a given key from its value. The argument to this operation is the key.
- Lookup: find the value (if any) that is bound to a given key. The argument to this operation is the key, and the value is returned from the operation. If no value is found, some associative array implementations raise an exception
Exception handling is a programming language construct or computer hardware mechanism designed to handle the occurrence of exceptions, special conditions that change the normal flow of program execution....
.
In addition, associative arrays may also include other operations such as determining the number of bindings or constructing an
iteratorIn computer programming, an iterator is an object that enables a programmer to traverse a container. Various types of iterators are often provided via a container's interface...
to loop over all the bindings. Usually, for such an operation, the order in which the bindings are returned may be arbitrary.
A multimap generalizes an associative array by allowing multiple values to be associated with a single key. A
bidirectional mapIn computer science, a bidirectional map is an associative data structure in which both types can be used as key.-External links:* http://www.boost.org/doc/libs/1_47_0/libs/bimap/doc/html/index.html...
is a related abstract data type in which the bindings operate in both directions: each value must be associated with a unique key, and a second lookup operation takes a value as argument and looks up the key associated with that value.
Example
Suppose that the set of loans made by a library is to be represented in a data structure. Each book in a library may be checked out only by a single library patron at a time. However, a single patron may be able to check out multiple books. Therefore, the information about which books are checked out to which patrons may be represented by an associative array, in which the books are the keys and the patrons are the values. For instance (using the notation from
PythonPython is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
in which a binding is represented by placing a colon between the key and the value), the current checkouts may be represented by an associative array
{
"Great Expectations": "John",
"Pride and Prejudice": "Alice",
"Wuthering Heights": "Alice"
}
A lookup operation with the key "Great Expectations" in this array would return the name of the person who checked out that book, John. If John returns his book, that would cause a deletion operation in the associative array, and if Pat checks out another book, that would cause an insertion operation, leading to a different state:
{
"Pride and Prejudice": "Alice",
"The Brothers Karamazov": "Pat",
"Wuthering Heights": "Alice"
}
In this new state, the same lookup as before, with the key "Great Expectations", would raise an exception, because this key is no longer present in the array.
Implementation
For dictionaries with very small numbers of bindings, it may make sense to implement the dictionary using an
association listIn computer programming and particularly in Lisp, an association list, often referred to as an alist, is a linked list in which each list element comprises a key and a value. The association list is said to associate the value with the key...
, a
linked listIn computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...
of bindings. With this implementation, the time to perform the basic dictionary operations is linear in the total number of bindings; however, it is easy to implement and the constant factors in its running time are small. Another very simple implementation technique, usable when the keys are restricted to a narrow range of integers, is direct addressing into an array: the value for a given key
k is stored at the array cell
A[
k], or if there is no binding for
k then the cell stores a special
sentinel valueIn computer programming, a sentinel value is a special value whose presence guarantees termination of a loop that processes structured data...
that indicates the absence of a binding. As well as being simple, this technique is fast: each dictionary operation takes constant time. However, the space requirement for this structure is the size of the entire keyspace, making it impractical unless the keyspace is small.
The most frequently used general purpose implementation of an associative array is with a
hash tableIn computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
: an array of bindings, together with a
hash functionA hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...
that maps each possible key into an array index. The basic idea of a hash table is that the binding for a given key is stored at the position given by applying the hash function to that key, and that lookup operations are performed by looking at that cell of the array and using the binding found there. However, hash table based dictionaries must be prepared to handle collisions that occur when two keys are mapped by the hash function to the same index, and many different collision resolution strategies have been developed for dealing with this situation, often based either on
open addressingOpen addressing, or closed hashing, is a method of collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternate locations in the array until either the target record is found, or an unused array slot is found, which indicates that...
(looking at a sequence of hash table indices instead of a single index, until finding either the given key or an empty cell) or on hash chaining (storing a small association list instead of a single binding in each hash table cell).
Dictionaries may also be stored in
binary search treeIn computer science, a binary search tree , which may sometimes also be called an ordered or sorted binary tree, is a node-based binary tree data structurewhich has the following properties:...
s or in data structures specialized to a particular type of keys such as
radix treeIn computer science, a radix tree is a space-optimized trie data structure where each node with only one child is merged with its child. The result is that every internal node has at least two children. Unlike in regular tries, edges can be labeled with sequences of characters as well as single...
s,
trieIn computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the...
s,
Judy arrayIn computer science and software engineering, a Judy array is a data structure that has high performance, low memory usage and implements an associative array. Unlike normal arrays, Judy arrays may be sparse, that is, they may have large ranges of unassigned indices. They can be used for storing...
s, or
van Emde Boas treeA van Emde Boas tree , also known as a vEB tree, is a tree data structure which implements an associative array with m-bit integer keys. It performs all operations in O time...
s, but these implementation methods are less efficient than hash tables as well as placing greater restrictions on the types of data that they can handle. The advantages of these alternative structures come from their ability to handle operations beyond the basic ones of an associative array, such as finding the binding whose key is the closest to a queried key, when the query is not itself present in the set of bindings.
Language support
Associative arrays can be implemented in any programming language as a package and many language systems provide them as part of their standard library. In some languages, they are not only built into the standard system, but have special syntax, often using array-like subscripting.
Built-in syntactic support for associative arrays was introduced by
SNOBOLSNOBOL is a generic name for the computer programming languages developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky, culminating in SNOBOL4...
4, under the name "table".
MUMPSMUMPS , or alternatively M, is a programming language created in the late 1960s, originally for use in the healthcare industry. It was designed for the production of multi-user database-driven applications...
made multi-dimensional associative arrays, optionally persistent, its key data structure.
SETLSETL is a very-high level programming language based on the mathematical theory of sets. It was originally developed by Jack Schwartz at the NYU Courant Institute of Mathematical Sciences in the late 1960s....
supported them as one possible implementation of sets and maps. Most modern scripting languages, starting with AWK and including
PerlPerl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
,
TclTcl is a scripting language created by John Ousterhout. Originally "born out of frustration", according to the author, with programmers devising their own languages intended to be embedded into applications, Tcl gained acceptance on its own...
,
JavaScriptJavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....
,
PythonPython is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
,
RubyRuby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...
, and Lua, support associative arrays as a primary container type. In many more languages, they are available as library functions without special syntax.
In
SmalltalkSmalltalk is an object-oriented, dynamically typed, reflective programming language. Smalltalk was created as the language to underpin the "new world" of computing exemplified by "human–computer symbiosis." It was designed and created in part for educational use, more so for constructionist...
,
Objective-CObjective-C is a reflective, object-oriented programming language that adds Smalltalk-style messaging to the C programming language.Today, it is used primarily on Apple's Mac OS X and iOS: two environments derived from the OpenStep standard, though not compliant with it...
,
.NETThe .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...
, Python, and
REALbasicRealbasic is the object-oriented dialect of the BASIC programming language used in Real Studio, a programming environment, developed and commercially marketed by Real Software, Inc of Austin, Texas for Mac OS X, Microsoft Windows, 32-bit x86 Linux and the web.- Language features :RB is a strongly...
they are called
dictionaries; in Perl and Ruby they are called
hashes; in
C++C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
,
JavaJava is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
, and
GoGo is a compiled, garbage-collected, concurrent programming language developed by Google Inc.The initial design of Go was started in September 2007 by Robert Griesemer, Rob Pike, and Ken Thompson. Go was officially announced in November 2009. In May 2010, Rob Pike publicly stated that Go was being...
they are called
maps (see map (C++),
unordered map (C++)unordered_map is a class template representing a hash table in the C++ Technical Report 1 and the C++11 standard. It is similar to the hash_map class template of the original STL which was also included in several implementations of the C++ Standard Library unordered_map is a class template...
, and ); in
Common LispCommon Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers...
and
Windows PowerShellWindows PowerShell is Microsoft's task automation framework, consisting of a command-line shell and associated scripting language built on top of, and integrated with the .NET Framework...
, they are called
hash tables (since both typically use this implementation). In
PHPPHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
, all arrays can be associative, except that the keys are limited to integers and strings. In JavaScript, all objects behave as associative arrays. In Lua, they are called
tables, and are used as the primitive building block for all data structures. In
Visual FoxProVisual FoxPro is a data-centric object-oriented and procedural programming language produced by Microsoft. It is derived from FoxPro which was developed by Fox Software beginning in 1984. Fox Technologies merged with Microsoft in 1992, after which the software acquired further features and the...
, they are called
Collections.
External links