All Topics  
Associative array

 

   Email Print
   Bookmark   Link






 

Associative array



 
 
An associative array (also associative container, map, mapping, hash, dictionary, finite map, and in query-processing an index or index file) is an abstract data type
Abstract data type

In computing, an abstract data type is a specification of a set of data and the set of operations that can be performed on the data. Such a data type is abstract in the sense that it is independent of various concrete implementations....
 composed of a collection
Collection (computing)

In computer science, a collection is a grouping of some variable number of data items that have some shared significance to the problem being solved and need to be operated upon together in some controlled fashion....
 of unique keys and a collection of values, where each key is associated with one value (or set of values). The operation of finding the value associated with a key is called a lookup or indexing, and this is the most important operation supported by an associative array.






Discussion
Ask a question about 'Associative array'
Start a new discussion about 'Associative array'
Answer questions from other users
Full Discussion Forum



Encyclopedia


An associative array (also associative container, map, mapping, hash, dictionary, finite map, and in query-processing an index or index file) is an abstract data type
Abstract data type

In computing, an abstract data type is a specification of a set of data and the set of operations that can be performed on the data. Such a data type is abstract in the sense that it is independent of various concrete implementations....
 composed of a collection
Collection (computing)

In computer science, a collection is a grouping of some variable number of data items that have some shared significance to the problem being solved and need to be operated upon together in some controlled fashion....
 of unique keys and a collection of values, where each key is associated with one value (or set of values). The operation of finding the value associated with a key is called a lookup or indexing, and this is the most important operation supported by an associative array. The relationship between a key and its value is sometimes called a mapping
Map (mathematics)

In mathematics and related technical fields, the term map or mapping is often a synonym for Function . Thus, for example, a partial map is a partial function, and a total map is a total function....
 or binding. For example, if the value associated with the key "bob" is 7, we say that our array maps "bob" to 7. Associative arrays are very closely related to the mathematical concept of a function
Function (mathematics)

The mathematical concept of a function expresses dependence between two quantities, one of which is known and the other which is produced. A function associates a single output to each input element drawn from a fixed Set , such as the real numbers , although different inputs may have the same output....
 with a finite domain. As a consequence, a common and important use of associative arrays is in memoization
Memoization

In computing, memoization is an Optimization technique used primarily to speed up computer programs by having Subroutine avoid repeating the calculation of results for previously-processed inputs....
.

From the perspective of a computer programmer, an associative array can be viewed as a generalization of an array
Array

In computer science, an array is a data structure consisting of a group of element s that are accessed by index . In most programming languages each element has the same data type and the array occupies a contiguous area of computer memory....
. While a regular array maps an index
Index (information technology)

In computer science, an index can be:# an integer which identifies an array element# a pointer data element.# a data structure that enables sublinear-time lookup...
 to an arbitrary data type
Data type

A data type in programming languages is an attribute of a data which tells the computer something about the kind of data it is. This involves setting constraints on the datum, such as what values it can take and what operations may be performed upon it....
 such as integer
Integer

The integers are natural numbers including 0 and their negative and non-negative numberss . They are numbers that can be written without a fractional or decimal component, and fall within the set ....
s, other primitive type
Primitive type

In computer science, primitive type can refer to either of the following concepts:* a basic type is a data type provided by a programming language as a basic building block....
s, or even objects
Object (computer science)

In its simplest embodiment, an object is an allocated region of storage. Since programming languages use variable#Computer_programmings to access objects, the terms object and variable are often used interchangeably....
, an associative array's keys can be arbitrarily typed. The values of an associative array do not need to be the same type, although this is dependent on the programming language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
.

The operations that are usually defined for an associative array are:
  • Add: Bind a new key to a new value
  • Reassign: Bind an old key to a new value
  • Remove: Unbind a key from a value and remove the key from the key set
  • Lookup: Find the value (if any) that is bound to a key


Examples

One can think of a telephone book
Telephone directory

A telephone directory is a listing of telephone subscribers in a geographical area or subscribers to services provided by the organization that publishes the directory....
 as an example of an associative array, where names are the keys and phone numbers are the values. Using the usual array-like notation, we might write
telephone['ada'] = '01-1234-56'
telephone['charles'] = '02-4321-56'


and so on. These entries can be thought of as two records in a database table:

name telephone
ada 01-1234-56
charles 02-4321-56


To retrieve the element from the associative array, we use a similar notation i.e.
x = telephone['ada']
y = telephone['charles']


Another example would be a dictionary
Dictionary

A dictionary is a book of Alphabetical order listed words in a specific language, with definitions, etymologies, pronunciations, and other information; or a book of alphabetically listed words in one language with their equivalents in another, also known as a lexicon....
 where words are the keys and definitions are the values.
dictionary['toad'] = 'four legged amphibian'
dictionary['cow'] = 'female four-legged domesticated mammalian ruminant'


Since a database equivalent is that of a table containing precisely two fields - key and value - we can use an associative array to store any information which can be held in this form.
capital['england'] = 'london'
capital['france'] = 'paris'

bossof['subeditor'] = 'editor' bossof['reporter'] = 'subeditor'

salary['editor'] = 50000 salary['reporter'] = 30000


Data structures for associative arrays

Associative arrays are usually used when lookup is the most frequent operation. For this reason, implementations are usually designed to allow speedy lookup, at the expense of slower insertion and a larger storage footprint than other data structures (such as association lists).

Efficient representations

There are two main efficient data structures used to represent associative arrays, the hash table
Hash table

In computer science, a hash table, or a hash map, is a data structure that associates Unique key with value .The primary operation that hash functions support efficiently is a lookup: given a key , find the corresponding value ....
 and the self-balancing binary search tree
Self-balancing binary search tree

In computer science, a self-balancing binary search tree or height-balanced binary search tree is a binary search tree that attempts to keep its height, or the number of levels of nodes beneath the root, as small as possible at all times, automatically....
 (such as a red-black tree
Red-black tree

A red-black tree is a type of self-balancing binary search tree, a data structure used in computer science, typically used to implement associative arrays....
 or an AVL tree
AVL tree

In computer science, an AVL tree is a self-balancing binary search tree, and it is the first such data structure to be invented. In an AVL tree, the tree height of the two child node subtrees of any node differ by at most one; therefore, it is also said to be height-balanced tree....
). Skip list
Skip list

A skip list is a probabilistic data structure, based on multiple parallel, sorted linked lists, with algorithmic efficiency comparable to a binary search tree ....
s are also an alternative, though relatively new and not as widely used. B-tree
B-tree

In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, insertions, and deletions in logarithmic Amortized analysis....
s (and variants) can also be used, and are commonly used when the associative array is too large to reside entirely in memory, for instance in a simple database. Relative advantages and disadvantages include:

  • Asymptotic operation performance: Hash tables have faster average lookup and insertion time, O(1) compared to a balanced binary search tree's T(log n), while balanced trees have faster worst-case lookup and insertion time, O(log n) as compared to T(n). Skip lists have O(n) worst-case and O(log n) average-case operation times, but with less insertion and deletion overhead in practice than balanced binary trees.
  • Ordering preservation: Balanced binary trees and skip lists preserve ordering — allowing one to efficiently iterate over the keys in order or to efficiently locate an association whose key is nearest to a given value. Hash tables do not preserve ordering and therefore cannot perform these operations as efficiently (they require the data to be sorted in a separate step).
  • Range queries: Balanced binary trees can be easily adapted to efficiently assign a single value to a large ordered range of keys, or to count the number of keys in an ordered range. (With n elements in the array and performing the operation on a contiguous range of m keys, a balanced binary tree will take O(log n + m) time while a hash table would need O(n) time as it needs to search the entire table.)
  • Allocation behavior: Hash tables with open addressing store all data in a large contiguous block of memory that is reallocated infrequently, whereas tree allocations perform many small, frequent allocations. As a result hash tables may be difficult to allocate in a fragmented heap, and conversely trees may cause heap fragmentation. Trees are also more vulnerable to inefficiencies in allocators.
  • Compactness: Hash tables can have more compact storage for small value types, especially when the values are bits.
  • Persistency: There are simple persistent
    Persistent data structure

    In computing, a persistent data structure is a data structure which always preserves the previous version of itself when it is modified; such data structures are effectively immutable, as their operations do not update the structure in-place, but instead always yield a new updated structure....
     versions of balanced binary trees, which are especially prominent in functional languages.
  • Supporting new key types: Building a hash table requires a reasonable hash function
    Hash function

    A hash function is any algorithm or function which converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an array index into an array....
     for the key type, which can be difficult to write well, while balanced binary trees and skip lists only require a total ordering on the keys.


Sometimes simple implementations of one data structure or the other have disadvantages that can be overcome by better design. For example:

  • Hash tables that use untrusted input as keys may be vulnerable to denial-of-service attack
    Denial-of-service attack

    A denial-of-service attack or distributed denial-of-service attack is an attempt to make a computer resource unavailable to its intended users....
    s where an untrusted user supplies data intended to generate large numbers of collisions. This may be overcome by choosing hash functions at random from a universal family
    Universal hashing

    Universal hashing is a randomized algorithm for selecting a hash function F with the following property: for any two distinct inputs x and y, the probability that F=F is the same as if F was a random function....
    , or by hashing untrusted input with a cryptographic hash function
    Cryptographic hash function

    A cryptographic hash function is a algorithm that takes an arbitrary block of data and returns a fixed-size bit string, the hash value, such that an accidental or intentional change to the data will almost certainly change the hash value....
     before insertion.
  • Simple balanced trees waste space on pointers and allocation metadata; these problems can be mitigated by storing multiple elements in each node and by using memory pools.


Association lists

A simple but generally inefficient type of associative array is an association list,

often called an alist for short, which simply stores a linked list
Linked list

In computer science, a linked list is one of the fundamental data structures, and can be used to implement other data structures. It consists of a sequence of node s, each containing arbitrary data Field s and one or two reference s pointing to the next and/or previous nodes....
 of key-value pairs. Each lookup does a linear search
Linear search

In computer science, linear search is a search algorithm, also known as sequential search, that is suitable for searching a list of data for a particular value....
 through the list looking for a key match. Such a data structure is commonly used in Lisp/Scheme.

Advantages of association lists include:
  • It need only be known how to test keys for equality — which is minimal for maps supporting the four basic operations — while the above alternatives require a linear order comparison or a hash function.
  • For small associative arrays, common in some applications, association lists can take less time and space than other data structures.
  • Insertions are done in constant time by adding the new association to the head of the list.


Specialized representations

If the keys have a specific type, one can often use specialized data structures to gain performance. For example, integer-keyed maps can be implemented using Patricia trees or Judy array
Judy array

In computer science and software engineering, a Judy array is a complex but very fast associative array data structure for storing and looking up values using integer or string keys....
s, and are useful space-saving replacements for sparse arrays. Because this type of data structure can perform longest-prefix matching, they're particularly useful in applications where a single value is assigned to most of a large range of keys with a common prefix except for a few exceptions, such as in routing table
Routing table

In computer networking a routing table, or Routing Information Base , is an electronic table or database type object that is stored in a router or a networked computer....
s.

String-keyed maps can avoid extra comparisons during lookups by using trie
Trie

In computer science, a trie, or prefix tree, is an ordered tree data structure data structure that is used to store an associative array where the keys are usually string s....
s.

Multimap

A variation of the map (associative array) is the multimap, which is the same as map data structures, but allows a key to be mapped to more than one value. Formally, a multimap can be thought of as a regular associative array that maps unique keys to nonempty multisets of values, although actual implementation may vary. C++
C++

C++ is a general-purpose programming language. It is regarded as a middle-level language, as it comprises a combination of both high-level programming language and low-level programming language language features....
's Standard Template Library
Standard Template Library

The Standard Template Library is a Library partially included in the C++ C++ standard library. It provides Container s, iterators, algorithms, and Function objects....
 provides the "multimap" container
Standard Template Library

The Standard Template Library is a Library partially included in the C++ C++ standard library. It provides Container s, iterators, algorithms, and Function objects....
 for the sorted multimap, SGI's STL provides the "hash_multimap" container, which implements a multimap using a hash table, and some varieties of LPC have built-in multimap support.

Language support


Associative arrays can be implemented in any programming language as a package and many language systems provide them as part of their standard library. In some languages, they are not only built into the standard system, but have special syntax, often using array-like subscripting.

Built-in syntactic support for associative arrays was introduced by Snobol4, under the name "table". MUMPS
MUMPS

MUMPS , or alternatively M, is a programming language created in the late 1960s, originally for use in the Health care. It was designed for the production of multi-user database-driven applications....
 made multi-dimensional associative arrays, optionally persistent, its key data structure. SETL
SETL

SETL is a very-high level programming language based on the mathematical theory of sets. It was originally developed by Jack Schwartz at the New York University Courant Institute of Mathematical Sciences in the late 1960's....
 supported them as one possible implementation of sets and maps. Most modern scripting languages, starting with awk and including Perl
Perl

In computer programming, Perl is a high-level programming language, List of programming languages by category, Interpreter , dynamic programming language....
, tcl
Tcl

Tcl is a scripting language created by John Ousterhout. Originally "born out of frustration"?according to the author?with programmers devising their own languages intended to be embedded into applications, Tcl quickly gained wide acceptance on its own and is generally thought to be easy to learn, but powerful in competent hands....
, Javascript
JavaScript

JavaScript is a scripting language widely used for client-side web development. It was the originating Programming language dialect of the ECMAScript standard....
, Python
Python (programming language)

Python is a general-purpose high-level programming language. Its design philosophy emphasizes code readability. Python's core syntax and semantics are Minimalism , while the standard library is large and comprehensive....
, and Ruby
Ruby (programming language)

Ruby is a dynamic programming language, reflection , general purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features....
, support associative arrays as a primary container type.

In many more languages, they are available as library functions without special syntax.

Associative arrays have a variety of names. In Smalltalk, Objective-C, .NET
.NET Framework

The Microsoft .NET Framework is a software framework that is available with several Microsoft Windows operating systems. It includes a large Library of coded solutions to prevent common programming problems and a virtual machine that manages the execution of programs written specifically for the Software framework....
, Python and REALbasic
REALbasic

REALbasic is an object-oriented dialect of the BASIC programming language developed and commercially marketed by REAL Software, Inc in Austin, Texas for Mac OS X, Microsoft Windows, and Linux....
 they are called dictionaries; in Perl and Ruby they are called hashes; in C++
C++

C++ is a general-purpose programming language. It is regarded as a middle-level language, as it comprises a combination of both high-level programming language and low-level programming language language features....
 and Java
Java (programming language)

Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java ....
 they are called maps (see stdmap
Map (C++ container)

The class template std::map is a standard C++ Standard Template Library#Containers. It is a sorted associative array of unique keys and associated data....
 and ) and in Common Lisp
Common Lisp

Common Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in American National Standards Institute standard document Information Technology - Programming Language - Common Lisp, formerly X3.226-1994 ....
 and Windows PowerShell
Windows PowerShell

Windows PowerShell is an extensible command line interface shell and associated scripting language from Microsoft. It was released in 2006 and is currently available for Windows XP SP2/SP3, Windows Server 2003, Windows Vista and is included in Windows Server 2008 as well as Windows 7 as an optional feature....
 they are called hashtables (since both typically use this implementation). In PHP
PHP

PHP is a scripting language originally designed for producing dynamic web pages. It has evolved to include a command line interface capability and can be used in Standalone software Graphical user interface....
 and JavaScript
JavaScript

JavaScript is a scripting language widely used for client-side web development. It was the originating Programming language dialect of the ECMAScript standard....
 all arrays can be associative, except that the keys are limited to integers and strings. In Visual FoxPro they are called Collections.

In the scripting language Lua
Lua programming language

In computing, Lua is a lightweight languages, Reflection , imperative programming and functional programming programming language, designed as a scripting language with extension language semantics as a primary goal....
, associative arrays, called tables, are used as the primitive building block for all data structures, even arrays. Likewise, in JavaScript, all objects are associative arrays. In MUMPS, the associative arrays are typically stored as B-trees.

External links