Dynamic array

# Dynamic array

Discussion

Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, a dynamic array, growable array, resizable array, dynamic table, or array list is a random access
Random access
In computer science, random access is the ability to access an element at an arbitrary position in a sequence in equal time, independent of sequence size. The position is arbitrary in the sense that it is unpredictable, thus the use of the term "random" in "random access"...

, variable-size list data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

that allows elements to be added or removed. It is supplied with standard libraries in many modern mainstream programming languages.

A dynamic array is not the same thing as a dynamically-allocated array, which is a fixed-size array whose size is fixed when the array is allocated, although a dynamic array may use such a fixed-size array as a back end.

## Bounded-size dynamic arrays and capacity

The simplest dynamic array is constructed by allocating a fixed-size array and then dividing it into two parts: the first stores the elements of the dynamic array and the second is reserved, or unused. We can then add or remove elements at the end of the dynamic array in constant time by using the reserved space, until this space is completely consumed. The number of elements used by the dynamic array contents is its logical size or size, while the size of the underlying array is called the dynamic array's capacity, which is the maximum possible size without relocating data.

In applications where the logical size is bounded, the fixed-size data structure suffices. This may be short-sighted, when problems with the array filling up turn up later. It is best to put resize code into any array, to respond to new conditions. Then choosing initial capacity is optimization, not getting the program to run. Resizing the underlying array is an expensive task, typically involving copying the entire contents of the array.

## Geometric expansion and amortized cost

To avoid incurring the cost of resizing many times, dynamic arrays resize by a large amount, such as doubling in size, and use the reserved space for future expansion. The operation of adding an element to the end might work as follows:

function insertEnd(dynarray a, element e)
if (a.size = a.capacity)
// resize a to twice its current capacity:
a.capacity ← a.capacity * 2
// (copy the contents to the new memory location here)
a[a.size] ← e
a.size ← a.size + 1

As n elements are inserted, the capacities form a geometric progression
Geometric progression
In mathematics, a geometric progression, also known as a geometric sequence, is a sequence of numbers where each term after the first is found by multiplying the previous one by a fixed non-zero number called the common ratio. For example, the sequence 2, 6, 18, 54, ... is a geometric progression...

. Expanding the array by any constant proportion ensures that inserting n elements takes O(n)
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...

time overall, meaning that each insertion takes amortized
Amortized analysis
In computer science, amortized analysis is a method of analyzing algorithms that considers the entire sequence of operations of the program. It allows for the establishment of a worst-case bound for the performance of an algorithm irrespective of the inputs by looking at all of the operations...

constant time. The value of this proportion a leads to a time-space tradeoff: the average time per insertion operation is about a/(a−1), while the number of wasted cells is bounded above by (a−1)n. The choice of a depends on the library or application: some textbooks use a = 2, but Java's ArrayList implementation uses a = 3/2 and the C implementation of Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

's list data structure uses a = 9/8.

Many dynamic arrays also deallocate some of the underlying storage if its size drops below a certain threshold, such as 30% of the capacity. This threshold must be strictly smaller than 1/a in order to support mixed sequences of insertions and removals with amortized constant cost.

Dynamic arrays are a common example when teaching amortized analysis
Amortized analysis
In computer science, amortized analysis is a method of analyzing algorithms that considers the entire sequence of operations of the program. It allows for the establishment of a worst-case bound for the performance of an algorithm irrespective of the inputs by looking at all of the operations...

.

## Performance

The dynamic array has performance similar to an array, with the addition of new operations to add and remove elements from the end:
• Getting or setting the value at a particular index (constant time)
• Iterating over the elements in order (linear time, good cache performance)
• Inserting or deleting an element in the middle of the array (linear time)
• Inserting or deleting an element at the end of the array (constant amortized time)

Dynamic arrays benefit from many of the advantages of arrays, including good locality of reference
Locality of reference
In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources...

and data cache utilization, compactness (low memory use), and random access
Random access
In computer science, random access is the ability to access an element at an arbitrary position in a sequence in equal time, independent of sequence size. The position is arbitrary in the sense that it is unpredictable, thus the use of the term "random" in "random access"...

. They usually have only a small fixed additional overhead for storing information about the size and capacity. This makes dynamic arrays an attractive tool for building cache-friendly data structures.

In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...

s, dynamic arrays have faster indexing (constant time versus linear time) and typically faster iteration due to improved locality of reference; however, dynamic arrays require linear time to insert or delete at an arbitrary location, since all following elements must be moved, while linked lists can do this in constant time. This disadvantage is mitigated by the gap buffer
Gap buffer
A gap buffer In computer science is a dynamic array that allows efficient insertion and deletion operations clustered near the same location. Gap buffers are especially common in text editors, where most changes to the text occur at or near the current location of the cursor. The text is stored in...

and tiered vector variants discussed under Variants below. Also, in a highly-fragmented
Fragmentation (computer)
In computer storage, fragmentation is a phenomenon in which storage space is used inefficiently, reducing storage capacity and in most cases reducing the performance. The term is also used to denote the wasted space itself....

memory region, it may be expensive or impossible to find contiguous space for a large dynamic array, whereas linked lists do not require the whole data structure to be stored contiguously.

A balanced tree
Self-balancing binary search tree
In computer science, a self-balancing binary search tree is any node based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions....

can store a list while providing all operations of both dynamic arrays and linked lists reasonably efficiently, but both insertion at the end and iteration over the list are slower than for a dynamic array, in theory and in practice, due to non-contiguous storage and tree traversal/manipulation overhead.

## Variants

Gap buffer
Gap buffer
A gap buffer In computer science is a dynamic array that allows efficient insertion and deletion operations clustered near the same location. Gap buffers are especially common in text editors, where most changes to the text occur at or near the current location of the cursor. The text is stored in...

s are similar to dynamic arrays but allow efficient insertion and deletion operations clustered near the same arbitrary location. Some deque
Deque
In computer science, a double-ended queue is an abstract data structure that implements a queue for which elements can only be added to or removed from the front or back...

implementations use array deques, which allow amortized constant time insertion/removal at both ends, instead of just one end.

Goodrich presented a dynamic array algorithm called Tiered Vectors that provided O(n1/2) performance for order preserving insertions or deletions from the middle of the array.

Hashed Array Tree
Hashed array tree
In computer science, a hashed array tree is a dynamic array algorithm published by Edward Sitarski in 1996. Whereas simple dynamic array data structures based on geometric expansion waste linear space, where n is the number of elements in the array, hashed array trees waste only order n1/2...

(HAT) is a dynamic array algorithm published by Sitarski in 1996. Hashed Array Tree wastes order n1/2 amount of storage space, where n is the number of elements in the array. The algorithm has O(1) amortized performance when appending a series of objects to the end of a Hashed Array Tree.

In a 1999 paper, Brodnik et al. describe a tiered dynamic array data structure, which wastes only n1/2 space for n elements at any point in time, and they prove a lower bound showing that any dynamic array must waste this much space if the operations are to remain amortized constant time. Additionally, they present a variant where growing and shrinking the buffer has not only amortized but worst-case constant time.

Bagwell (2002) presented the VList
VList
In computer science, the VList is a persistent data structure designed by Phil Bagwell in 2002 that combines the fast indexing of arrays with the easy extension of cons-based linked lists....

algorithm, which can be adapted to implement a dynamic array.

## Language support

C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

's `std::vector` is an implementation of dynamic arrays, as are the `ArrayList` classes supplied with the Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

API and the .NET Framework
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

. The generic `List<>` class supplied with version 2.0 of the .NET Framework is also implemented with dynamic arrays. Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

's `list` datatype implementation is a dynamic array. Delphi and D
D (programming language)
The D programming language is an object-oriented, imperative, multi-paradigm, system programming language created by Walter Bright of Digital Mars. It originated as a re-engineering of C++, but even though it is mainly influenced by that language, it is not a variant of C++...

implement dynamic arrays at the language's core. Many scripting languages such as Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

and PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

offer dynamic arrays as a built-in primitive data type.