In
computer scienceComputer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, a
data structure is a particular way of storing and organizing data in a computer so that it can be used
efficientlyIn computer science, efficiency is used to describe properties of an algorithm relating to how much of various types of resources it consumes. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process, where the goal is to reduce...
.
Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks. For example,
B-treeIn computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children...
s are particularly well-suited for implementation of databases, while
compilerA compiler is a computer program that transforms source code written in a programming language into another computer language...
implementations usually use
hash tableIn computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
s to look up identifiers.
Data structures are used in almost every program or software system. Data structures provide a means to manage huge amounts of data efficiently, such as large
databaseA database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
s and
internet indexing serviceWeb indexing includes back-of-book-style indexes to individual websites or an intranet, and the creation of keyword metadata to provide a more useful vocabulary for Internet or onsite search engines...
s. Usually, efficient data structures are a key to designing efficient algorithms. Some formal design methods and
programming languageA programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s emphasize data structures, rather than algorithms, as the key organizing factor in software design.
Overview
- An array
In computer science, an array data structure or simply array is a data structure consisting of a collection of elements , each identified by at least one index...
stores a number of elements of the same type in a specific order. They are accessed using an integer to specify which element is required (although the elements may be of almost any type). Arrays may be fixed-length or expandable.
- Record
In computer science, a record is an instance of a product of primitive data types called a tuple. In C it is the compound data in a struct. Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed...
(also called tuple or struct) Records are among the simplest data structureIn computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
s. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually called fields or members.
- A hash
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
or dictionary or map is a more flexible variation on a record, in which name-value pairs can be added and deleted freely.
- Union. A union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g. "float or long integer". Contrast with a record
In computer science, a record is an instance of a product of primitive data types called a tuple. In C it is the compound data in a struct. Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed...
, which could be defined to contain a float and an integer; whereas, in a union, there is only one value at a time.
- A tagged union
In computer science, a tagged union, also called a variant, variant record, discriminated union, or disjoint union, is a data structure used to hold a value that could take on several different, but fixed types. Only one of the types can be in use at any one time, and a tag field explicitly...
(also called a variantVariant is a data type in certain programming languages, particularly Visual Basic and C++ when using the Component Object Model.In Visual Basic the Variant data type is a tagged union that can be used to represent any other data type except fixed-length string type and...
, variant record, discriminated union, or disjoint union) contains an additional field indicating its current type, for enhanced type safety.
- A set is an abstract data structure that can store specific values, without any particular order
In mathematics, a sequence is an ordered list of objects . Like a set, it contains members , and the number of terms is called the length of the sequence. Unlike a set, order matters, and exactly the same elements can appear multiple times at different positions in the sequence...
, and no repeated values. Values themselves are not retrieved from sets, rather one tests a value for membership to obtain a boolean "in" or "not in".
- An object
In computer science, an object is any entity that can be manipulated by the commands of a programming language, such as a value, variable, function, or data structure...
contains a number of data fields, like a record, and also a number of program code fragments for accessing or modifying them. Data structures not containing code, like those above, are called plain old data structure.
Many others are possible, but they tend to be further variations and compounds of the above.
Basic principles
Data structures are generally based on the ability of a computer to fetch and store data at any place in its memory, specified by an address—a bit string that can be itself stored in memory and manipulated by the program. Thus the
recordIn computer science, a record is an instance of a product of primitive data types called a tuple. In C it is the compound data in a struct. Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed...
and array data structures are based on computing the addresses of data items with arithmetic operations; while the
linked data structureIn computer science, a linked data structure is a data structure which consists of a set of data records linked together and organized by references ....
s are based on storing addresses of data items within the structure itself. Many data structures use both principles, sometimes combined in non-trivial ways (as in
XOR linkingAn XOR linked list is a data structure used in computer programming. They take advantage of the bitwise exclusive disjunction operation, here denoted by ⊕, to decrease storage requirements for doubly linked lists. An ordinary doubly linked list stores addresses of the previous and next list items...
)
The implementation of a data structure usually requires writing a set of
proceduresIn computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....
that create and manipulate instances of that structure. The efficiency of a data structure cannot be analyzed separately from those operations. This observation motivates the theoretical concept of an
abstract data typeIn computing, an abstract data type is a mathematical model for a certain class of data structures that have similar behavior; or for certain data types of one or more programming languages that have similar semantics...
, a data structure that is defined indirectly by the operations that may be performed on it, and the mathematical properties of those operations (including their space and time cost).
Language support
Most
assembly languageAn assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...
s and some low-level languages, such as
BCPLBCPL is a procedural, imperative, and structured computer programming language designed by Martin Richards of the University of Cambridge in 1966.- Design :...
, lack support for data structures. Many high-level programming languages, and some higher-level assembly languages, such as MASM, on the other hand, have special syntax or other built-in support for certain data structures, such as vectors (one-dimensional
arrayIn computer science, an array type is a data type that is meant to describe a collection of elements , each selected by one or more indices that can be computed at run time by the program. Such a collection is usually called an array variable, array value, or simply array...
s) in the
CC is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
language or multi-dimensional arrays in
PascalPascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...
.
Most programming languages feature some sorts of library mechanism that allows data structure implementations to be reused by different programs. Modern languages usually come with standard libraries that implement the most common data structures. Examples are the
C++C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
Standard Template LibraryThe Standard Template Library is a C++ software library which later evolved into the C++ Standard Library. It provides four components called algorithms, containers, functors, and iterators. More specifically, the C++ Standard Library is based on the STL published by SGI. Both include some...
, the
Java Collections FrameworkThe Java collections framework is a set of classes and interfaces that implement commonly reusable collection data structures.Although it is a framework, it works in a manner of a library...
, and
MicrosoftMicrosoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
's
.NET FrameworkThe .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...
.
Modern languages also generally support
modular programmingModular programming is a software design technique that increases the extent to which software is composed of separate, interchangeable components called modules by breaking down program functions into modules, each of which accomplishes one function and contains everything necessary to accomplish...
, the separation between the interface of a library module and its implementation. Some provide
opaque data typeIn computer science, an opaque data type is a user defined data type used like built-in data type. It is incompletely defined in an interface, so that ordinary client programs can only manipulate data of that type by calling procedures that have access to the missing information.-Overview:Opaque...
s that allow clients to hide implementation details.
Object-oriented programming languageThis is a list of object-oriented programming programming languages.-Languages with object-oriented features:*ABAP*Ada 95*AmigaE*BETA*Blue*Boo*C++*C#*COBOL*Cobra*ColdFusion*Common Lisp*COOL*CorbaScript*Clarion*CLU*Curl*D*Dylan*E*Eiffel...
s, such as
C++C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
,
JavaJava is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
and
.NET FrameworkThe .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...
use
classesIn object-oriented programming, a class is a construct that is used as a blueprint to create instances of itself – referred to as class instances, class objects, instance objects or simply objects. A class defines constituent members which enable these class instances to have state and behavior...
for this purpose.
Many known data structures have
concurrentIn computer science, a concurrent data structure is aparticular way of storing and organizing data for access bymultiple computing threads on a computer.Historically, such data structures were used on uniprocessor...
versions that allow multiple computing threads to access the data structure simultaneously.
See also
- List of data structures
- Plain old data structure
- Concurrent data structure
In computer science, a concurrent data structure is aparticular way of storing and organizing data for access bymultiple computing threads on a computer.Historically, such data structures were used on uniprocessor...
- Data model
A data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....
- Dynamization
In computer science, Dynamization is the process of transforming a static data structure into a dynamic one. Although static data structures may provide very good functionality and fast queries, their utility is limited because of their inability to grow/shrink fast, thus making them inapplicable...
- Linked data structure
In computer science, a linked data structure is a data structure which consists of a set of data records linked together and organized by references ....
- Persistent data structure
In computing, a persistent data structure is a data structure which always preserves the previous version of itself when it is modified; such data structures are effectively immutable, as their operations do not update the structure in-place, but instead always yield a new updated structure...
Further readings
- Peter Brass, Advanced Data Structures, Cambridge University Press
Cambridge University Press is the publishing business of the University of Cambridge. Granted letters patent by Henry VIII in 1534, it is the world's oldest publishing house, and the second largest university press in the world...
, 2008.
- Donald Knuth
Donald Ervin Knuth is a computer scientist and Professor Emeritus at Stanford University.He is the author of the seminal multi-volume work The Art of Computer Programming. Knuth has been called the "father" of the analysis of algorithms...
, The Art of Computer ProgrammingThe Art of Computer Programming is a comprehensive monograph written by Donald Knuth that covers many kinds of programming algorithms and their analysis....
, vol. 1. Addison-WesleyAddison-Wesley was a book publisher in Boston, Massachusetts, best known for its textbooks and computer literature. As well as publishing books, Addison-Wesley also distributed its technical titles through the Safari Books Online e-reference service...
, 3rd edition, 1997.
- Dinesh Mehta and Sartaj Sahni
Prof. Sartaj Kumar Sahni is an Indian computer scientist, now based in the USA, and is one of the pioneers in the field of data structures. He is a distinguished professor in the Department of Computer and Information Science and Engineering at the University of Florida.- Biography :Sahni received...
Handbook of Data Structures and Applications, Chapman and HallChapman & Hall was a British publishing house in London, founded in the first half of the 19th century by Edward Chapman and William Hall. Upon Hall's death in 1847, Chapman's cousin Frederic Chapman became partner in the company, of which he became sole manager upon the retirement of Edward...
/CRC PressThe CRC Press, LLC is a publishing group which specializes in producing technical books. While many of their books relate to engineering, science and mathematics, their scope also includes books on business, forensics and information technology...
, 2007.
- Niklaus Wirth
Niklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages.-Biography:Wirth...
, Algorithms and Data Structures, Prentice HallPrentice Hall is a major educational publisher. It is an imprint of Pearson Education, Inc., based in Upper Saddle River, New Jersey, USA. Prentice Hall publishes print and digital content for the 6-12 and higher-education market. Prentice Hall distributes its technical titles through the Safari...
, 1985.
External links