In
computer scienceComputer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, a
set is an abstract data structure that can store certain values, without any particular
orderIn mathematics, a sequence is an ordered list of objects . Like a set, it contains members , and the number of terms is called the length of the sequence. Unlike a set, order matters, and exactly the same elements can appear multiple times at different positions in the sequence...
, and no repeated values. It is a computer implementation of the
mathematicalMathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...
concept of a
finite set. Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set.
Some set data structures are designed for
static or
frozen sets that do not change after they are constructed. Static sets allow only query operations on their elements — such as checking whether a given value is in the set, or enumerating the values in some arbitrary order. Other variants, called
dynamic or
mutable sets, allow also the insertion and/or deletion of elements from the set.
A set can be implemented in many ways. For example, one can use a list, ignoring the order of the elements and taking care to avoid repeated values. Sets are often implemented using various flavors of
treesIn computer science, a tree is a widelyused data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...
,
trieIn computer science, a trie, or prefix tree, is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the...
s, or hash tables.
A set can be seen, and implemented, as a (partial)
associative arrayIn computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....
, in which the value of each keyvalue pair has the
unit typeIn the area of mathematical logic, and computer science known as type theory, a unit type is a type that allows only one value . The carrier associated with a unit type can be any singleton set. There is an isomorphism between any two such sets, so it is customary to talk about the unit type and...
.
In
type theoryIn mathematics, logic and computer science, type theory is any of several formal systems that can serve as alternatives to naive set theory, or the study of such formalisms in general...
, sets are generally identified with their
indicator function: accordingly, a set of values of type
may be denoted by
or
. (Subtypes and subsets may be modeled by refinement types, and quotient sets may be replaced by
setoidIn mathematics, a setoid is a set equipped with an equivalence relation.Setoids are studied especially in proof theory and in typetheoretic foundations of mathematics. Often in mathematics, when one defines an equivalence relation on a set, one immediately forms the quotient set...
s.) The characteristic function
F of a set
S is defined as:
In theory, many other abstract data structures can be viewed as set structures with additional operations and/or additional
axiomIn traditional logic, an axiom or postulate is a proposition that is not proven or demonstrated but considered either to be selfevident or to define and delimit the realm of analysis. In other words, an axiom is a logical statement that is assumed to be true...
s imposed on the standard operations. For example, an abstract heap can be viewed as a set structure with a
min(S)
operation that returns the element of smallest value.
Core settheoretical operations
One may define the operations of the
algebra of setsThe algebra of sets develops and describes the basic properties and laws of sets, the settheoretic operations of union, intersection, and complementation and the relations of set equality and set inclusion...
:

union(S,T)
: returns the unionIn set theory, the union of a collection of sets is the set of all distinct elements in the collection. The union of a collection of sets S_1, S_2, S_3, \dots , S_n\,\! gives a set S_1 \cup S_2 \cup S_3 \cup \dots \cup S_n. Definition :...
of sets S and T.

intersection(S,T)
: returns the intersectionIn mathematics, the intersection of two sets A and B is the set that contains all elements of A that also belong to B , but no other elements....
of sets S and T.

difference(S,T)
: returns the difference of sets S and T.

subset(S,T)
: a predicate that tests whether the set S is a subsetIn mathematics, especially in set theory, a set A is a subset of a set B if A is "contained" inside B. A and B may coincide. The relationship of one set being a subset of another is called inclusion or sometimes containment...
of set T.
Static sets
Typical operations that may be provided by a static set structure
S are:

is_element_of(x,S)
: checks whether the value x is in the set S.

is_empty(S)
: checks whether the set S is empty.

size(S)
or cardinality(S)
: returns the number of elements in S.

iterateIn computer programming, an iterator is an object that enables a programmer to traverse a container. Various types of iterators are often provided via a container's interface...
(S)
: returns a function that returns one more value of S at each call, in some arbitrary order.

enumerate(S)
: returns a list containing the elements of S in some arbitrary order.

build(x_{1},x_{2},…,x_{n},)
: creates a set structure with values x_{1},x_{2},…,x_{n}.

create_from(collection)
: creates a new set structure containing all the elements of the given collectionIn computer science, a collection is a grouping of some variable number of data items that have some shared significance to the problem being solved and need to be operated upon together in some controlled fashion. Generally, the data items will be of the same type or, in languages supporting...
or all the elements returned by the given iteratorIn computer programming, an iterator is an object that enables a programmer to traverse a container. Various types of iterators are often provided via a container's interface...
.
Dynamic sets
Dynamic set structures typically add:

create
: creates a new, initially empty set structure.

create_with_capacity(n)
: creates a new set structure, initially empty but capable of holding up to n elements.

add(S,x)
: adds the element x to S, if it is not present already.

remove(S, x)
: removes the element x from S, if it is present.

capacity(S)
: returns the maximum number of values that S can hold.
Some set structures may allow only some of these operations. The cost of each operation will depend on the implementation, and possibly also on the particular values stored in the set, and the order in which they are inserted.
Additional operations
There are many other operations that can (in principle) be defined in terms of the above, such as:

pop(S)
: returns an arbitrary element of S, deleting it from S.

mapIn many programming languages, map is the name of a higherorder function that applies a given function to each element of a list, returning a list of results. They are examples of both catamorphisms and anamorphisms...
(F,S)
: returns the set of distinct values resulting from applying function F to each element of S.

filterIn functional programming, filter is a higherorder function that processes a data structure in some order to produce a new data structure containing exactly those elements of the original data structure for which a given predicate returns the boolean value true.Example:In Haskell, the code...
(P,S)
: returns the subset containing all elements of S that satisfy a given predicate P.

foldIn functional programming, fold – also known variously as reduce, accumulate, compress, or inject – are a family of higherorder functions that analyze a recursive data structure and recombine through use of a given combining operation the results of recursively processing its...
(A_{0},F,S)
: returns the value A_{S} after applying A_{i+1} := F(A_{i}, e)
for each element e of S.

clear(S)
: delete all elements of S.

equal(S_{1}, S_{2})
: checks whether the two given sets are equal (i.e. contain all and only the same elements).

hash(S)
: returns a hash value for the static set S such that if equal(S_{1}, S_{2})
then hash(S_{1}) = hash(S_{2})
Other operations can be defined for sets with elements of a special type:

sum(S)
: returns the sum of all elements of S for some definition of "sum". For example, over integers or reals, it may be defined as fold(0, add, S)
.

nearest(S,x)
: returns the element of S that is closest in value to x (by some metricIn mathematics, a metric or distance function is a function which defines a distance between elements of a set. A set with a metric is called a metric space. A metric induces a topology on a set but not all topologies can be generated by a metric...
).
Implementations
Sets can be implemented using various
data structureIn computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
s, which provide different time and space tradeoffs for various operations. Some implementations are designed to improve the efficiency of very specialized operations, such as
nearest
or
union
. Implementations described as "general use" typically strive to optimize the
element_of
,
add
, and
delete
operations.
Sets are commonly implemented in the same way as associative arrays, namely, a
selfbalancing binary search treeIn computer science, a selfbalancing binary search tree is any node based binary search tree that automatically keeps its height small in the face of arbitrary item insertions and deletions....
for sorted sets (which has O(log n) for most operations), or a
hash tableIn computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
for unsorted sets (which has O(1) averagecase, but O(n) worstcase, for most operations). A sorted linear hash table may be used to provide deterministically ordered sets.
Other popular methods include arrays. In particular a subset of the integers 1..
n can be implemented efficiently as an
nbit
bit array, which also support very efficient union and intersection operations. A Bloom map implements a set probabilistically, using a very compact representation but risking a small chance of false positives on queries.
The Boolean set operations can be implemented in terms of more elementary operations (
pop
,
clear
, and
add
), but specialized algorithms may yield lower asymptotic time bounds. If sets are implemented as sorted lists, for example, the naive algorithm for
union(S,T)
will take code proportional to the length
m of
S times the length
n of
T; whereas a variant of the
list merging algorithmMerge algorithms are a family of algorithms that run sequentially over multiple sorted lists, typically producing more sorted lists as output. This is wellsuited for machines with tape drives...
will do the job in time proportional to
m+
n. Moreover, there are specialized set data structures (such as the unionfind data structure) that are optimized for one or more of these operations, at the expense of others.
Language support
One of the earliest languages to support sets was Pascal; many languages now include it, whether in the core language or in a
standard libraryA standard library for a programming language is the library that is conventionally made available in every implementation of that language. In some cases, the library is described directly in the programming language specification; in other cases, the contents of the standard library are...
.
 Java
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer lowlevel facilities...
offers the interfaceIn the field of computer science, an interface is a tool and concept that refers to a point of interaction between components, and is applicable at the level of both hardware and software...
to support sets (with the class implementing it using a hash table), and the subinterface to support sorted sets (with the class implementing it using a binary search tree).
 Apple's Foundation framework
The Foundation Kit, or just Foundation for short, is an ObjectiveC framework in the OpenStep specification. It provides basic classes such as wrapper classes and data structure classes. This framework uses the prefix NS .NSObject:...
(part of CocoaCocoa is Apple's native objectoriented application programming interface for the Mac OS X operating system and—along with the Cocoa Touch extension for gesture recognition and animation—for applications for the iOS operating system, used on Apple devices such as the iPhone, the iPod Touch, and...
) provides the ObjectiveCObjectiveC is a reflective, objectoriented programming language that adds Smalltalkstyle messaging to the C programming language.Today, it is used primarily on Apple's Mac OS X and iOS: two environments derived from the OpenStep standard, though not compliant with it...
classes NSSet
, NSMutableSet
, NSCountedSet
, NSOrderedSet
, and NSMutableOrderedSet
. The CoreFoundation APIs provide the CFSet and CFMutableSet types for use in CC is a generalpurpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
.
 Python
Python is a generalpurpose, highlevel programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
has builtin set
and frozenset
types since 2.4, and since Python 3.0 and 2.7, supports nonempty set literals using a curlybracket syntax, e.g.: { x, y, z }
.
 The .NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...
provides the generic HashSet
and SortedSet
classes that implement the generic ISet
interface.
 Ruby
Ruby is a dynamic, reflective, generalpurpose objectoriented programming language that combines syntax inspired by Perl with Smalltalklike features. Ruby originated in Japan during the mid1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...
's standard library includes a set
module which contains Set
and SortedSet
classes that implement sets using hash tables, the latter allowing iteration in sorted order.
 OCaml's standard library contains a
Set
module, which implements a functional set data structure using binary search trees.
 The GHC
The Glorious Glasgow Haskell Compilation System, more commonly known as the Glasgow Haskell Compiler or GHC, is an open source native code compiler for the functional programming language Haskell. The lead developers are Simon Peyton Jones and Simon Marlow...
implementation of HaskellHaskell is a standardized, generalpurpose purely functional programming language, with nonstrict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a firstclass citizen" of the programming language. As a functional programming language, the...
provides a Data.Set
module, which implements a functional set data structure using binary search trees.
 The Tcl
Tcl is a scripting language created by John Ousterhout. Originally "born out of frustration", according to the author, with programmers devising their own languages intended to be embedded into applications, Tcl gained acceptance on its own...
TcllibTcllib is a collection of packages available for the Tcl programming language. Tcllib is distributed in both source code as well as precompiled binary formats...
package provides a set module which implements a set data structure based upon TCL lists.
As noted in the previous section, in languages which do not directly support sets but do support
associative arrayIn computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....
s, sets can be emulated using associative arrays, by using the elements as keys, and using a dummy value as the values, which are ignored.
In C++
In
C++C++ is a statically typed, freeform, multiparadigm, compiled, generalpurpose programming language. It is regarded as an intermediatelevel language, as it comprises a combination of both highlevel and lowlevel language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
, the
Standard Template LibraryThe Standard Template Library is a C++ software library which later evolved into the C++ Standard Library. It provides four components called algorithms, containers, functors, and iterators. More specifically, the C++ Standard Library is based on the STL published by SGI. Both include some...
(STL) provides the
set
template class, which implements a sorted set using a binary search tree;
SGISilicon Graphics International Corp. , is an American manufacturer of computer hardware and software, including highperformance computing solutions, x86based servers for datacenter deployment, and visualization products...
's STL also provides the
hash_set
template class, which implements a set using a hash table.
In sets, the elements themselves are the keys, in contrast to sequenced containers, where elements are accessed using their (relative or absolute) position. Set elements must have a strict weak ordering.
Some of the member functions in C++ and their description is given in the table below:
set
member functions
Signature(s)  Description 
iterator begin; 
Returns an iterator to the first element of the set. 
iterator end; 
Returns an iterator just before the end of the set. 
bool empty const; 
Checks if the set container is empty (i.e. has a size of 0) 
iterator find(const key_type &x) const; 
Searches the container for an element x and if found, returns an iterator to it, or else returns an iterator to set::end 

Inserts an element into the set. The first version returns pair, with its member pair::first set to an iterator pointing to either the newly inserted element or to the element that already had its same value in the set. The pair::second element in the pair is set to true if a new element was inserted or false if an element with the same value existed. And the second version returns an iterator either pointing to newly inserted element or to element having same value in set. 
void clear; 
Removes all elements in the set, making its size 0. 
Template parameters
Here
value_type
and
iterator
are member types defined in set containers. Firstly, value to be used to initialize inserted element. Then position of first element compared for the operation.It actually does not give the position where element is to be inserted but just an indication of possible insertion position in the container so as to make efficient insertion operation. Template type can be any type of input iterator. Iterators specify a range of elements and copies of these in range [first, last) are inserted in set.
Multiset
A variation of the set is the
multisetIn mathematics, the notion of multiset is a generalization of the notion of set in which members are allowed to appear more than once...
or
bag, which is the same as a set data structure, but allows repeated ("equal") values. It is possible for objects in computer science to be considered "equal" under some
equivalence relationIn mathematics, an equivalence relation is a relation that, loosely speaking, partitions a set so that every element of the set is a member of one and only one cell of the partition. Two elements of the set are considered equivalent if and only if they are elements of the same cell...
but still distinct under another relation. Some types of multiset implementations will store distinct equal objects as separate items in the data structure; while others will collapse it down to one version (the first one encountered) and keep a positive integer count of the multiplicity of the element.
 C++'s Standard Template Library provides the
multiset
class for the sorted multiset, and SGI's STL provides the hash_multiset
class, which implements a multiset using a hash table.
 For Java
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer lowlevel facilities...
, thirdparty libraries provide multiset functionality:
 Apache Commons Collections provides the
Bag
and SortedBag
interfaces, with implementing classes like HashBag
and TreeBag
.
 Google Collections provides the
Multiset
interface, with implementing classes like HashMultiset
and TreeMultiset
.
 Apple provides the
NSCountedSet
class as part of CocoaCocoa is Apple's native objectoriented application programming interface for the Mac OS X operating system and—along with the Cocoa Touch extension for gesture recognition and animation—for applications for the iOS operating system, used on Apple devices such as the iPhone, the iPod Touch, and...
, and the CFBag
and CFMutableBag
types as part of CoreFoundation.
 Python's
Python is a generalpurpose, highlevel programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
standard library includes collections.Counter
, which is similar to a multiset.
Where a multiset data structure is not available, a workaround is to use a regular set, but override the equality predicate of its items to always return "not equal" on distinct objects (however, such will still not be able to store multiple occurrences of the same object) or use an
associative arrayIn computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....
mapping the values to their integer multiplicities (this will not be able to distinguish between equal elements at all).