Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Algebraic data type

Algebraic data type

Overview
In computer programming
Computer programming
Computer programming is the process of writing, testing, debugging/troubleshooting, and maintaining the source code of computer programs. This source code is written in a programming language. The code may be a modification of an existing source or something completely new...

, an algebraic data type (sometimes also called a variant type) is a datatype each of whose values
Value (computer science)
In computer science, a value is a sequence of bits that is interpreted according to some data type. It is possible for the same sequence of bits to have different values, depending on the type used to interpret its meaning...

 is data from other datatypes wrapped in one of the constructors of the datatype. Any wrapped datum
Data
The term data means groups of information that represent the qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables...

 is an argument to the constructor. In contrast to other datatypes, the constructor is not executed and the only way to operate on the data is to unwrap the constructor using pattern matching
Pattern matching
In computer science, pattern matching is the act of checking for the presence of the constituents of a given pattern. In contrast to pattern recognition, the pattern is rigidly specified. Such a pattern concerns conventionally either sequences or tree structures...

.

The most common algebraic data type is a list with two constructors: Nil or [] for an empty list, and Cons (an abbreviation of constructor), ::, or : for the combination of a new element with a shorter list (for example (Cons 1 '(2 3 4)) or 1:[2,3,4]).

Special cases of algebraic types are product type
Product type
In programming languages and type theory, the product of two types is the type that characterizes the expressions which behaves, with respect to the evaluation mechanism, as pairs whose first component is an expression of the first type and whose second component is an expression of the second...

s i.e.
Discussion
Ask a question about 'Algebraic data type'
Start a new discussion about 'Algebraic data type'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia
In computer programming
Computer programming
Computer programming is the process of writing, testing, debugging/troubleshooting, and maintaining the source code of computer programs. This source code is written in a programming language. The code may be a modification of an existing source or something completely new...

, an algebraic data type (sometimes also called a variant type) is a datatype each of whose values
Value (computer science)
In computer science, a value is a sequence of bits that is interpreted according to some data type. It is possible for the same sequence of bits to have different values, depending on the type used to interpret its meaning...

 is data from other datatypes wrapped in one of the constructors of the datatype. Any wrapped datum
Data
The term data means groups of information that represent the qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables...

 is an argument to the constructor. In contrast to other datatypes, the constructor is not executed and the only way to operate on the data is to unwrap the constructor using pattern matching
Pattern matching
In computer science, pattern matching is the act of checking for the presence of the constituents of a given pattern. In contrast to pattern recognition, the pattern is rigidly specified. Such a pattern concerns conventionally either sequences or tree structures...

.

The most common algebraic data type is a list with two constructors: Nil or [] for an empty list, and Cons (an abbreviation of constructor), ::, or : for the combination of a new element with a shorter list (for example (Cons 1 '(2 3 4)) or 1:[2,3,4]).

Special cases of algebraic types are product type
Product type
In programming languages and type theory, the product of two types is the type that characterizes the expressions which behaves, with respect to the evaluation mechanism, as pairs whose first component is an expression of the first type and whose second component is an expression of the second...

s i.e. records
Record (computer science)
In computer science, a record is one of the simplest data structures, consisting of two or more values or variables stored in consecutive memory positions; so that each component can be accessed by applying different offsets to the starting address.For example, a date may be stored as a record...

 (only one constructor), sum types or tagged unions (many constructors with a single argument) and enumerated type
Enumerated type
In computer programming, an enumerated type is a data type consisting of a set of named values called elements, members or enumerators of the type. The enumerator names are usually identifiers that behave as constants in the language...

s (many constructors with no arguments). Algebraic types are one kind of composite type
Composite type
In computer science, composite data types are data types which can be constructed in a program using its programming language's primitive data types and other composite types...

 (i.e. a type formed by combining other types).

An algebraic data type may also be an abstract data type
Abstract data type
In computing, an abstract data type or abstract data structure is a mathematical model for a certain class of data structures that have similar behavior; or for certain data types of one or more programming languages that have similar semantics...

 (ADT) if it is exported from a module without its constructors. Values of such a type can only be manipulated using functions defined in the same module as the type itself.

In set theory
Set theory
The modern study of set theory was initiated by Cantor and Dedekind in the 1870s. After the discovery of paradoxes in informal set theory, numerous axiom systems were proposed in the early twentieth century, of which the Zermelo–Fraenkel axioms, with the axiom of choice, are the best-known.The...

 the equivalent of an algebraic data type is a disjoint union
Disjoint union
In mathematics, the term disjoint union may refer to one of two different concepts:* In set theory, a disjoint union is a modified union operation which indexes the elements according to which set they originated in;...

 – a set whose elements are pairs consisting of a tag (equivalent to a constructor) and an object of a type corresponding to the tag (equivalent to the constructor arguments).

An example


For example, in Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry.- History :...

 we can define a new algebraic data type, Tree:

data Tree = Empty
| Leaf Int
| Node Tree Tree


Here, Empty, Leaf and Node are called data constructors. Tree is a type constructor
Type constructor
In the area of mathematical logic and computer science known as type theory, a type constructor is a feature of a typed formal language that builds new types from old. Typical type constructors encountered are product types, function types, power types and list types. Basic types are considered...

(in this case a nullary one). In the rest of this article constructor shall mean data constructor. Similarly, in OCaml syntax the above example may be written:

type tree = Empty
| Leaf of int
| Node of tree * tree


In most languages that support algebraic data types, it's possible to define polymorphic types. Examples are given later in this article.

Somewhat similar to a function, a data constructor is applied to arguments of an appropriate type, yielding an instance of the data type to which the type constructor belongs. For instance, the data constructor Leaf is logically a function Int -> Tree, meaning that giving an integer as an argument to Leaf produces a value of the type Tree. As Node takes two arguments of the type Tree itself, the datatype is recursive
Recursive type
In computer programming languages, a recursive data type is a data type for values that may contain other values of the same type...

.

Operations on algebraic data types can be defined by using pattern matching
Pattern matching
In computer science, pattern matching is the act of checking for the presence of the constituents of a given pattern. In contrast to pattern recognition, the pattern is rigidly specified. Such a pattern concerns conventionally either sequences or tree structures...

 to retrieve the arguments. For example, consider a function to find the depth of a Tree, given here in Haskell:


depth :: Tree -> Int
depth Empty = 0
depth (Leaf n) = 1
depth (Node l r) = 1 + max (depth l) (depth r)


Thus, a Tree given to depth can be constructed using any of Empty, Leaf or Node and we must match for any of them respectively to deal with all cases. In case of Node, the pattern extracts the subtrees l and r for further processing.

Explanation


What is happening is that we have a datatype, which can be “one of several types of things.” Each “type of thing” is associated with an identifier called a constructor, which can be thought of as a kind of tag for that kind of data. Each constructor can carry with it a different type of data. A constructor could carry no data at all (e.g. "Empty" in the example above), carry one piece of data (e.g. “Leaf” has one Int value), or multiple pieces of data (e.g. “Node” has two Tree values).

When we want to do something with a value of this Tree algebraic data type, we deconstruct it using a process known as pattern matching. It involves matching the data with a series of patterns. The example function "depth" above pattern-matches its argument with three patterns. When the function is called, it finds the first pattern that matches its argument, performs any variable bindings that are found in the pattern, and evaluates the expression corresponding to the pattern.

Each pattern has a form that resembles the structure of some possible value of this datatype. The first pattern above simply matches values of the constructor Empty. The second pattern above matches values of the constructor Leaf. Patterns are recursive, so then the data that is associated with that constructor is matched with the pattern "n". In this case, a lowercase identifier represents a pattern that matches any value, which then is bound to a variable of that name — in this case, a variable “n” is bound to the integer value stored in the data type — to be used in the expression to be evaluated.

The recursion in patterns in this example are trivial, but a possible more complex recursive pattern would be something like Node (Node (Leaf 4) x) (Node y (Node Empty z)). Recursive patterns several layers deep are used for example in balancing red-black tree
Red-black tree
A red-black tree is a type of self-balancing binary search tree, a data structure used in computer science, typically used to implement associative arrays. The original structure was invented in 1972 by Rudolf Bayer : who called them "symmetric binary B-trees", but acquired its modern name in a...

s, which involve cases that require looking at colors several layers deep.

The example above is operationally equivalent to the following pseudocode:

if data.constructor

Empty:
return 0
else if data.constructor

Leaf:
let n = data.field1
return 1
else if data.constructor Node:
let l = data.field1
let r = data.field2
return 1 + max (depth l) (depth r)

The comparison of this with pattern matching will point out some of the advantages of algebraic data types and pattern matching. First is type safety
Type safety
In computer science, type safety is a property of some programming languages that is defined differently by different communities, but most definitions involve the use of a type system to prevent certain erroneous or undesirable program behavior...

. The pseudocode above relies on the diligence of the programmer to not access field2 when the constructor is a Leaf, for example. Also, the type of field1 is different for Leaf and Node (for Leaf it is Int; for Node it is Tree), so the type system would have difficulties assigning a static type to it in a safe way in a traditional record
Record (computer science)
In computer science, a record is one of the simplest data structures, consisting of two or more values or variables stored in consecutive memory positions; so that each component can be accessed by applying different offsets to the starting address.For example, a date may be stored as a record...

 data structure. However, in pattern matching, the type of each extracted value is checked based on the types declared by the relevant constructor, and how many values you can extract is known based on the constructor, so it does not face these problems.

Second, in pattern matching, the compiler statically checks that all cases are handled. If one of the cases of the “depth” function above were missing, the compiler would issue a warning, indicating that a case is not handled. This task may seem easy for the simple patterns above, but when you have many complicated recursive patterns, the task becomes difficult for the average human (or compiler, if it has to check arbitrary nested if-else constructs) to handle. Similarly, there may be patterns which never match (i.e. it is already covered by previous patterns), and the compiler can also check and issue warnings for these, as they may indicate an error in reasoning.

Do not confuse these patterns with regular expression
Regular expression
In computing, regular expressions provide a concise and flexible means for identifying strings of text of interest, such as particular characters, words, or patterns of characters...

 patterns used in string pattern matching. The purpose is similar — to check whether a piece of data matches certain constraints, and if so, extract relevant parts of it for processing — but the mechanism is very different. This kind of pattern matching on algebraic data types matches on the structural properties of an object rather than on the character sequence of strings.
Theory
A general algebraic data type is a possibly recursive sum type of product type
Product type
In programming languages and type theory, the product of two types is the type that characterizes the expressions which behaves, with respect to the evaluation mechanism, as pairs whose first component is an expression of the first type and whose second component is an expression of the second...

s. Each constructor tags a product type to separate it from others, or if there is only one constructor, the data type is a product type. Further, the parameter types of a constructor are the factors of the product type. A parameterless constructor corresponds to the empty product
Empty product
In mathematics, an empty product, or nullary product, is the result of multiplying no numbers. Its numerical value is 1, the multiplicative identity, just as the empty sum—the result of adding no numbers—is zero, or the additive identity...

. If a datatype is recursive, the entire sum of products is wrapped in a recursive type
Recursive type
In computer programming languages, a recursive data type is a data type for values that may contain other values of the same type...

, and each constructor also rolls the datatype into the recursive type.

For example, the Haskell datatype:


data List a = Nil | Cons a (List a)


is represented in type theory
Type theory
In mathematics, logic and computer science, type theory is any of several formal systems that can serve as alternatives to naive set theory, or the study of such formalisms in general...

 as

with constructors and .

The Haskell List datatype can also be represented in type theory in a slightly different form, as follows:
.
(Note how the and constructs are reversed relative to the original.) The original formation specified a type function whose body was a recursive type; the revised version specifies a recursive function on types. (We use the type variable to suggest a function rather than a "base type" like , since is like a Greek "f".) Note that we must also now apply the function to its argument type in the body of the type.

For the purposes of the List example, these two formulations are not significantly different; but the second form allows one to express so-called nested data types, i.e., those where the recursive type differs parametrically from the original. (For more information on nested data types, see the works of Richard Bird, Lambert Meertens
Lambert Meertens
Lambert Meertens is a Dutch computer scientist and professor.In the 1960s, Meertens applied affix grammars to the description and composition of music, and obtained a special prize from the jury at the 1968 IFIP Congress in Edinburgh for his computer-generated string quartet, "Quartet No...

 and Ross Paterson.)
Programming languages with algebraic data types
The following programming languages have algebraic data types as a first class notion:
  • F#
  • Haskell
    Haskell (programming language)
    Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry.- History :...

  • Hope
  • Miranda
  • Mythryl
    Mythryl
    Mythryl is a general-purpose, modular, functional programming language with compile-time type checking and type inferencesupporting both scripting and application development....

  • OCaml
  • Scala
  • Standard ML
    Standard ML
    Standard ML is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of theorem provers.SML is a modern descendant of the ML...

  • Visual Prolog
    Visual Prolog
    Visual Prolog, also formerly known as PDC Prolog and Turbo Prolog, is a strongly typed object-oriented extension of Prolog. As Turbo Prolog it was marketed by Borland, but it is now developed and marketed by the Danish firm Prolog Development Center that originally developed it...

  • Nemerle
    Nemerle
    Nemerle is a high-level statically typed programming language for the .NET platform. It offers functional, object-oriented and imperative features. It has a simple C#-like syntax and a powerful metaprogramming system....


See also
  • Tagged union
    Tagged union
    In computer science, a tagged union, also called a variant, variant record, discriminated union, or disjoint union, is a data structure used to hold a value that could take on several different, but fixed types. Only one of the types can be in use at any one time, and a tag field explicitly...

  • Disjoint union
    Disjoint union
    In mathematics, the term disjoint union may refer to one of two different concepts:* In set theory, a disjoint union is a modified union operation which indexes the elements according to which set they originated in;...

  • Type theory
    Type theory
    In mathematics, logic and computer science, type theory is any of several formal systems that can serve as alternatives to naive set theory, or the study of such formalisms in general...

  • Generalized algebraic data type
    Generalized Algebraic Data Type
    Generalized algebraic data types are generalizations of the algebraic data types of Haskell and ML, applying to parametric types.With this extension, the parameters of the return type of a data constructor can be freely chosen when declaring the constructor, while for algebraic data types in...

  • Initial algebra
    Initial algebra
    In mathematics, an initial algebra is an initial object in the category of F-algebras for a given endofunctor F. The initiality provides a general framework for induction and recursion....