All Topics  
Type inference

 

   Email Print
   Bookmark   Link






 

Type inference



 
 
Type inference, or implicit typing, refers to the ability to deduce automatically the type of a value in a programming language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
. It is a feature present in some strongly
Strongly-typed programming language

In computer science and computer programming, the term strong typing is used to describe those situations where programming languages specify one or more restrictions on how operations involving values having different data types can be intermixed....
 statically typed languages. It is often characteristic of — but not limited to — functional programming languages in general. Some languages that include type inference are: Ada
Ada (programming language)

Ada is a structured programming, statically typed, Imperative programming, and Object-oriented programming high-level language computer programming programming language, extended from Pascal and other languages....
, BitC
BitC

BitC is a programming language currently being developed by researchers at the Johns Hopkins University and , as part of the Coyotos project. The language has two primary objectives:...
, Boo, C# 3.0, Cayenne, Clean, Cobra
Cobra (programming language from Cobra Language LLC)

Cobra is an object-oriented programming language produced by Cobra Language LLC. Cobra is designed by Chuck Esterbrook, and runs on the Microsoft .NET and Mono frameworks....
, D
D (programming language)

The D programming language, also known simply as D, is an Object-oriented programming, Imperative programming, Multi-paradigm programming language system programming language by Walter Bright of Digital Mars....
, Epigram, F#, Haskell
Haskell (programming language)

Haskell is a standardized, purely functional programming language with non-strict programming language, named after logician Haskell Curry. The goals of the language are described as:...
, ML, Nemerle
Nemerle

Nemerle is a high level language static typing programming language for the Microsoft .NET platform. It offers functional programming, object-oriented and imperative programming features....
, OCaml, Oxygene, Scala, and Visual Basic .NET 9.0
Visual Basic .NET

Visual Basic , formerly called Visual Basic .NET , is an object-oriented programming computer language that can be viewed as an evolution of Microsoft Visual Basic implemented on the .NET Framework....
.






Discussion
Ask a question about 'Type inference'
Start a new discussion about 'Type inference'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Type inference, or implicit typing, refers to the ability to deduce automatically the type of a value in a programming language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
. It is a feature present in some strongly
Strongly-typed programming language

In computer science and computer programming, the term strong typing is used to describe those situations where programming languages specify one or more restrictions on how operations involving values having different data types can be intermixed....
 statically typed languages. It is often characteristic of — but not limited to — functional programming languages in general. Some languages that include type inference are: Ada
Ada (programming language)

Ada is a structured programming, statically typed, Imperative programming, and Object-oriented programming high-level language computer programming programming language, extended from Pascal and other languages....
, BitC
BitC

BitC is a programming language currently being developed by researchers at the Johns Hopkins University and , as part of the Coyotos project. The language has two primary objectives:...
, Boo, C# 3.0, Cayenne, Clean, Cobra
Cobra (programming language from Cobra Language LLC)

Cobra is an object-oriented programming language produced by Cobra Language LLC. Cobra is designed by Chuck Esterbrook, and runs on the Microsoft .NET and Mono frameworks....
, D
D (programming language)

The D programming language, also known simply as D, is an Object-oriented programming, Imperative programming, Multi-paradigm programming language system programming language by Walter Bright of Digital Mars....
, Epigram, F#, Haskell
Haskell (programming language)

Haskell is a standardized, purely functional programming language with non-strict programming language, named after logician Haskell Curry. The goals of the language are described as:...
, ML, Nemerle
Nemerle

Nemerle is a high level language static typing programming language for the Microsoft .NET platform. It offers functional programming, object-oriented and imperative programming features....
, OCaml, Oxygene, Scala, and Visual Basic .NET 9.0
Visual Basic .NET

Visual Basic , formerly called Visual Basic .NET , is an object-oriented programming computer language that can be viewed as an evolution of Microsoft Visual Basic implemented on the .NET Framework....
. This feature is planned for Fortress, C++0x
C++0x

C++0x is the planned new Open standard for the C++. It is intended to replace the existing C++ standard, ISO/IEC 14882, which was published in 1998 and updated in 2003....
 and Perl 6
Perl 6

Perl 6 is a planned major revision to the Perl programming language. It is a language specification which introduces elements of many modern and historical languages....
. The ability to infer types automatically makes many programming tasks easier, leaving the programmer free to omit type annotations
Type signature

Type signature is a term that is used in computer programming. A type signature defines the inputs and outputs for a subroutine or method ....
 while maintaining some level of type safety. Explicitly converting to another data type is called "casting" (or a "cast").

Nontechnical explanation


In most programming languages, all values have a type
Data type

A data type in programming languages is an attribute of a data which tells the computer something about the kind of data it is. This involves setting constraints on the datum, such as what values it can take and what operations may be performed upon it....
 which describes the kind of data a particular value describes. In some languages, the type is known only at runtime
Runtime

In computer science, runtime or run time describes the operation of a computer program, the duration of its execution, from beginning to termination ....
; these languages are dynamically typed. In other languages, the type is known at compile time
Compile time

In computer science, compile time refers to either the operations performed by a compiler , programming language requirements that must be met by source code for it to be successfully compiled , or properties of the program that can be reasoned about at compile time....
; these languages are statically typed. In statically typed languages, the input and output types of functions and local variable
Local variable

In computer science, a local variable is a variable that is given local scope . Such a variable is accessible only from the subroutine or statement block in which it is declared....
s ordinarily must be explicitly provided by type annotations. For example, in C
C (programming language)

C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
: int addone(int x) The beginning of this function definition, int addone(int x) declares that addone is a function which takes one argument, an integer
Integer (computer science)

In computer science, the term integer is used to refer to a data type which represents some finite subset of the mathematical integers. These are also known as integral data types....
, and returns an integer. int result; declares that the local variable result is an integer. In a proposed language where type inference is available, the code might be written like this instead: addone(x) This looks very similar to how code is written in a dynamically typed language, yet all types are known at compile time. In the imaginary language in which the last example is written, + always takes two integers and returns one integer (which is how it works in, for example, OCaml). From this, the type inferencer can infer that the value of x+1 is an integer, therefore result is an integer, therefore the return value of addone is an integer. Similarly, since + requires that both of its arguments be integers, x must be an integer, and therefore addone accepts one integer as an argument.

However, in the subsequent line, result2 is calculated by adding a decimal "1.0" with floating-point arithmetic, causing a conflict in the use of x for both integer and floating-point expressions. Such a situation would generate a compile-time error message. In older languages, result2 might have been implicitly declared as a floating-point variable, from implicitly converting integer x in the expression, simply because a decimal point was accidentally placed after the integer 1. Such a situation shows the difference between type inference, which does not involve type conversion, and implicit type conversion, which forces data to the higher-precision data type, often without restrictions.

Technical description


Type inference refers to the ability to deduce automatically, either partially or fully, the type of a value derived from the eventual evaluation of an expression. As this process is systematically performed at compile time, the compiler is often able to infer the type of a variable or the type signature
Type signature

Type signature is a term that is used in computer programming. A type signature defines the inputs and outputs for a subroutine or method ....
 of a function, without explicit type annotations having been given. In many cases, it is possible to omit type annotations from a program completely if the type inference system is robust enough, or the program or language simple enough.

To obtain the information required to infer correctly the type of an expression lacking an explicit type annotation, the compiler either gathers this information as an aggregate and subsequent reduction of the type annotations given for its subexpressions (which may themselves be variables or functions), or through an implicit understanding of the type of various atomic values (e.g., : Unit
Unit type

A unit type is a mathematical type theory that allows only one value .The carrier associated with a unit type can be any singleton set. There is an isomorphism between any two such sets, so it is customary to talk about the unit type and ignore the details of its value....
; true : Bool; 42 : Integer; 3.14159 : Real; etc.). It is through recognition of the eventual reduction of expressions to implicitly typed atomic values that the compiler for a type inferring language is able to compile a program completely without type annotations. In the case of highly complex forms of higher order programming
Higher order programming

Higher-order programming is a style of computer programming that exploits the theoretical ability to use functions as values; it is usually instantiated with, or borrowed from, models of computation like the lambda calculus which make heavy use of higher-order functions....
 and polymorphism, it is not always possible for the compiler to infer as much, however, and type annotations are occasionally necessary for disambiguation.

Example

For example, let us consider the Haskell
Haskell (programming language)

Haskell is a standardized, purely functional programming language with non-strict programming language, named after logician Haskell Curry. The goals of the language are described as:...
 function map, which applies a procedure to each element of a list, and may be defined as: map f [] = [] map f (first:rest) = f first : map f rest

From this, it is evident that the function map takes a list as its second argument, that its first argument f is a function that can be applied to the type of elements of that list, and that the result of map is constructed as a list with elements that are results of f. So assuming that a list contains elements of the same type, we can reliably construct a type signature map (a -> b) -> [a] -> [b] where the syntax "a -> b" denotes a function that takes an a as its parameter and produces a b. "a -> b -> c" is equivalent to "a -> (b -> c)".

Note that the inferred type of map is parametrically polymorphic
Type polymorphism

In computer science, polymorphism is a programming language feature that allows values of different data types to be handled using a uniform interface....
: The type of the arguments and results of f are not inferred, but left as type variables, and so map can be applied to functions and lists of various types, as long as the actual types match in each invocation.

Hindley–Milner type inference algorithm

The common algorithm used to perform type inference is the one now commonly referred to as Hindley–Milner, Damas–Milner algorithm. It has been referred to in the past as polymorphic type checking or Algorithm W.

The origin of this algorithm is the type inference algorithm for the simply typed lambda calculus
Simply typed lambda calculus

The simply typed lambda calculus is a typed lambda calculus of the lambda calculus with only one type combinator: . It is the canonical and simplest example of a typed lambda calculus....
, which was devised by Haskell B. Curry and Robert Feys
Robert Feys

Robert Feys was a logician and philosopher.In 1958 Feys and Haskell B. Curry devised the type inference algorithm for the simply typed lambda calculus ....
 in 1958. In 1969 Roger Hindley extended this work and proved that their algorithm always inferred the most general type. In 1978 Robin Milner
Robin Milner

Arthur John Robin Gorell Milner Fellow of the Royal Society FRSE is a prominent British computer scientist....
 , independently of Hindley's work, provided an equivalent algorithm, Algorithm W. In 1982 Luis Damas finally proved that Milner's algorithm is complete and extended it to support systems with polymorphic references.

The algorithm

The algorithm proceeds in two steps. First, we need to generate a number of equations to solve, then we need to solve them.

Generating the equations
The algorithm used for generating the equations is similar to a regular type checker, so let's consider first a regular type checker for the typed lambda calculus
Typed lambda calculus

A typed lambda calculus is a typed formalism that uses the lambda-symbol to denote anonymous function abstraction. Typed lambda calculi are foundational programming languages and are the base of typed functional programming languages such as ML programming language and Haskell and, more indirectly, typed imperative programming....
 given by

and

where is a primitive expression (such as "3") and is a primitive type (such as "Integer").

We want to construct a function which maps a pair , where is a type environment and is a term, to some type . We assume that this function is already defined on primitives. The other cases are:

  • where is in
  • where and
  • where


Note that so far we do not specify what to do when we fail to meet the various conditions. This is because, in the simple type checking algorithm, the check simply fails whenever anything goes wrong.

Now, we develop a more sophisticated algorithm that can deal with type variables and constraints on them. Therefore, we extend the set T of primitive types to include an infinite supply of variables, denoted by lowercase Greek letters

See for more details.

Solving the equations
Solving the equations proceeds by unification
Unification

In mathematical logic, in particular as applied to computer science, a unification of two terms is a join with respect to a specialisation order....
. This is—perhaps surprisingly—a rather simple algorithm. The function operates on type equations and returns a structure called a "substitution". A substitution is simply a mapping from type variables to types. Substitutions can be composed and extended in the obvious ways.

Unifying the empty set of equations is easy enough: , where is the identity substitution.

Unifying a variable with a type goes this way: , where is the substitution composition operator, and is the set of remaining constraints with the new substitution applied to it.

Of course, .

The interesting case remains as .

A simple example would be a[i] = b[i] (assume this to be c-like syntax for this example). First Hindley-Milner would find that i must be of type int, further more that 'a' must be an "array of " and 'b' must an "array of ". Now, since there is an assignment of to , must be of the same type (assuming no implicit type conversions) as . In the very least must be a supertype of .

See also

  • Duck typing
    Duck typing

    In computer programming, duck typing is a style of dynamic typing in which an object's current set of Method s and properties determines the valid semantics, rather than its inheritance from a particular class or implementation of a specific interface....
    , an analogous concept in languages with dynamic or weak typing


External links

  • by Roger Hindley explaining the history of type inference
  • of Hindley-Milner in Perl 5, by Nikita Borisov (via Internet Archive
    Internet Archive

    The Internet Archive is a nonprofit organization dedicated to building and maintaining a free and openly accessible online digital library, including an archive site of the World Wide Web....
    , version Sep 112005)