Type safety
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, type safety is the extent to which a programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....

 discourages or prevents type errors. A type error is erroneous or undesirable program behaviour caused by a discrepancy between differing data type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...

s. Type safety is sometimes alternatively considered to be a property of a computer program rather than the language in which that program is written; that is, some languages have type-safe facilities that can be circumvented by programmers who adopt practices that exhibit poor type safety. The formal type-theoretic
Type theory
In mathematics, logic and computer science, type theory is any of several formal systems that can serve as alternatives to naive set theory, or the study of such formalisms in general...

 definition of type safety is considerably stronger than what is understood by most programmers.

Type enforcement
Type enforcement
The concept of type enforcement in the field of information technology is related to access control. Implementing TE, gives priority to “mandatory access control” over “discretionary access control” . Access clearance is first given to a subject accessing objects based on rules defined in an...

 can be static, catching potential errors at compile time
Compile time
In computer science, compile time refers to either the operations performed by a compiler , programming language requirements that must be met by source code for it to be successfully compiled , or properties of the program that can be reasoned about at compile time.The operations performed at...

, or dynamic, associating type information with values at run-time and consulting them as needed to detect imminent errors, or a combination of both.

The behaviors classified as type errors by a given programming language are usually those that result from attempts to perform operations on values
Value (computer science)
In computer science, a value is an expression which cannot be evaluated any further . The members of a type are the values of that type. For example, the expression "1 + 2" is not a value as it can be reduced to the expression "3"...

 that are not of the appropriate data type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...

. This classification is partly based on opinion; some language designers and programmers argue that any operation not leading to program crashes, security flaws or other obvious failures is legitimate and need not be considered an error, while others consider any contravention of the programmer's explicit intent (as communicated via typing annotations) to be erroneous and not "type-safe."

In the context of static (compile-time) type systems, type safety usually involves (among other things) a guarantee that the eventual value of any expression
Expression (programming)
An expression in a programming language is a combination of explicit values, constants, variables, operators, and functions that are interpreted according to the particular rules of precedence and of association for a particular programming language, which computes and then produces another value...

 will be a legitimate member of that expression's static type. The precise requirement is more subtle than this — see, for example, subtype
Subtype
In programming language theory, subtyping or subtype polymorphism is a form of type polymorphism in which a subtype is a datatype that is related to another datatype by some notion of substitutability, meaning that program constructs, typically subroutines or functions, written to operate on...

 and polymorphism for complications.

Type safety is closely linked to memory safety
Memory safety
Memory safety is a concern in software development that aims to avoid software bugs that cause security vulnerabilities dealing with random-access memory access, such as buffer overflows and dangling pointers....

, a restriction on the ability to copy arbitrary bit patterns from one memory location to another. For instance, in an implementation of a language that has some type , such that some sequence of bits (of the appropriate length) does not represent a legitimate member of , if that language allows data to be copied into a variable
Variable (programming)
In computer programming, a variable is a symbolic name given to some known or unknown quantity or information, for the purpose of allowing the name to be used independently of the information it represents...

 of type , then it is not type-safe because such an operation might assign a non- value to that variable. Conversely, if the language is type-unsafe to the extent of allowing an arbitrary integer to be used as a pointer, then it is clearly not memory-safe.

Most statically typed languages provide a degree of type safety that is strictly stronger than memory safety, because their type systems enforce the proper use of abstract data type
Abstract data type
In computing, an abstract data type is a mathematical model for a certain class of data structures that have similar behavior; or for certain data types of one or more programming languages that have similar semantics...

s defined by programmers even when this is not strictly necessary for memory safety or for the prevention of any kind of catastrophic failure.

Definitions

Type-safe code accesses only the memory locations it is authorized to access. (For this discussion, type safety specifically refers to memory type safety and should not be confused with type safety in a broader respect.) For example, type-safe code cannot read values from another object's private fields.

Robin Milner
Robin Milner
Arthur John Robin Gorell Milner FRS FRSE was a prominent British computer scientist.-Life, education and career:...

 provided the following slogan to describe type safety:
"Well-typed programs never go wrong."

The appropriate formalization of this slogan depends on the style of formal semantics used for a particular language. In the context of denotational semantics
Denotational semantics
In computer science, denotational semantics is an approach to formalizing the meanings of programming languages by constructing mathematical objects which describe the meanings of expressions from the languages...

, type safety means that the value of an expression that is well-typed, say with type τ, is a bona fide member of the set corresponding to τ.

In 1994, Andrew Wright
Andrew Wright
Andrew Wright is a Canadian multimedia artist from Ottawa, Ontario. He is best known for his work with video and large-scale photography. He holds a specialist degree in Art and Art History from the University of Toronto and a Masters in Fine Arts at the University of Windsor...

 and Matthias Felleisen
Matthias Felleisen
Matthias Felleisen is a computer science professor and an author of German background.Felleisen is currently a Trustee Professor in the College of Computer and Information Science at Northeastern University in Boston, Massachusetts. In the past he has taught at Rice University after receiving his...

 formulated what is now the standard definition and proof technique for type safety in languages defined by operational semantics
Operational semantics
In computer science, operational semantics is a way to give meaning to computer programs in a mathematically rigorous way. Operational semantics are classified into two categories: structural operational semantics formally describe how the individual steps of a computation take place in a...

. Under this approach, type safety is determined by two properties of the semantics of the programming language:

(Type-) preservation: "Well typedness" of programs remains invariant under the transition rules (i.e. evaluation rules or reduction rules) of the language.
Progress: A well typed program never gets "stuck", i.e., never gets into an undefined state where no further transitions are possible.

These properties do not exist in a vacuum; they are linked to the semantics of the programming language they describe, and there is a large space of varied languages that can fit these criteria, since the notion of "well typed" program is part of the static semantics of the programming language and the notion of "getting stuck" (or "going wrong") is a property of its dynamic semantics.

Vijay Saraswat provides the following definition:
"A language is type-safe if the only operations that can be performed on data in the language are those sanctioned by the type of the data."

Relation to other forms of safety

Type safety is ultimately aimed at excluding other problems, eg:-
  • Prevention of Illegal operations. For example, we can identify an expression 3 / "Hello, World" as invalid, because the rules of arithmetic
    Arithmetic
    Arithmetic or arithmetics is the oldest and most elementary branch of mathematics, used by almost everyone, for tasks ranging from simple day-to-day counting to advanced science and business calculations. It involves the study of quantity, especially as the result of combining numbers...

     do not specify how to divide an integer
    Integer
    The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...

     by a string
    String (computer science)
    In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

    . As discussed below, strong typing offers more safety, but generally does not guarantee complete safety.
  • Memory safety
    Memory safety
    Memory safety is a concern in software development that aims to avoid software bugs that cause security vulnerabilities dealing with random-access memory access, such as buffer overflows and dangling pointers....

    • Wild pointers can arise when a pointer to one type object is treated as a pointer to another type. For instance, the size of an object depends on the type, so if a pointer is incremented under the wrong credentials, it will end up pointing at some random area of memory.
    • Buffer overflow
      Buffer overflow
      In computer security and programming, a buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory. This is a special case of violation of memory safety....

      - Out-of bound writes can corrupt the contents of object already present on the heap. This can occur when a larger object of one type is crudely copied into smaller object of another type.
  • Logic error
    Logic error
    In computer programming, a logic error is a bug in a program that causes it to operate incorrectly, but not to terminate abnormally . A logic error produces unintended or undesired output or other behavior, although it may not immediately be recognized as such.Logic errors occur in both compiled...

    s originating in the semantics
    Semantics
    Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

     of different types. For instance, inches and millimeters may both be stored as integers, but should not be substituted for each other or added. A type system can enforce two different types of integer for them.

Type-safe and type-unsafe languages

Type safety is usually a requirement for any toy language
Toy language
A toy language is a term for a computer programming language that is not considered to fulfill the robustness or completeness requirement of a computer programming language. As such it is not considered a suitable language for creating solid and reliable programs for use in production environments....

 proposed in academic programming language research. Many languages, on the other hand, are too big for human-generated type-safety proofs, as they often require checking thousands of cases. Nevertheless, some languages such as Standard ML
Standard ML
Standard ML is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of theorem provers.SML is a modern descendant of the ML...

, which has rigorously defined semantics, have been proved to meet one definition of type safety. Some other languages such as Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...

 are believed to meet some definition of type safety, provided certain "escape" features are not used (for example Haskell's unsafePerformIO, used to "escape" from the usual restricted environment in which I/O is possible, circumvents the type system and so can be used to break type safety.) Type punning
Type punning
In computer science, type punning is a common term for any programming technique that subverts or circumvents the type system of a programming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal language.In C and C++, constructs...

 is another example of such an "escape" feature. Regardless of the properties of the language definition, certain errors may occur at run-time due to bugs in the implementation, or in linked libraries
Library (computer science)
In computer science, a library is a collection of resources used to develop software. These may include pre-written code and subroutines, classes, values or type specifications....

 written in other languages; such errors could render a given implementation type unsafe in certain circumstances. An early version of Sun's Java Virtual Machine was vulnerable to this sort of problem.

Type safety and strong typing

Type safety is synonymous with one of the many definitions of strong typing; but type safety and dynamic typing are mutually compatible. A dynamically typed language such as Smalltalk
Smalltalk
Smalltalk is an object-oriented, dynamically typed, reflective programming language. Smalltalk was created as the language to underpin the "new world" of computing exemplified by "human–computer symbiosis." It was designed and created in part for educational use, more so for constructionist...

 can be seen as a strongly typed language with a very permissive type system where any syntactically correct program is well-typed; as long as its dynamic semantics ensures that no such program ever "goes wrong" in an appropriate sense, it satisfies the definition above and can be called type-safe.

Ada

Ada was designed to be suitable for embedded system
Embedded system
An embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...

s, device driver
Device driver
In computing, a device driver or software driver is a computer program allowing higher-level computer programs to interact with a hardware device....

s and other forms of system programming
System programming
System programming is the activity of programming system software. The primary distinguishing characteristic of systems programming when compared to application programming is that application programming aims to produce software which provides services to the user System programming (or systems...

, but also to encourage type safe programming. To resolve these conflicting goals, Ada confines type-unsafety to a certain set of special constructs whose names usually begin with the string Unchecked_. Unchecked_Deallocation can be effectively banned from a unit of Ada text by applying pragma Pure to this unit. It is expected that programmers will use Unchecked_ constructs very carefully and only when necessary; programs that do not use them are type safe.

The SPARK programming language
SPARK programming language
SPARK is a formally-defined computer programming language based on the Ada programming language, intended to be secure and to support the development of high integrity software used in applications and systems where predictable and highly reliable operation is essential either for reasons of safety...

 is a subset of Ada eliminating all its potential ambiguities and insecurities while at the same time adding statically checked contracts
Design by contract
Design by contract , also known as programming by contract and design-by-contract programming, is an approach to designing computer software...

 to the language features available. SPARK avoids the issues with dangling pointers by disallowing allocation at run time entirely.

C

The C programming language
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

 is typesafe in limited contexts; for example, a compile-time error is generated when an attempt is made to convert a pointer to one type of structure to a pointer to another type of structure, unless an explicit cast is used. However, a number of very common operations are non-typesafe; for example, the usual way to print an integer is something like printf("%d", 12), where the %d tells printf at run-time to expect an integer argument. (Something like printf("%s", 12), which erroneously tells the function to expect a pointer to a character-string, will be accepted by compilers, but will produce undefined results.) In addition, C, like Ada, provides unspecified or undefined explicit conversions; and unlike in Ada, idioms that use these conversions are very common, and have helped to give C a type-unsafe reputation. For example, the standard way to allocate memory on the heap is to invoke a memory allocation-function (malloc) with an argument indicating how many bytes are required. The function returns an untyped pointer (type void *), which the calling code must cast to the appropriate pointer type. Expressions such as (struct foo *) malloc(sizeof(struct foo)) are therefore quite common.

C++

C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 is more type-safe than C when used properly (avoiding the use of void pointers and casting between pointers of two types). Features of C++ that promote code that is less at risk of type errors include:
  • The new operator returns a pointer of a specific type based on the operand, versus the void pointer from C's malloc
    Malloc
    C dynamic memory allocation refers to performing dynamic memory allocation in the C via a group of functions in the C standard library, namely malloc, realloc, calloc and free....

    .
  • Certain code written in C that uses void pointers can be rewritten using C++ templates
    Template (programming)
    Templates are a feature of the C++ programming language that allow functions and classes to operate with generic types. This allows a function or class to work on many different data types without being rewritten for each one....

     to give a type to an argument whose type is variable.
  • #define constants (no type) can be rewritten as const
    Constant (programming)
    In computer programming, a constant is an identifier whose associated value cannot typically be altered by the program during its execution...

     (typed) variables.
  • #define macro procedures (calculation having no specific type) can be rewritten as inline
    Inline function
    In various versions of the C and C++ programming languages, an inline function is a function upon which the compiler has been requested to perform inline expansion...

     (returning strict type) functions. Flexibility of accepting and returning wide typerange of data types can be attained by function overloading.
  • Additional casting operators such as dynamic cast
    Dynamic cast
    In the C++ programming language, the dynamic_cast operator is a part of the run-time type information system that performs a typecast. Unlike an ordinary C-style typecast, a type safety check is performed at runtime, and if the types are not compatible, an exception will be thrown or a null...

    , which allow for more specific checks than a C static cast
    Static cast
    In C++ type conversion, the static_cast operator changes expressions of one static type to objects and values of another static type.-Syntax:static_cast ;...

    .

C#

C# is type-safe. It has support for untyped pointers, but this must be accessed using the "unsafe" keyword which can be prohibited at the compiler level. It has inherent support for run-time cast validation. Casts can be validated by using the "as" keyword that will return a null reference if the cast is invalid, or by using a C-style cast that will throw an exception if the cast is invalid. See C Sharp Conversion Operators.

Undue reliance on the object type (from which all other types are derived) runs the risk of defeating the purpose of the C# type system. It is usually better practice to abandon object references in favour of generics, similar to templates in C++ and generics in Java
Generics in Java
Generics are a facility of generic programming that was added to the Java programming language in 2004 as part of J2SE 5.0. They allow "a type or method to operate on objects of various types while providing compile-time type safety." A common use of this feature is when using a Java Collection...

.

Cyclone

Cyclone is a type-safe language. It does not require a virtual machine or garbage collection to achieve type safety during runtime. The syntax is very similar to C.

Standard ML

SML
Standard ML
Standard ML is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of theorem provers.SML is a modern descendant of the ML...

 has rigorously defined semantics and is known to be type-safe. However, some implementations of SML, including Standard ML of New Jersey
Standard ML of New Jersey
Standard ML of New Jersey is a compiler and programming environment for Standard ML. Aside from its runtime system, which is written in C, SML/NJ is written in Standard ML...

 (SML/NJ), its syntactic variant Mythryl and Mlton
MLton
MLton is an open source, whole-program optimizing compiler for the Standard ML programming language.MLton aims to produce fast executables, and to encourage rapid prototyping and modular programming by eliminating performance penalties often associated with the use of high-level language...

, provide libraries that offer certain unsafe operations. These facilities are often used in conjunction with those implementations' foreign function interface
Foreign function interface
A foreign function interface is a mechanism by which a program written in one programming language can call routines or make use of services written in another. The term comes from the specification for Common Lisp, which explicitly refers to the language features for inter-language calls as...

s to interact with non-ML code (such as C libraries) that may require data laid out in specific ways. Another example is the SML/NJ interactive toplevel
Read-eval-print loop
A read–eval–print loop , also known as an interactive toplevel, is a simple, interactive computer programming environment. The term is most usually used to refer to a Lisp interactive environment, but can be applied to command line shells and similar environments for F#, Smalltalk, Standard ML,...

 itself, which must use unsafe operations to execute ML code entered by the user.

Pascal

Pascal
Pascal (programming language)
Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...

has had a number of type safety requirements, some of which are kept in some compilers. Where a Pascal compiler dictates "strict typing", two variables cannot be assigned to each other unless they are either compatible (such as conversion of integer to real) or assigned to the identical subtype. For example, if you have the following code fragment:


type
TwoTypes = record
I: Integer;
Q: Real;
end;

DualTypes = record
I: Integer;
Q: Real;
end;

var
T1, T2: TwoTypes;
D1, D2: DualTypes;


Under strict typing, a variable defined as TwoTypes is not compatible with DualTypes (because they are not identical, even though the components of that user defined type are identical) and an assignment of T1 := D2; is illegal. An assignment of T1 := T2; would be legal because the subtypes they are defined to are identical. However, an assignment such as T1.Q := D1.Q; would be legal.

Examples

The following simple C++ program illustrates that C++ permits operations that are type-unsafe:
  1. include

using namespace std;

int main
{
int ival = 5; // A four-byte integer (on most processors)
void *pval = &ival; // Store the address of ival in an untyped pointer
double dval = *((double*)pval); // Convert it to a double pointer and get the value at that address
cout << dval << endl; // Output the double (this will be garbage and not 5!)
return 0;
}

Even though pval does contain the correct address of ival, when pval is cast from a void pointer to a double pointer the resulting double value is not 5, but an undefined garbage value. On the machine code level, this program has explicitly prevented the processor from performing the correct conversion from a four-byte integer to an eight-byte floating-point value. When the program is run it will output a garbage floating-point value and possibly raise a memory exception. Thus, C++ (and C) allow type-unsafe code.

The next example shows a slightly more complex type safety issue in C++ involving object pointer conversion.

  1. include

using namespace std;

class Parent {};

class Child1 : public Parent
{
public:
int a;
};

class Child2 : public Parent
{
public:
double b;
};

int main
{
Child1 c1;
c1.a = 5;
Child2 c2;
c2.b = 2.4;
Parent *p = &c1; // Down-conversion is safe
Child1 *pc1 = (Child1*)p; // Up-conversion is not safe
cout << pc1->a << endl; // ...but this time it's okay
p = &c2; // Safe
pc1 = (Child1*)p; // Not safe
cout << pc1->a << endl; // This time it breaks!
return 0;
}

The two child classes have members of different types. When a pointer to a less-specific parent class is converted to a pointer to a more-specific child class, the resulting pointer may or may not point to a valid object of its type. In the first conversion, the pointer is valid, and in the second, it is not. Again, a garbage value is printed and a memory exception may be raised.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK