Reference (computer science)
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, a reference is a value
Value (computer science)
In computer science, a value is an expression which cannot be evaluated any further . The members of a type are the values of that type. For example, the expression "1 + 2" is not a value as it can be reduced to the expression "3"...

 that enables a program to indirectly access a particular data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

 item, such as a variable or a record
Record (computer science)
In computer science, a record is an instance of a product of primitive data types called a tuple. In C it is the compound data in a struct. Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed...

, in the computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...

's memory or in some other storage device
Data storage device
thumb|200px|right|A reel-to-reel tape recorder .The magnetic tape is a data storage medium. The recorder is data storage equipment using a portable medium to store the data....

. The reference is said to refer to the data item, and accessing those data is called dereferencing
Dereference operator
The dereference operator or indirection operator, denoted by "*" , is a unary operator found in C-like languages that include pointer variables. It operates on a pointer variable, and returns an l-value equivalent to the value at the pointer address. This is called "dereferencing" the pointer...

 the reference.

A reference is distinct from the data itself. Typically, a reference is the physical address
Physical address
In computing, a physical address, also real address, or binary address, is the memory address that is represented in the form of a binary number on the address bus circuitry in order to enable the data bus to access a particular storage cell of main memory.In a computer with virtual memory, the...

 of where the data is stored in memory or in the storage device. For this reason, a reference is often called a pointer or address, and is said to point to the data. However a reference may also be the offset (difference) between the datum's address and some fixed "base" address, or an index into an array.

The concept of reference must not be confused with other values (keys
Unique key
In relational database design, a unique key can uniquely identify each row in a table, and is closely related to the Superkey concept. A unique key comprises a single column or a set of columns. No two distinct rows in a table can have the same value in those columns if NULL values are not used...

or identifier
Identifier
An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...

s
) that uniquely identify the data item, but give access to it only through a non-trivial lookup
Lookup
In computing, lookup usually refers to searching a data structure for an item that satisfies some specified property. For example, variable lookup performed by a language interpreter, virtual machine or other similar engine usually consists of performing certain...

 operation in some table data structure
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...

.

References are widely used in programming
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...

, especially to efficiently pass large or mutable data as arguments to procedures
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....

, or to share such data among various uses. In particular, a reference may point to a variable or record that contains references to other data. This idea is the basis of indirect addressing and of many linked data structure
Linked data structure
In computer science, a linked data structure is a data structure which consists of a set of data records linked together and organized by references ....

s, such as linked list
Linked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...

s.

A reference may be compared to a street address, such as "12 Main Street" or "three houses down the road on the left side". Going to the building with that address is analogous to dereferencing the reference. The name "Bob and Joe's Car Shop" might be a unique identifier for the same building, but cannot be compared to a data reference, because finding the building with that name requires a non-trivial search or a lookup in some directory. A reference stored in a data record can be compared to a sign on that shop saying "For tire service please go to 20 Cross Street". Passing a reference to a subroutine, instead of the data, is like giving your friends the address of that shop, instead of taking Bob and Joe and all their tools to your friend's home.

Benefits

References increase flexibility in where objects can be stored, how they are allocated, and how they are passed between areas of code. As long as we can access a reference to the data, we can access the data through it, and the data itself need not be moved. They also make sharing of data between different code areas easier; each keeps a reference to it.

The mechanism of references, if varying in implementation, is a fundamental programming language feature common to nearly all modern programming languages. Even some languages that support no direct use of references have some internal or implicit use. For example, the call by reference
Evaluation strategy
In computer science, an evaluation strategy is a set of rules for evaluating expressions in a programming language. Emphasis is typically placed on functions or operators: an evaluation strategy defines when and in what order the arguments to a function are evaluated, when they are substituted...

 calling convention can be implemented with either explicit or implicit use of references.

Examples

Pointers are the most primitive. Due to their intimate relationship with the underlying hardware, they are one of the most powerful and efficient types of references. However, also due to this relationship, pointers require a strong understanding by the programmer of the details of memory architecture. Because pointers store a memory location's address, instead of a value directly, inappropriate use of pointers can lead to undefined behavior in a program. Smart pointer
Smart pointer
In computer science, a smart pointer is an abstract data type that simulates a pointer while providing additional features, such as automatic garbage collection or bounds checking. These additional features are intended to reduce bugs caused by the misuse of pointers while retaining efficiency...

s are opaque data structures
Opaque pointer
In computer programming, an opaque pointer is a special case of an opaque data type, a datatype that is declared to be a pointer to a record or data structure of some unspecified type....

 that act like pointers but can only be accessed through particular methods.

File handles, or handles, are a type of reference used to abstract file content. It usually represents both the file itself, as when requesting a lock
Lock (computer science)
In computer science, a lock is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. Locks are one way of enforcing concurrency control policies.-Types:...

 on the file, and a specific position within the file's content, as when reading a file.

In distributed computing
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...

, the reference may contain more than an address or identifier; it may also include an embedded specification of the network protocols used to locate and access the referenced object, the way information is encoded or serialized. Thus, for example, a WSDL
Web Services Description Language
The Web Services Description Language is an XML-based language that is used for describing the functionality offered by a Web service. A WSDL description of a web service provides a machine-readable description of how the service can be called, what parameters it expects and what data structures...

 description of a remote web service can be viewed as a form of reference; it includes a complete specification of how to locate and bind to a particular web service
Web service
A Web service is a method of communication between two electronic devices over the web.The W3C defines a "Web service" as "a software system designed to support interoperable machine-to-machine interaction over a network". It has an interface described in a machine-processable format...

. A reference to a live distributed object
Live distributed object
- Definitions :The term live distributed object refers to a running instance of a distributed multi-party protocol, viewed from the object-oriented perspective, as an entity that has a distinct identity, may encapsulate internal state and threads of execution, and that exhibits a well-defined...

 is another example: it is a complete specification for how to construct a small software component called a proxy that will subsequently engage in a peer-to-peer interaction, and through which the local machine may gain access to data that is replicated or exists only as a weakly consistent message stream. In all these cases, the reference includes the full set of instructions, or a recipe, for how to access the data; in this sense, it serves the same purpose as an identifier or address in memory.

Formal representation

More generally, a reference can be considered as a piece of data that allows unique retrieval of another piece of data. This includes primary keys in database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

s and keys in an associative array
Associative array
In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....

. If we have a set of data D, any well-defined (single-valued) function from D onto D ∪ {null} defines a type of reference, where null is the image of a piece of data not referring to anything meaningful.

An alternative representation of such a function is a directed graph called a reachability graph. Here, each datum is represented by a vertex and there is an edge from u to v if the datum in u refers to the datum in v. The maximum out-degree is one. These graphs are valuable in garbage collection
Garbage collection (computer science)
In computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program...

, where they can be used to separate accessible from inaccessible objects.

External and internal storage

In many data structures, large, complex objects are composed of smaller objects. These objects are typically stored in one of two ways:
  1. With internal storage, the contents of the smaller object are stored inside the larger object.
  2. With external storage, the smaller objects are allocated in their own location, and the larger object only stores references to them.


Internal storage is usually more efficient, because there is a space cost for the references and dynamic allocation metadata, and a time cost associated with dereferencing a reference and with allocating the memory for the smaller objects. Internal storage also enhances locality of reference
Locality of reference
In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources...

 by keeping different parts of the same large object close together in memory. However, there are a variety of situations in which external storage is preferred:
  • If the data structure is recursive, meaning it may contain itself. This cannot be represented in the internal way.
  • If the larger object is being stored in an area with limited space, such as the stack, then we can prevent running out of storage by storing large component objects in another memory region and referring to them using references.
  • If the smaller objects may vary in size, it's often inconvenient or expensive to resize the larger object so that it can still contain them.
  • References are often easier to work with and adapt better to new requirements.


Some languages, such as Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

, Smalltalk
Smalltalk
Smalltalk is an object-oriented, dynamically typed, reflective programming language. Smalltalk was created as the language to underpin the "new world" of computing exemplified by "human–computer symbiosis." It was designed and created in part for educational use, more so for constructionist...

, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

, and Scheme, do not support internal storage. In these languages, all objects are uniformly accessed through references.

Language support

In assembly language
Assembly language
An assembly language is a low-level programming language for computers, microprocessors, microcontrollers, and other programmable devices. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture...

s, the first languages used, it is typical to express references using either raw memory addresses or indexes into tables. These work, but are somewhat tricky to use, because an address tells you nothing about the value it points to, not even how large it is or how to interpret it; such information is encoded in the program logic. The result is that misinterpretations can occur in incorrect programs, causing bewildering errors.

One of the earliest opaque references was that of the Lisp language cons cell
Cons
In computer programming, cons is a fundamental function in most dialects of the Lisp programming language. cons constructs memory objects which hold two values or pointers to values. These objects are referred to as cells, conses, non-atomic s-expressions , or pairs...

, which is simply a record
Object composition
In computer science, object composition is a way to combine simple objects or data types into more complex ones...

 containing two references to other Lisp objects, including possibly other cons cells. This simple structure is most commonly used to build singly linked list
Linked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...

s, but can also be used to build simple binary tree
Binary tree
In computer science, a binary tree is a tree data structure in which each node has at most two child nodes, usually distinguished as "left" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to...

s and so-called "dotted lists", which terminate not with a null reference but a value.

Another early language, Fortran, does not have an explicit representation of references, but does use them implicitly in its call-by-reference calling semantics.

The pointer is still one of the most popular types of references today. It is similar to the assembly representation of a raw address, except that it carries a static datatype which can be used at compile-time to ensure that the data it refers to is not misinterpreted. However, because C has a weak type system
Weak typing
In computer science, weak typing is a property attributed to the type systems of some programming languages. It is the opposite of strong typing, and consequently the term weak typing has as many different meanings as strong typing does.One of the more common definitions states that weakly typed...

 which can be violated using casts (explicit conversions between various pointer types and between pointer types and integers), misinterpretation is still possible, if more difficult. Its successor C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 tried to increase type safety
Type safety
In computer science, type safety is the extent to which a programming language discourages or prevents type errors. A type error is erroneous or undesirable program behaviour caused by a discrepancy between differing data types...

 of pointers with new cast operators and smart pointers in its standard library
C++ standard library
In C++, the C++ Standard Library is a collection of classes and functions, which are written in the core language and part of the C++ ISO Standard itself...

, but still retained the ability to circumvent these safety mechanisms for compatibility.

A number of popular mainstream languages today such as Eiffel
Eiffel (programming language)
Eiffel is an ISO-standardized, object-oriented programming language designed by Bertrand Meyer and Eiffel Software. The design of the language is closely connected with the Eiffel programming method...

, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

, C#, and Visual Basic
Visual Basic
Visual Basic is the third-generation event-driven programming language and integrated development environment from Microsoft for its COM programming model...

 have adopted a much more opaque type of reference, usually referred to as simply a reference. These references have types like C pointers indicating how to interpret the data they reference, but they are typesafe in that they cannot be interpreted as a raw address and unsafe conversions are not permitted.

Fortran

A Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

 reference is best thought of as an alias of another object, such as a scalar variable or a row or column of an array. There is no syntax to dereference the reference or manipulate the contents of the referent directly. Fortran references can be null. As in other languages, these references facilitate the processing of dynamic structures, such as linked lists, queues, and trees.

Functional languages

In all of the above settings, the concept of mutable variables, data that can be modified, often makes implicit use of references. In Standard ML
Standard ML
Standard ML is a general-purpose, modular, functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of theorem provers.SML is a modern descendant of the ML...

, Objective Caml
Objective Caml
OCaml , originally known as Objective Caml, is the main implementation of the Caml programming language, created by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy and others in 1996...

, and many other functional languages, most values are persistent: they cannot be modified by assignment. Assignable "reference cells" serve the unavoidable purposes of mutable references in imperative languages, and make the capability to be modified explicit. Such reference cells can hold any value, and so are given the polymorphic type α ref, where α is to be replaced with the type of value pointed to. These mutable references can be pointed to different objects over their lifetime. For example, this permits building of circular data structures. The reference cell is functionally equivalent to an array of length 1.

To preserve safety and efficient implementations, references cannot be type-cast in ML, nor can pointer arithmetic be performed. It is important to note that in the functional paradigm, many structures that would be represented using pointers in a language like C are represented using other facilities, such as the powerful algebraic datatype mechanism. The programmer is then able to enjoy certain properties (such as the guarantee of immutability) while programming, even though the compiler often uses machine pointers "under the hood".

Symbolic references

Some languages, like Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

, support symbolic references, which are just string values that contain the names of variables. When a value that is not a regular reference is dereferenced, Perl considers it to be a symbolic reference and gives the variable with the name given by the value. PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...

 has a similar feature in the form of its $$var syntax.

See also

  • Abstraction (computer science)
    Abstraction (computer science)
    In computer science, abstraction is the process by which data and programs are defined with a representation similar to its pictorial meaning as rooted in the more complex realm of human life and language with their higher need of summarization and categorization , while hiding away the...

  • Autovivification
    Autovivification
    Autovivification is a distinguishing feature of the Perl programming language involving the dynamic creation of data structures. Autovivification is the automatic creation of a variable reference when an undefined value is dereferenced...

  • Bounded pointer
    Bounded pointer
    In computer science a bounded pointer is a pointer that is augmented with additional information that enable the storage bounds within which it may point to be deduced...

  • Dereferenceable Uniform Resource Identifier
    Dereferenceable Uniform Resource Identifier
    A dereferenceable Uniform Resource Identifier or dereferenceable URI is a resource retrieval mechanism that uses any of the internet protocols A dereferenceable Uniform Resource Identifier or dereferenceable URI is a resource retrieval mechanism that uses any of the internet protocols A...

  • Magic cookie
    Magic cookie
    A magic cookie or just cookie for short, is a token or short packet of data passed between communicating programs, where the data is typically not meaningful to the recipient program. The contents are opaque and not usually interpreted until the recipient passes the cookie data back to the sender...

  • Pointer (computing)
  • Variable (programming)
    Variable (programming)
    In computer programming, a variable is a symbolic name given to some known or unknown quantity or information, for the purpose of allowing the name to be used independently of the information it represents...

  • Weak reference
    Weak reference
    In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector . An object referenced only by weak references is considered unreachable and so may be collected at any time...


External links

  • Pointer Fun With Binky Introduction to pointers in a 3 minute educational video - Stanford Computer Science Education Library
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK