All Topics  
Reference (computer science)

 

   Email Print
   Bookmark   Link






 

Reference (computer science)



 
 
In computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
, a reference is an object
Object (computer science)

In its simplest embodiment, an object is an allocated region of storage. Since programming languages use variable#Computer_programmings to access objects, the terms object and variable are often used interchangeably....
 containing information about how to locate and access the particular data item, as opposed to containing the data itself. Accessing the value
Value (computer science)

In computer science, a value is a sequence of bits that is interpreted according to some data type. It is possible for the same sequence of bits to have different values, depending on the type used to interpret its meaning....
 referred to by a reference is called dereferencing it. References are fundamental to constructing many data structure
Data structure

A data structure in computer science is a way of storing data in a computer so that it can be used efficiently. It is an organization of mathematical and logical concepts of data....
s (such as linked list
Linked list

In computer science, a linked list is one of the fundamental data structures, and can be used to implement other data structures. It consists of a sequence of node s, each containing arbitrary data Field s and one or two reference s pointing to the next and/or previous nodes....
s) and in exchanging information between different parts of a program
Computer program

Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running....
.

A reference may be compared to the address of a house.






Discussion
Ask a question about 'Reference (computer science)'
Start a new discussion about 'Reference (computer science)'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
, a reference is an object
Object (computer science)

In its simplest embodiment, an object is an allocated region of storage. Since programming languages use variable#Computer_programmings to access objects, the terms object and variable are often used interchangeably....
 containing information about how to locate and access the particular data item, as opposed to containing the data itself. Accessing the value
Value (computer science)

In computer science, a value is a sequence of bits that is interpreted according to some data type. It is possible for the same sequence of bits to have different values, depending on the type used to interpret its meaning....
 referred to by a reference is called dereferencing it. References are fundamental to constructing many data structure
Data structure

A data structure in computer science is a way of storing data in a computer so that it can be used efficiently. It is an organization of mathematical and logical concepts of data....
s (such as linked list
Linked list

In computer science, a linked list is one of the fundamental data structures, and can be used to implement other data structures. It consists of a sequence of node s, each containing arbitrary data Field s and one or two reference s pointing to the next and/or previous nodes....
s) and in exchanging information between different parts of a program
Computer program

Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running....
.

A reference may be compared to the address of a house. Typically, it is a small identifier, memory or network address, with which it is possible to find a potentially much larger object. Finding a house based on its address is analogous to dereferencing a reference.

In a more complicated example, suppose you leave a forwarding address in your old house each time you move. A person could visit your first house, then follow the forwarding address to the next house, and so on until they finally find your current house. This is analogous to how references are used in singly linked list
Linked list

In computer science, a linked list is one of the fundamental data structures, and can be used to implement other data structures. It consists of a sequence of node s, each containing arbitrary data Field s and one or two reference s pointing to the next and/or previous nodes....
s.

Another benefit of house addresses is that they're much easier to deal with than actual houses. Say you want to be able to easily locate people on your street based on their last name. One way to do this is to use a large crane to physically pick up and rearrange all the houses based on the last names of the residents. A much easier solution is to make a list of addresses of people on your street and sort it by their last names. References have the same benefit: it is possible to manipulate references to data without actually having to modify the data itself, which in some cases can be much more efficient.

Benefits


References increase flexibility in where objects can be stored, how they are allocated, and how they are passed between areas of code. As long as we can access a reference to the data, we can access the data through it, and the data itself need not be moved. They also make sharing of data between different code areas easier; each keeps a reference to it.

The mechanism of references, if varying in implementation, is a fundamental programming language feature common to nearly all modern programming languages. Even some languages that support no direct use of references have some internal or implicit use. For example, the call by reference
Evaluation strategy

In computer science, an evaluation strategy is a set of rules for determining the evaluation of expression in a programming language. Emphasis is typically placed on subprogram or operators ? an evaluation strategy defines when and in what order the arguments to a function are evaluated, when they are substituted into the function, and wha...
 calling convention can be implemented with either explicit or implicit use of references.

Examples


Pointers are the most primitive and error-prone but also one of the most powerful and efficient types of references, storing only the address of an object in memory. Smart pointer
Smart pointer

In computer science, a smart pointer is an abstract data type that simulates a pointer while providing additional features, such as garbage collection or bounds checking....
s are opaque data structures
Opaque pointer

In computer programming, an opaque pointer is a datatype that hides its internal implementation using a pointer. This allows the implementation of the whole interface to change without the need to recompile the module s using it....
 that act like pointers but can only be accessed through particular methods.

File handles, or handles, are a type of reference used to abstract file content. It usually represents both the file itself, as when requesting a lock
Lock (computer science)

In computer science, a lock is a Synchronization mechanism for enforcing limits on access to a resource in an environment where there are many thread ....
 on the file, and a specific position within the file's content, as when reading a file.

Dereferenceable URIs
Dereferenceable Uniform Resource Identifier

A dereferenceable Uniform Resource Identifier or dereferenceable URI is a resource identification mechanism that uses the Hypertext Transfer Protocol protocol to obtain a representation of the resource it identifies....
 are a type of reference used to abstract entities on the World Wide Web
World Wide Web

The World Wide Web is a very large set of interlinked hypertext documents accessed via the Internet. With a Web browser, one can view Web pages that may contain writing, s, videos, and other multimedia and navigate between them using hyperlinks....
, especially when working with the Linked Data
Linked Data

Linked Data is a term used to describe a method of exposing, sharing, and connecting data on the Semantic Web via dereferenceable URIs....
 Web.

In distributed computing
Distributed computing

Distributed computing deals with hardware and software systems containing more than one processing element or Computer data storage element, Concurrent computing processes, or multiple programs, running under a loosely or tightly controlled regime....
, the reference may contain more than an address or identifier; it may also include an embedded specification of the network protocols used to locate and access the referenced object, the way information is encoded or serialized. Thus, for example, a WSDL
Web Services Description Language

The Web Services Description Language is an XML-based language that provides a model for describing Web services....
 description of a remote web service can be viewed as a form of reference; it includes a complete specification of how to locate and bind to a particular web service
Web service

A Web service is defined by the W3C as "a software system designed to support interoperability Machine to Machine interaction over a computer network"....
. A reference to a live distributed object
Live distributed object

Definitions The term live distributed object refers to a running instance of a distributed computing multi-party protocol , viewed from the object-oriented programming perspective, as an entity that has a distinct identity , may Encapsulation internal State and Thread , and that exhibits a well-defined externally visible behavior....
 is another example: it is a complete specification for how to construct a small software component called a proxy that will subsequently engage in a peer-to-peer interaction, and through which the local machine may gain access to data that is replicated or exists only as a weakly consistent message stream. In all these cases, the reference includes the full set of instructions, or a recipe, for how to access the data; in this sense, it serves the same purpose as an identifier or address in memory.

Formal representation


More generally, a reference can be considered as a piece of data that allows unique retrieval of another piece of data. This includes primary keys in database
Database

A database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model....
s and keys in an associative array
Associative array

An associative array is an abstract data type composed of a Collection of unique keys and a collection of values, where each key is associated with one value ....
. If we have a set of data D, any well-defined (single-valued) function from D onto D ? defines a type of reference, where null is the image of a piece of data not referring to anything meaningful.

An alternative representation of such a function is a directed graph called a reachability graph. Here, each datum is represented by a vertex and there is an edge from u to v if the datum in u refers to the datum in v. The maximum out-degree is one. These graphs are valuable in garbage collection
Garbage collection (computer science)

In computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage , or memory used by Object that will never be accessed or mutated again by the Application software....
, where they can be used to separate accessible from inaccessible objects.

External and internal storage


In many data structures, large, complex objects are composed of smaller objects. These objects are typically stored in one of two ways:

  1. With internal storage, the contents of the smaller object are stored inside the larger object.
  2. With external storage, the smaller objects are allocated in their own location, and the larger object only stores references to them.


Internal storage is usually more efficient, because there is a space cost for the references and dynamic allocation
Dynamic memory allocation

In computer science, dynamic memory allocation is the allocation of computer storage storage for use in a computer program during the runtime of that program....
 metadata, and a time cost associated with dereferencing a reference and with allocating the memory for the smaller objects. Internal storage also enhances locality of reference
Locality of reference

In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related computer storage locations being frequently accessed....
 by keeping different parts of the same large object close together in memory. However, there are a variety of situations in which external storage is preferred:

  • If the data structure is recursive, meaning it may contain itself. This cannot be represented in the internal way.
  • If the larger object is being stored in an area with limited space, such as the stack, then we can prevent running out of storage by storing large component objects in another memory region and referring to them using references.
  • If the smaller objects may vary in size, it's often inconvenient or expensive to resize the larger object so that it can still contain them.
  • References are often easier to work with and adapt better to new requirements.


Some languages, such as Java
Java (programming language)

Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java ....
 and Scheme, do not support internal storage. In these languages, all objects are uniformly accessed through references.

Language support


In assembly language
Assembly language

An assembly language is a low-level language for programming computers. It implements a symbolic representation of the numeric machine codes and other constants needed to program a particular CPU architecture....
s, the first languages used, it is typical to express references using either raw memory addresses or indexes into tables. These work, but are somewhat tricky to use, because an address tells you nothing about the value it points to, not even how large it is or how to interpret it; such information is encoded in the program logic. The result is that misinterpretations can occur in incorrect programs, causing bewildering errors.

One of the earliest opaque references was that of the Lisp programming language
Lisp programming language

Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older....
 cons cell
Cons

In computer programming, cons is a fundamental subroutine in most dialects of the Lisp programming language. cons constructs memory objects which hold two values or pointers to values....
, which is simply a record
Object composition

In computer science, object composition is a way to Object association simple object s or data types into more complex ones. Compositions are a critical building block of many basic data structures, including the tagged union, the linked list, and the binary tree, as well as the object used in object-oriented programming....
 containing two references to other Lisp objects, including possibly other cons cells. This simple structure is most commonly used to build singly linked list
Linked list

In computer science, a linked list is one of the fundamental data structures, and can be used to implement other data structures. It consists of a sequence of node s, each containing arbitrary data Field s and one or two reference s pointing to the next and/or previous nodes....
s, but can also be used to build simple binary tree
Binary tree

In computer science, a binary tree is a Tree in which each node has at most two child node. Typically the child nodes are called left and right....
s and so-called "dotted lists", which terminate not with a null reference but a value.

Another early language, Fortran, does not have an explicit representation of references, but does use them implicitly in its call-by-reference calling semantics.

The pointer is still one of the most popular types of references today. It is similar to the assembly representation of a raw address, except that it carries a static datatype which can be used at compile-time to ensure that the data it refers to is not misinterpreted. However, because C has a weak type system
Weak typing

In computer science, weak typing is a property attributed to the type systems of some programming languages. It is the opposite of strong typing, and consequently the term weak typing has as many different meanings as strong typing does ....
 which can be violated using casts (explicit conversions between various pointer types and between pointer types and integers), misinterpretation is still possible, if more difficult. Its successor C++
C++

C++ is a general-purpose programming language. It is regarded as a middle-level language, as it comprises a combination of both high-level programming language and low-level programming language language features....
 tried to increase type safety
Type safety

In computer science, type safety is a property of some programming languages that is defined differently by different communities, but most definitions involve the use of a type system to prevent certain erroneous or undesirable program behavior ....
 of pointers with new cast operators and smart pointers in its standard library
C++ standard library

In C++, the Standard Library is a collection of class and subroutine, which are written in the core language. The Standard Library provides several generic containers, functions to utilise and manipulate these containers, function objects, generic strings and streams , support for some language features, and every day functions for tasks suc...
, but still retained the ability to circumvent these safety mechanisms for compatibility.

A number of popular mainstream languages today such as Java
Java (programming language)

Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java ....
, C#, and Visual Basic
Visual Basic

'Visual Basic' is the third-generation programming language event-driven programming and integrated integrated development environment from Microsoft for its Component Object Model programming model....
 have adopted a much more opaque type of reference, usually referred to as simply a reference. These references have types like C pointers indicating how to interpret the data they reference, but they are typesafe in that they cannot be interpreted as a raw address and unsafe conversions are not permitted.

Fortran


A Fortran
Fortran

Fortran is a general-purpose programming language, procedural programming language, imperative programming language programming language that is especially suited to numerical analysis and scientific computing....
 reference is best thought of as an alias of another object, such as a scalar variable or a row or column of an array. There is no syntax to dereference the reference or manipulate the contents of the referent directly. Fortran references can be null. As in other languages, these references facilitate the processing of dynamic structures, such as linked lists, queues, and trees.

Functional languages


In all of the above settings, the concept of mutable variables, data that can be modified, often makes implicit use of references. In Standard ML
Standard ML

Standard ML is a general-purpose, Module , functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of automated theorem proving....
, O'Caml, and many other functional languages, most values are persistent: they cannot be modified by assignment. Assignable "reference cells" serve the unavoidable purposes of mutable references in imperative languages, and make the capability to be modified explicit. Such reference cells can hold any value, and so are given the polymorphic type a ref, where a is to be replaced with the type of value pointed to. These mutable references can be pointed to different objects over their lifetime. For example, this permits building of circular data structures. The reference cell is functionally equivalent to an array of length 1.

To preserve safety and efficient implementations, references cannot be type-cast in ML, nor can pointer arithmetic be performed. It is important to note that in the functional paradigm, many structures that would be represented using pointers in a language like C are represented using other facilities, such as the powerful algebraic datatype mechanism. The programmer is then able to enjoy certain properties (such as the guarantee of immutability) while programming, even though the compiler often uses machine pointers "under the hood".

See also


  • Abstraction (computer science)
    Abstraction (computer science)

    In computer science, abstraction is a mechanism and practice to reduce and factor out details so that one can focus on a few concepts at a time....
  • Pointer (computing)
  • Magic cookie
    Magic cookie

    A magic cookie or just cookie for short, is a token or short packet of data passed between communicating programs, where the data is typically not meaningful to the recipient program....
  • Weak reference
    Weak reference

    In computer programming, a weak reference is a Reference that does not protect the referent object from collection by a garbage collection . An object referenced only by weak references is considered Unreachable memory and so may be collected at any time....
  • Bounded pointer
    Bounded pointer

    In computer science a bounded pointer is a pointer that is augmented with additional information that enable the storage bounds within which it may point to be deduced....
  • Autovivification
    Autovivification

    Autovivification is a unique feature of the Perl programming language involving the dynamic creation of data structures. Autovivification is the automatic creation of a reference when an undefined value is dereferenced....
  • Dereferenceable Uniform Resource Identifier
    Dereferenceable Uniform Resource Identifier

    A dereferenceable Uniform Resource Identifier or dereferenceable URI is a resource identification mechanism that uses the Hypertext Transfer Protocol protocol to obtain a representation of the resource it identifies....


External links

  • Introduction to pointers in a 3 minute educational video - Stanford Computer Science Education Library