All Topics  
Reference counting

 

   Email Print
   Bookmark   Link






 

Reference counting



 
 
In computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
, reference counting is a technique of storing the number of references, pointers, or handles to a resource such as an object or block of memory. It is typically used as a means of deallocating objects which are no longer referenced.

rence counting is often known as a garbage collection
Garbage collection (computer science)

In computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage , or memory used by Object that will never be accessed or mutated again by the Application software....
 algorithm where each object contains a count of the number of reference
Reference

A reference is a relation between Object in which one object designates by linking to another object. Such relations as these may occur in a variety of domains, including logic, computer science, time, art and scholarship....
s to it held by other objects.






Discussion
Ask a question about 'Reference counting'
Start a new discussion about 'Reference counting'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
, reference counting is a technique of storing the number of references, pointers, or handles to a resource such as an object or block of memory. It is typically used as a means of deallocating objects which are no longer referenced.

Use in garbage collection

Reference counting is often known as a garbage collection
Garbage collection (computer science)

In computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage , or memory used by Object that will never be accessed or mutated again by the Application software....
 algorithm where each object contains a count of the number of reference
Reference

A reference is a relation between Object in which one object designates by linking to another object. Such relations as these may occur in a variety of domains, including logic, computer science, time, art and scholarship....
s to it held by other objects. If an object's reference count reaches zero, the object has become inaccessible, and it is put on a list of objects to be destroyed.

Simple reference counts require frequent updates. Whenever a reference is destroyed or overwritten, the reference count of the object it references is decremented, and whenever one is created or copied, the reference count of the object it references is incremented.

Reference counting is also used in disk operating systems and distributed systems, where full non-incremental tracing garbage collection is too time consuming because of the size of the object graph and slow access speed.

Advantages and disadvantages

The main advantage of reference counting over tracing garbage collection is that objects are reclaimed as soon as they can no longer be referenced, and in an incremental fashion, without long pauses for collection cycles and with clearly defined lifetime of every object. In real-time applications or systems with limited memory, this is important to maintain responsiveness. Reference counting is also among the simplest forms of garbage collection to implement. It also allows for effective management of non-memory resources such as operating system objects, which are often much scarcer than memory (tracing GC systems use finalizers for this, but the delayed reclamation may cause problems). Weighted reference counts
Reference counting

In computer science, reference counting is a technique of storing the number of references, pointers, or handles to a resource such as an object or block of memory....
 are a good solution for garbage collecting a distributed system.

Reference counts are also useful information to use as input to other runtime optimizations. For example, systems that depend heavily on immutable object
Immutable object

In object-oriented computer programming and Functional_programming programming, an immutable object is an object whose state cannot be modified after it is created....
s such as many functional programming languages can suffer an efficiency penalty due to frequent copies. However, if we know an object has only one reference (as most do in many systems), and that reference is lost at the same time that a similar new object is created (as in the string append statement str ← str + "a"), we can replace the operation with a mutation on the original object.

Reference counting in naive form has two main disadvantages over the tracing garbage collection, both of which require additional mechanisms to ameliorate:
  • The frequent updates it involves are a source of inefficiency. While tracing garbage collectors can impact efficiency severely via context switching
    Context switch

    A context switch is the computing process of storing and restoring the State of a Central processing unit such that multiple Process es can share a single CPU resource....
     and cache line faults, they collect relatively infrequently, while accessing objects is done continually. Also, less importantly, reference counting requires every memory-managed object to reserve space for a reference count. In tracing garbage collectors, this information is stored implicitly in the references that refer to that object, saving space, although tracing garbage collectors, particularly incremental ones, can require additional space for other purposes.
  • The naive algorithm described above can't handle reference cycles, an object which refers directly or indirectly to itself. A mechanism relying purely on reference counts will never consider cyclic chains of objects for deletion, since their reference count is guaranteed to stay nonzero. Methods for dealing with this issue exist but can also increase the overhead and complexity of reference counting — on the other hand, these methods need only be applied to data that might form cycles, often a small subset of all data. One such method is the use of weak reference
    Weak reference

    In computer programming, a weak reference is a Reference that does not protect the referent object from collection by a garbage collection . An object referenced only by weak references is considered Unreachable memory and so may be collected at any time....
    s.


Graph interpretation

When dealing with garbage collection schemes, it's often helpful to think of the reference graph, which is a directed graph
Directed graph

A directed graph or digraph is a pair G= of:* a Set V, whose element are called vertices or nodes,* a set A of ordered pairs of vertices, called arcs, directed edges, or arrows....
 where the vertices
Vertex (graph theory)

In graph theory, a vertex or node is the fundamental unit out of which graphs are formed: an undirected graph consists of a set of vertices and a set of edges , while a directed graph consists of a set of vertices and a set of arcs ....
 are objects and there is an edge from an object A to an object B if A holds a reference to B. We also have a special vertex or vertices representing the local variables and references held by the runtime system, and no edges ever go to these nodes, although edges can go from them to other nodes.

In this context, the simple reference count of an object is the in-degree of its vertex. Deleting a vertex is like collecting an object. It can only be done when the vertex has no incoming edges, so it does not affect the out-degree of any other vertices, but it can affect the in-degree of other vertices, causing their corresponding objects to be collected as well.

The connected component containing the special vertex contains the objects that can't be collected, while other connected components of the graph only contain garbage. By the nature of reference counting, each of these garbage components must contain at least one cycle.

Dealing with inefficiency of updates

Incrementing and decrementing reference counts every time a reference is created or destroyed can significantly impede performance. Not only do the operations take time, but they damage cache
CPU cache

A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access computer storage. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations....
 performance and can lead to pipeline bubbles. Even read-only operations like calculating the length of a list require a large number of reads and writes for reference updates with naive reference counting.

One simple technique is for the compiler to combine a number of nearby reference updates into one. This is especially effective for references which are created and quickly destroyed. Care must be taken, however, to put the combined update at the right position so that a premature free is avoided.

The Deutsch-Bobrow method of reference counting capitalizes on the fact that most reference count updates are in fact generated by references stored in local variables. It ignores these references, only counting references in data structures, but before an object with reference count zero can be deleted, the system must verify with a scan of the stack and registers that no other reference to it still exists.

Another technique devised by Henry Baker
Henry Baker (computer scientist)

Henry G. Baker is a computer scientist who has made contributions in Garbage collection , functional programming languages, and linear logic. He was also one of the founders of Symbolics....
 involves deferred increments, in which references which are stored in local variables do not immediately increment the corresponding reference count, but instead defer this until it is necessary. If such a reference is destroyed quickly, then there is no need to update the counter. This eliminates a large number of updates associated with short-lived references. However, if such a reference is copied into a data structure, then the deferred increment must be performed at that time. It is also critical to perform the deferred increment before the object's count drops to zero, resulting in a premature free.

A dramatic decrease in the overhead on counter updates was obtained by and . Their update coalescing method eliminates more than 99% of the counter updates for typical Java benchmarks. In addition, they eliminate the need for atomic operations during pointer updates on parallel processors. Finally, they present an enhanced algorithm that may run concurrently with multithreaded applications employing only fine synchronization. See the for more.

Blackburn and McKinley's ulterior reference counting combines deferred reference counting with a copying nursery, observing that the majority of pointer mutations occur in young objects. This algorithm achieves throughput comparable with the fastest generational copying collectors with the low bounded pause times of reference counting.

More work on improving performance of reference counting collectors can be found in . In particular, he advocates the use of and .

Dealing with reference cycles

There are a variety of ways of handling the problem of detecting and collecting reference cycles. One is that a system may explicitly forbid reference cycles. In some systems like filesystems this is a common solution. Cycles are also sometimes ignored in systems with short lives and a small amount of cyclic garbage, particularly when the system was developed using a methodology of avoiding cyclic data structures wherever possible, typically at the expense of efficiency.

Another solution is to periodically use a tracing garbage collector to reclaim cycles. Since cycles typically constitute a relatively small amount of reclaimed space, the collection cycles can be spaced much farther apart than with an ordinary tracing garbage collector.

Bacon describes a cycle-collection algorithm for reference counting systems with some similarities to tracing systems, including the same theoretical time bounds, but that takes advantage of reference count information to run much more quickly and with less cache damage. It's based on the observation that an object cannot appear in a cycle until its reference count is decremented to a nonzero value. All objects which this occurs to are put on a roots list, and then periodically the program searches through the objects reachable from the roots for cycles. It knows it has found a cycle when decrementing all the reference counts on a cycle of references brings them all down to zero. An enhanced version of this algorithm by Paz et al. is able to run concurrently with other operations and improve its efficiency by using the update coalescing method of Levanoni and Petrank. See the for more.

Variants of reference counting

Although it's possible to augment simple reference counts in a variety of ways, often a better solution can be found by performing reference counting in a fundamentally different way. Here we describe some of the variants on reference counting and their benefits and drawbacks.

Weighted reference counting

In weighted reference counting, we assign each reference a weight, and each object tracks not the number of references referring to it, but the total weight of the references referring to it. The initial reference to a newly-created object has a large weight, such as 216. Whenever this reference is copied, half of the weight goes to the new reference, and half of the weight stays with the old reference. Because the total weight does not change, the object's reference count does not need to be updated.

Destroying a reference decrements the total weight by the weight of that reference. When the total weight becomes equal to the partial weight, all references have been destroyed. If an attempt is made to copy a reference with a weight of 1, we have to "get more weight" by adding to the total weight and then adding this new weight to our reference, and then split it.

The property of not needing to access a reference count when a reference is copied is particularly helpful when the object's reference count is expensive to access, for example because it is in another process, on disk, or even across a network. It can also help increase concurrency by avoiding many threads locking a reference count to increase it. Thus, weighted reference counting is most useful in parallel, multiprocess, database, or distributed applications.

The primary problem with simple weighted reference counting is that destroying a reference still requires accessing the reference count, and if many references are destroyed this can cause the same bottlenecks we seek to avoid. Some adaptations of weighted reference counting seek to avoid this by attempting to give weight back from a dying reference to one which is still active.

Weighted reference counting was independently devised by Bevan, in the paper Distributed garbage collection using reference counting, and Watson, in the paper An efficient garbage collection scheme for parallel computer architectures, both in 1987.

Indirect reference counting

In indirect reference counting, it is necessary to keep track of who the reference was obtained from. This means that two references are kept to the object: a direct one which is used for invocations; and an indirect one which forms part of a diffusion tree, such as in the Dijkstra-Scholten algorithm
Dijkstra-Scholten Algorithm

The Dijkstra-Scholten algorithm is an algorithm for detecting Termination analysis in a distributed system. The algorithm was proposed by Dijkstra and Scholten in 1980....
, which allows a garbage collector to identify dead objects. This approach prevents an object from being discarded prematurely.

Examples of use


COM

Microsoft's Component Object Model
Component Object Model

Component Object Model is an interface standard for software componentry introduced by Microsoft in 1993. It is used to enable interprocess communication and dynamic object creation in a large range of programming languages....
 (COM) makes pervasive use of reference counting. In fact, the three methods that all COM objects must provide (in the IUnknown
IUnknown

In programming, the IUnknown interface is the fundamental interface in the Component Object Model . The published mandates that COM objects must minimally implement this interface....
 interface) all increment or decrement the reference count. Much of the Windows Shell
Windows Shell

The Microsoft Windows Shell is the main GUI in Windows. The Windows shell includes well-known Windows components such as the Taskbar and the Start menu....
 and many Windows applications (including MS Internet Explorer, MS Office
Microsoft Office

Microsoft Office is a popular set of interrelated desktop applications, servers and services. Microsoft Office is collectively referred to as an office suite, for the Microsoft Windows and Mac OS X operating systems....
, and countless third-party products) are built on COM, demonstrating the viability of reference counting in large-scale systems.

One primary motivation for reference counting in COM is to enable interoperability across different programming languages and runtime systems. A client need only know how to invoke object methods in order to manage object life cycle; thus, the client is completely abstracted from whatever memory allocator the implementation of the COM object uses. As a typical example, a Visual Basic
Visual Basic

'Visual Basic' is the third-generation programming language event-driven programming and integrated integrated development environment from Microsoft for its Component Object Model programming model....
 program using a COM object is agnostic towards whether that object was allocated (and must later be deallocated) by a C++ allocator or another Visual Basic component.

However, this support for heterogeneity has a major cost: it requires correct reference count management by all parties involved. While high-level languages like Visual Basic manage reference counts automatically, C/C++ programmers are entrusted to increment and decrement reference counts at the appropriate time. C++ programs can and should avoid the task of managing reference counts manually by using smart pointer
Smart pointer

In computer science, a smart pointer is an abstract data type that simulates a pointer while providing additional features, such as garbage collection or bounds checking....
s. Bugs caused by incorrect reference counting in COM systems are notoriously hard to resolve, especially because the error may occur in an opaque, third-party component.

Microsoft has abandoned reference counting in favor of tracing garbage collection for the .NET Framework
.NET Framework

The Microsoft .NET Framework is a software framework that is available with several Microsoft Windows operating systems. It includes a large Library of coded solutions to prevent common programming problems and a virtual machine that manages the execution of programs written specifically for the Software framework....
.

Cocoa

Apple's Cocoa
Cocoa (API)

Cocoa is one of Apple Inc.'s native object-oriented application program environment for the Mac OS X operating system. It is one of four major Application programming interfaces available for Mac OS X; the others are Carbon , POSIX , and Java platform....
 framework (and related frameworks, such as Core Foundation
Core Foundation

Core Foundation is a C application programming interface in Mac OS X, and is a mix of low-level routines and wrapper functions. Most of it is available in an open source project called CF-Lite that can be used to write cross-platform applications for Mac OS X, Linux, and Microsoft Windows ....
) use manual reference counting, much like COM
Component Object Model

Component Object Model is an interface standard for software componentry introduced by Microsoft in 1993. It is used to enable interprocess communication and dynamic object creation in a large range of programming languages....
. However, as of Mac OS X v10.5
Mac OS X v10.5

Mac OS X version 10.5 "Leopard" is the sixth Software version of Mac OS X, Apple Inc. desktop and server operating system for Apple Macintosh computers, and the successor to Mac OS X v10.4 "Tiger"....
, Cocoa when used with Objective C 2.0 also has automatic garbage collection.

Delphi

One language that uses reference counting for garbage collection is Delphi. Delphi is not a completely garbage collected language, in that user-defined types must still be manually allocated and deallocated. It does provide automatic collection, however, for a few built-in types, such as strings, dynamic arrays, and interfaces, for ease of use and to simplify the generic database functionality. It is important to note that it is up to the programmer to decide whether to use the built-in types or not; Delphi programmers have complete access to low-level memory management like in C/C++. So all potential cost of Delphi's reference counting can, if desired, be easily circumvented.

Some of the reasons reference counting may have been preferred to other forms of garbage collection in Delphi include:

  • The general benefits of reference counting, such as prompt collection.
  • Cycles either cannot occur or do not occur in practice using only the small set of garbage-collected built-in types.
  • The overhead in code size required for reference counting is very small (typically a single LOCK INC or LOCK DEC instruction, which ensures atomicity in any environment), and no separate thread of control is needed for collection as would be needed for a tracing garbage collector.
  • Many instances of the most commonly used garbage-collected type, the string, have a short lifetime, since they are typically intermediate values in string manipulation.
  • The reference count of a string is checked before mutating a string. This allows reference count 1 strings to be mutated directly whilst higher reference count strings are copied before mutation. This allows the general behaviour of old style pascal strings to be preserved whilst eliminating the cost of copying the string on every assignment.
  • Because garbage-collection is only done on built-in types, reference counting can be efficiently integrated into the library routines used to manipulate each datatype, keeping the overhead needed for updating of reference counts low.


GObject

The GObject
GObject

The GLib Object System, or GObject, is a free software software library that provides a portable object system and transparent cross-language interoperability....
 object-oriented programming framework implements reference counting on its base types, including weak reference
Weak reference

In computer programming, a weak reference is a Reference that does not protect the referent object from collection by a garbage collection . An object referenced only by weak references is considered Unreachable memory and so may be collected at any time....
s. Reference incrementing and decrementing uses atomic operation
Atomic operation

An atomic operation in computer science refers to a set of Instruction s that can be combined so that they appear to the rest of the system to be a single operation with only two possible outcomes: success or failure....
s for thread safety. A significant amount of the work in writing bindings to GObject from high-level languages lies in adapting GObject reference counting to work with the language's own memory management system.

PHP

PHP
PHP

PHP is a scripting language originally designed for producing dynamic web pages. It has evolved to include a command line interface capability and can be used in Standalone software Graphical user interface....
 uses a reference counting mechanism for its internal variable management. Since PHP 5.3 it implements the algorithm from Bacon's above mentioned paper. PHP allows you turn on and off the cycle collection with user-level functions. It also allows you to manually force the purging mechanism to be run.

Python

Python
Python (programming language)

Python is a general-purpose high-level programming language. Its design philosophy emphasizes code readability. Python's core syntax and semantics are Minimalism , while the standard library is large and comprehensive....
 also uses reference counting and offers cycle detection as well. See .

Squirrel

Squirrel
Squirrel programming language

Squirrel is a high level Imperative programming/Object oriented language programming language, designed to be a light-weight scripting language that fits in the size, memory bandwidth, and real-time requirements of applications like video games....
 also uses reference counting and offers cycle detection as well. This tiny language is relatively unknown outside the video game industry; however, it is a concrete example of how reference counting can be practical and efficient (especially in realtime environments).

External links