All Topics  
Garbage collection (computer science)

 

   Email Print
   Bookmark   Link






 

Garbage collection (computer science)



 
 
In computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
, garbage collection (GC) is a form of automatic memory management
Memory management

Memory management is the act of managing computer memory. In its simpler forms, this involves providing ways to allocate portions of memory to programs at their request, and freeing it for reuse when no longer needed....
. The garbage collector, or just collector, attempts to reclaim garbage
Garbage (computer science)

Garbage, in the context of computer science, refers to object s, data, or other regions of the memory of a computer system , which will not be used in any future computation by the system, or by a program running on it....
, or memory used by objects
Object (computer science)

In its simplest embodiment, an object is an allocated region of storage. Since programming languages use variable#Computer_programmings to access objects, the terms object and variable are often used interchangeably....
 that will never be accessed or mutated again by the application
Application software

Application software is any tool that functions and is operated by means of a computer, with the purpose of supporting or improving the software user 's work....
. Garbage collection was invented by John McCarthy
John McCarthy (computer scientist)

John McCarthy , is an United States computer scientist and cognitive scientist who received the Turing Award in 1971 for his major contributions to the field of Artificial Intelligence ....
 around 1959 to solve the problems of manual memory management in Lisp.

Garbage collection is often portrayed as the opposite of manual memory management
Manual memory management

In computer science, manual memory management refers to the usage of manual instructions by the programmer in order to identify and deallocate unused objects, or garbage ....
, which requires the programmer to specify which objects to deallocate and return to the memory system.






Discussion
Ask a question about 'Garbage collection (computer science)'
Start a new discussion about 'Garbage collection (computer science)'
Answer questions from other users
Full Discussion Forum



Encyclopedia


In computer science
Computer science

Computer science is the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems....
, garbage collection (GC) is a form of automatic memory management
Memory management

Memory management is the act of managing computer memory. In its simpler forms, this involves providing ways to allocate portions of memory to programs at their request, and freeing it for reuse when no longer needed....
. The garbage collector, or just collector, attempts to reclaim garbage
Garbage (computer science)

Garbage, in the context of computer science, refers to object s, data, or other regions of the memory of a computer system , which will not be used in any future computation by the system, or by a program running on it....
, or memory used by objects
Object (computer science)

In its simplest embodiment, an object is an allocated region of storage. Since programming languages use variable#Computer_programmings to access objects, the terms object and variable are often used interchangeably....
 that will never be accessed or mutated again by the application
Application software

Application software is any tool that functions and is operated by means of a computer, with the purpose of supporting or improving the software user 's work....
. Garbage collection was invented by John McCarthy
John McCarthy (computer scientist)

John McCarthy , is an United States computer scientist and cognitive scientist who received the Turing Award in 1971 for his major contributions to the field of Artificial Intelligence ....
 around 1959 to solve the problems of manual memory management in Lisp.

Garbage collection is often portrayed as the opposite of manual memory management
Manual memory management

In computer science, manual memory management refers to the usage of manual instructions by the programmer in order to identify and deallocate unused objects, or garbage ....
, which requires the programmer to specify which objects to deallocate and return to the memory system. However, many systems use a combination of the two approaches, and there are other techniques being studied (such as region inference
Region inference

Region inference is a memory management method for computer programming. It is an alternative to manual memory management and Garbage collection ....
) to solve the same fundamental problem.

Description

The basic principles of garbage collection are:
  1. Find data objects in a program that cannot be accessed in the future
  2. Reclaim the resources used by those objects


By making manual memory deallocation unnecessary (and often forbidding it), garbage collection frees the programmer from having to worry about releasing objects that are no longer needed, which can otherwise consume a significant amount of design effort. It also aids programmers in their efforts to make programs more stable, because it prevents several classes of runtime errors. For example, it prevents dangling pointer
Dangling pointer

Dangling pointers and wild pointers in computer programming are data pointer that do not point to a valid object of the appropriate type. Dangling pointers arise when an object is deleted or deallocated, without modifying the value of the pointer, so that the pointer still points to the memory location of the deallocated memory....
 errors, where a reference to a deallocated object is used. (The pointer still points to the location in memory where the object or data was, even though the object or data has since been deleted and the memory may now be used for other purposes, creating a dangling pointer. This can, and often does, lead to storage violation
Storage violation

A storage violation occurs when a task modifies computer storage that it does not own....
 errors that are extremely difficult to detect) Many computer language
Programming language

A programming language is a machine-readable artificial language designed to express computations that can be performed by a machine, particularly a computer....
s require garbage collection, either as part of the language specification (e.g. Java
Java (programming language)

Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java ....
, C#, and most scripting languages) or effectively for practical implementation (e.g. formal languages like lambda calculus
Lambda calculus

In mathematical logic and computer science, lambda calculus, also written as ?-calculus, is a formal system designed to investigate function definition, function application and recursion....
); these are said to be garbage collected languages. Other languages were designed for use with manual memory management, but have garbage collected implementations (e.g., C
C (programming language)

C is a general-purpose computer programming language originally developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implement the Unix operating system....
, C++
C++

C++ is a general-purpose programming language. It is regarded as a middle-level language, as it comprises a combination of both high-level programming language and low-level programming language language features....
). Delphi
Object Pascal

Object Pascal refers to a branch of Object-oriented programming derivatives of Pascal , mostly known as the primary programming language of CodeGear Delphi....
 versions support garbage collected dynamic arrays, long strings, variants and interfaces. Some languages, like Ada
Ada (programming language)

Ada is a structured programming, statically typed, Imperative programming, and Object-oriented programming high-level language computer programming programming language, extended from Pascal and other languages....
 and Modula-3
Modula-3

In Computer science, Modula-3 is a programming language conceived as a successor to an upgraded version of Modula-2. While it has been influential in research circles it has not been adopted widely in industry....
, allow both garbage collection and manual memory management
Manual memory management

In computer science, manual memory management refers to the usage of manual instructions by the programmer in order to identify and deallocate unused objects, or garbage ....
 to co-exist in the same application by using separate heaps for collected and manually managed objects, or yet others like D
D (programming language)

The D programming language, also known simply as D, is an Object-oriented programming, Imperative programming, Multi-paradigm programming language system programming language by Walter Bright of Digital Mars....
, which is garbage collected but allows the user to manually delete objects and also entirely disable garbage collection when speed is required. In any case, it is far easier to implement garbage collection as part of the language's compiler
Compiler

A compiler is a computer program that transforms source code written in a programming language into another computer language . The most common reason for wanting to transform source code is to create an executable program....
 and runtime
Runtime

In computer science, runtime or run time describes the operation of a computer program, the duration of its execution, from beginning to termination ....
 system, but post hoc GC systems exist, including ones that do not require recompilation. The garbage collector will almost always be closely integrated with the memory allocator
Dynamic memory allocation

In computer science, dynamic memory allocation is the allocation of computer storage storage for use in a computer program during the runtime of that program....
.

Benefits


Garbage collection frees the programmer from manually dealing with memory allocation and deallocation. As a result, certain categories of bugs are eliminated or substantially reduced:

  • Dangling pointer bugs, which occur when a piece of memory is freed while there are still pointers to it, and one of those pointers is used.
  • Double free bugs, which occur when the program attempts to free a region of memory that is already free.
  • Certain kinds of memory leak
    Memory leak

    In computer science, a memory leak is a particular type of unintentional memory consumption by a computer program where the program fails to release dynamic memory when no longer needed....
    s
    , in which a program fails to free memory that is no longer referenced by any variable, leading, over time, to memory exhaustion.


Researchers draw a distinction between "physical" and "logical" memory leaks. In a physical memory leak, the last pointer to a region of allocated memory is removed, but the memory is not freed. In a logical memory leak, a region of memory is still referenced by a pointer, but is never actually used. Garbage collectors generally can do nothing about logical memory leaks. Novice programmers sometimes believe that garbage collection makes memory leaks impossible, not realizing that logical leaks are still possible.

In languages that provide dynamic allocation, garbage collection is crucial to memory safety
Memory safety

Memory safety is a concern in software development that aims to avoid software bugs that cause Vulnerability dealing with RAM memory access, such as Buffer overflow and Dangling pointer....
 and often to the associated property of type safety
Type safety

In computer science, type safety is a property of some programming languages that is defined differently by different communities, but most definitions involve the use of a type system to prevent certain erroneous or undesirable program behavior ....
. This in turn improves the security
Computer security

Computer security is a branch of technology known as information security as applied to computers. The objective of computer security can include protection of information from theft or corruption, or the preservation of availability, as defined in the security policy....
 of the language by preventing a wide class of security vulnerabilities
Vulnerability (computing)

In computer security, the term vulnerability is applied to a weakness in a system which allows an attacker to violate the integrity of that system....
 based on over-writing memory in unexpected ways.

Tracing garbage collectors

Tracing garbage collectors are the most common type of garbage collector. They first determine which objects are reachable (or potentially reachable), and then discard all remaining objects.

Reachability of an object

Informally, a reachable object can be defined as an object for which there exists some variable in the program environment that leads to it, either directly or through references from other reachable objects. More precisely, objects can be reachable in only two ways:
  1. A distinguished set of objects are assumed to be reachable—these are known as the roots. Typically, these include all the objects referenced from anywhere in the call stack
    Call stack

    In computer science, a call stack is a dynamic Stack data structure that stores information about the active subroutines of a computer program....
     (that is, all local variables and parameters in the functions currently being invoked), and any global variables.
  2. Anything referenced from a reachable object is itself reachable; more formally, reachability is a transitive closure
    Transitive closure

    In mathematics, the transitive closure of a binary relation R on a Set X is the smallest transitive relation on X that contains R....
    .


The reachability definition of "garbage" is not optimal, insofar as the last time a program uses an object could be long before that object falls out of the environment scope. A distinction is sometimes drawn between syntactic garbage, those objects the program cannot possibly reach, and semantic garbage, those objects the program will in fact never again use. For example: Object x = new Foo; Object y = new Bar; x = new Quux; /* at this point, we know that the Foo object will * never be accessed - it is syntactic garbage */

if(x.check_something) System.exit(0); /* in the above block, y *could* be semantic garbage, * but we won't know until x.check_something returns * some value - if it returns at all */ The problem of precisely identifying semantic garbage can easily be shown to be partially decidable
Decision problem

In computability theory and computational complexity theory, a decision problem is a question in some formal system with a yes-or-no answer, depending on the values of some input parameters....
: a program that allocates an object X, runs an arbitrary input program P, and uses X if and only if P finishes would require a semantic garbage collector to solve the halting problem
Halting problem

In computability theory , the halting problem is a decision problem which can be stated as follows: given a description of a computer program and a finite input, decide whether the program finishes running or will run forever, given that input....
. Although conservative heuristic methods for semantic garbage detection remain an active research area, essentially all practical garbage collectors focus on syntactic garbage.

Another complication with this approach is that, in languages with both reference type
Reference type

In Object-oriented programming, a reference type is a data type that can only be accessed by Reference . Unlike objects of value types, objects of reference types cannot be directly embedded into composite type and are always Dynamic memory allocation....
s and unboxed value type
Value type

In object-oriented programming, a value type is a data type that can exist outside dynamic memory allocation. Unlike reference types, value types can be directly embedded into composite type....
s, the garbage collector needs to somehow be able to distinguish which variables on the stack or fields in an object are regular values and which are references; since in memory, an integer and a reference might look alike. This often runs into problems with generic programming
Generic programming

Generic programming is a style of computer programming in which algorithms are written in terms of to-be-specified-later types that are then instantiated when needed for specific types provided as parameters and was pioneered by Ada which appeared in 1983....
, where you may want to have a generic container class be able to hold various types of elements (e.g. the programmers want to have a list that can hold integers or references, or a mix of them). The garbage collector then needs to know whether to treat the element as a reference and follow it, or whether it is a primitive value. One common solution is the use of Tagged pointer
Tagged pointer

In computer science, a tagged pointer is a common example of a tagged union, where the primary type of data to be stored in the union is a pointer....
s.

Basic algorithm

Tracing collectors are so called because they trace through the working set of memory. These garbage collectors perform collection in cycles. A cycle is started when the collector decides (or is notified) that it needs to reclaim memory, which happens most often when the system is low on memory. The original method involves a naive mark-and-sweep in which the entire memory set is touched several times.

Naive mark-and-sweep
In the naive mark-and-sweep method, each object in memory has a flag (typically a single bit) reserved for garbage collection use only. This flag is always cleared (counter-intuitively), except during the collection cycle. The first stage of collection sweeps the entire 'root set', marking each accessible object as being 'in-use'. All objects transitively accessible from the root set are marked, as well. Finally, each object in memory is again examined; those with the in-use flag still cleared are not reachable by any program or data, and their memory is freed. (For objects which are marked in-use, the in-use flag is cleared again, preparing for the next cycle.)

This method has several disadvantages, the most notable being that the entire system must be suspended during collection; no mutation of the working set can be allowed. This will cause programs to 'freeze' periodically (and generally unpredictably), making real-time and time-critical applications impossible. In addition, the entire working memory must be examined, much of it twice, potentially causing problems in paged memory systems.

Tri-colour marking
Because of these pitfalls, most modern tracing garbage collectors implement some variant of the tri-colour marking abstraction
Abstraction (computer science)

In computer science, abstraction is a mechanism and practice to reduce and factor out details so that one can focus on a few concepts at a time....
, but simple collectors (such as the mark-and-sweep collector) often do not make this abstraction explicit. Tri-colour marking works as follows:

  1. Create initial white, grey, and black sets; these sets will be used to maintain progress during the cycle. Initially the white set or condemned set is the set of objects that are candidates for having their memory recycled. The black set is the set of objects that cheaply can be proven to have no references to objects in the white set; in many implementations the black set starts off empty. The grey set is all the remaining objects that may or may not have references to objects in the white set (and elsewhere). These sets partition
    Partition of a set

    In mathematics, a partition of a Set X is a division of X into non-overlapping "parts" or "blocks" or "cells" that cover all of X....
     memory; every object in the system, including the root set, is in precisely one set.
  2. Pick an object from the grey set. Blacken this object (move it to the black set), by greying all the white objects it references directly.
  3. Repeat the previous step until the grey set is empty.
  4. When there are no more objects in the grey set, then all the objects remaining in the white set are provably not reachable and the storage occupied by them can be reclaimed.


The tri-colour marking algorithm preserves an important invariant:
No black object points directly to a white object.
This ensures that the white objects can be safely destroyed once the grey set is empty. (Some variations on the algorithm do not preserve the tricolour invariant but they use a modified form for which all the important properties hold.)

The tri-colour method has an important advantage: it can be performed 'on-the-fly', without halting the system for significant time periods. This is accomplished by marking objects as they are allocated and during mutation, maintaining the various sets. By monitoring the size of the sets, the system can perform garbage collection periodically, rather than as-needed. Also, the need to touch the entire working set each cycle is avoided.

Implementation strategies

In order to implement the basic tri-colour algorithm, several important design decisions must be made, which can significantly affect the performance characteristics of the garbage collector.

Moving vs. non-moving
Once the unreachable set has been determined, the garbage collector may simply release the unreachable objects and leave everything else as it is, or it may copy some or all of the reachable objects into a new area of memory, updating all references to those objects as needed. These are called "non-moving" and "moving" garbage collectors, respectively.

At first, a moving GC strategy may seem inefficient and costly compared to the non-moving approach, since much more work would appear to be required on each cycle. In fact, however, the moving GC strategy leads to several performance advantages, both during the garbage collection cycle itself and during actual program execution:
  • No additional work is required to reclaim the space freed by dead objects; the entire region of memory from which reachable objects were moved can be considered free space. In contrast, a non-moving GC must visit each unreachable object and somehow record that the memory it alone occupied is available.
  • Similarly, new objects can be allocated very quickly. Since large contiguous regions of memory are usually made available by the moving GC strategy, new objects can be allocated by simply incrementing a 'free memory' pointer. A non-moving strategy may, after some time, lead to a heavily fragmented
    Fragmentation (computer)

    In computer storage, fragmentation is a phenomenon in which storage space is used inefficiently, reducing storage capacity. The term is also used to denote the wasted space itself....
     heap, requiring expensive consultation of "free lists" of small available blocks of memory in order to allocate new objects.
  • If an appropriate traversal order is used, objects that refer to each other frequently can be moved very close to each other in memory, increasing the likelihood that they will be located in the same cache line or virtual memory
    Virtual memory

    Virtual memory is a computer system technique which gives an application program the impression that it has contiguous working memory , while in fact it may be physically fragmented and may even overflow on to disk storage....
     page. This can significantly speed up access to these objects through these references.


One disadvantage of a moving garbage collector is that it only allows access through references that are managed by the garbage collected environment, and does not allow pointer arithmetic. This is because any native pointers to objects will be invalidated when the garbage collector moves the object (they become dangling pointer
Dangling pointer

Dangling pointers and wild pointers in computer programming are data pointer that do not point to a valid object of the appropriate type. Dangling pointers arise when an object is deleted or deallocated, without modifying the value of the pointer, so that the pointer still points to the memory location of the deallocated memory....
s). For interoperability
Interoperability

Interoperability is a property referring to the ability of diverse systems and organizations to work together . The term is often used in a technical systems engineering sense, or alternatively in a broad sense, taking into account social, political, and organizational factors that impact system to system performance....
 with native code, the garbage collector must copy the object contents to a location outside of the garbage collected region of memory. An alternative approach is to pin the object in memory, preventing the garbage collector from moving it and allowing the memory to be directly shared with native pointers (and possibly allowing pointer arithmetic).

Copying vs. mark-and-sweep vs. mark-and-don't-sweep
To further refine the distinction, tracing collectors can also be divided by considering how the three sets of objects (white, grey, and black) are maintained during a collection cycle.

The most straightforward approach is the semi-space collector, which dates to 1969. In this moving GC scheme, memory is partitioned into a "from space" and "to space". Initially, objects are allocated into "to space" until they become full and a collection is triggered. At the start of a collection, the "to space" becomes the "from space", and vice versa. The objects reachable from the root set are copied from the "from space" to the "to space". These objects are scanned in turn, and all objects that they point to are copied to "to space", until all reachable objects have been copied to "to space". Once the program continues execution, new objects are once again allocated from the "to space" until it is once again full and the process is repeated. This approach has the advantage of conceptual simplicity (the three object color sets are implicitly constructed during the copying process), but the disadvantage that a (possibly) very large contiguous region of free memory is necessarily required on every collection cycle. This technique is also known as stop-and-copy. Cheney's algorithm
Cheney's algorithm

Cheney's algorithm, first described in a 1970 Association for Computing Machinery paper by C.J. Cheney, is a method of garbage collection in computer software systems....
 is an improvement on the semi-space collector.

A mark and sweep garbage collector maintains a bit (or two) with each object to record whether it is white or black; the grey set is either maintained as a separate list (such as the process stack) or using another bit. As the reference tree is traversed during a collection cycle (the "mark" phase), these bits are manipulated by the collector to reflect the current state. A final "sweep" of the memory areas then frees white objects. The mark and sweep strategy has the advantage that, once the unreachable set is determined, either a moving or non-moving collection strategy can be pursued; this choice of strategy can even be made at runtime, as available memory permits. It has the disadvantage of "bloating" objects by a small amount.

A mark and don't sweep garbage collector, like the mark-and-sweep, maintains a bit with each object to record whether it is white or black; the gray set is either maintained as a separate list (such as the process stack) or using another bit. There are two key differences here. First, black and white mean different things than they do in the mark and sweep collector. In a "mark and don't sweep" system, all reachable objects are always black. An object is marked black at the time it is allocated, and it will stay black even if it becomes unreachable. A white object is unused memory and may be allocated. Second, the interpretation of the black/white bit can change. Initially, the black/white bit may have the sense of (0=white, 1=black). If an allocation operation ever fails to find any available (white) memory, that means all objects are marked used (black). The sense of the black/white bit is then inverted (for example, 0=black, 1=white). Everything becomes white. This momentarily breaks the invariant that reachable objects are black, but a full marking phase follows immediately, to mark them black again. Once this is done, all unreachable memory is white. No "sweep" phase is necessary.

Generational GC (aka Ephemeral GC)
It has been empirically observed that in many programs, the most recently created objects are also those most likely to become unreachable quickly (known as infant mortality or the generational hypothesis). A generational GC divides objects into generations and, on most cycles, will place only the objects of a subset of generations into the initial white (condemned) set. Furthermore, the runtime system maintains knowledge of when references cross generations by observing the creation and overwriting of references. When the garbage collector runs, it may be able to use this knowledge to prove that some objects in the initial white set are unreachable without having to traverse the entire reference tree. If the generational hypothesis holds, this results in much faster collection cycles while still reclaiming most unreachable objects.

In order to implement this concept, many generational garbage collectors use separate memory regions for different ages of objects. When a region becomes full, those few objects that are referenced from older memory regions are promoted (copied) up to the next highest region, and the entire region can then be overwritten with fresh objects. This technique permits very fast incremental garbage collection, since the garbage collection of only one region at a time is all that is typically required.

Generational garbage collection is a heuristic
Heuristic (computer science)

In computer science, a heuristic algorithm, or simply a heuristic, is an algorithm that is able to produce an acceptable solution to a problem in many practical scenarios, but for which there is no formal proof of its correctness....
 approach, and some unreachable objects may not be reclaimed on each cycle. It may therefore occasionally be necessary to perform a full mark and sweep or copying garbage collection to reclaim all available space. In fact, runtime systems for modern programming languages (such as Java
Java (programming language)

Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java ....
 and the .NET Framework
.NET Framework

The Microsoft .NET Framework is a software framework that is available with several Microsoft Windows operating systems. It includes a large Library of coded solutions to prevent common programming problems and a virtual machine that manages the execution of programs written specifically for the Software framework....
) usually use some hybrid of the various strategies that have been described thus far; for example, most collection cycles might look only at a few generations, while occasionally a mark-and-sweep is performed, and even more rarely a full copying is performed to combat fragmentation. The terms "minor cycle" and "major cycle" are sometimes used to describe these different levels of collector aggression.

Stop-the-world vs. incremental vs. concurrent
Simple stop-the-world garbage collectors completely halt execution of the program to run a collection cycle, thus guaranteeing that new objects are not allocated and objects do not suddenly become unreachable while the collector is running. This has the obvious disadvantage that the program can perform no useful work while a collection cycle is running (sometimes called the "embarrassing pause").

Incremental garbage collectors are designed to reduce this disruption by interleaving their work with activity from the main program. Careful design is necessary to ensure that the main program does not interfere with the garbage collector and vice versa; for example, when the program needs to allocate a new object, the runtime system may either need to suspend it until the collection cycle is complete, or somehow notify the garbage collector that there exists a new, reachable object.

Precise vs. conservative and internal pointers
Some collectors running in a particular environment can correctly identify all pointers (references) in an object; these are called "precise" (also "exact" or "accurate") collectors, the opposite being a "conservative" or "partly conservative" collector. Conservative collectors have to assume that any bit pattern in memory could be a pointer if (when interpreted as a pointer) it would point into any allocated object. Thus, conservative collectors may have some false negatives, where storage is not released because of accidental fake pointers, but this is rarely a significant drawback in practice, unless the program has to handle data that is or seems to be random and could easily be a pointer. Naturally 64bit systems suffer less from this problem than 32bit systems, since the possibility for a 64bit pattern to form a valid pointer is obviously smaller than for a 32bit pattern as the domain of a 64bit value is much greater than a 32bit value. Thus it is less probable for a "random" 64bit value to point to a region on system's memory. Whether a precise collector is practical usually depends on the type safety properties of the programming language in question. An example for which a conservative garbage collector would be needed is the C programming language, which allows typed (non-void) pointers to be type cast into untyped (void) pointers, and vice versa.

A related issue concerns internal pointers, or pointers to fields within an object. If the semantics of a language allow internal pointers, then there may be many different addresses that can refer to the same object, which complicates determining whether an object is garbage or not. An example for this is the C++ programming language, in which multiple inheritance can cause pointers to base objects to have different addresses.

Performance implications

Tracing garbage collectors require some implicit runtime overhead
Computational overhead

In computer science, overhead is generally considered any combination of excess or indirect computation time, memory, bandwidth, or other resources that are required to attain a particular goal....
 that may be beyond the control of the programmer, and can sometimes lead to performance problems. For example, commonly used stop-the-world garbage collectors, which pause program execution at arbitrary times, may make garbage collection inappropriate for some embedded system
Embedded system

An embedded system is a special-purpose computer system designed to perform one or a few dedicated functions, often with real-time computing constraints....
s, high-performance server
Server (computing)

A server is a computer program that provides services to other computer programs , in the same or other computer. The physical computer that runs a server program is also often referred to as server....
 software, and applications with real-time
Real-time computing

In computer science, real-time computing is the study of Computer hardware and computer software systems that are subject to a "real-time constraint"?i.e., operational deadlines from event to system response....
 needs.

Manual heap allocation:
  • search for best/first-fit block of sufficient size
  • free list maintenance


Garbage collection:
  • locate reachable objects
  • copy reachable objects for moving collectors
  • read/write barriers for incremental collectors
  • search for best/first-fit block and free list maintenance for non-moving collectors


It is difficult to compare the two cases directly, as their behavior depends on the situation. For example, in the best case for a garbage collecting system, allocation just increments a pointer, but in the best case for manual heap allocation, the allocator maintains freelists of specific sizes and allocation only requires following a pointer. However, this size segregation usually cause a large degree of external fragmentation, which can have an adverse impact on cache behaviour. Memory allocation in a garbage collected language may be implemented using heap allocation behind the scenes (rather than simply incrementing a pointer), so the performance advantages listed above don't necessarily apply in this case. In some situations, most notably embedded systems, it is possible to avoid both garbage collection and heap management overhead by preallocating pools of memory and using a custom, lightweight scheme for allocation/deallocation.

The overhead of write barriers is more likely to be noticeable in an imperative
Imperative programming

In computer science, imperative programming is a programming paradigm that describes computation in terms of statement s that change a program state ....
-style program which frequently writes pointers into existing data structures than in a functional
Functional programming

In computer science, functional programming is a programming paradigm that treats computation as the evaluation of function s and avoids program state and immutable object data....
-style program which constructs data only once and never changes them.

Some advances in garbage collection can be understood as reactions to performance issues. Early collectors were stop-the-world collectors, but the performance of this approach was distracting in interactive applications. Incremental collection avoided this disruption, but at the cost of decreased efficiency due to the need for barriers. Generational collection techniques are used with both stop-the-world and incremental collectors to increase performance; the trade-off is that some garbage is not detected as such for longer than normal.

Reference counting


Reference counting is a form of automatic memory management where each object has a count of the number of references to it. An object's reference count is incremented when a reference to it is created, and decremented when a reference is destroyed. The object's memory is reclaimed when the count reaches zero.

There are two major disadvantages to reference counting:
  • If two or more objects refer to each other, they can create a cycle whereby neither will be collected as their mutual references never let their reference counts become zero. Some garbage collection systems (like the one in CPython
    CPython

    CPython is the default, most-widely used implementation of the Python . It is written in C . In addition to CPython, there are two other production-quality Python implementations: Jython, written in Java , and IronPython, which is written for the Common Language Runtime, as well as several experimental implementations....
    ) using reference counting use specific cycle-detecting algorithms to deal with this issue.
  • In naïve implementations, each assignment of a reference and each reference falling out of scope often require modifications of one or more reference counters. However, optimizations to this are described in the literature
    Reference counting

    In computer science, reference counting is a technique of storing the number of references, pointers, or handles to a resource such as an object or block of memory....
    . When used in a multithreaded environment, these modifications (increment and decrement) may need to be interlocked. This is usually a very expensive operation.


Escape analysis

Escape analysis
Escape analysis

In programming language compiler optimization theory, escape analysis is a method for determining the dynamic scope of pointers. It is related to pointer analysis and shape analysis ....
 can be used to convert heap allocations to stack allocations, thus reducing the amount of work needed to be done by the garbage collector.

Availability


Generally speaking, higher-level programming languages
High-level programming language

In computing, a high-level programming language is a programming language with strong Abstraction from the details of the computer. In comparison to low-level programming languages, it may use natural language elements, be easier to use, or more Porting across platforms....
 are more likely to have garbage collection as a standard feature. In languages that do not have built in garbage collection, it can often be added through a library, as with the Boehm garbage collector
Boehm garbage collector

In computer science, the Boehm-Demers-Weiser garbage collector, often simply known as Boehm GC, is a conservative Garbage collection for C and C++, which is used by many projects that are implemented in C or C++, as well as by runtime environments for a number of other languages, including the GNU Compiler for Java runtime environment...
 for C and C++. This approach is not without drawbacks, such as changing object creation and destruction mechanisms.

Most functional programming languages, such as ML
ML programming language

ML is a general-purpose functional programming language developed by Robin Milner and others in the late 1970s at the University of Edinburgh, whose syntax is inspired by ISWIM....
, Haskell
Haskell (programming language)

Haskell is a standardized, purely functional programming language with non-strict programming language, named after logician Haskell Curry. The goals of the language are described as:...
, and APL, have garbage collection built in. Lisp
Lisp programming language

Lisp is a family of computer programming languages with a long history and a distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the second-oldest high-level programming language in widespread use today; only Fortran is older....
, which introduced functional programming
Functional programming

In computer science, functional programming is a programming paradigm that treats computation as the evaluation of function s and avoids program state and immutable object data....
, is especially notable for introducing this mechanism.

Other dynamic languages, such as Ruby
Ruby (programming language)

Ruby is a dynamic programming language, reflection , general purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features....
 (but not Perl
Perl

In computer programming, Perl is a high-level programming language, List of programming languages by category, Interpreter , dynamic programming language....
, or PHP
PHP

PHP is a scripting language originally designed for producing dynamic web pages. It has evolved to include a command line interface capability and can be used in Standalone software Graphical user interface....
, which use reference counting), also tend to use GC. Object-oriented programming languages such as Smalltalk, Java
Java (programming language)

Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java ....
 and ECMAScript
ECMAScript

ECMAScript is a scripting language, standardized by Ecma International in the ECMA-262 Specification . The language is widely used on the World Wide Web, and is often confused with JavaScript or JScript, the two major Programming language dialect from which ECMAScript was standardized....
 usually provide integrated garbage collection, a notable exception being C++
C++

C++ is a general-purpose programming language. It is regarded as a middle-level language, as it comprises a combination of both high-level programming language and low-level programming language language features....
.

Historically, languages intended for beginners, such as BASIC
BASIC

In computer programming, BASIC is a family of high-level programming languages. The Dartmouth BASIC was designed in 1964 by John George Kemeny and Thomas Eugene Kurtz at Dartmouth College in New Hampshire, United States to provide computer access to non-science students....
 and Logo
Logo (programming language)

Logo is a computer programming language used for functional programming. It is an adaptation and dialect of the Lisp language; some have called it Lisp without the S-expression....
, have often used garbage collection for variable-length data types, such as strings and lists, so as not to burden programmers with manual memory management. On early microcomputers, with their limited memory and slow processors, garbage collection could often cause apparently random, inexplicable pauses in the midst of program operation.

Limited environments


Garbage collection is rarely used on embedded or real-time systems because of the perceived need for very tight control over the use of limited resources. However, garbage collectors compatible with such limited environments have been developed. The Microsoft .NET Micro Framework
.NET Micro Framework

The .NET Micro Framework is a Microsoft .NET platform for extremely resource-constrained devices. It includes a small version of the .NET common language runtime and supports development in C Sharp and debugging , both using Microsoft Visual Studio....
 and Java Platform, Micro Edition
Java Platform, Micro Edition

In computing, the Java Platform, Micro Edition or Java ME is a specification of a subset of the Java platform aimed at providing a certified collection of Java Application Programming Interfaces for the development of software for tiny, small and resource-constrained devices....
 are embedded software platforms that, like their larger cousins, include garbage collection.

See also

  • Cheney's algorithm
    Cheney's algorithm

    Cheney's algorithm, first described in a 1970 Association for Computing Machinery paper by C.J. Cheney, is a method of garbage collection in computer software systems....
  • Memory management
    Memory management

    Memory management is the act of managing computer memory. In its simpler forms, this involves providing ways to allocate portions of memory to programs at their request, and freeing it for reuse when no longer needed....
  • Reversible computing
    Reversible computing

    Reversible computing, sometimes called non-destructive computing includes any computational process that is reversible, i.e., time-invertible function, meaning that a time-reversed version of the process could exist within the same general dynamical system as the original process....
  • Memory leak
    Memory leak

    In computer science, a memory leak is a particular type of unintentional memory consumption by a computer program where the program fails to release dynamic memory when no longer needed....
  • Weak reference
    Weak reference

    In computer programming, a weak reference is a Reference that does not protect the referent object from collection by a garbage collection . An object referenced only by weak references is considered Unreachable memory and so may be collected at any time....


External links

  • Publications by the OOPS group at the University of Texas at Austin
    University of Texas at Austin

    The University of Texas at Austin is a public university research university located in Austin, Texas, Texas, United States, and is the flagship#University campuses institution of University of Texas System....
    : http://www.cs.utexas.edu/users/oops/papers.html
  • by Hans Boehm (the site also links to a number of articles on garbage collection in general)
  • has a comprehensive description of the GC used in Microsoft .Net Framework
  • by Edsger W. Dijkstra and Leslie Lamport
    Leslie Lamport

    Dr. Leslie Lamport is an United States computer science. A graduate of the Bronx High School of Science, he received a Bachelor's degree in mathematics from the Massachusetts Institute of Technology in 1960, and Master's degree and Doctor of Philosophy degrees in mathematics from Brandeis University, respectively in 1963 and 1972....
     and A.J.Martin and C.S.Scholten and E.F.M.Steffens
  • Richard Jones and Rafael Lins, Garbage Collection: Algorithms for Automatic Dynamic Memory Management, Wiley and Sons (1996), ISBN 0-471-94148-4
  • by H. Lieberman and C. Hewitt, MIT Artificial Intelligence Laboratory