Pointer swizzling
Encyclopedia
In computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, pointer swizzling is the conversion of references based on name or position to direct pointer references. It is typically performed during the deserialization (loading) of a relocatable object from disk, such as an executable file or pointer-based data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

. The reverse operation, replacing pointers with position-independent symbols or positions, is sometimes referred to as unswizzling, and is performed during serialization
Serialization
In computer science, in the context of data storage and transmission, serialization is the process of converting a data structure or object state into a format that can be stored and "resurrected" later in the same or another computer environment...

 (saving).

Example

For example, suppose we have the following linked list
Linked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...

 data structure:


struct node {
int data;
struct node *next;
};


We can easily create a linked list data structure in memory using such an object, but when we attempt to save it to disk we run into trouble. Directly saving the pointer values won't work on most architectures, because the next time we load it the memory positions the nodes now use may be in use by other data. One way of dealing with this is to assign a unique id number to each node and then unswizzle the pointers by turning them into a field indicating the id number of the next node:


struct node_saved {
int data;
int id_number;
int id_number_of_next_node;
};


We can save these records to disk in any order, and no information will be lost. Other options include saving the file offset of the next node or a number indicating its position in the sequence of saved records.

When we go to load these nodes, however, we quickly discover that attempting to find a node based on its number is cumbersome and inefficient. We'd like our original data structure back so we can simply follow next pointers to traverse the list. To do this, we perform pointer swizzling, finding the address of each node and turning the id_number_of_next_node fields back into direct pointers to the right node.

Methods of unswizzling

There are a potentially unlimited number of forms into which a pointer can be unswizzled, but some of the most popular include:
  • The offset of the pointed-to object in the file
  • The index of the pointed-to object in some sequence of records
  • A unique identifier possessed by the pointed-to object, such as a person's social security number
    Social Security number
    In the United States, a Social Security number is a nine-digit number issued to U.S. citizens, permanent residents, and temporary residents under section 205 of the Social Security Act, codified as . The number is issued to an individual by the Social Security Administration, an independent...

    ; in databases, all pointers are unswizzled in this manner (see foreign key
    Foreign key
    In the context of relational databases, a foreign key is a referential constraint between two tables.A foreign key is a field in a relational table that matches a candidate key of another table...

    )

Potential security weaknesses

For security, such methods must be implemented with a great deal of caution. In particular, an attacker's presentation of a specially crafted file may allow access to addresses outside of the expected and proper bounds. In systems with weak memory protection this can lead to exposure of confidential data or modification of code likely to be executed. If the system does not implement guards against execution of data the system may be severely compromised by the installation of various kinds of malware
Malware
Malware, short for malicious software, consists of programming that is designed to disrupt or deny operation, gather information that leads to loss of privacy or exploitation, or gain unauthorized access to system resources, or that otherwise exhibits abusive behavior...

.

Methods of protection include verifications prior to releasing the data to an application:
  • That an offset does not leave the bounds of the data read.
  • That a table of indexes and the records pointed to is similarly constrained.
  • That identifiers are both unique, and if sensitive, encrypted.
  • That all variable–length data is restrained to lengths not exceeding the actual allocation.
  • That allocations are of reasonable size
  • That allocations made that are not loaded with data read are cleared, or loaded with some specific pattern.


Methods of swizzling

Swizzling in the general case can be complicated. The reference graph
Graph (data structure)
In computer science, a graph is an abstract data structure that is meant to implement the graph and hypergraph concepts from mathematics.A graph data structure consists of a finite set of ordered pairs, called edges or arcs, of certain entities called nodes or vertices...

 of pointers might contain an arbitrary number of cycle
Cycle (graph theory)
In graph theory, the term cycle may refer to a closed path. If repeated vertices are allowed, it is more often called a closed walk. If the path is a simple path, with no repeated vertices or edges other than the starting and ending vertices, it may also be called a simple cycle, circuit, circle,...

s; this complicates maintaining a mapping from the old unswizzled values to the new addresses. Associative array
Associative array
In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....

s are useful for maintaining the mapping, while algorithms such as breadth-first search
Breadth-first search
In graph theory, breadth-first search is a graph search algorithm that begins at the root node and explores all the neighboring nodes...

 help to traverse the graph, although both of these require extra storage. Various serialization
Serialization
In computer science, in the context of data storage and transmission, serialization is the process of converting a data structure or object state into a format that can be stored and "resurrected" later in the same or another computer environment...

 libraries
Library (computer science)
In computer science, a library is a collection of resources used to develop software. These may include pre-written code and subroutines, classes, values or type specifications....

 provide general swizzling systems. In many cases, however, swizzling can be performed with simplifying assumptions, such as a tree
Tree (data structure)
In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...

 or list structure of references.

The different types of swizzling are:
  • Automatic swizzling
  • On-demand swizzling
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK