Read-copy-update - AbsoluteAstronomy.com

Operating system

An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

s, read-copy-update (RCU) is a synchronization

Synchronization (computer science)

In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, so as to reach an agreement or...

mechanism implementing a kind of mutual exclusion

Mutual exclusion

Mutual exclusion algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections. A critical section is a piece of code in which a process or thread accesses a common resource...

Please note that RCU does not implement mutual exclusion in the conventional sense: RCU readers can and do run concurrently with RCU updates. RCU's variant of mutual exclusion is in space, with RCU readers accessing old versions of data being concurrently updated, rather than in time, as is the case for conventional concurrency-control mechanisms. which can sometimes be used as an alternative to a readers-writer lock

Readers-writer lock

In computer science, a readers-writer or shared-exclusive lock In computer science, a readers-writer or shared-exclusive lock In computer science, a readers-writer or shared-exclusive lock (also known as the multiple readers / single-writer lock or the multi-reader lock,...

. It allows extremely low overhead, wait-free

Non-blocking synchronization

In computer science, a non-blocking algorithm ensures that threads competing for a shared resource do not have their execution indefinitely postponed by mutual exclusion...

reads. However, RCU updates can be expensive, as they must leave the old versions of the data structure in place to accommodate pre-existing readers. These old versions are reclaimed after all pre-existing readers finish their accesses.

Overview

RCU features read-side critical sections, which are normally delimited by rcu_read_lock and rcu_read_unlock. Any statement that is not within an RCU read-side critical section is said to be a quiescent state, and such statements are not permitted to hold references to RCU-protected data structures. Any time period during which each thread resides at least one quiescent state is called a grace period. By definition, any RCU read-side critical section in existence at the beginning of a given grace period must complete before the end of that grace period, which constitutes the fundamental guarantee provided by RCU. It turns out that this guarantee can be provided with extremely small read-side overheads, in fact, in the limiting case that is actually realized by server-class Linux-kernel builds, the read-side overhead is exactly zero.

RCU's fundamental guarantee may be used by splitting updates into removal and reclamation phases. The removal phase removes references to data items within a data structure (possibly by replacing them with references to new versions of these data items), and can run concurrently with RCU read-side critical sections. The reason that it is safe to run the removal phase concurrently with RCU readers is the semantics of modern CPUs guarantee that readers will see either the old or the new version of the data structure rather than a partially updated reference. Once a grace period has elapsed, there can no longer be any readers referencing the old version, so it is then safe for the reclamation phase to free (reclaim) the data items that made up that old version.

Splitting an update into removal and reclamation phases allows the updater to perform the removal phase immediately, and to defer the reclamation phase until all readers active during the removal phase have completed, in other words, until a grace period has elapsed.Only readers that are active during the removal phase need be considered, because any reader starting after the removal phase will be unable to gain a reference to the removed data items, and therefore cannot be disrupted by the reclamation phase.

So the typical RCU update sequence goes something like the following:

Ensure that all readers accessing RCU-protected data structures carry out their references from within an RCU read-side critical section.
Remove pointers to a data structure, so that subsequent readers cannot gain a reference to it.
Wait for a grace period to elapse, so that all previous readers (which might still have pointers to the data structure removed in the prior step) will have completed their RCU read-side critical sections.
At this point, there cannot be any readers still holding references to the data structure, so it now may safely be reclaimed (e.g., freed).garbage collectors
Garbage collection (computer science)
In computer science, garbage collection is a form of automatic memory management. The garbage collector, or just collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program...

, where available, may be used to perform this step.

In the above procedure, the updater is performing both the removal and the reclamation step, but it is often helpful for an entirely different thread to do the reclamation. Reference counting can be used to let the reader perform removal so, even if the same thread performs both the update step (step (2) above) and the reclamation step (step (4) above), it is often helpful to think of them separately.

Uses

As of early 2008, there were almost 2,000 uses of the RCU API within the Linux kernel, including the networking protocol stacks and the memory-management system.
Since 2006, researchers have applied RCU and similar techniques to a number of problems, including management of metadata used in dynamic analysis, managing the lifetime of clustered objects, managing object lifetime in the K42 research operating system, and optimizing software transactional memory

Software transactional memory

In computer science, software transactional memory is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It is an alternative to lock-based synchronization. A transaction in this context is a piece of code that...

implementations. Dragonfly BSD

DragonFly BSD

DragonFly BSD is a free Unix-like operating system created as a fork of FreeBSD 4.8. Matthew Dillon, an Amiga developer in the late 1980s and early 1990s and a FreeBSD developer between 1994 and 2003, began work on DragonFly BSD in June 2003 and announced it on the FreeBSD mailing lists on July...

uses a technique similar to RCU that most closely resembles Linux's SRCU implementation.

Advantages and disadvantages

The ability to wait until all readers are done allows RCU readers to use much lighter-weight synchronization—in some cases, absolutely no synchronization at all. In contrast, in more conventional lock-based schemes, readers must use heavy-weight synchronization in order to prevent an updater from deleting the data structure out from under them. This is because lock-based updaters typically update data items in place, and must therefore exclude readers. In contrast, RCU-based updaters typically take advantage of the fact that writes to single aligned pointers are atomic on modern CPUs, allowing atomic insertion, removal, and replacement of data items in a linked structure without disrupting readers. Concurrent RCU readers can then continue accessing the old versions, and can dispense with the atomic operations, memory barriers, and cache misses that are so expensive on modern SMP

Symmetric multiprocessing

In computing, symmetric multiprocessing involves a multiprocessor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance. Most common multiprocessor systems today use an SMP architecture...

computer systems, even in absence of lock contention. The lightweight nature of RCU's read-side primitives provides additional advantages beyond excellent performance, scalability, and real-time response. For example, they provide immunity to most deadlock and livelock conditions.RCU-based deadlocks are still possible, for example by executing a statement that blocks until a grace period completes within an RCU read-side critical section.

Of course, RCU also has disadvantages. For example, RCU is a specialized technique that works best in situations with mostly reads and few updates, but is often less applicable to update-only workloads. For another example, although the fact that RCU readers and updaters may execute concurrently is what enables the lightweight nature of RCU's read-side primitives, some algorithms may not be amenable to read/update concurrency.

Despite well over a decade of experience with RCU, the exact extent of its applicability is still a research topic.

Patents

The technique is covered by U.S.

United States Patent and Trademark Office

The United States Patent and Trademark Office is an agency in the United States Department of Commerce that issues patents to inventors and businesses for their inventions, and trademark registration for product and intellectual property identification.The USPTO is based in Alexandria, Virginia,...

software patent

Software patent

Software patent does not have a universally accepted definition. One definition suggested by the Foundation for a Free Information Infrastructure is that a software patent is a "patent on any performance of a computer realised by means of a computer program".In 2005, the European Patent Office...

5,442,758, issued August 15, 1995 and assigned to Sequent Computer Systems

Sequent Computer Systems

Sequent Computer Systems, or Sequent, was a computer company that designed and manufactured multiprocessing computer systems. They were among the pioneers in high-performance symmetric multiprocessing open systems, innovating in both hardware and software Sequent Computer Systems, or Sequent, was...

, as well as by 5,608,893, 5,727,528, 6,219,690, and 6,886,162. The now-expired US Patent 4,809,168 covers a closely related technique. RCU is also the topic of one claim in the SCO v. IBM lawsuit

Lawsuit

A lawsuit or "suit in law" is a civil action brought in a court of law in which a plaintiff, a party who claims to have incurred loss as a result of a defendant's actions, demands a legal or equitable remedy. The defendant is required to respond to the plaintiff's complaint...

Sample RCU interface

RCU is available in a number of operating systems, and was added to the Linux kernel

Linux kernel

The Linux kernel is an operating system kernel used by the Linux family of Unix-like operating systems. It is one of the most prominent examples of free and open source software....

in October 2002. User-level implementations such as liburcu are also available.

The implementation of RCU in version 2.6 of the Linux kernel is among the better-known RCU implementations, and will be used as an inspiration for the RCU API in the remainder of this article. The core API (Application Programming Interface

Application programming interface

An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...

) is quite small:

rcu_read_lock: Marks an RCU-protected data structure so that it won't be reclaimed for the full duration of that critical section.

rcu_read_unlock: Used by a reader to inform the reclaimer that the reader is exiting an RCU read-side critical section. Note that RCU read-side critical sections may be nested and/or overlapping.

synchronize_rcu: It blocks until all pre-existing RCU read-side critical sections on all CPUs have completed. Note that synchronize_rcu will not necessarily wait for any subsequent RCU read-side critical sections to complete. For example, consider the following sequence of events:



	         CPU 0                  CPU 1                 CPU 2

	     ----------------- ------------------------- ---------------

	 1.  rcu_read_lock

	 2.                    enters synchronize_rcu

	 3.                                               rcu_read_lock

	 4.  rcu_read_unlock

	 5.                     exits synchronize_rcu

	 6.                                              rcu_read_unlock

Since synchronize_rcu is the API that must figure out when readers are done, its implementation is key to RCU. For RCU to be useful in all but the most read-intensive situations, synchronize_rcu's overhead must also be quite small.

Alternatively, instead of blocking, synchronize_rcu may register a callback to be invoked after all ongoing RCU read-side critical sections have completed. This callback variant is called call_rcu in the Linux kernel.

rcu_assign_pointer: The updater uses this function to assign a new value to an RCU-protected pointer, in order to safely communicate the change in value from the updater to the reader. This function returns the new value, and also executes any memory barrier
Memory barrier
Memory barrier, also known as membar or memory fence or fence instruction, is a type of barrier and a class of instruction which causes a central processing unit or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.CPUs employ...

instructions required for a given CPU architecture. Perhaps more importantly, it serves to document which pointers are protected by RCU.

rcu_dereference_pointer: The reader uses rcu_dereference_pointer to fetch an RCU-protected pointer, which returns a value that may then be safely dereferenced. It also executes any needed memory-barrier instructions for a given CPU architecture. The value returned by rcu_dereference_pointer is valid only within the enclosing RCU read-side critical section. As with rcu_assign_pointer, an important function of rcu_dereference_pointer is to document which pointers are protected by RCU.

The following diagram shows how each API communicates among the reader, updater, and reclaimer.

The RCU infrastructure observes the time sequence of rcu_read_lock, rcu_read_unlock, synchronize_rcu, and call_rcu invocations in order to determine when (1) synchronize_rcu invocations may return to their callers and (2) call_rcu callbacks may be invoked. Efficient implementations of the RCU infrastructure make heavy use of batching in order to amortize their overhead over many uses of the corresponding APIs.

What is a simple implementation of RCU?

RCU has extremely simple "toy" implementations that can aid understanding of RCU. This section presents one such "toy" implementation that works in a non-preemptive environment.



	void rcu_read_lock(void) { }



	void rcu_read_unlock(void) { }



	void call_rcu(void (*callback) (void *), void *arg)

	{

		// add callback/arg pair to a list

	}



	void synchronize_rcu(void)

	{

		int cpu, ncpus = 0;



		for_each_cpu(cpu)

                        schedule_current_task_to(cpu);



                for each entry in the call_rcu list

                        entry->callback (entry->arg);

	}

You can ignore rcu_assign_pointer and rcu_dereference_pointer without missing much. But here they are anyway.



	#define rcu_assign_pointer(p, v)	({ \

							smp_wmb; \

							(p) = (v); \

						})



	#define rcu_dereference_pointer(p)      ({ \

					        	typeof(p) _value = (p); \

					        	smp_rmb; /* not needed on all architectures */ \

					        	(_value); \

					        })

Note that rcu_read_lock and rcu_read_unlock do absolutely nothing. This is the great strength of classic RCU in a non-preemptive kernel: read-side overhead is precisely zero, as the memory barrier is actually needed only on DEC Alpha

DEC Alpha

Alpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations. Alpha was implemented in microprocessors...

CPUs. And there is absolutely no way that rcu_read_lock can participate in a deadlock

Deadlock

A deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg"...

cycle, cause a realtime process to miss its scheduling deadline, precipitate priority inversion

Priority inversion

In computer science, priority inversion is a problematic scenario in scheduling when a higher priority task is indirectly preempted by a lower priority task effectively "inverting" the relative priorities of the two tasks....

, or result in high lock contention

Lock (computer science)

In computer science, a lock is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. Locks are one way of enforcing concurrency control policies.-Types:...

.

The implementation of synchronize_rcu moves the caller of synchronize_cpu to each CPU, thus blocking until all CPUs have been able to perform the context switch. Since there can be no preemption points within an RCU read-side critical section, if a given CPU executes a context switch (to schedule another process), we know that it must have completed all preceding RCU read-side critical sections. Once all CPUs have executed a context switch, then all preceding RCU read-side critical sections will have completed.

Analogy with Reader-Writer Locking

Although RCU can be used in many different ways, a very common use of RCU is analogous to reader-writer locking. The following side-by-side code display shows how closely related reader-writer locking (on the left) and RCU (on the right) can be.



 1 struct el {                           1 struct el {

 2   struct list_head lp;                2   struct list_head lp;

 3   long key;                           3   long key;

 4   spinlock_t mutex;                   4   spinlock_t mutex;

 5   int data;                           5   int data;

 6   /* Other data fields */             6   /* Other data fields */

 7 };                                    7 };

 8 DEFINE_RWLOCK(listmutex);             8 DEFINE_SPINLOCK(listmutex);

 9 LIST_HEAD(head);                      9 LIST_HEAD(head);



 1 int search(long key, int *result)     1 int search(long key, int *result)

 2 {                                     2 {

 3   struct el *p;                       3   struct el *p;

 4                                       4

 5   read_lock(&listmutex);              5   rcu_read_lock;

 6   list_for_each_entry(p, &head, lp) { 6   list_for_each_entry_rcu(p, &head, lp) {

 7     if (p->key  key) {              7     if (p->key 
 key) {

 8       *result = p->data;              8       *result = p->data;

 9       read_unlock(&listmutex);        9       rcu_read_unlock;

10       return 1;                      10       return 1;

11     }                                11     }

12   }                                  12   }

13   read_unlock(&listmutex);           13   rcu_read_unlock;

14   return 0;                          14   return 0;

15 }                                    15 }



 1 int delete(long key)                  1 int delete(long key)

 2 {                                     2 {

 3   struct el *p;                       3   struct el *p;

 4                                       4

 5   write_lock(&listmutex);             5   spin_lock(&listmutex);

 6   list_for_each_entry(p, &head, lp) { 6   list_for_each_entry(p, &head, lp) {

 7     if (p->key  key) {              7     if (p->key 
 key) {

 8       list_del(&p->lp);               8       list_del_rcu(&p->lp);

 9       write_unlock(&listmutex);       9       spin_unlock(&listmutex);

                                        10       synchronize_rcu;

10       kfree(p);                      11       kfree(p);

11       return 1;                      12       return 1;

12     }                                13     }

13   }                                  14   }

14   write_unlock(&listmutex);          15   spin_unlock(&listmutex);

15   return 0;                          16   return 0;

16 }                                    17 }

The differences between the two approaches are quite small. Read-side locking moves to rcu_read_lock and rcu_read_unlock, update-side locking moves from a reader-writer lock to a simple spinlock, and a synchronize_rcu precedes the kfree.

However, there is one potential catch: the read-side and update-side critical sections can now run concurrently. In many cases, this will not be a problem, but it is necessary to check carefully regardless. For example, if multiple independent list updates must be seen as a single atomic update, converting to RCU will require special care.

Also, the presence of synchronize_rcu means that the RCU version of delete can now block. If this is a problem, call_rcu could be used like call_rcu (kfree, p) in place of synchronize_rcu. This is especially useful in combination with reference counting.

Explanation of name

The name comes from the way that RCU is used to update a linked structure in place.
A thread wishing to do this uses the following steps:

create a new structure,
copy the data from the old structure into the new one, and save a pointer to the old structure,
modify the new, copied, structure
update the global pointer to refer to the new structure, and then
sleep until the operating system kernel determines that there are no readers left using the old structure, for example, in the Linux kernel, by using synchronize_rcu.

When the thread which made the copy is awakened by the kernel, it can safely deallocate the old structure.

So the structure is read concurrently with a thread copying in order to do an update, hence the name "read-copy update". The abbreviation "RCU" was one of many contributions by the Linux community. Other names for similar techniques include passive serialization and MP defer by VM/XA

VM (operating system)

VM refers to a family of IBM virtual machine operating systems used on IBM mainframes System/370, System/390, zSeries, System z and compatible systems, including the Hercules emulator for personal computers. The first version, released in 1972, was VM/370, or officially Virtual Machine Facility/370...

programmers and generations by K42

K42

K42 is an open-source research operating system for cache-coherent 64-bit multiprocessor systems. It was developed primarily at IBM Thomas J. Watson Research Center in collaboration with University of Toronto and University of New Mexico...

and Tornado programmers.

History

Techniques and mechanisms resembling RCU have been independently invented multiple times:

H. T. Kung
H. T. Kung
H. T. Kung is a computer scientist. His current research is primarily in the area of communications networks and network security, but his interests have been broad-ranging, including computational complexity theory, database theory, VLSI design, and parallel computing.Kung received his bachelor...

and Q. Lehman described use of garbage collectors to implement RCU-like access to a binary search tree.
Udi Manber
Udi Manber
Udi Manber is an Israeli computer scientist. He is one of the authors of agrep and GLIMPSE. As of April 2008, he is employed by Google as one of their vice presidents of engineering.-Biography:...

and Richard Ladner extended Kung's and Lehman's work to non-garbage-collected environments by deferring reclamation until all threads running at removal time have terminated, which works in environments that do not have long-lived threads.
Richard Rashid
Richard Rashid
Richard F. Rashid oversees Microsoft Research's worldwide operations. Previously, he was the director of Microsoft Research. He joined Microsoft Research in 1991, and was promoted to vice president in 1994. In 2000, he became senior vice president...

et al. described a lazy translation lookaside buffer
Translation Lookaside Buffer
A translation lookaside buffer is a CPU cache that memory management hardware uses to improve virtual address translation speed. All current desktop and server processors use a TLB to map virtual and physical address spaces, and it is ubiquitous in any hardware which utilizes virtual memory.The...

(TLB) implementation that deferred reclaiming virtual-address space until all CPUs flushed their TLB, which is similar in spirit to some RCU implementations.
J. Hennessy, D. Osisek, and J. Seigh were granted US Patent 4,809,168 in 1989 (since lapsed). This patent describes an RCU-like mechanism that was apparently used in VM/XA
VM (operating system)
VM refers to a family of IBM virtual machine operating systems used on IBM mainframes System/370, System/390, zSeries, System z and compatible systems, including the Hercules emulator for personal computers. The first version, released in 1972, was VM/370, or officially Virtual Machine Facility/370...

on IBM Mainframe
IBM mainframe
IBM mainframes are large computer systems produced by IBM from 1952 to the present. During the 1960s and 1970s, the term mainframe computer was almost synonymous with IBM products due to their marketshare...

s.
William Pugh
William Pugh
William Pugh is the inventor of the skip list, the Omega test for deciding Presburger arithmetic, co-author of the static code analysis tool FindBugs, and was highly influential in the development of the current memory model of the Java language together with his PhD student Jeremy Manson.He is...

described an RCU-like mechanism that relied on explicit flag-setting by readers.
Aju John proposed an RCU-like implementation where updaters simply wait for a fixed period of time, under the assumption that readers would all complete within that fixed time, as might be appropriate in a hard real-time system. Van Jacobson
Van Jacobson
Van Jacobson is one of the primary contributors to the TCP/IP protocol stack which is the technological foundation of today’s Internet. He is renowned for his pioneering achievements in network performance and scaling....

proposed a similar scheme in 1993 (verbal communication).
J. Slingwine and P. E. McKenney received US Patent 5,442,758 in August 1995, which describes RCU as implemented in DYNIX/ptx and later in the Linux kernel.
B. Gamsa, O. Krieger, J. Appavoo, and M. Stumm described an RCU-like mechanism used in the University of Toronto
University of Toronto
The University of Toronto is a public research university in Toronto, Ontario, Canada, situated on the grounds that surround Queen's Park. It was founded by royal charter in 1827 as King's College, the first institution of higher learning in Upper Canada...

Tornado research operating system and the closely related IBM Research
IBM Research
IBM Research, a division of IBM, is a research and advanced development organization and currently consists of eight locations throughout the world and hundreds of projects....

K42
K42
K42 is an open-source research operating system for cache-coherent 64-bit multiprocessor systems. It was developed primarily at IBM Thomas J. Watson Research Center in collaboration with University of Toronto and University of New Mexico...

research operating systems.
Rusty Russell
Rusty Russell
Paul "Rusty" Russell is an Australian free software programmer and advocate.- Software development :Russell wrote the packet filtering systems ipchains and netfilter/iptables in the Linux operating system kernel...

and Phil Rumpf described RCU-like techniques for handling unloading of Linux kernel modules.
D. Sarma added RCU to version 2.5.43 of the Linux kernel in October 2002.
Robert Colvin et al. formally verified a lazy concurrent list-based set algorithm that resembles RCU.

External links

Paul E. McKenney and Jonathan Walpole: What is RCU, Fundamentally?, What is RCU? Part 2: Usage, and RCU part 3: the RCU API. Linux Weekly News.
Paul E. McKenney's RCU web page
Hart, McKenney, and Demke Brown (2006). Making Lockless Synchronization Fast: Performance Implications of Memory Reclamation An IPDPS 2006 Best Paper comparing RCU's performance to that of other lockless synchronization mechanisms. Journal version (including Walpole as author). (1995) "Apparatus and method for achieving reduced overhead mutual exclusion and maintaining coherency in a multiprocessor system utilizing execution history and thread monitoring"
Paul McKenney: Sleepable RCU. Linux Weekly News.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.