All Topics  
Asynchronous I/O

 

   Email Print
   Bookmark   Link






 

Asynchronous I/O



 
 
Asynchronous I/O, or non-blocking I/O, is a form of input/output
Input/output

In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world ? possibly a human, or another information processing system....
 processing that permits other processing to continue before the transmission has finished.

Input and output (I/O) operations on a computer can be extremely slow compared to the processing of data. An I/O device can incorporate mechanical devices which must physically move, such as a hard drive seeking a track to read or write; extremely slow compared to merely moving electrons.






Discussion
Ask a question about 'Asynchronous I/O'
Start a new discussion about 'Asynchronous I/O'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Asynchronous I/O, or non-blocking I/O, is a form of input/output
Input/output

In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world ? possibly a human, or another information processing system....
 processing that permits other processing to continue before the transmission has finished.

Input and output (I/O) operations on a computer can be extremely slow compared to the processing of data. An I/O device can incorporate mechanical devices which must physically move, such as a hard drive seeking a track to read or write; extremely slow compared to merely moving electrons. For example, during a disk operation that takes ten milliseconds to perform, a processor that is clocked at one gigahertz could have performed ten million instruction-processing cycles.

A simple approach to I/O would be to start the access and then wait for it to complete. But such an approach (called synchronous I/O or blocking I/O) would block the progress of a program while the communications is in progress, leaving system resources idle. When a program makes many I/O operations, this means that the processor can spend almost all of its time idle waiting for I/O operations to complete.

Alternatively, it is possible, but more complicated to predicate, to start the communication and then perform processing that does not require that the I/O has completed. This approach is called asynchronous input/output. Any task that actually depends on the I/O having completed (this includes both using the input values and critical operations that claim to assure that a write operation has been completed) still needs to wait for the I/O operation to complete, and thus is still blocked, but other processing which does not have a dependency on the I/O operation can continue.

Many operating system functions exist to implement asynchronous I/O at many levels. In fact, one of the main functions of all but the most rudimentary of operating systems is to perform at least some form of basic asynchronous I/O, though this may not be particularly apparent to the operator or programmer. In the simplest software solution, the hardware device status is polled
Polling (computer science)

Polling, or polled operation, in computer science, refers to actively sampling the status of an external device by a client program as a synchronous activity....
 at intervals to detect whether the device is ready for its next operation. (The CP/M
CP/M

CP/M is an operating system originally created for Intel 8080/Intel 8085 based microcomputers by Gary Kildall of Digital Research. Initially confined to single tasking on 8-bit processors and no more than 64 kilobytes of memory, later versions of CP/M added multi-user variations, and were migrated to 16-bit processors....
 operating system could be built this way, for example. Its system call
System call

In computing, a system call is the mechanism used by an application program to request service from the kernel based on the Monolithic_kernel or to system servers on operating systems based on the microkernel-structure....
 semantics did not require any more elaborate I/O structure than this, though most implementations were more complex, and thereby more efficient.) DMA
Direct memory access

Direct memory access is a feature of modern computers and microprocessors that allows certain hardware subsystems within the computer to access system Computer storage for reading and/or writing independently of the central processing unit....
 can greatly increase the efficiency of a polling-based system, and hardware interrupts can eliminate the need for polling entirely. Multitasking
Multitasking

Multitasking may refer to any of the following:*Computer multitasking - the apparent simultaneous performance of two or more tasks by a computer's central processing unit...
 operating systems can exploit the functionality provided by hardware interrupts, whilst hiding the complexity of interrupt handling from the user. Spooling
Spooling

In computer science, spooling refers to a process of transferring data by placing it in a temporary working area where another program may access it for processing at a later point in time....
 was one of the first forms of multitasking designed to exploit asynchronous I/O. Finally, multithreading
Thread (computer science)

In computer science, a thread of execution is a Fork of a computer program into two or more Concurrency running task s. The implementation of threads and process es differs from one operating system to another, but in most cases, a thread is contained inside a process....
 and explicit asychronous I/O APIs within user processes can exploit asynchronous I/O further, at the cost of extra software complexity.

Asynchronous I/O is used to improve throughput, latency, and/or responsiveness.

Forms


All forms of asynchronous I/O open applications up to potential resource conflicts and associated failure. Careful programming (often using mutex, semaphores, etc.) is required to prevent this.

When exposing asynchronous I/O to applications there are a few broad classes of implementation. The form of the API provided to the application does not necessarily correspond with the mechanism actually provided by the operating system, emulations are possible. Furthermore, more than one method may be used by a single application, depending on its needs and the desires of its programmer(s). Many operating systems provide more than one of these mechanisms, it is possible that some may provide all of them. The major forms are:

Process

Available in early Unix. With a multitasking
Computer multitasking

In computing, multitasking is a method by which multiple tasks, also known as Computer process, share common processing resources such as a Central processing unit....
 operating system each I/O flow can be allocated to a separate process or sub-process. This is a heavyweight solution, and coordinating the flows can be difficult. Because of the natural process isolation it may not even be possible to get the desired behavior; the attempt to partition the task for asynchronous I/O may do more harm than good.

Polling

Variations:
  • Error if it cannot be done yet (reissue later)
  • Report when it can be done without blocking (then issue it)


Available in traditional Unix. Its major problem is that it can waste CPU time polling repeatedly when there is nothing else for the issuing process to do, reducing the time available for other processes. Reducing this waste by sleeping a bit before the next poll results in increased latency of I/O operations, reducing throughput and responsiveness. Also, if there is more than one polling place in a process it is highly likely that not all potential I/O will be considered at each point (else why even have multiple polling places), at best affecting only latency and/or throughput and at worst inducing difficult to reproduce failures. It can also be difficult or impossible to even find all blocking code points in the process to refit them with polling versions, preventing the process from fully satisfying whatever criteria prompted the move toward asynchronous I/O in the first place.

Select(/poll) loops

Available in BSD Unix
Unix

Unix is a computer operating system originally developed in 1969 by a group of American Telephone & Telegraph employees at Bell Labs, including Ken Thompson , Dennis Ritchie, Douglas McIlroy, and Joe Ossanna....
, and almost anything else with a TCP/IP protocol stack that either utilizes or is modeled after the BSD implementation. A variation on the theme of polling, a select loop uses the select
Select (Unix)

select is a system call for Polling the status of multiple file descriptors. In C programming, it is declared in the header file sys/select.h or unistd.h....
system call to sleep
Sleep (operating system)

A computer program may sleep, which places it into an process state for a period of time. Eventually the expiration of an interval timer, or the receipt of a signal or interrupt causes the program to resume execution....
 until a condition occurs on a file descriptor
File descriptor

In computer programming, a file descriptor is an abstract key for accessing a file. The term is generally used in POSIX operating systems. In Microsoft Windows terminology and in the context of the stdio.h, "file handle" is preferred, though the latter case is technically a different object ....
 (e.g., when data is available for reading), a timeout occurs, or a signal
Signal (computing)

A signal is a limited form of inter-process communication used in Unix, Unix-like, and other POSIX-compliant operating systems. Essentially it is an asynchronous notification sent to a Process in order to notify it of an event that occurred....
 is received (e.g., when a child process dies). By examining the return parameters of the select call, the loop finds out which file descriptor has changed and executes the appropriate code. Often, for ease of use, the select loop will be implemented as an event loop
Event loop

In computer science, the event loop, message dispatcher, message loop or message pump is a programming construct that waits for and dispatches Event-driven programming or Message passing in a Computer program....
, perhaps using callback functions
Callback (computer science)

In computer programming, a callback is executable code that is passed as an argument to other code. It allows a lower-level abstraction layer to call a subroutine defined in a higher-level layer....
; the situation lends itself particularly well to event-driven programming
Event-driven programming

In computer programming, event-driven programming or event-based programming is a programming paradigm in which the Program flow is determined by event s — i.e., sensor outputs or user actions or Message passing from other programs or Thread_....
.

While this method is reliable and relatively efficient, it depends heavily on the Unix
Unix

Unix is a computer operating system originally developed in 1969 by a group of American Telephone & Telegraph employees at Bell Labs, including Ken Thompson , Dennis Ritchie, Douglas McIlroy, and Joe Ossanna....
 paradigm that "everything is a file"; any blocking I/O that does not involve a file descriptor will block the process. The select loop also relies on being able to involve all I/O in the central select call; libraries that conduct their own I/O are particularly problematic in this respect.

The select loop doesn't reach the ultimate system efficiencies possible with, say, the completion queues method because the semantics of the select call, allowing as it does for per-call tuning of the acceptable event set, consumes some amount of time per invocation traversing the selection array. This creates little overhead for user applications which might have open one file descriptor for the windowing system and a few for open files, but becomes more of a problem as the number of potential event sources grows, and can hinder development of many-client server applications; other asynchronous methods may be noticeably more efficient in such cases. Some Unixes provide system-specific calls with better scaling; for example, epoll in Linux
Linux

Linux is a generic term referring to Unix-like computer operating systems based on the Linux kernel. Their development is one of the most prominent examples of free and open source software collaboration; typically all the underlying source code can be used, freely modified, and redistributed by anyone under the terms of the GNU GPL license...
 (which fills the return selection array with only those event sources on which an event has occurred), kqueue in FreeBSD
FreeBSD

FreeBSD is a Unix-like free software operating system descended from AT&T Unix via the Berkeley Software Distribution branch through the 386BSD and Berkeley Software Distribution#4.4BSD and descendants operating systems....
, and /dev/poll in Solaris
Solaris Operating System

Solaris is a Unix-based operating system introduced by Sun Microsystems in 1992 as the successor to SunOS.Solaris is known for its scalability, especially on SPARC systems, and for originating many innovative features such as DTrace and ZFS....
.

SVR3
UNIX System V

Unix System V, commonly abbreviated SysV , is one of the versions of the Unix operating system. It was originally developed by AT&T and first released in 1983....
 Unix
Unix

Unix is a computer operating system originally developed in 1969 by a group of American Telephone & Telegraph employees at Bell Labs, including Ken Thompson , Dennis Ritchie, Douglas McIlroy, and Joe Ossanna....
 provided the poll system call. Arguably better-named than select, for the purposes of this discussion it is essentially the same thing. SVR4 Unixes (and thus POSIX
POSIX

POSIX or "Portable Operating System Interface" is the collective name of a family of related standardizations specified by the Institute of Electrical and Electronics Engineers to define the application programming interface , along with shell and utilities interfaces for software compatible with variants of the Unix operating system, altho...
) offer both calls.

Signals (interrupts)

Available in BSD and POSIX
POSIX

POSIX or "Portable Operating System Interface" is the collective name of a family of related standardizations specified by the Institute of Electrical and Electronics Engineers to define the application programming interface , along with shell and utilities interfaces for software compatible with variants of the Unix operating system, altho...
 Unix. I/O is issued asynchronously, and when it is complete a signal
Signal (computing)

A signal is a limited form of inter-process communication used in Unix, Unix-like, and other POSIX-compliant operating systems. Essentially it is an asynchronous notification sent to a Process in order to notify it of an event that occurred....
 (interrupt
Interrupt

In computing, an interrupt is an asynchronous communication signal from hardware indicating the need for attention or a synchronous event in software indicating the need for a change in execution....
) is generated. As in low-level kernel programming, the facilities available for safe use within the signal handler are limited, and the main flow of the process could have been interrupted at nearly any point, resulting in inconsistent data structures as seen by the signal handler. The signal handler is usually not able to issue further asynchronous I/O by itself.

The signal approach, though relatively simple to implement within the OS, brings to the application program the unwelcome baggage associated with writing an operating system's kernel interrupt system. Its worst characteristic is that every blocking (synchronous) system call is potentially interruptible; the programmer must usually incorporate retry code at each call.

Callback functions

Available in Mac OS
Mac OS

Mac OS is the trademarked name for a series of graphical user interface-based operating systems developed by Apple Inc. for their Macintosh line of computer systems....
 (pre-Mac OS X
Mac OS X

Mac OS X is a line of computer operating systems developed, marketed, and sold by Apple Inc., and since 2002 has been included with all new Macintosh computer systems....
), VMS
OpenVMS

OpenVMS , previously known as VAX-11/VMS, VAX/VMS or VMS, is the name of a high-end computer server operating system that runs on the VAX and DEC Alpha families of computers, developed by Digital Equipment Corporation of Maynard, Massachusetts, Massachusetts , and most recently on Hewlett-Packard systems built around the In...
 and Windows
Microsoft Windows

Microsoft Windows is a series of software operating systems and graphical user interfaces produced by Microsoft. Microsoft first introduced an operating environment named Windows in November 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces ....
. Bears many of the characteristics of the signal method as it is fundamentally the same thing, though rarely recognized as such. The difference is that each I/O request usually can have its own completion function, whereas the signal system has a single callback.

A potential problem is that stack depth can grow unmanageably, as an extremely common thing to do when one I/O is finished is to schedule another. If this should be satisfied immediately, the first callback
Callback (computer science)

In computer programming, a callback is executable code that is passed as an argument to other code. It allows a lower-level abstraction layer to call a subroutine defined in a higher-level layer....
 is not 'unwound' off the stack before the next one is invoked. Systems to prevent this (like 'mid-ground' scheduling of new work) add complexity and reduce performance. The separation of textual (code) and time (event) flows provides fertile ground for errors.

Light-weight processes/threads

Light-weight processes (LWP
Light-weight process

In computer operating systems, a light-weight process is a means of achieving computer multitasking. In contrast to a regular process, an LWP shares all its logical address space and system resources with other process; in contrast to a thread , a light-weight process has its own private process identifier and Parent process with other pro...
s) or threads
Thread (computer science)

In computer science, a thread of execution is a Fork of a computer program into two or more Concurrency running task s. The implementation of threads and process es differs from one operating system to another, but in most cases, a thread is contained inside a process....
 are available in more modern Unixes. Like the process method, but without the data isolation that hampers coordination of the flows. This lack of isolation introduces its own problems, usually requiring kernel-provided synchronization mechanisms and thread-safe
Thread-safe

Thread safety is a computer programming concept applicable in the context of multi-threaded programs. A piece of code is thread-safe if it functions correctly during simultaneous execution by multiple threads....
 libraries. Each LWP/thread itself uses traditional blocking synchronous I/O. The requisite separate per-thread stack may preclude large-scale implementations using very large numbers of threads. The separation of textual (code) and time (event) flows provides fertile ground for errors.

Completion queues/ports

Available in Microsoft Windows
Microsoft Windows

Microsoft Windows is a series of software operating systems and graphical user interfaces produced by Microsoft. Microsoft first introduced an operating environment named Windows in November 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces ....
, Solaris
Solaris Operating System

Solaris is a Unix-based operating system introduced by Sun Microsystems in 1992 as the successor to SunOS.Solaris is known for its scalability, especially on SPARC systems, and for originating many innovative features such as DTrace and ZFS....
 and DNIX
DNIX

DNIX was a Unix-like real-time operating system operating system from the Swedish company Dataindustrier AB . A version called ABCenix was also developed for the ABC1600 computer from Luxor....
. I/O requests are issued asynchronously, but notifications of completion are provided via a synchronizing queue mechanism in the order they are completed. Usually associated with a state-machine
Finite state machine

A finite state machine or finite state automaton or simply a state machine, is a model of behavior composed of a finite number of state s, transitions between those states, and actions....
 structuring of the main process (event-driven programming
Event-driven programming

In computer programming, event-driven programming or event-based programming is a programming paradigm in which the Program flow is determined by event s — i.e., sensor outputs or user actions or Message passing from other programs or Thread_....
), which can bear little resemblance to a process that does not use asynchronous I/O or that uses one of the other forms, hampering code reuse. Does not require additional special synchronization mechanisms or thread-safe
Thread-safe

Thread safety is a computer programming concept applicable in the context of multi-threaded programs. A piece of code is thread-safe if it functions correctly during simultaneous execution by multiple threads....
 libraries, nor are the textual (code) and time (event) flows separated.

Event flags

Available in VMS
OpenVMS

OpenVMS , previously known as VAX-11/VMS, VAX/VMS or VMS, is the name of a high-end computer server operating system that runs on the VAX and DEC Alpha families of computers, developed by Digital Equipment Corporation of Maynard, Massachusetts, Massachusetts , and most recently on Hewlett-Packard systems built around the In...
. Bears many of the characteristics of the completion queue method as it is essentially a completion queue of depth one. To simulate the effect of queue 'depth', an additional event flag is required for each potential unprocessed (but completed) event, or event information can be lost. Waiting for the next available event in such a clump requires synchronizing mechanisms that may not scale well to larger numbers of potentially parallel events.

Implementation


The vast majority of general-purpose computing hardware relies entirely upon two methods of implementing asynchronous I/O: polling, and interrupts. Usually both methods are used together, the balance depends heavily upon the design of the hardware and its required performance characteristics. (It should be noted that DMA
Direct memory access

Direct memory access is a feature of modern computers and microprocessors that allows certain hardware subsystems within the computer to access system Computer storage for reading and/or writing independently of the central processing unit....
 is not itself another independent method, it is merely a means by which more work can be done per poll or interrupt.)

Pure polling systems are entirely possible, small microcontrollers (such as systems using the PIC
PIC microcontroller

PIC is a family of Harvard architecture microcontrollers made by Microchip Technology, derived from the PIC1640 originally developed by General Instrument's Microelectronics Division....
) are often built this way. CP/M
CP/M

CP/M is an operating system originally created for Intel 8080/Intel 8085 based microcomputers by Gary Kildall of Digital Research. Initially confined to single tasking on 8-bit processors and no more than 64 kilobytes of memory, later versions of CP/M added multi-user variations, and were migrated to 16-bit processors....
 systems could also be built this way (though rarely were), with or without DMA. Also, when the utmost performance is necessary for only a few tasks, at the expense of any other potential tasks, polling may also be appropriate as the overhead of taking interrupts may be unwelcome. (Servicing an interrupt requires time [and space] to save at least part of the processor state, along with the time required to resume the interrupted task.)

Most general-purpose computing systems rely heavily upon interrupts. A pure interrupt system may be possible, though usually some component of polling is also required as it is very common for multiple potential sources of interrupts to share a common interrupt signal line, in which case polling is used within the device driver
Device driver

In computing, a device driver or software driver is a computer program allowing higher-level computer programs to interact with a hardware device....
 to resolve the actual source. (This resolution time also contributes to an interrupt system's performance penalty. Over the years a great deal of work has been done to try to minimize the overhead associated with servicing an interrupt. Current interrupt systems are rather lackadaisical when compared to some highly-tuned earlier ones, but the general increase in hardware performance has greatly mitigated this.)

Hybrid approaches are also possible, wherein an interrupt can trigger the beginning of some burst of asynchronous I/O, and polling is used within the burst itself. This technique is common in high-speed device drivers, such as network or disk, where the time lost in returning to the pre-interrupt task is greater than the time until the next required servicing. (Common I/O hardware in use these days relies heavily upon DMA and large data buffers to make up for a relatively poorly-performing interrupt system. These characteristically use polling inside the driver loops, and can exhibit tremendous throughput. Ideally the per-datum polls are always successful, or at most repeated a small number of times.)

At one time this sort of hybrid approach was common in disk and network drivers where there was not DMA or significant buffering available. Because the desired transfer speeds were faster even than could tolerate the minimum four-operation per-datum loop (bit-test, conditional-branch-to-self, fetch, and store), the hardware would often be built with automatic wait state
Wait state

A wait state is a delay experienced by a computer central processing unit when accessing external computer storage or another device that is slow to respond....
 generation on the I/O device, pushing the data ready poll out of software and onto the processor's fetch or store hardware and reducing the programmed loop to two operations. (In effect using the processor itself as a DMA engine.) The 6502
MOS Technology 6502

The MOS Technology 6502 is an 8-bit microprocessor that was designed by Chuck Peddle and Bill Mensch for MOS Technology in 1975. When it was introduced, it was the least expensive full-featured central processing unit on the market by a considerable margin, costing less than one-sixth the price of competing designs from larger companies such...
 processor offered an unusual means to provide a three-element per-datum loop as it had a hardware pin that, when asserted, would cause the processor's Overflow bit to be set directly. (Obviously one would have to take great care in the hardware design to avoid overriding the Overflow bit outside of the device driver!)

Synthesis


Using only these two tools (polling, and interrupts), all the other forms of asynchronous I/O discussed above may be (and in fact, are) synthesized.

In an environment such as a Java Virtual Machine
Java Virtual Machine

A Java Virtual Machine is a set of computer software programs and data structures which use a virtual machine model for the execution of other computer programs and Scripting language....
 (JVM), asynchronous I/O can be synthesized even though the environment the JVM is running in may not offer it at all. This is due to the interpreted nature of the JVM. The JVM may poll (or take an interrupt) periodically to institute an internal flow of control change, effecting the appearance of multiple simultaneous processes, at least some of which presumably exist in order to perform asynchronous I/O. (Of course, at the microscopic level the parallelism may be rather coarse and exhibit some non-ideal characteristics, but on the surface it will appear to be as desired.)

That, in fact, is the problem with using polling in any form to synthesize a different form of asynchronous I/O. Every CPU cycle that is a poll is wasted, and lost to overhead rather than accomplishing a desired task. Every CPU cycle that is not a poll represents an increase in latency of reaction to pending I/O. Striking an acceptable balance between these two opposing forces is difficult. (This is why hardware interrupt systems were invented in the first place.)

The trick to maximize efficiency is to minimize the amount of work that has to be done upon reception of an interrupt in order to awaken the appropriate application. Secondarily (but perhaps no less important) is the method the application itself uses to determine what it needs to do.

Particularly problematic (for application efficiency) are the exposed polling methods, including the select/poll mechanisms. Though the underlying I/O events they are interested in are in all likelihood interrupt-driven, the interaction to this mechanism is polled and can consume a large amount of time in the poll. This is particularly true of the potentially large-scale polling possible through select (and poll). Interrupts map very well to Signals, Callback functions, Completion Queues, and Event flags, such systems can be very efficient.

See also

  • IOCP, Input/Output Completion Ports.
  • Concurrent Input/Output, a new file systems feature introduced in JFS2
    JFS2

    Journaled File System or JFS is a 64-bit journaling filesystem created by International Business Machines. It is available as free software under the terms of the GNU General Public License ....
     in May 2003


External links

  • ; a survey of asynchronous I/O methods with emphasis on scaling – by Dan Kegel
  • Article "" by M. Tim Jones
  • Article "" by Willy Zwaenepoel, Khaled Elmeleegy, Anupam Chanda and Alan L. Cox
  • by Mark Russinovich
    Mark Russinovich

    Mark Russinovich is a software engineer and author who works for Microsoft as a Technical fellow. He is a regular contributor to TechNet Magazine and Windows IT Pro magazine on the subject of the Architecture of Windows 2000 and was co-author of Inside Windows 2000 ....