Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
Pipeline (software)

Pipeline (software)

Overview
In software engineering
Software engineering
Software engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software....

, a pipeline consists of a chain of processing elements (processes
Process (computing)
In computing, a process is an instance of a computer program, consisting of one or more threads, that is being sequentially executed by a computer system that has the ability to run several computer programs concurrently....

, threads
Thread (computer science)
In computer science, a thread of execution results from a fork of a computer program into two or more concurrently running tasks. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...

, coroutine
Coroutine
In computer science, coroutines are program components that generalize subroutines to allow multiple entry points for suspending and resuming execution at certain locations...

s, etc.), arranged so that the output of each element is the input of the next. Usually some amount of buffering
Buffer (computer science)
In computing, a buffer is a region of memory used to temporarily hold data while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device or just before it is sent to an output device . However, a buffer may be used when moving...

 is provided between consecutive elements. The information that flows in these pipelines is often a stream of records
Record (computer science)
In computer science, a record is one of the simplest data structures, consisting of two or more values or variables stored in consecutive memory positions; so that each component can be accessed by applying different offsets to the starting address.For example, a date may be stored as a record...

, bytes
Byte stream
In computer science, a byte stream is a bit stream, in which data bits are grouped into units, called bytes.In computer networking the term octet stream is sometimes used to refer to the same thing; it emphasizes the use of bytes having the length of 8 bits, known as octets...

 or bits
Bitstream
A bitstream or bit stream is a time series of bits.A bytestream is a series of bytes, typically of 8 bits each, and can be regarded as a special case of a bitstream....

.

The concept is also called the pipes and filters design pattern. It was named by analogy to a physical pipeline
Pipeline transport
Pipeline transport is the transportation of goods through a pipe. Most commonly, liquid and gases are sent, but pneumatic tubes that transport solid capsules using compressed air have also been used....

.

Pipelines are often implemented in a multitasking
Computer multitasking
In computing, multitasking is a method by which multiple tasks, also known as processes, share common processing resources such as a CPU. In the case of a computer with a single CPU, only one task is said to be running at any point in time, meaning that the CPU is actively executing instructions...

 OS
Operating system
An operating system is an interface between hardware and user which is responsible for the management and coordination of activities and the sharing of the resources of the computer that acts as a host for computing applications run on the machine. As a host, one of the purposes of an operating...

, by launching all elements at the same time as processes, and automatically servicing the data read requests by each process with the data written by the upstream process.
Discussion
Ask a question about 'Pipeline (software)'
Start a new discussion about 'Pipeline (software)'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia
In software engineering
Software engineering
Software engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software....

, a pipeline consists of a chain of processing elements (processes
Process (computing)
In computing, a process is an instance of a computer program, consisting of one or more threads, that is being sequentially executed by a computer system that has the ability to run several computer programs concurrently....

, threads
Thread (computer science)
In computer science, a thread of execution results from a fork of a computer program into two or more concurrently running tasks. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...

, coroutine
Coroutine
In computer science, coroutines are program components that generalize subroutines to allow multiple entry points for suspending and resuming execution at certain locations...

s, etc.), arranged so that the output of each element is the input of the next. Usually some amount of buffering
Buffer (computer science)
In computing, a buffer is a region of memory used to temporarily hold data while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device or just before it is sent to an output device . However, a buffer may be used when moving...

 is provided between consecutive elements. The information that flows in these pipelines is often a stream of records
Record (computer science)
In computer science, a record is one of the simplest data structures, consisting of two or more values or variables stored in consecutive memory positions; so that each component can be accessed by applying different offsets to the starting address.For example, a date may be stored as a record...

, bytes
Byte stream
In computer science, a byte stream is a bit stream, in which data bits are grouped into units, called bytes.In computer networking the term octet stream is sometimes used to refer to the same thing; it emphasizes the use of bytes having the length of 8 bits, known as octets...

 or bits
Bitstream
A bitstream or bit stream is a time series of bits.A bytestream is a series of bytes, typically of 8 bits each, and can be regarded as a special case of a bitstream....

.

The concept is also called the pipes and filters design pattern. It was named by analogy to a physical pipeline
Pipeline transport
Pipeline transport is the transportation of goods through a pipe. Most commonly, liquid and gases are sent, but pneumatic tubes that transport solid capsules using compressed air have also been used....

.

Multiprocessed pipelines


Pipelines are often implemented in a multitasking
Computer multitasking
In computing, multitasking is a method by which multiple tasks, also known as processes, share common processing resources such as a CPU. In the case of a computer with a single CPU, only one task is said to be running at any point in time, meaning that the CPU is actively executing instructions...

 OS
Operating system
An operating system is an interface between hardware and user which is responsible for the management and coordination of activities and the sharing of the resources of the computer that acts as a host for computing applications run on the machine. As a host, one of the purposes of an operating...

, by launching all elements at the same time as processes, and automatically servicing the data read requests by each process with the data written by the upstream process. In this way, the CPU
Central processing unit
The Central Processing Unit or processor is the portion of a computer system that carries out the instructions of a computer program, and is the primary element carrying out the computer's functions. This term has been in use in the computer industry at least since the early 1960s...

 will be naturally switched among the processes by the scheduler
Scheduling (computing)
Scheduling is a key concept in computer multitasking and multiprocessing operating system design, and in real-time operating system design. In modern operating systems, there are typically many more processes running than there are CPUs available to run them.Scheduling refers to the way processes...

 so as to minimize its idle time. In other common models, elements are implemented as lightweight threads or as coroutines to reduce the OS overhead often involved with processes. Depending upon the OS, threads may be scheduled directly by the OS or by a thread manager. Coroutines are always scheduled by a coroutine manager of some form.

Usually, read and write requests are blocking operations, which means that the execution of the source process, upon writing, is suspended until all data could be written to the destination process, and, likewise, the execution of the destination process, upon reading, is suspended until at least some of the requested data could be obtained from the source process. Obviously, this cannot lead to a deadlock
Deadlock
A deadlock is a situation wherein two or more competing actions are waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg."...

, where both processes would wait indefinitely for each other to respond, since at least one of the two processes will soon thereafter have its request serviced by the operating system, and continue to run.

For performance, most operating systems implementing pipes use pipe buffers
Buffer (computer science)
In computing, a buffer is a region of memory used to temporarily hold data while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device or just before it is sent to an output device . However, a buffer may be used when moving...

, which allow the source process to provide more data than the destination process is currently able or willing to receive. Under most Unices and Unix-like operating systems, a special command is also available which implements a pipe buffer of potentially much larger and configurable size, typically called "buffer". This command can be useful if the destination process is significantly slower than the source process, but it is anyway desired that the source process can complete its task as soon as possible. E.g., if the source process consists of a command which reads an audio track
Audio track
The audio track is in some respects interchangeable with sound track. The subtle differences between the two which are acknowledged by some include:* Sound track is more often applied to film* Audio track is more often applied to video...

 from a CD and the destination process consists of a command which compresses the waveform
Waveform
Waveform means the shape and form of a signal such as a wave moving in a solid, liquid or gaseous medium.In many cases the medium in which the wave is being propagated does not permit a direct visual image of the form. In these cases, the term 'waveform' refers to the shape of a graph of the...

 audio data to a format like MP3
MP3
MPEG-1 Audio Layer 3, more commonly referred to as MP3, is a patented digital audio encoding format using a form of lossy data compression. It is a common audio format for consumer audio storage, as well as a de facto standard of digital audio compression for the transfer and playback of music on...

. In this case, buffering the entire track in a pipe buffer would allow the CD drive to spin down more quickly, and enable the user to remove the CD from the drive before the encoding process has finished.

Such a buffer command can be implemented using available operating system primitive
Primitive
Primitive may refer to:* Anarcho-primitivism, an anarchist critique of the origins and progress of civilization* Primitive culture, one that lacks major signs of economic development or modernity...

s for reading and writing data. Wasteful busy waiting
Busy waiting
In software engineering, busy waiting or spinning is a technique in which a process repeatedly checks to see if a condition is true, such as waiting for keyboard input or waiting for a lock to become available...

 can be avoided by using facilities such as poll or select
Select (Unix)
select is a system call for polling the status of multiple file descriptors. In C programming, it is declared in the header file sys/select.h or unistd.h...

 or multithreading.

VM/CMS and MVS


CMS Pipelines
Hartmann pipeline
A Hartmann pipeline is an extension of the Unix pipeline concept, providing for more complex paths, multiple input/output streams, and other features. It is an example and extension of Pipeline programming....

 is a port of the pipeline idea to VM/CMS and MVS
MVS
Multiple Virtual Storage, more commonly called MVS, was the most commonly used operating system on the System/370 and System/390 IBM mainframe computers...

 systems. It supports much more complex pipeline structures than Unix shells, with steps taking multiple input streams and producing multiple output streams. (Such functionality is supported by the Unix kernel, but few programs use it as it makes for complicated syntax and blocking modes, although some shells do support it via arbitrary file descriptor assignment.) Due to the different nature of IBM mainframe operating systems, it implements many steps inside CMS Pipelines which in Unix are separate external programs, but can also call separate external programs for their functionality. Also, due to the record-oriented nature of files on IBM mainframes, pipelines operate in a record-oriented, rather than stream-oriented manner.

Pseudo-pipelines


On single-tasking operating systems, the processes of a pipeline have to be executed one by one in sequential order; thus the output of each process must be saved to a temporary file
Temporary file
Temporary files may be created by computer programs for a variety of purposes; principally when a program cannot allocate enough memory for its tasks, when the program is working on data bigger than architecture's address space, or as a primitive form of inter-process communication.- Auxiliary...

, which is then read by the next process. Since there is no parallelism or CPU
Central processing unit
The Central Processing Unit or processor is the portion of a computer system that carries out the instructions of a computer program, and is the primary element carrying out the computer's functions. This term has been in use in the computer industry at least since the early 1960s...

 switching, this version is called a "pseudo-pipeline".

For example, the command line interpreter
Command line interpreter
A command-line interpreter is a computer program that reads lines of text entered by a user and interprets them in the context of a given operating system or programming language....

 of MS-DOS
MS-DOS
MS-DOS is an operating system developed by Microsoft. It was the most commonly used member of the DOS family of operating systems and was the main operating system for personal computers during the 1980s. It was preceded by M-DOS , designed and copyrighted by Microsoft in 1979...

 ('COMMAND.COM') provides pseudo-pipelines with a syntax superficially similar to that of Unix pipelines
Pipeline (Unix)
In Unix-like computer operating systems, a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process feeds directly as input to the next one. Each connection is implemented by an anonymous pipe. Filter programs are often...

. The command "dir | sort | more" would have been executed like this (albeit with more complicated temporary file names):
  1. Create temporary file 1.tmp
  2. Run command "dir", redirecting its output to 1.tmp
  3. Create temporary file 2.tmp
  4. Run command "sort", redirecting its input to 1.tmp and its output to 2.tmp
  5. Run command "more", redirecting its input to 2.tmp, and presenting its output to the user
  6. Delete 1.tmp and 2.tmp, which are no longer needed
  7. Return to the command prompt
    Command Prompt
    Command Prompt may stand for:* Command line interpreter, a kind of text-based user interface* Command Prompt , the command line interpreter in Windows operating systems...



All temporary files are stored in the directory pointed to by %TEMP%, or the current directory if %TEMP% isn't set.

Thus, pseudo-pipes acted like true pipes with a pipe buffer of unlimited size (disk space limitations notwithstanding), with the significant restriction that a receiving process could not read any data from the pipe buffer until the sending process finished completely. Besides causing disk traffic, if one doesn't install a harddisk cache such as SMARTDRV, that would have been unnecessary under multi-tasking operating systems, this implementation also made pipes unsuitable for applications requiring real-time response, like, for example, interactive purposes (where the user enters commands that the first process in the pipeline receives via stdin, and the last process in the pipeline presents its output to the user via stdout).

Also, commands that produce a potentially infinite amount of output, such as the yes
Yes (Unix)
yes is a Unix command, which outputs an affirmative response, or a user-defined string of text continuously until killed.-Description:...

 command, cannot be used in a pseudo-pipeline, since they would run until the temporary disk space is exhausted, so the following processes in the pipeline could not even start to run.

Object pipelines


Beside byte stream-based pipelines, there are also object pipelines. In an object pipeline, the processed output objects instead of texts; therefore removing the string parsing tasks that are common in UNIX shell scripts. Windows PowerShell
Windows PowerShell
Windows PowerShell is an extensible automation engine, consisting of a command-line shell and associated scripting language from Microsoft...

 uses this scheme and transfers .NET objects. Channel
Channel (programming)
Channels are a tool used for interprocess communication. An object may be sent over a channel, and a process is able to receive any objects sent over a channel it has a reference to. They are similar to pipelines, but may contain arbitrary unserialised objects instead of lines of text, and are used...

s, found in the Limbo programming language
Limbo programming language
Limbo is a programming language for writing distributed systems and is the language used to write applications for the Inferno operating system...

, are another example of this metaphor.

Pipelines in GUI
Graphical user interface
A graphical user interface is a type of user interface item that allows people to interact with programs in more ways than typing such as computers; hand-held devices such as MP3 Players, Portable Media Players or Gaming devices; household appliances and office equipment with images rather than...

s


Graphical environments such as RISC OS
RISC OS
RISC OS is a computer operating system which was originally developed by Acorn Computers Ltd in Cambridge, England for their ARM based computers. It was first released in 1988 as RISC OS 2.00, having been derived from Acorn's Arthur operating system, with the addition of cooperative multitasking...

 and ROX Desktop
ROX Desktop
The ROX Desktop is a desktop environment based on the ROX-Filer file manager. Files are loaded by dragging them to an application from the filer, and saved by dragging back to the filer. Applications are executable directories, and are thus also installed , uninstalled , and run through the filer...

 also make use of pipelines. Rather than providing a save dialog box
Dialog box
In graphical user interfaces, a dialog box is a special window, used in user interfaces to display information to the user, or to get a response if needed. They are so-called because they form a dialog between the computer and the user—either informing the user of something, or requesting...

 containing a file manager
File manager
A file manager or file browser is a computer program that provides a user interface to work with file systems. The most common operations used are create, open, edit, view, print, play, rename, move, copy, delete, attributes, properties, search/find, and permissions. Files are typically displayed...

 to let the user specify where a program should write data, RISC OS and ROX provide a save dialog box containing an icon
Icon (computing)
On computer displays, a computer icon is a small pictogram. Icons have been used to supplement the normal alphanumerics of the computer...

 (and a field to specify the name). The destination is specified by dragging and dropping
Drag-and-drop
In computer graphical user interfaces, drag-and-drop is the action of clicking on a virtual object and dragging it to a different location or onto another virtual object...

 the icon. The user can drop the icon anywhere an already-saved file could be dropped, including onto icons of other programs. If the icon is dropped onto a program's icon, it's loaded and the contents that would otherwise have been saved are passed in on the new program's standard input stream.

For instance, a user browsing the world-wide web might come across a .gz compressed image which they want to edit and re-upload. Using GUI pipelines, they could drag the link to their de-archiving program, drag the icon representing the extracted contents to their image editor, edit it, open the save as dialog, and drag its icon to their uploading software.

Conceptually, this method could be used with a conventional save dialog box, but this would require the user's programs to have an obvious and easily-accessible location in the filesystem that can be navigated to. In practice, this is often not the case, so GUI pipelines are rare.

Other considerations


The name 'pipeline' comes from a rough analogy with physical plumbing in that a pipeline usually allows information to flow in only one direction, like water often flows in a pipe.

Pipes and filters
Filter (software)
A filter is a computer program to process a data stream. Some operating systems such as Unix are rich with filter programs. Even Windows has some simple filters built in to its command shell, most of which have significant enhancements relative to the similar filter commands that were available in...

 can be viewed as a form of functional programming
Functional programming
In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast to the imperative programming style, which emphasizes changes in state...

, using byte streams as data objects; more specifically, they can be seen as a particular form of monad
Monads in functional programming
In functional programming, a monad is a kind of abstract data type used to represent computations . Monads allow the programmer to chain actions together to build a pipeline, in which each action is decorated with additional processing rules provided by the monad...

 for I/O
I/O
I/O may refer to:* Input/output, a system of communication for information processing systems* The input-output model, an economic model of flow prediction between sectors...

.

The concept of pipeline is also central to the Cocoon
Apache Cocoon
Apache Cocoon, usually just called Cocoon, is a web application framework built around the concepts of pipeline, separation of concerns and component-based web development. The framework focuses on XML and XSLT publishing and is built using the Java programming language...

 web development framework
Framework
A framework is a basic conceptual structure used to solve or address complex issues. This very broad definition has allowed the term to be used as a buzzword, especially in a software context....

 where it allows a source stream to be modified before eventual display.

This pattern encourages the use of text streams as the input and output of programs. This reliance on text has to be accounted when creating graphic
Gui
Gui or guee is a generic term to refer to grilled dishes in Korean cuisine. These most commonly have meat or fish as their primary ingredient, but may in some cases also comprise grilled vegetables or other vegetarian ingredients. The term derives from the verb, "gupda" in Korean, which literally...

 shells to text programs.

History


Process pipelines were invented by Douglas McIlroy
Douglas McIlroy
Malcolm Douglas McIlroy is a mathematician, engineer, and programmer. As of 2007 he is an Adjunct Professor of Computer Science at Dartmouth College. Dr...

, one of the designers of the first Unix shell
Unix shell
A Unix shell is a command-line interpreter and script host that provides a traditional user interface for the Unix operating system and for Unix-like systems...

s, and greatly contributed to the popularity of that operating system. It can be considered the first non-trivial instance of software componentry.

The idea was eventually ported to other operating systems, such as DOS
DOS
DOS, short for "Disk Operating System", is a shorthand term for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions Windows 95, 98, and ME.Related systems...

, OS/2
OS/2
OS/2 is a computer operating system, initially created by Microsoft and IBM, then later developed by IBM exclusively. The name stands for "Operating System/2," because it was introduced as part of the same generation change release as IBM's "Personal System/2 " line of second-generation personal...

, Windows NT
Windows NT
Windows NT is a family of operating systems produced by Microsoft, the first version of which was released in July 1993. It was originally designed to be a powerful high-level-language-based, processor-independent, multiprocessing, multiuser operating system with features comparable to Unix. It was...

, BeOS
BeOS
BeOS was an operating system for personal computers which began development by Be Inc. in 1991. It was first written to run on BeBox hardware. BeOS was optimized for digital media work and was written to take advantage of modern hardware facilities such as symmetric multiprocessing by utilizing...

, and Mac OS X
Mac OS X
Mac OS X is a line of computer operating systems developed, marketed, and sold by Apple Inc., and since 2002 has been included with all new Macintosh computer systems...

.

See also

  • Anonymous pipe
    Anonymous pipe
    In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication. An implementation is often integrated into the operating system's file IO subsystem...

  • Component-based software engineering
    Component-based software engineering
    Component-based software engineering is a branch of software engineering, the priority of which is the separation of concerns in respect of the wide-ranging functionality available throughout a given software system...

  • GStreamer
    GStreamer
    GStreamer is a pipeline-based multimedia framework written in the C programming language with the type system based on GObject.GStreamer allows a programmer to create a variety of media-handling components, including simple audio playback, audio and video playback, recording, streaming and editing...

     for a multimedia framework built on plugin pipelines
  • Named pipe
    Named pipe
    In computing, a named pipe is an extension to the traditional pipe concept on Unix and Unix-like systems, and is one of the methods of inter-process communication. MacOS calls it a socket, which should not be confused with a TCP socket. The concept is also found in Microsoft Windows, although the...

    , an operating system construct intermediate to anonymous pipe and file.
  • Pipeline (computing) for other computer-related versions of the concept.
  • Pipeline (Unix)
    Pipeline (Unix)
    In Unix-like computer operating systems, a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process feeds directly as input to the next one. Each connection is implemented by an anonymous pipe. Filter programs are often...

     for details specific to Unix
    Unix
    Unix is a computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

    .
  • Plumber
    Plumber (program)
    The plumber, in the Plan 9 from Bell Labs and Inferno operating systems, is a mechanism for interprocess communication, somewhat similar to copy and paste.The plumber is a program which handles all the messaging when programs make plumbing messages...

     - "intelligent pipes" developed as part of Plan 9
    Plan 9 from Bell Labs
    Plan 9 from Bell Labs is a distributed operating system, primarily used for research. It was developed as the research successor to Unix by the Computing Sciences Research Center at Bell Labs between the mid-1980s and 2002...

  • Programming in the large
    Programming in the large
    - Introduction :In software development, programming in the large and programming in the small describe two different approaches to writing software...

  • Software design pattern
  • XML pipeline
    XML pipeline
    In computer science, an XML Pipeline is formed when XML processes, sometimes called XML transformations, are connected together....

     for processing of XML
    XML
    XML is a set of rules for encoding documents electronically. It is defined in the produced by the W3C and several other related specifications; all are fee-free open standards....

    files

External links