Pipeline (Unix)
Encyclopedia
In Unix-like
Unix-like
A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....

 computer operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

s (and, to some extent, Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

), a pipeline is the original software pipeline
Pipeline (software)
In software engineering, a pipeline consists of a chain of processing elements , arranged so that the output of each element is the input of the next. Usually some amount of buffering is provided between consecutive elements...

: a set of process
Process (computing)
In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system , a process may be made up of multiple threads of execution that execute instructions concurrently.A computer program is a...

es chained by their standard streams
Standard streams
In Unix and Unix-like operating systems , as well as certain programming language interfaces, the standard streams are preconnected input and output channels between a computer program and its environment when it begins execution...

, so that the output of each process (stdout) feeds directly as input (stdin) to the next one. Each connection is implemented by an anonymous pipe
Anonymous pipe
In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication . An implementation is often integrated into the operating system's file IO subsystem...

. Filter program
Filter (Unix)
In Unix and Unix-like operating systems, a filter is a program that gets most of its data from its standard input and writes its main results to its standard output . Unix filters are often used as elements of pipelines...

s are often used in this configuration.

The concept was invented by Douglas McIlroy
Douglas McIlroy
Malcolm Douglas McIlroy is a mathematician, engineer, and programmer. As of 2007 he is an Adjunct Professor of Computer Science at Dartmouth College. Dr...

 for Unix shell
Unix shell
A Unix shell is a command-line interpreter or shell that provides a traditional user interface for the Unix operating system and for Unix-like systems...

s and it was named by analogy to a physical pipeline
Pipeline transport
Pipeline transport is the transportation of goods through a pipe. Most commonly, liquids and gases are sent, but pneumatic tubes that transport solid capsules using compressed air are also used....

.

Unix pipeline can be thought of as left associative infix operation
Infix notation
Infix notation is the common arithmetic and logical formula notation, in which operators are written infix-style between the operands they act on . It is not as simple to parse by computers as prefix notation or postfix notation Infix notation is the common arithmetic and logical formula notation,...

 whose operands are programs with parameters. Programatically all programs in pipeline run at the same time (in parallel), but, looking at syntax, it can be thought that one runs after another (note, that parallelism is actually emulated; for how just see how pipelines are implemented later on this page). It is a functional composition. One can be reminded of functional programming
Functional programming
In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast to the imperative programming style, which emphasizes changes in state...

, where data is passed from one function to another (as their input or output).

Simple example


ls -l | less


In this example, ls
Ls
In computing, ls is a command to list files in Unix and Unix-like operating systems. ls is specified by POSIX and the Single UNIX Specification.- History :An ls utility appeared in the original version of AT&T UNIX...

is the Unix directory lister, and less
Less (Unix)
less is a terminal pager program on Unix, Windows, and Unix-like systems used to view the contents of a text file one screen at a time. It is similar to more, but has the extended capability of allowing both forward and backward navigation through the file...

is an interactive text pager
Terminal pager
A terminal pager, or paging program, is a computer program used to view the contents of a text file moving down the file one line or one screen at a time. Some, but not all, pagers allow movement up a file. A popular cross-platform terminal pager is more. more can move forwards and backwards in...

 with searching capabilities.
The pipeline lets the user scroll up and down a directory listing that may not fit on the screen.

Pipelines ending in less (or more, a similar text pager) are among the most commonly used.
They let the user navigate potentially large amounts of text (constrained only by available memory) which otherwise would have scrolled past the top of the terminal and been lost.
Put differently, they relieve programmers from the burden of implementing text pagers in their applications: they can pipe output through less, or assume that the user will do so when needed.

Complex example

Below is an example of a pipeline that implements a kind of spell checker
Spell checker
In computing, a spell checker is an application program that flags words in a document that may not be spelled correctly. Spell checkers may be stand-alone capable of operating on a block of text, or as part of a larger application, such as a word processor, email client, electronic dictionary,...

 for the web
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

 resource indicated by a URL
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....

. An explanation of what it does follows.


curl "http://en.wikipedia.org/wiki/Pipeline_(Unix)" |
sed 's/[^a-zA-Z ]/ /g' |
tr 'A-Z ' 'a-z\n' |
grep '[a-z]' |
sort -u |
comm -23 - <(sort /usr/share/dict/words) |
less

  1. curl
    CURL
    cURL is a computer software project providing a library and command-line tool for transferring data using various protocols. The cURL project produces two products, libcurl and cURL...

    obtains the HTML
    HTML
    HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

     contents of a web page (could use wget on some systems).
  2. sed
    Sed
    sed is a Unix utility that parses text and implements a programming language which can apply transformations to such text. It reads input line by line , applying the operation which has been specified via the command line , and then outputs the line. It was developed from 1973 to 1974 as a Unix...

    replaces all characters (from the web page's content) that are not spaces or letters, with spaces. (Newline
    Newline
    In computing, a newline, also known as a line break or end-of-line marker, is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the...

    s are preserved.)
  3. tr changes all of the uppercase letters into lowercase and converts the spaces in the lines of text to newlines (each 'word' is now on a separate line).
  4. grep
    Grep
    grep is a command-line text-search utility originally written for Unix. The name comes from the ed command g/re/p...

    includes only lines that contain at least one lowercase alphabetical character (removing any blank lines).
  5. sort
    Sort (Unix)
    sort is a standard Unix command line program that prints the lines of its input or concatenation of all files listed in its argument list in sorted order. Sorting is done based on one or more sort keys extracted from each line of input. By default, the entire input is taken as sort key...

    sorts the list of 'words' into alphabetical order, and the -u switch removes duplicates.
  6. comm finds lines in common between two files, -23 suppresses lines unique to the second file, and those that are common to both, leaving only those that are found only in the first file named. The - in place of a filename causes comm to use its standard input (from the pipe line in this case). sort /usr/share/dict/words sorts the contents of the words file alphabetically, as comm expects, and <( ... ) outputs the results to a temporary file (via process substitution
    Process substitution
    In computing, process substitution is a form of inter-process communication that allows the input or output of a command to appear as a file. The command is substituted in-line, where a file name would normally occur, by the command shell...

    ), which comm reads. The result is a list of words (lines) that are not found in /usr/share/dict/words.
  7. less
    Less (Unix)
    less is a terminal pager program on Unix, Windows, and Unix-like systems used to view the contents of a text file one screen at a time. It is similar to more, but has the extended capability of allowing both forward and backward navigation through the file...

    allows the user to page through the results.


The special character "|" tells the command line interpreter shell
Shell (computing)
A shell is a piece of software that provides an interface for users of an operating system which provides access to the services of a kernel. However, the term is also applied very loosely to applications and may include any software that is "built around" a particular component, such as web...

 to pipe the output from the previous command in the line into the next command in the line. That is, the output of the curl command is given as the input of the sed command, etc.

Pipelines in command line interfaces

All widely used Unix and Windows shells have a special syntax construct for the creation of pipelines. In typical usage one writes the filter commands in sequence, separated by the ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...

 vertical bar
Vertical bar
The vertical bar is a character with various uses in mathematics, where it can be used to represent absolute value, among others; in computing and programming and in general typography, as a divider not unlike the interpunct...

 character "|" (which, for this reason, is often called "pipe character"). The shell starts the processes and arranges for the necessary connections between their standard streams (including some amount of buffer
Buffer (computer science)
In computer science, a buffer is a region of a physical memory storage used to temporarily hold data while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device or just before it is sent to an output device...

 storage).

Error stream

By default, the standard error streams ("stderr") of the processes in a pipeline are not passed on through the pipe; instead, they are merged and directed to the console. However, many shells have additional syntax for changing this behaviour. In the csh
C shell
The C shell is a Unix shell that was created by Bill Joy while a graduate student at University of California, Berkeley in the late 1970s. It has been distributed widely, beginning with the 2BSD release of the BSD Unix system that Joy began distributing in 1978...

 shell, for instance, using "|&" instead of "| " signifies that the standard error stream too should be merged with the standard output and fed to the next process. The Bourne Shell
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...

 can also merge standard error, using 2>&1, as well as redirect it to a different file.

Pipemill

In the most commonly used simple pipelines the shell connects a series of sub-processes via pipes, and executes external commands within each sub-process. Thus the shell itself is doing no direct processing of the data flowing through the pipeline.

However, it's possible for the shell to perform processing directly. This construct generally looks something like:

command | while read var1 var2 ...; do
# process each line, using variables as parsed into $var1, $var2, etc
# (note that this is a subshell: var1, var2 etc will not be available
# after the while loop terminates)
done

... which is referred to as a "pipemill" (since the while is "milling" over the results from the initial command.)

When using programs such as ssh, stdin is being passed to the remote command.
As such the default standard input handling of ssh drains the remaining hosts from the while loop.
To prevent this, redirect stdin from /dev/null. (< /dev/null)
In the case of ssh, you may also use -n to prevent ssh
Secure Shell
Secure Shell is a network protocol for secure data communication, remote shell services or command execution and other secure network services between two networked computers that it connects via a secure channel over an insecure network: a server and a client...

 reading from stdin.

Creating pipelines programmatically

Pipelines can be created under program control. The Unix pipe system call
System call
In computing, a system call is how a program requests a service from an operating system's kernel. This may include hardware related services , creating and executing new processes, and communicating with integral kernel services...

 asks the operating system to construct a new anonymous pipe
Anonymous pipe
In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication . An implementation is often integrated into the operating system's file IO subsystem...

 object. This results in two new, opened file descriptors in the process: the read-only end of the pipe, and the write-only end. The pipe ends appear to be normal, anonymous file descriptor
File descriptor
In computer programming, a file descriptor is an abstract indicator for accessing a file. The term is generally used in POSIX operating systems...

s, except that they have no ability to seek.

To avoid deadlock
Deadlock
A deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg"...

 and exploit parallelism, the Unix process with one or more new pipes will then, generally, call fork to create new processes. Each process will then close the end(s) of the pipe that it will not be using before producing or consuming any data. Alternatively, a process might create a new thread and use the pipe to communicate between them.

Named pipe
Named pipe
In computing, a named pipe is an extension to the traditional pipe concept on Unix and Unix-like systems, and is one of the methods of inter-process communication. The concept is also found in Microsoft Windows, although the semantics differ substantially...

s
may also be created using mkfifo or mknod and then presented as the input or output file to programs as they are invoked. They allow multi-path pipes to be created, and are especially effective when combined with standard error redirection, or with tee.

Implementation

In most Unix-like systems, all processes of a pipeline are started at the same time, with their streams appropriately connected, and managed by the scheduler
Scheduling (computing)
In computer science, a scheduling is the method by which threads, processes or data flows are given access to system resources . This is usually done to load balance a system effectively or achieve a target quality of service...

 together with all other processes running on the machine. An important aspect of this, setting Unix pipes apart from other pipe implementations, is the concept of buffering
Buffer (computer science)
In computer science, a buffer is a region of a physical memory storage used to temporarily hold data while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device or just before it is sent to an output device...

: a sending program may produce 5000 bytes per second
Second
The second is a unit of measurement of time, and is the International System of Units base unit of time. It may be measured using a clock....

, and a receiving program may only be able to accept 100 bytes per second, but no data is lost. Instead, the output of the sending program is held in a queue. When the receiving program is ready to read data, the operating system sends its data from the queue, then removes that data from the queue. If the queue buffer fills up, the sending program is suspended (blocked) until the receiving program has had a chance to read some data and make room in the buffer. In Linux, the size of the buffer is 65536 bytes.

Network pipes

Tools like netcat
Netcat
Netcat is a computer networking service for reading from and writing network connections using TCP or UDP. Netcat is designed to be a dependable “back-end” device that can be used directly or easily driven by other programs and scripts...

 and socat can connect pipes to TCP/IP sockets
Internet socket
In computer networking, an Internet socket or network socket is an endpoint of a bidirectional inter-process communication flow across an Internet Protocol-based computer network, such as the Internet....

.

History

The pipeline concept and the vertical-bar notation was invented by Douglas McIlroy
Douglas McIlroy
Malcolm Douglas McIlroy is a mathematician, engineer, and programmer. As of 2007 he is an Adjunct Professor of Computer Science at Dartmouth College. Dr...

, one of the authors of the early command shells
Unix shell
A Unix shell is a command-line interpreter or shell that provides a traditional user interface for the Unix operating system and for Unix-like systems...

, after he noticed that much of the time they were processing the output of one program as the input to another. His ideas were implemented in 1973 when Ken Thompson
Ken Thompson
Kenneth Lane Thompson , commonly referred to as ken in hacker circles, is an American pioneer of computer science...

 added pipes to the UNIX
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 operating system. The idea was eventually ported to other operating systems, such as DOS
DOS
DOS, short for "Disk Operating System", is an acronym for several closely related operating systems that dominated the IBM PC compatible market between 1981 and 1995, or until about 2000 if one includes the partially DOS-based Microsoft Windows versions 95, 98, and Millennium Edition.Related...

, OS/2
OS/2
OS/2 is a computer operating system, initially created by Microsoft and IBM, then later developed by IBM exclusively. The name stands for "Operating System/2," because it was introduced as part of the same generation change release as IBM's "Personal System/2 " line of second-generation personal...

, Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

, and BeOS
BeOS
BeOS is an operating system for personal computers which began development by Be Inc. in 1991. It was first written to run on BeBox hardware. BeOS was optimized for digital media work and was written to take advantage of modern hardware facilities such as symmetric multiprocessing by utilizing...

, often with the same notation.

Although developed independently, Unix pipes are similar to, and were preceded by, the 'communication files' developed by Ken Lochner in the 1960s for the Dartmouth Time Sharing System
Dartmouth Time Sharing System
The Dartmouth Time-Sharing System, or DTSS for short, was the first large-scale time-sharing system to be implemented successfully. DTSS was inspired by a PDP-1-based time-sharing system at Bolt, Beranek and Newman. In 1962, John Kemeny and Thomas Kurtz at Dartmouth College submitted a grant for...

.

The robot in the icon for Apple
Apple Computer
Apple Inc. is an American multinational corporation that designs and markets consumer electronics, computer software, and personal computers. The company's best-known hardware products include the Macintosh line of computers, the iPod, the iPhone and the iPad...

's Automator
Automator (software)
Automator is an application developed by Apple for Mac OS X that implements point-and-click creation of workflows for automating repetitive tasks into batches for quicker alteration, thus saving time and effort over human intervention to manually change each file separately...

, which also uses a pipeline concept to chain repetitive commands together, holds a pipe in homage to the original Unix concept.

Other operating systems

This feature of Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...

 was borrowed by other operating systems, such as Taos and MS-DOS
MS-DOS
MS-DOS is an operating system for x86-based personal computers. It was the most commonly used member of the DOS family of operating systems, and was the main operating system for IBM PC compatible personal computers during the 1980s to the mid 1990s, until it was gradually superseded by operating...

, and eventually became the pipes and filters design pattern
Pipeline (software)
In software engineering, a pipeline consists of a chain of processing elements , arranged so that the output of each element is the input of the next. Usually some amount of buffering is provided between consecutive elements...

 of software engineering
Software engineering
Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software...

.

See also

  • Anonymous pipe
    Anonymous pipe
    In computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication . An implementation is often integrated into the operating system's file IO subsystem...

    , a FIFO
    FIFO
    FIFO is an acronym for First In, First Out, an abstraction related to ways of organizing and manipulation of data relative to time and prioritization...

     structure used for interprocess communication
  • GStreamer
    GStreamer
    GStreamer is a pipeline-based multimedia framework written in the C programming language with the type system based on GObject.GStreamer allows a programmer to create a variety of media-handling components, including simple audio playback, audio and video playback, recording, streaming and editing...

    , a pipeline-based multimedia framework
  • Hartmann pipeline
    Hartmann pipeline
    A Hartmann pipeline is an extension of the Unix pipeline concept, providing for more complex paths, multiple input/output streams, and other features. It is an example and extension of Pipeline programming....

  • Named pipe
    Named pipe
    In computing, a named pipe is an extension to the traditional pipe concept on Unix and Unix-like systems, and is one of the methods of inter-process communication. The concept is also found in Microsoft Windows, although the semantics differ substantially...

     persistent pipes used for interprocess communication
  • Pipeline (computing) for other computer-related pipelines.
  • Pipeline (software)
    Pipeline (software)
    In software engineering, a pipeline consists of a chain of processing elements , arranged so that the output of each element is the input of the next. Usually some amount of buffering is provided between consecutive elements...

     for the general software engineering concept.
  • Redirection (computing)
  • Tee (command), a general command for tapping data from a pipeline
  • XML pipeline
    XML pipeline
    In software, an XML Pipeline is formed when XML processes, especially XML transformations and XML validations, are connected together....

    for processing of XML files

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK