Parallel (software)
Encyclopedia
GNU parallel is a command-line driven utility for Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

 or other Unix-like
Unix-like
A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....

 operating systems which allows the user to execute shell
Bourne shell
The Bourne shell, or sh, was the default Unix shell of Unix Version 7 and most Unix-like systems continue to have /bin/sh - which will be the Bourne shell, or a symbolic link or hard link to a compatible shell - even when more modern shells are used by most users.Developed by Stephen Bourne at AT&T...

 scripts in parallel
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

. GNU parallel is a free software
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

, written in Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...

. It is available under the terms of GPLv3.

Usage

The most common usage is to replace the shell loop, for example

(for x in `cat list` ; do
do_something $x
done) | process_output

to the form of

cat list | parallel do_something | process_output

where the file list contains arguments for do_something and where process_output may be empty.

Scripts using parallel are often easier to read than scripts using pexec
Pexec
pexec is a command-line driven utility for Linux or other Unix-like operating systems which allows the user to execute "for ~ do ~ done" like shell loops in parallel. The specified command or script can be executed on both local and remote host computers, in the case of remote execution, ssh is...

.

The program parallel features also
  • grouping of standard output and standard error so the output of the parallel running jobs do not run together;
  • retaining the order of output to remain the same order as input;
  • dealing nicely with file names containing special characters such as space, single quote, double quote, ampersand, and UTF-8 encoded characters;

By default, parallel runs 9 jobs in parallel, but using -j+0 parallel can be made to detect the number of CPUs and use all of them.

An introduction video to GNU Parallel can be found on Wikimedia Commons.

Examples


find . -name "*.foo" | parallel grep bar


The above is equivalent to:


grep bar $(find . -name "*.foo")


This searches in all files in the current directory
Directory (file systems)
In computing, a folder, directory, catalog, or drawer, is a virtual container originally derived from an earlier Object-oriented programming concept by the same name within a digital file system, in which groups of computer files and other folders can be kept and organized.A typical file system may...

 and its subdirectories which end in .foo for occurrences of the string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

 bar. The parallel command will work as expected unless a file name contains a newline
Newline
In computing, a newline, also known as a line break or end-of-line marker, is a special character or sequence of characters signifying the end of a line of text. The name comes from the fact that the next character after the newline will appear on a new line—that is, on the next line below the...

. In order to avoid this limitation one may use:


find . -name "*.foo" -print0 | parallel -0 grep bar


The above command uses GNU specific extensions to find to separate filenames using the null character
Null character
The null character , abbreviated NUL, is a control character with the value zero.It is present in many character sets, including ISO/IEC 646 , the C0 control code, the Universal Character Set , and EBCDIC...

;

find . -name "*.foo" | parallel -X mv {} /tmp/trash


The above command uses {} to tell parallel to replace {} with the argument list.


find . -maxdepth 1 -type f -name "*.ogg" | parallel -X -r cp -v -p {} /home/media


The command above does the same as:


cp -v -p *.ogg /home/media


however, the former command which uses find/parallel/cp is more resource efficient and will not halt with an error if the expansion of *.ogg is too large for the shell.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK