Message Passing Interface
Encyclopedia
Message Passing Interface (MPI) is a standardized and portable message-passing system designed by a group of researchers from academia and industry to function on a wide variety of parallel computers. The standard defines the syntax and semantics of a core of library routines useful to a wide range of users writing portable message-passing programs in Fortran 77 or the C programming language
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

. Several well-tested and efficient implementations of MPI include some that are free and in the public domain. These fostered the development of a parallel software industry, and there encouraged development of portable and scalable large-scale parallel applications.

History

The message passing interface effort began in the summer of 1991 when a small group of researchers started discussions at a mountain retreat in Austria. Out of that discussion came a Workshop on Standards for Message Passing in a Distributor Memory Environment held on April 29–30, 1992 in Williamsburg, Virginia
Williamsburg, Virginia
Williamsburg is an independent city located on the Virginia Peninsula in the Hampton Roads metropolitan area of Virginia, USA. As of the 2010 Census, the city had an estimated population of 14,068. It is bordered by James City County and York County, and is an independent city...

. At this workshop the basic features essential to a standard message-passing interface were discussed, and a working group established to continue the standardization process. Jack Dongarra
Jack Dongarra
Jack J. Dongarra is a University Distinguished Professor of Computer Sciencein the Electrical Engineering and Computer Science Department at the University of Tennessee...

, Rolf Hempel Hempel, Tony Hey
Tony Hey
Anthony John Grenville Hey CBE FREng FIET FInstP FBCS is a researcher and educator across a range of science and engineering fields....

, and David W. Walker put forward a preliminary draft proposal in November 1992, this was known as MPI1. In November 1992, a meeting of the MPI working group was held in Minneapolis, at which it was decided to place the standardization process on a more formal footing. The MPI working group met every 6 weeks throughout the first 9 months of 1993. The draft MPI standard was presented at the Supercomputing '93 conference in November 1993. After a period of public comments, which resulted in some changes in MPI, version 1.0 of MPI was released in June 1994. These meetings and the email discussion together constituted the MPI Forum, membership of which has been open to all members of the high performance computing community.

The MPI effort involved about 80 people from 40 organizations, mainly in the United States and Europe. Most of the major vendors of concurrent computers were involved in MPI along with researchers from universities, government laboratories, and industry.

The MPI standard defines the syntax and semantics of a core of library routines useful to a wide range of users writing portable message passing programs in Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

 and C. The MPI effort was conducted and similar spirits of the High-Performance Fortran Forum (HPFF).

MPI provides parallel hardware vendors with a clearly defined base set of routines that can be efficiently implemented. As a result, hardware vendors can build upon this collection of standard low-level routines to create higher-level routines for the distributed-memory communication environment supplied with their parallel machines. MPI provides a simple-to-use portable interface for the basic user, yet powerful enough to allow programmers to use the high-performance message passing operations available on advance machines.

As an effort to create a “true” standard for message passing, researchers incorporated into MPI the most useful features of several systems, rather than choose one system to adopt as a standard. Features were used from systems by IBM, Intel, nCUBE, PVM, Express, P4 and PARMACS. The message passing paradigms is attractive because of a wide portability and can be used in communication for distributed-memory and shared-memory multiprocessors, networks of workstations, and a combination of these elements. The paradigm will not be made obsolete by increases and network speed or by architectural combining shared and distributive memory components.

Support for MPI meetings came in part from ARPA and US National Science Foundation
National Science Foundation
The National Science Foundation is a United States government agency that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National Institutes of Health...

 under grant ASC-9310330, NSF Science and Technology Center Cooperative agreement number CCR-8809615, and the Commission of the European Community through Esprit Project P6643. The University of Tennessee
University of Tennessee
The University of Tennessee is a public land-grant university headquartered at Knoxville, Tennessee, United States...

 also made financial contributions to the MPI Forum.

Overview

MPI is a language-independent communications protocol
Communications protocol
A communications protocol is a system of digital message formats and rules for exchanging those messages in or between computing systems and in telecommunications...

 used to program parallel computers. Both point-to-point and collective communication are supported. MPI "is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation." MPI's goals are high performance, scalability, and portability. MPI remains the dominant model used in high-performance computing
High-performance computing
High-performance computing uses supercomputers and computer clusters to solve advanced computation problems. Today, computer systems approaching the teraflops-region are counted as HPC-computers.-Overview:...

 today.

MPI is not sanctioned by any major standards body; nevertheless, it has become a de facto
De facto
De facto is a Latin expression that means "concerning fact." In law, it often means "in practice but not necessarily ordained by law" or "in practice or actuality, but not officially established." It is commonly used in contrast to de jure when referring to matters of law, governance, or...

standard
Standardization
Standardization is the process of developing and implementing technical standards.The goals of standardization can be to help with independence of single suppliers , compatibility, interoperability, safety, repeatability, or quality....

 for communication
Communication
Communication is the activity of conveying meaningful information. Communication requires a sender, a message, and an intended recipient, although the receiver need not be present or aware of the sender's intent to communicate at the time of communication; thus communication can occur across vast...

 among processes that model a parallel program running on a distributed memory
Distributed memory
In computer science, distributed memory refers to a multiple-processor computer system in which each processor has its own private memory. Computational tasks can only operate on local data, and if remote data is required, the computational task must communicate with one or more remote processors...

 system. Actual distributed memory supercomputers such as computer clusters often run such programs. The principal MPI-1 model has no shared memory concept, and MPI-2 has only a limited distributed shared memory
Distributed shared memory
Distributed Shared Memory , in Computer Architecture is a form of memory architecture where the memories can be addressed as one address space...

 concept. Nonetheless, MPI programs are regularly run on shared memory computers. Designing programs around the MPI model (contrary to explicit shared memory
Shared memory
In computing, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Depending on context, programs may run on a single processor or on multiple separate processors...

 models) has advantages over NUMA
Non-Uniform Memory Access
Non-Uniform Memory Access is a computer memory design used in Multiprocessing, where the memory access time depends on the memory location relative to a processor...

 architectures since MPI encourages memory locality
Locality of reference
In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources...

.

Although MPI belongs in layers 5 and higher of the OSI Reference Model, implementations may cover most layers, with sockets
Internet socket
In computer networking, an Internet socket or network socket is an endpoint of a bidirectional inter-process communication flow across an Internet Protocol-based computer network, such as the Internet....

 and Transmission Control Protocol
Transmission Control Protocol
The Transmission Control Protocol is one of the core protocols of the Internet Protocol Suite. TCP is one of the two original components of the suite, complementing the Internet Protocol , and therefore the entire suite is commonly referred to as TCP/IP...

 (TCP) used in the transport layer.

Most MPI implementations consist of a specific set of routines (i.e., an API) directly callable from C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

, C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

, Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...

 and any language able to interface with such libraries, including C#, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 or Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...

. The advantages of MPI over older message passing libraries are portability (because MPI has been implemented for almost every distributed memory architecture) and speed (because each implementation
Implementation
Implementation is the realization of an application, or execution of a plan, idea, model, design, specification, standard, algorithm, or policy.-Computer Science:...

 is in principle optimized for the hardware
Hardware
Hardware is a general term for equipment such as keys, locks, hinges, latches, handles, wire, chains, plumbing supplies, tools, utensils, cutlery and machine parts. Household hardware is typically sold in hardware stores....

 on which it runs).

MPI uses Language Independent Specifications (LIS) for calls and language bindings. The first MPI standard specified ANSI C
ANSI C
ANSI C refers to the family of successive standards published by the American National Standards Institute for the C programming language. Software developers writing in C are encouraged to conform to the standards, as doing so aids portability between compilers.-History and outlook:The first...

 and Fortran-77 bindings together with the LIS. The draft was presented at
Supercomputing 1994 (November 1994) and finalized soon thereafter. About 128 functions constitute the MPI-1.3 standard which was released as the final end of the MPI-1 series in 2008.

At present, the standard has several popular versions: version 1.3 (shortly called MPI-1), which emphasizes message passing and has a static runtime environment, and MPI-2.2 (MPI-2), which includes new features such as parallel I/O, dynamic process management and remote memory operations. MPI-2's LIS specifies over 500 functions and provides language bindings for ANSI C
ANSI C
ANSI C refers to the family of successive standards published by the American National Standards Institute for the C programming language. Software developers writing in C are encouraged to conform to the standards, as doing so aids portability between compilers.-History and outlook:The first...

, ANSI C++, and ANSI Fortran (Fortran90). Object interoperability was also added to allow easier mixed-language message passing programming. A side-effect of standardizing MPI-2, completed in 1996, was clarifying the MPI-1 standard, creating the MPI-1.2.

MPI-2 is mostly a superset of MPI-1, although some functions have been deprecated. MPI-1.3 programs still work under MPI implementations compliant with the MPI-2 standard.

MPI is often compared with Parallel Virtual Machine
Parallel Virtual Machine
The Parallel Virtual Machine is a software tool for parallel networking of computers. It is designed to allow a network of heterogeneous Unix and/or Windows machines to be used as a single distributed parallel processor. Thus large computational problems can be solved more cost effectively by...

 (PVM), which is a popular distributed environment and message passing system developed in 1989, and which was one of the systems that motivated the need for standard parallel message passing. Threaded shared memory programming models (such as Pthreads and OpenMP
OpenMP
OpenMP is an API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most processor architectures and operating systems, including Linux, Unix, AIX, Solaris, Mac OS X, and Microsoft Windows platforms...

) and message passing programming (MPI/PVM) can be considered as complementary programming approaches, and can occasionally be seen together in applications, e.g. in servers with multiple large shared-memory nodes.

Functionality

The MPI interface is meant to provide essential virtual topology, synchronization
Synchronization
Synchronization is timekeeping which requires the coordination of events to operate a system in unison. The familiar conductor of an orchestra serves to keep the orchestra in time....

, and communication functionality between a set of processes (that have been mapped to nodes/servers/computer instances) in a language-independent way, with language-specific syntax (bindings), plus a few language-specific features. MPI programs always work with processes, but programmers commonly refer to the processes as processors. Typically, for maximum performance, each CPU (or core
Multi-core (computing)
A multi-core processor is a single computing component with two or more independent actual processors , which are the units that read and execute program instructions...

 in a multi-core machine) will be assigned just a single process. This assignment happens at runtime through the agent that starts the MPI program, normally called mpirun or mpiexec.

MPI library functions include, but are not limited to, point-to-point rendezvous-type send/receive operations, choosing between a Cartesian
Cartesian tree
In computer science, a Cartesian tree is a binary tree derived from a sequence of numbers; it can be uniquely defined from the properties that it is heap-ordered and that a symmetric traversal of the tree returns the original sequence...

 or graph
Graph (data structure)
In computer science, a graph is an abstract data structure that is meant to implement the graph and hypergraph concepts from mathematics.A graph data structure consists of a finite set of ordered pairs, called edges or arcs, of certain entities called nodes or vertices...

-like logical process topology, exchanging data between process pairs (send/receive operations), combining partial results of computations (gather and reduce operations), synchronizing nodes (barrier operation) as well as obtaining network-related information such as the number of processes in the computing session, current processor identity that a process is mapped to, neighboring processes accessible in a logical topology, and so on. Point-to-point operations come in synchronous
Synchronization (computer science)
In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, so as to reach an agreement or...

, asynchronous
Asynchronous I/O
Asynchronous I/O, or non-blocking I/O, is a form of input/output processing that permits other processing to continue before the transmission has finished....

, buffered, and ready forms, to allow both relatively stronger and weaker semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....

 for the synchronization aspects of a rendezvous-send. Many outstanding operations are possible in asynchronous mode, in most implementations.

MPI-1 and MPI-2 both enable implementations that overlap communication and computation, but practice and theory differ. MPI also specifies thread safe interfaces, which have cohesion
Cohesion (computer science)
In computer programming, cohesion is a measure of how strongly-related each piece of functionality expressed by the source code of a software module is...

 and coupling
Coupling (computer science)
In computer science, coupling or dependency is the degree to which each program module relies on each one of the other modules.Coupling is usually contrasted with cohesion. Low coupling often correlates with high cohesion, and vice versa...

 strategies that help avoid hidden state within the interface. It is relatively easy to write multithreaded point-to-point MPI code, and some implementations support such code. Multithreaded collective communication is best accomplished with multiple copies of Communicators, as described below.

Concepts

MPI provides a rich range of abilities. The following concepts help in understanding and providing context for all of those abilities and help the programmer to decide what functionality to use in their application programs. Four of MPI's eight basic concepts are unique to MPI-2.

Communicator

Communicator objects connect groups of processes in the MPI session. Each communicator gives each contained process an independent identifier and arranges its contained processes in an ordered topology
Topology (disambiguation)
Topology is a branch of mathematics concerned with spatial properties preserved under bicontinuous deformation ; these properties are the topological invariants.Topology may also refer to:...

. MPI also has explicit groups, but these are mainly good for organizing and reorganizing groups of processes before another communicator is made. MPI understands single group intracommunicator operations, and bilateral intercommunicator communication. In MPI-1, single group operations are most prevalent. Bilateral operations mostly appear in MPI-2 where they include collective communication and dynamic in-process management.

Communicators can be partitioned using several MPI commands. These commands include a graph coloring
Graph coloring
In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices share the...

-type algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

 called MPI_COMM_SPLIT, which can efficiently derive topological and other logical subgroupings.

Point-to-point basics

A number of important MPI functions involve communication between two specific processes. A popular example is MPI_Send, which allows one specified process to send a message to a second specified process. Point-to-point operations, as these are called, are particularly useful in patterned or irregular communication, for example, a data-parallel
Data parallelism
Data parallelism is a form of parallelization of computing across multiple processors in parallel computing environments. Data parallelism focuses on distributing the data across different parallel computing nodes...

 architecture in which each processor routinely swaps regions of data with specific other processors between calculation steps, or a master-slave architecture in which the master sends new task data to a slave whenever the prior task is completed.

MPI-1 specifies mechanisms for both blocking
Blocking (computing)
Blocking occurs when a subroutine does not return until it either completes its task or fails with an error or exception. A process that is blocked is one that waits for some event, such as a resource becoming available or the completion of an I/O operation.In a multitasking computer system,...

 and non-blocking point-to-point communication mechanisms, as well as the so-called 'ready-send' mechanism whereby a send request can be made only when the matching receive request has already been made.

Collective basics

Collective functions
Collective operation
A collective operation is a concept in parallel computing in which data is simultaneously sent to or received from many nodes.Common examples of collective operations are gather , scatter , and broadcast A collective operation is a concept in parallel computing in which data is simultaneously sent...

 involve communication among all processes in a process group (which can mean the entire process pool or a program-defined subset). A typical function is the MPI_Bcast call (short for "broadcast"). This function takes data from one node and sends it to all processes in the process group. A reverse operation is the MPI_Reduce call, which takes data from all processes in a group, performs an operation (such as summing), and stores the results on one node. Reduce is often useful at the start or end of a large distributed calculation, where each processor operates on a part of the data and then combines it into a result.

Other operations perform more sophisticated tasks, such as MPI_Alltoall which rearranges n items of data processor such that the nth node gets the nth item of data from each.

Derived datatypes

Many MPI functions require that you specify the type of data which is sent between processors. This is because these functions pass variables, not defined types. If the data type is a standard one, such as int, char, double, etc., you can use predefined MPI datatypes such as MPI_INT, MPI_CHAR, MPI_DOUBLE.

Here is an example in C that passes an array of ints and all the processors want to send their arrays to the root with MPI_Gather:

int array[100];
int root, total_p, *receive_array;

MPI_Comm_size(comm, &total_p);
receive_array=malloc(total_p*100*sizeof(*receive_array));
MPI_Gather(array, 100, MPI_INT, receive_array, 100, MPI_INT, root, comm);


However, you may instead wish to send data as one block as opposed to 100 ints. To do this define a "contiguous block" derived data type.

MPI_Datatype newtype;
MPI_Type_contiguous(100, MPI_INT, &newtype);
MPI_Type_commit(&newtype);
MPI_Gather(array, 1, newtype, receive_array, 1, newtype, root, comm);


Passing a class or a data structure cannot use a predefined data type. MPI_Type_create_struct creates an MPI derived data type from MPI_predefined data types, as follows:

int MPI_Type_create_struct(int count, int blocklen[], MPI_Aint disp[], MPI_Datatype type[], MPI_Datatype *newtype)


where count is a number of blocks, also number of entries in blocklen[], disp[], and type[]:
  • blocklen[] — number of elements in each block (array of integer)
  • disp[] — byte displacement of each block (array of integer)
  • type[] — type of elements in each block (array of handles to datatype objects).


The disp[] array is needed because processors require the variables to be aligned a specific way on the memory. For example, Char is one byte and can go anywhere on the memory. Short is 2 bytes, so it goes to even memory addresses. Long is 4 bytes, it goes on locations divisible by 4 and so on. The compiler tries to accommodate this architecture in a class or data structure by padding the variables. The safest way to find the distance between different variables in a data structure is by obtaining their addresses with MPI_Get_address. This function calculates the displacement of all the structure's elements from the start of the data structure.

Given the following data structures:


typedef struct{
int f;
short p;
} A;

typedef struct{
A a;
int pp,vp;
} B;


Here's the C code for building an MPI-derived data type:

void define_MPI_datatype{

int blocklen[6]={1,1,1,1,1,1}; //The first and last elements mark the beg and end of data structure
MPI_Aint disp[6];
MPI_Datatype newtype;
MPI_Datatype type[6]={MPI_LB, MPI_INT, MPI_SHORT, MPI_INT, MPI_INT, MPI_UB};
B findsize[2]; //You need an array to establish the upper bound of the data structure
MPI_Aint findsize_addr, a_addr, f_addr, p_addr, pp_addr, vp_addr, UB_addr;
int error;

MPI_Get_address(&findsize[0], &findsize_addr);
MPI_Get_address(&(findsize[0]).a, &a_addr);
MPI_Get_address(&((findsize[0]).a).f, &f_addr);
MPI_Get_address(&((findsize[0]).a).p, &p_addr);
MPI_Get_address(&(findsize[0]).pp, &pp_addr);
MPI_Get_address(&(findsize[0]).vp, &vp_addr);
MPI_Get_address(&findsize[1],&UB_addr);

disp[0]=a_addr-findsize_addr;
disp[1]=f_addr-findsize_addr;
disp[2]=p_addr-findsize_addr;
disp[3]=pp_addr-findsize_addr;
disp[4]=vp_addr-findsize_addr;
disp[5]=UB_addr-findsize_addr;

error=MPI_Type_create_struct(6, blocklen, disp, type, &newtype);
MPI_Type_commit(&newtype);
}

One-sided communication

MPI-2 defines three one-sided communications operations, Put, Get, and Accumulate, being a write to remote memory, a read from remote memory, and a reduction operation on the same memory across a number of tasks. Also defined are three different methods to synchronize this communication (global, pairwise, and remote locks) as the specification does not guarantee that these operations have taken place until a synchronization point.

These types of call can often be useful for algorithms in which synchronization would be inconvenient (e.g. distributed matrix multiplication
Matrix multiplication
In mathematics, matrix multiplication is a binary operation that takes a pair of matrices, and produces another matrix. If A is an n-by-m matrix and B is an m-by-p matrix, the result AB of their multiplication is an n-by-p matrix defined only if the number of columns m of the left matrix A is the...

), or where it is desirable for tasks to be able to balance their load while other processors are operating on data.

Dynamic process management

The key aspect is "the ability of an MPI process to participate in the creation of new MPI processes or to establish communication with MPI processes that have been started separately." The MPI-2 specification describes three main interfaces by which MPI processes can dynamically establish communications, MPI_Comm_spawn, MPI_Comm_accept/MPI_Comm_connect and MPI_Comm_join. The MPI_Comm_spawn interface allows an MPI process to spawn a number of instances of the named MPI process. The newly spawned set of MPI processes form a new MPI_COMM_WORLD intracommunicator but can communicate with the parent and the intercommunicator the function returns. MPI_Comm_spawn_multiple is an alternate interface that allows the different instances spawned to be different binaries with different arguments.

I/O

The parallel I/O feature is sometimes called MPI-IO, and refers to a set of functions designed to abstract I/O management on distributed systems to MPI, and allow files to be easily accessed in a patterned way using the existing derived datatype functionality.

The little research that has been done on this feature indicates the difficulty for good performance. For example, some implementations of sparse matrix-vector multiplications using the MPI I/O library are disastrously inefficient.

'Classical' cluster and supercomputer implementations

The MPI implementation language is not constrained to match the language or languages it seeks to support at runtime. Most implementations combine C, C++ and assembly language, and target C, C++, and Fortran programmers. Bindings are available for many other languages, including Perl, Python, R, Ruby, Java, CL.

The initial implementation of the MPI 1.x standard was MPICH
MPICH
MPICH is a freely available, portable implementation of MPI, a standard for message-passing for distributed-memory applications used in parallel computing...

, from Argonne National Laboratory
Argonne National Laboratory
Argonne National Laboratory is the first science and engineering research national laboratory in the United States, receiving this designation on July 1, 1946. It is the largest national laboratory by size and scope in the Midwest...

 (ANL) and Mississippi State University
Mississippi State University
The Mississippi State University of Agriculture and Applied Science commonly known as Mississippi State University is a land-grant university located in Oktibbeha County, Mississippi, United States, partially in the town of Starkville and partially in an unincorporated area...

. IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

 also was an early implementor, and most early 90s supercomputer
Supercomputer
A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation.Supercomputers are used for highly calculation-intensive tasks such as problems including quantum physics, weather forecasting, climate research, molecular modeling A supercomputer is a...

 companies either commercialized MPICH, or built their own implementation. LAM/MPI
LAM/MPI
LAM/MPI is one of the predecessors of the Open MPI project. Open MPI represents a community-driven, next generation implementation of a Message Passing Interface fundamentally designed upon a component architecture to make an extremely powerful platform for high-performance computing.LAM is an...

 from Ohio Supercomputer Center
Ohio Supercomputer Center
Established in 1987, the Ohio Supercomputer Center is a partner of Ohio universities and industries that provides a high performance computing, research, cyberinfrastructure, and computational science education services....

 was another early open implementation. ANL has continued developing MPICH for over a decade, and now offers MPICH 2, implementing the MPI-2.1 standard. LAM/MPI and a number of other MPI efforts recently merged to form Open MPI
Open MPI
Open MPI is a Message Passing Interface library project combining technologies and resources from several other projects . It is used by many TOP500 supercomputers including Roadrunner, which was the world's fastest supercomputer from June 2008 to November 2009, and K computer, the fastest...

. Many other efforts are derivatives of MPICH, LAM, and other works, including, but not limited to, commercial implementations from HP, Intel, and Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...

.

Python

MPI Python implementations include: pyMPI, mpi4py, pypar, MYMPI, and the MPI submodule in ScientificPython. PyMPI is notable because it is a variant python interpreter, while pypar, MYMPI, and ScientificPython's module are import modules. They make it the coder's job to decide where the call to MPI_Init belongs. Recently the well known Boost C++ Libraries acquired Boost:MPI which included the MPI Python Bindings. This is of particular help for mixing C++ and Python.

Objective Caml

The OCamlMPI Module implements a large subset of MPI functions and is in active use in scientific computing. An eleven thousand line OCaml program was "MPI-ified" using the module, with an additional 500 lines of code and slight restructuring and ran with excellent results on up to 170 nodes in a supercomputer.

Java

Although Java does not have an official MPI binding, several groups attempt to bridge the two, with different degrees of success and compatibility. One of the first attempts was Bryan Carpenter's mpiJava, essentially a set of Java Native Interface
Java Native Interface
The Java Native Interface is a programming framework that enables Java code running in a Java Virtual Machine to call and to be called by native applications and libraries written in other languages such as C, C++ and assembly.-Purpose and features:JNI enables one to write native methods to...

 (JNI) wrappers to a local C MPI library, resulting in a hybrid implementation with limited portability, which also has to be compiled against the specific MPI library being used.

However, this original project also defined the mpiJava API (a de-facto MPI API for Java that closely followed the equivalent C++ bindings) which other subsequent Java MPI projects adopted. An alternative, less-used API is MPJ API, designed to be more object-oriented and closer to Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

' coding conventions. Beyond the API, Java MPI libraries can be either dependent on a local MPI library, or implement the message passing functions in Java, while some like P2P-MPI also provide peer-to-peer
Peer-to-peer
Peer-to-peer computing or networking is a distributed application architecture that partitions tasks or workloads among peers. Peers are equally privileged, equipotent participants in the application...

 functionality and allow mixed platform operation.

Some of the most challenging parts of Java/MPI arise from Java characteristics such as the lack of explicit pointers
Data pointer
In computer science, a pointer is a programming language data type whose value refers directly to another value stored elsewhere in the computer memory using its address...

 and the linear memory address space for its objects, which make transferring multidimensional arrays and complex objects inefficient. Workarounds usually involve transferring one line at a time and/or performing explicit de-serialization
Serialization
In computer science, in the context of data storage and transmission, serialization is the process of converting a data structure or object state into a format that can be stored and "resurrected" later in the same or another computer environment...

 and casting at both sending and receiving ends, simulating C or Fortran-like arrays by the use of a one-dimensional array, and pointers to primitive types by the use of single-element arrays, thus resulting in programming styles quite far from Java conventions.

Another Java message passing system is MPJ Express. Recent versions can be executed in cluster and multicore configurations. In the cluster configuration, it can execute parallel Java applications on clusters and clouds. Here Java sockets or specialized I/O interconnects like Myrinet
Myrinet
Myrinet, ANSI/VITA 26-1998, is a high-speed local area networking system designed by Myricom to be used as an interconnect between multiple machines to form computer clusters. Myrinet has much lower protocol overhead than standards such as Ethernet, and therefore provides better throughput, less...

 can support messaging between MPJ Express processes. In the multicore configuration, a parallel Java application is executed on multicore processors. In this mode, MPJ Express processes are represented by Java threads.

Common Language Infrastructure

The two managed Common Language Infrastructure
Common Language Infrastructure
The Common Language Infrastructure is an open specification developed by Microsoft and standardized by ISO and ECMA that describes the executable code and runtime environment that form the core of the Microsoft .NET Framework and the free and open source implementations Mono and Portable.NET...

 (CLI) .NET
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...

 implementations are Pure Mpi.NET and MPI.NET, a research effort at Indiana University
Indiana University
Indiana University is a multi-campus public university system in the state of Indiana, United States. Indiana University has a combined student body of more than 100,000 students, including approximately 42,000 students enrolled at the Indiana University Bloomington campus and approximately 37,000...

 licensed under a BSD-style license. It is compatible with Mono
Mono (software)
Mono, pronounced , is a free and open source project led by Xamarin to create an Ecma standard compliant .NET-compatible set of tools including, among others, a C# compiler and a Common Language Runtime....

, and can make full use of underlying low-latency MPI network fabrics.

Hardware implementations

MPI hardware research focuses on implementing MPI directly in hardware, for example via processor-in-memory
Processor-in-memory
A Processor-in-memory refers to a computer processor tightly coupled to memory, generally on the same silicon chip.The chief goal of merging the processing and memory components in this way is to reduce memory latency and increase bandwidth. Alternatively reducing the distance that data needs to...

, building MPI operations into the microcircuitry of the RAM
Ram
-Animals:*Ram, an uncastrated male sheep*Ram cichlid, a species of freshwater fish endemic to Colombia and Venezuela-Military:*Battering ram*Ramming, a military tactic in which one vehicle runs into another...

 chips in each node. By implication, this approach is independent of the language, OS or CPU, but cannot be readily updated or removed.

Another approach has been to add hardware acceleration to one or more parts of the operation, including hardware processing of MPI queues and using RDMA to directly transfer data between memory and the network interface without CPU or OS kernel intervention.

Example program

Here is a "Hello World" program in MPI written in C. In this example, we send a "hello" message to each processor, manipulate it trivially, return the results to the main process, and print the messages.


/*
"Hello World" MPI Test Program
*/
#include
#include
#include

#define BUFSIZE 128
#define TAG 0

int main(int argc, char *argv[])
{
char idstr[32];
char buff[BUFSIZE];
int numprocs;
int myid;
int i;
MPI_Status stat;

MPI_Init(&argc,&argv); /* all MPI programs start with MPI_Init; all 'N' processes exist thereafter */
MPI_Comm_size(MPI_COMM_WORLD,&numprocs); /* find out how big the SPMD world is */
MPI_Comm_rank(MPI_COMM_WORLD,&myid); /* and this processes' rank is */

/* At this point, all programs are running equivalently, the rank distinguishes
the roles of the programs in the SPMD model, with rank 0 often used specially... */
if(myid 0)
{
printf("%d: We have %d processors\n", myid, numprocs);
for(i=1;i {
sprintf(buff, "Hello %d! ", i);
MPI_Send(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD);
}
for(i=1;i {
MPI_Recv(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD, &stat);
printf("%d: %s\n", myid, buff);
}
}
else
{
/* receive from rank 0: */
MPI_Recv(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD, &stat);
sprintf(idstr, "Processor %d ", myid);
strncat(buff, idstr, BUFSIZE-1);
strncat(buff, "reporting for duty\n", BUFSIZE-1);
/* send to rank 0: */
MPI_Send(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD);
}

MPI_Finalize; /* MPI Programs end with MPI Finalize; this is a weak synchronization point */
return 0;
}


The runtime environment for the MPI implementation used (often called mpirun or mpiexec) spawns multiple copies of the program, with the total number of copies determining the number of process ranks in MPI_COMM_WORLD, which is an opaque descriptor for communication between the set of processes. A single process, multiple data (SPMD
SPMD
In computing, SPMD is a technique employed to achieve parallelism; it is a subcategory of MIMD. Tasks are split up and run simultaneously on multiple processors with different input in order to obtain results faster. SPMD is the most common style of parallel programming...

) programming model is thereby facilitated, but not required; many MPI implementations allow multiple, different, executables to be started in the same MPI job.
Each process has its own rank, the total number of processes in the world, and the ability to communicate between them either with point-to-point (send/receive) communication, or by collective communication among the group. It is enough for MPI to provide an SPMD-style program with MPI_COMM_WORLD, its own rank, and the size of the world to allow algorithms to decide what to do. In more realistic situations, I/O is more carefully managed than in this example. MPI does not guarantee how POSIX
POSIX
POSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...

 I/O would actually work on a given system, but it commonly does work, at least from rank 0.

MPI uses the notion of process rather than processor. Program copies are mapped to processors by the MPI runtime. In that sense, the parallel machine can map to 1 physical processor, or N where N is the total number of processors available, or something in between. For maximum parallel speedup, more physical processors are used. This example adjusts its behavior to the size of the world N, so it also seeks to scale to the runtime configuration without compilation for each size variation, although runtime decisions might vary depending on that absolute amount of concurrency available.
MPI-2 adoption
Adoption of MPI-1.2 has been universal, particularly in cluster computing, but acceptance of MPI-2.1 has been more limited. Issues include:
  1. MPI-2 implementations include I/O and dynamic process management, and the size of the middleware is substantially larger. Most sites that use batch scheduling systems cannot support dynamic process management. MPI-2's parallel I/O is well accepted.
  2. Many MPI-1.2 programs were developed before MPI-2. Portability concerns initially slowed, although wider support has lessened this.
  3. Many MPI-1.2 applications use only a subset of that standard (16-25 functions) with no real need for MPI-2 functionality.

Future
Some aspects of MPI's future appear solid; others less so. The MPI Forum reconvened in 2007, to clarify some MPI-2 issues and explore developments for a possible MPI-3.

Like Fortran, MPI is ubiquitous in technical computing, and it is taught and used widely.

Architectures are changing, with greater internal concurrency (multi-core), better fine-grain concurrency control (threading, affinity), and more levels of memory hierarchy. Multithreaded programs can take advantage of these developments more easily than single threaded applications. This has already yielded separate, complementary standards for symmetric multiprocessing
Symmetric multiprocessing
In computing, symmetric multiprocessing involves a multiprocessor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance. Most common multiprocessor systems today use an SMP architecture...

, namely OpenMP
OpenMP
OpenMP is an API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most processor architectures and operating systems, including Linux, Unix, AIX, Solaris, Mac OS X, and Microsoft Windows platforms...

. MPI-2 defines how standard-conforming implementations should deal with multithreaded issues, but does not require that implementations be multithreaded, or even thread safe. Few multithreaded-capable MPI implementations exist. Multi-level concurrency completely within MPI is an opportunity for the standard.

Improved fault tolerance within MPI would have clear benefits for the growing trend of grid computing
Grid computing
Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files...

.
See also
  • MPICH
    MPICH
    MPICH is a freely available, portable implementation of MPI, a standard for message-passing for distributed-memory applications used in parallel computing...

  • Open MPI
    Open MPI
    Open MPI is a Message Passing Interface library project combining technologies and resources from several other projects . It is used by many TOP500 supercomputers including Roadrunner, which was the world's fastest supercomputer from June 2008 to November 2009, and K computer, the fastest...

  • OpenMP
    OpenMP
    OpenMP is an API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most processor architectures and operating systems, including Linux, Unix, AIX, Solaris, Mac OS X, and Microsoft Windows platforms...

  • OpenHMPP HPC Open Standard for Manycore Programming
  • Intel MPI
  • HP MPI
  • Microsoft Messaging Passing Interface
    Microsoft Messaging Passing Interface
    Microsoft Message Passing Interface is an implementation of the MPI-2 specification by Microsoft for use in Windows HPC Server 2008 to interconnect and communicate between High performance computing nodes. It is mostly compatible with the MPICH2 reference implementation, with some exceptions for...

  • SHMEM
  • Global Arrays
    Global Arrays
    Global Arrays, or GA, is the library developed by scientists at Pacific Northwest National Laboratory for parallel computing. GA provides a friendly API for shared-memory programming on distributed-memory computers for multidimensional arrays...

  • Unified Parallel C
    Unified Parallel C
    Unified Parallel C is an extension of the C programming language designed for high-performance computing on large-scale parallel machines, including those with a common global address space and those with distributed memory...

  • Co-array Fortran
    Co-array Fortran
    Co-array Fortran , formerly known as F--, is an extension of Fortran 95/2003 for parallel processing created by Robert Numrich and John Reid in 1990s...

  • occam (programming language)
    Occam (programming language)
    occam is a concurrent programming language that builds on the Communicating Sequential Processes process algebra, and shares many of its features. It is named after William of Ockham of Occam's Razor fame....

  • Linda (coordination language)
    Linda (coordination language)
    In computer science, Linda is a model of coordination and communication among several parallel processes operating upon objects stored in and retrieved from shared, virtual, associative memory...

  • X10 (programming language)
    X10 (programming language)
    X10 is a programming language being developed by IBM at the Thomas J. Watson Research Center as part of the Productive, Easy-to-use, Reliable Computing System project funded by DARPA's High Productivity Computing Systems program...

  • Parallel Virtual Machine
    Parallel Virtual Machine
    The Parallel Virtual Machine is a software tool for parallel networking of computers. It is designed to allow a network of heterogeneous Unix and/or Windows machines to be used as a single distributed parallel processor. Thus large computational problems can be solved more cost effectively by...

  • Calculus of communicating systems
    Calculus of Communicating Systems
    The Calculus of Communicating Systems is a process calculus introduced by Robin Milner around 1980 and the title of a book describing the calculus. Its actions model indivisible communications between exactly two participants. The formal language includes primitives for describing parallel...

  • Calculus of Broadcasting Systems
    Calculus of Broadcasting Systems
    Calculus of Broadcasting Systems is a CCS-like calculus where processes speak one at a time and each is heard instantaneously by all others. Speech is autonomous, contention between speakers being resolved nondeterministically, but hearing only happens when someone else speaks. Observationally...

  • Actor model
    Actor model
    In computer science, the Actor model is a mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent digital computation: in response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and...

  • Interconnect Driven Server
  • DDT Debugging tool for MPI programs
  • Bulk Synchronous Parallel
    Bulk synchronous parallel
    The Bulk Synchronous Parallel abstract computer is a bridging model for designing parallel algorithms. A bridging model "is intended neither as a hardware nor a programming model but something in between" . It serves a purpose similar to the Parallel Random Access Machine model. BSP differs from...

     BSP Programming
  • Partitioned global address space
    Partitioned global address space
    In computer science, a partitioned global address space is a parallel programming model. It assumes a global memory address space that is logically partitioned and a portion of it is local to each processor. The novelty of PGAS is that the portions of the shared memory space may have an affinity...

  • Caltech Cosmic Cube
    Caltech Cosmic Cube
    The Caltech Cosmic Cube was a parallel computer, developed by Charles Seitz and Geoffrey Fox from 1981 onward.It was an early attempt to capitalise on VLSI to speed up scientific calculations at a reasonable cost...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK