Distributed computing

Distributed computing

Overview
Distributed computing is a field of computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

 that studies distributed systems. A distributed system consists of multiple autonomous computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...

s that communicate through a computer network
Computer network
A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....

. The computers interact with each other in order to achieve a common goal. A computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

 that runs in a distributed system is called a distributed program, and distributed programming is the process of writing such programs.

Distributed computing also refers to the use of distributed systems to solve computational problems. In distributed computing, a problem is divided into many tasks, each of which is solved by one or more computers.

The word distributed in terms such as "distributed system", "distributed programming", and "distributed algorithm" originally referred to computer networks where individual computers were physically distributed within some geographical area.
Discussion
Ask a question about 'Distributed computing'
Start a new discussion about 'Distributed computing'
Answer questions from other users
Full Discussion Forum
 
Unanswered Questions
Recent Discussions
Encyclopedia
Distributed computing is a field of computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

 that studies distributed systems. A distributed system consists of multiple autonomous computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...

s that communicate through a computer network
Computer network
A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....

. The computers interact with each other in order to achieve a common goal. A computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

 that runs in a distributed system is called a distributed program, and distributed programming is the process of writing such programs.

Distributed computing also refers to the use of distributed systems to solve computational problems. In distributed computing, a problem is divided into many tasks, each of which is solved by one or more computers.

Introduction


The word distributed in terms such as "distributed system", "distributed programming", and "distributed algorithm" originally referred to computer networks where individual computers were physically distributed within some geographical area. The terms are nowadays used in a much wider sense, even referring to autonomous processes
Process (computing)
In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system , a process may be made up of multiple threads of execution that execute instructions concurrently.A computer program is a...

 that run on the same physical computer and interact with each other by message passing.

While there is no single definition of a distributed system, the following defining properties are commonly used:
  • There are several autonomous computational entities, each of which has its own local memory.
  • The entities communicate with each other by message passing
    Message passing
    Message passing in computer science is a form of communication used in parallel computing, object-oriented programming, and interprocess communication. In this model, processes or objects can send and receive messages to other processes...

    .

In this article, the computational entities are called computers or nodes
Node (networking)
In communication networks, a node is a connection point, either a redistribution point or a communication endpoint . The definition of a node depends on the network and protocol layer referred to...

.

A distributed system may have a common goal, such as solving a large computational problem. Alternatively, each computer may have its own user with individual needs, and the purpose of the distributed system is to coordinate the use of shared resources or provide communication services to the users.

Other typical properties of distributed systems include the following:
  • The system has to tolerate failures in individual computers.
  • The structure of the system (network topology, network latency, number of computers) is not known in advance, the system may consist of different kinds of computers and network links, and the system may change during the execution of a distributed program.
  • Each computer has only a limited, incomplete view of the system. Each computer may know only one part of the input.

Parallel and distributed computing


Distributed systems are groups of networked computers, which have the same goal for their work.
The terms "concurrent computing
Concurrent computing
Concurrent computing is a form of computing in which programs are designed as collections of interacting computational processes that may be executed in parallel...

", "parallel computing
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

", and "distributed computing" have a lot of overlap, and no clear distinction exists between them. The same system may be characterised both as "parallel" and "distributed"; the processors in a typical distributed system run concurrently in parallel. Parallel computing may be seen as a particular tightly-coupled form of distributed computing, and distributed computing may be seen as a loosely-coupled form of parallel computing. Nevertheless, it is possible to roughly classify concurrent systems as "parallel" or "distributed" using the following criteria:
  • In parallel computing, all processors have access to a shared memory
    Shared memory
    In computing, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Depending on context, programs may run on a single processor or on multiple separate processors...

    . Shared memory can be used to exchange information between processors.
  • In distributed computing, each processor has its own private memory (distributed memory
    Distributed memory
    In computer science, distributed memory refers to a multiple-processor computer system in which each processor has its own private memory. Computational tasks can only operate on local data, and if remote data is required, the computational task must communicate with one or more remote processors...

    ). Information is exchanged by passing messages between the processors.

The figure on the right illustrates the difference between distributed and parallel systems. Figure (a) is a schematic view of a typical distributed system; as usual, the system is represented as a network topology in which each node is a computer and each line connecting the nodes is a communication link. Figure (b) shows the same distributed system in more detail: each computer has its own local memory, and information can be exchanged only by passing messages from one node to another by using the available communication links. Figure (c) shows a parallel system in which each processor has a direct access to a shared memory.

The situation is further complicated by the traditional uses of the terms parallel and distributed algorithm that do not quite match the above definitions of parallel and distributed systems; see the section Theoretical foundations below for more detailed discussion. Nevertheless, as a rule of thumb, high-performance parallel computation in a shared-memory multiprocessor uses parallel algorithms while the coordination of a large-scale distributed system uses distributed algorithms.

History


The use of concurrent processes that communicate by message-passing has its roots in operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

 architectures studied in the 1960s. The first widespread distributed systems were local-area networks such as Ethernet
Ethernet
Ethernet is a family of computer networking technologies for local area networks commercially introduced in 1980. Standardized in IEEE 802.3, Ethernet has largely replaced competing wired LAN technologies....

 that was invented in the 1970s.

ARPANET
ARPANET
The Advanced Research Projects Agency Network , was the world's first operational packet switching network and the core network of a set that came to compose the global Internet...

, the predecessor of the Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...

, was introduced in the late 1960s, and ARPANET e-mail
E-mail
Electronic mail, commonly known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Modern email operates across the Internet or other computer networks. Some early email systems required that the author and the recipient both be online at the...

 was invented in the early 1970s. E-mail became the most successful application of ARPANET, and it is probably the earliest example of a large-scale distributed application. In addition to ARPANET, and its successor, the Internet, other early worldwide computer networks included Usenet
Usenet
Usenet is a worldwide distributed Internet discussion system. It developed from the general purpose UUCP architecture of the same name.Duke University graduate students Tom Truscott and Jim Ellis conceived the idea in 1979 and it was established in 1980...

 and FidoNet
FidoNet
FidoNet is a worldwide computer network that is used for communication between bulletin board systems. It was most popular in the early to mid 1990s, prior to the introduction of easy and affordable access to the Internet...

 from 1980s, both of which were used to support distributed discussion systems.

The study of distributed computing became its own branch of computer science in the late 1970s and early 1980s. The first conference in the field, Symposium on Principles of Distributed Computing
Symposium on Principles of Distributed Computing
The Symposium on Principles of Distributed Computing is an academic conference in the field of distributed computing organised annually by the Association for Computing Machinery ....

 (PODC), dates back to 1982, and its European counterpart International Symposium on Distributed Computing
International Symposium on Distributed Computing
The International Symposium on Distributed Computing is an annual academic conference for refereed presentations, whose focus is the theory, design, analysis, implementation, and application of distributed systems and networks. The Symposium is organized in association with the European...

 (DISC) was first held in 1985.

Applications


There are two main reasons for using distributed systems and distributed computing. First, the very nature of the application may require the use of a communication network that connects several computers. For example, data is produced in one physical location and it is needed in another location.

Second, there are many cases in which the use of a single computer would be possible in principle, but the use of a distributed system is beneficial for practical reasons. For example, it may be more cost-efficient to obtain the desired level of performance by using a cluster
Cluster (computing)
A computer cluster is a group of linked computers, working together closely thus in many respects forming a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks...

 of several low-end computers, in comparison with a single high-end computer. A distributed system can be more reliable than a non-distributed system, as there is no single point of failure
Single point of failure
A single point of failure is a part of a system that, if it fails, will stop the entire system from working. They are undesirable in any system with a goal of high availability or reliability, be it a business practice, software application, or other industrial system.-Overview:Systems can be made...

. Moreover, a distributed system may be easier to expand and manage than a monolithic uniprocessor system.

Examples of distributed systems and applications of distributed computing include the following:
  • Telecommunication
    Telecommunication
    Telecommunication is the transmission of information over significant distances to communicate. In earlier times, telecommunications involved the use of visual signals, such as beacons, smoke signals, semaphore telegraphs, signal flags, and optical heliographs, or audio messages via coded...

     networks:
    • Telephone network
      Telephone network
      A telephone network is a telecommunications network used for telephone calls between two or more parties.There are a number of different types of telephone network:...

      s and cellular network
      Cellular network
      A cellular network is a radio network distributed over land areas called cells, each served by at least one fixed-location transceiver known as a cell site or base station. When joined together these cells provide radio coverage over a wide geographic area...

      s.
    • Computer network
      Computer network
      A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....

      s such as the Internet
      Internet
      The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...

      .
    • Wireless sensor networks.
    • Routing algorithms.
  • Network applications:
    • World wide web
      World Wide Web
      The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...

       and peer-to-peer networks.
    • Massively multiplayer online game
      Massively multiplayer online game
      A massively multiplayer online game is a multiplayer video game which is capable of supporting hundreds or thousands of players simultaneously. By necessity, they are played on the Internet, and usually feature at least one persistent world. They are, however, not necessarily games played on...

      s and virtual reality
      Virtual reality
      Virtual reality , also known as virtuality, is a term that applies to computer-simulated environments that can simulate physical presence in places in the real world, as well as in imaginary worlds...

       communities.
    • Distributed databases and distributed database management system
      Distributed database management system
      A distributed database management system is a software system that permits the management of a distributed database and makes the distribution transparent to the users. A distributed database is a collection of multiple, logically interrelated databases distributed over a computer network...

      s.
    • Network file system
      Distributed file system
      Network file system may refer to:* A distributed file system, which is accessed over a computer network* Network File System , a specific brand of distributed file system...

      s.
    • Distributed information processing systems such as banking systems and airline reservation systems.
  • Real-time process control:
    • Aircraft
      Aircraft
      An aircraft is a vehicle that is able to fly by gaining support from the air, or, in general, the atmosphere of a planet. An aircraft counters the force of gravity by using either static lift or by using the dynamic lift of an airfoil, or in a few cases the downward thrust from jet engines.Although...

       control systems.
    • Industrial control systems
      Industrial Control Systems
      Industrial control system is a general term that encompasses several types of control systems used in industrial production, including supervisory control and data acquisition systems, distributed control systems , and other smaller control system configurations such as skid-mounted programmable...

      .
  • Parallel computation:
    • Scientific computing, including cluster computing
      Cluster Computing
      Cluster Computing: the Journal of Networks, Software Tools and Applications is a journal for parallel processing, distributed computing systems, and computer communication networks....

       and grid computing
      Grid computing
      Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files...

       and various volunteer computing
      Volunteer computing
      Volunteer computing is a type of distributed computing in which computer owners donate their computing resources to one or more "projects".-History:...

       projects; see the list of distributed computing projects.
    • Distributed rendering in computer graphics.

Models


Many tasks that we would like to automate by using a computer are of question–answer type: we would like to ask a question and the computer should produce an answer. In theoretical computer science
Theoretical computer science
Theoretical computer science is a division or subset of general computer science and mathematics which focuses on more abstract or mathematical aspects of computing....

, such tasks are called computational problem
Computational problem
In theoretical computer science, a computational problem is a mathematical object representing a collection of questions that computers might want to solve. For example, the problem of factoring...

s. Formally, a computational problem consists of instances together with a solution for each instance. Instances are questions that we can ask, and solutions are desired answers to these questions.

Theoretical computer science seeks to understand which computational problems can be solved by using a computer (computability theory
Computability theory (computer science)
Computability is the ability to solve a problem in an effective manner. It is a key topic of the field of computability theory within mathematical logic and the theory of computation within computer science...

) and how efficiently (computational complexity theory
Computational complexity theory
Computational complexity theory is a branch of the theory of computation in theoretical computer science and mathematics that focuses on classifying computational problems according to their inherent difficulty, and relating those classes to each other...

). Traditionally, it is said that a problem can be solved by using a computer if we can design an algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

 that produces a correct solution for any given instance. Such an algorithm can be implemented as a computer program
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

 that runs on a general-purpose computer: the program reads a problem instance from input
Information
Information in its most restricted technical sense is a message or collection of messages that consists of an ordered sequence of symbols, or it is the meaning that can be interpreted from such a message or collection of messages. Information can be recorded or transmitted. It can be recorded as...

, performs some computation, and produces the solution as output
Output
Output is the term denoting either an exit or changes which exit a system and which activate/modify a process. It is an abstract concept, used in the modeling, system design and system exploitation.-In control theory:...

. Formalisms such as random access machine
Random access machine
In computer science, random access machine is an abstract machine in the general class of register machines. The RAM is very similar to the counter machine but with the added capability of 'indirect addressing' of its registers...

s or universal Turing machine
Universal Turing machine
In computer science, a universal Turing machine is a Turing machine that can simulate an arbitrary Turing machine on arbitrary input. The universal machine essentially achieves this by reading both the description of the machine to be simulated as well as the input thereof from its own tape. Alan...

s can be used as abstract models of a sequential general-purpose computer executing such an algorithm.

The field of concurrent and distributed computing studies similar questions in the case of either multiple computers, or a computer that executes a network of interacting processes: which computational problems can be solved in such a network and how efficiently? However, it is not at all obvious what is meant by “solving a problem” in the case of a concurrent or distributed system: for example, what is the task of the algorithm designer, and what is the concurrent or distributed equivalent of a sequential general-purpose computer?

The discussion below focusses on the case of multiple computers, although many of the issues are the same for concurrent processes running on a single computer.

Three viewpoints are commonly used:
Parallel algorithms in shared-memory model
  • All computers have access to a shared memory. The algorithm designer chooses the program executed by each computer.
  • One theoretical model is the parallel random access machine
    Parallel Random Access Machine
    In computer science, Parallel Random Access Machine is a shared memory abstract machine. As its name indicates, the PRAM was intended as the parallel computing analogy to the random access machine...

    s (PRAM) that are used. However, the classical PRAM model assumes synchronous access to the shared memory.
  • A model that is closer to the behavior of real-world multiprocessor machines and takes into account the use of machine instructions, such as Compare-and-swap
    Compare-and-swap
    In computer science, the compare-and-swap CPU instruction is a special instruction that atomically compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a given new value...

     (CAS), is that of asynchronous shared memory. There is a wide body of work on this model, a summary of which can be found in the literature.


Parallel algorithms in message-passing model
  • The algorithm designer chooses the structure of the network, as well as the program executed by each computer.
  • Models such as Boolean circuits and sorting network
    Sorting network
    A sorting network is an abstract mathematical model of a network of wires and comparator modules that is used to sort a sequence of numbers. Each comparator connects two wires and sorts the values by outputting the smaller value to one wire, and the larger to the other...

    s are used. A Boolean circuit can be seen as a computer network: each gate is a computer that runs an extremely simple computer program. Similarly, a sorting network can be seen as a computer network: each comparator is a computer.

Distributed algorithms in message-passing model
  • The algorithm designer only chooses the computer program. All computers run the same program. The system must work correctly regardless of the structure of the network.
  • A commonly used model is a graph
    Graph (mathematics)
    In mathematics, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges...

     with one finite-state machine per node.


In the case of distributed algorithms, computational problems are typically related to graphs. Often the graph that describes the structure of the computer network is the problem instance. This is illustrated in the following example.

An example


Consider the computational problem of finding a coloring of a given graph G. Different fields might take the following approaches:
Centralized algorithms
  • The graph G is encoded as a string, and the string is given as input to a computer. The computer program finds a coloring of the graph, encodes the coloring as a string, and outputs the result.

Parallel algorithms
  • Again, the graph G is encoded as a string. However, multiple computers can access the same string in parallel. Each computer might focus on one part of the graph and produce a colouring for that part.
  • The main focus is on high-performance computation that exploits the processing power of multiple computers in parallel.

Distributed algorithms
  • The graph G is the structure of the computer network. There is one computer for each node of G and one communication link for each edge of G. Initially, each computer only knows about its immediate neighbours in the graph G; the computers must exchange messages with each other to discover more about the structure of G. Each computer must produce its own colour as output.
  • The main focus is on coordinating the operation of an arbitrary distributed system.


While the field of parallel algorithms has a different focus than the field of distributed algorithms, there is a lot of interaction between the two fields. For example, the Cole–Vishkin algorithm for graph colouring was originally presented as a parallel algorithm, but the same technique can also be used directly as a distributed algorithm.

Moreover, a parallel algorithm can be implemented either in a parallel system (using shared memory) or in a distributed system (using message passing). The traditional boundary between parallel and distributed algorithms (choose a suitable network vs. run in any given network) does not lie in the same place as the boundary between parallel and distributed systems (shared memory vs. message passing).

Complexity measures



In parallel algorithms, yet another resource in addition to time and space is the number of computers. Indeed, often there is a trade-off between the running time and the number of computers: the problem can be solved faster if there are more computers running in parallel (see speedup). If a decision problem can be solved in polylogarithmic time by using a polynomial number of processors, then the problem is said to be in the class NC
NC (complexity)
In complexity theory, the class NC is the set of decision problems decidable in polylogarithmic time on a parallel computer with a polynomial number of processors. In other words, a problem is in NC if there exist constants c and k such that it can be solved in time O using O parallel processors...

. The class NC can be defined equally well by using the PRAM formalism or Boolean circuits – PRAM machines can simulate Boolean circuits efficiently and vice versa.

In the analysis of distributed algorithms, more attention is usually paid on communication operations than computational steps. Perhaps the simplest model of distributed computing is a synchronous system where all nodes operate in a lockstep fashion. During each communication round, all nodes in parallel (1) receive the latest messages from their neighbours, (2) perform arbitrary local computation, and (3) send new messages to their neighbours. In such systems, a central complexity measure is the number of synchronous communication rounds required to complete the task.

This complexity measure is closely related to the diameter of the network. Let D be the diameter of the network. On the one hand, any computable problem can be solved trivially in a synchronous distributed system in approximately 2D communication rounds: simply gather all information in one location (D rounds), solve the problem, and inform each node about the solution (D rounds).

On the other hand, if the running time of the algorithm is much smaller than D communication rounds, then the nodes in the network must produce their output without having the possibility to obtain information about distant parts of the network. In other words, the nodes must make globally consistent decisions based on information that is available in their local neighbourhood. Many distributed algorithms are known with the running time much smaller than D rounds, and understanding which problems can be solved by such algorithms is one of the central research questions of the field.

Other commonly used measures are the total number of bits transmitted in the network (cf. communication complexity
Communication complexity
The notion of communication complexity was introduced by Yao in 1979,who investigated the following problem involving two separated parties . Alice receives an n-bit string x and Bob another n-bit string y, and the goal is for one of them to compute a certain function f with the least amount of...

).

Other problems


Traditional computational problems take the perspective that we ask a question, a computer (or a distributed system) processes the question for a while, and then produces an answer and stops. However, there are also problems where we do not want the system to ever stop. Examples of such problems include the dining philosophers problem
Dining philosophers problem
In computer science, the dining philosophers problem is an example problem often used in concurrent algorithm design to illustrate synchronization issues and techniques for resolving them....

 and other similar mutual exclusion
Mutual exclusion
Mutual exclusion algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections. A critical section is a piece of code in which a process or thread accesses a common resource...

 problems. In these problems, the distributed system is supposed to continuously coordinate the use of shared resources so that no conflicts or deadlock
Deadlock
A deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg"...

s occur.

There are also fundamental challenges that are unique to distributed computing. The first example is challenges that are related to fault-tolerance. Examples of related problems include consensus problems
Consensus (computer science)
Consensus is a problem in distributed computing that encapsulates the task of group agreement in the presence of faults.In particular, any process in the group may fail at any time. Consensus is fundamental to core techniques in fault tolerance, such as state machine replication.- Problem...

, Byzantine fault tolerance
Byzantine fault tolerance
Byzantine fault tolerance is a sub-field of fault tolerance research inspired by the Byzantine Generals' Problem, which is a generalized version of the Two Generals' Problem....

, and self-stabilisation.

A lot of research is also focused on understanding the asynchronous nature of distributed systems:
  • Synchronizers
    Synchronizer (algorithm)
    In computer science, a synchronizer is an algorithm that can be used to run a synchronous algorithm on top of an asynchronous processor network, so enabling the asynchronous system to run as a synchronous network....

     can be used to run synchronous algorithms in asynchronous systems.
  • Logical clock
    Logical clock
    A logical clock is a mechanism for capturing chronological and causal relationships in a distributed system.Logical clock algorithms of note are:* Lamport timestamps, which are monotonically increasing software counters....

    s provide a causal happened-before
    Happened-before
    In computer science, the happened-before relation is a means of ordering events based on the potential causal relationship of pairs of events in a concurrent system, especially asynchronous distributed systems...

     ordering of events.
  • Clock synchronization
    Clock synchronization
    Clock synchronization is a problem from computer science and engineering which deals with the idea that internal clocks of several computers may differ. Even when initially set accurately, real clocks will differ after some amount of time due to clock drift, caused by clocks counting time at...

     algorithms provide globally consistent physical time stamps.

Properties of distributed systems


So far the focus has been on designing a distributed system that solves a given problem. A complementary research problem is studying the properties of a given distributed system.

The halting problem
Halting problem
In computability theory, the halting problem can be stated as follows: Given a description of a computer program, decide whether the program finishes running or continues to run forever...

 is an analogous example from the field of centralised computation: we are given a computer program and the task is to decide whether it halts or runs forever. The halting problem is undecidable
Undecidable problem
In computability theory and computational complexity theory, an undecidable problem is a decision problem for which it is impossible to construct a single algorithm that always leads to a correct yes-or-no answer....

 in the general case, and naturally understanding the behaviour of a computer network is at least as hard as understanding the behaviour of one computer.

However, there are many interesting special cases that are decidable. In particular, it is possible to reason about the behaviour of a network of finite-state machines. One example is telling whether a given network of interacting (asynchronous and non-deterministic) finite-state machines can reach a deadlock. This problem is PSPACE-complete
PSPACE-complete
In complexity theory, a decision problem is PSPACE-complete if it is in the complexity class PSPACE, and every problem in PSPACE can be reduced to it in polynomial time...

, i.e., it is decidable, but it is not likely that there is an efficient (centralised, parallel or distributed) algorithm that solves the problem in the case of large networks.

Architectures


Various hardware and software architectures are used for distributed computing. At a lower level, it is necessary to interconnect multiple CPUs with some sort of network, regardless of whether that network is printed onto a circuit board or made up of loosely-coupled devices and cables. At a higher level, it is necessary to interconnect processes
Process (computing)
In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system , a process may be made up of multiple threads of execution that execute instructions concurrently.A computer program is a...

 running on those CPUs with some sort of communication system.

Distributed programming typically falls into one of several basic architectures or categories: client–server, 3-tier architecture, n-tier architecture
Multitier architecture
In software engineering, multi-tier architecture is a client–server architecture in which the presentation, the application processing, and the data management are logically separate processes. For example, an application that uses middleware to service data requests between a user and a database...

, distributed object
Distributed object
The term distributed objects usually refers to software modules that are designed to work together, but reside either in multiple computers connected via a network or in different processes inside the same computer. One object sends a message to another object in a remote machine or process to...

s, loose coupling
Loose coupling
In computing and systems design a loosely coupled system is one where each of its components has, or makes use of, little or no knowledge of the definitions of other separate components. The notion was introduced into organizational studies by Karl Weick...

, or tight coupling.
  • Client–server: Smart client code contacts the server for data then formats and displays it to the user. Input at the client is committed back to the server when it represents a permanent change.
  • 3-tier architecture: Three tier systems move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment. Most web applications are 3-Tier.
  • n-tier architecture
    Multitier architecture
    In software engineering, multi-tier architecture is a client–server architecture in which the presentation, the application processing, and the data management are logically separate processes. For example, an application that uses middleware to service data requests between a user and a database...

    : n-tier refers typically to web applications which further forward their requests to other enterprise services. This type of application is the one most responsible for the success of application server
    Application server
    An application server is a software framework that provides an environment in which applications can run, no matter what the applications are or what they do...

    s.
  • Tightly coupled (clustered): refers typically to a cluster of machines that closely work together, running a shared process in parallel. The task is subdivided in parts that are made individually by each one and then put back together to make the final result.
  • Peer-to-peer
    Peer-to-peer
    Peer-to-peer computing or networking is a distributed application architecture that partitions tasks or workloads among peers. Peers are equally privileged, equipotent participants in the application...

    : an architecture where there is no special machine or machines that provide a service or manage the network resources. Instead all responsibilities are uniformly divided among all machines, known as peers. Peers can serve both as clients and servers.
  • Space based: refers to an infrastructure that creates the illusion (virtualization) of one single address-space. Data are transparently replicated according to application needs. Decoupling in time, space and reference is achieved.


Another basic aspect of distributed computing architecture is the method of communicating and coordinating work among concurrent processes. Through various message passing protocols, processes may communicate directly with one another, typically in a master/slave relationship. Alternatively, a "database-centric" architecture
Database-centric architecture
Database-centric architecture or data-centric architecture has several distinct meanings, generally relating to software architectures in which databases play a crucial role. Often this description is meant to contrast the design to an alternative approach...

 can enable distributed computing to be done without any form of direct inter-process communication
Inter-process communication
In computing, Inter-process communication is a set of methods for the exchange of data among multiple threads in one or more processes. Processes may be running on one or more computers connected by a network. IPC methods are divided into methods for message passing, synchronization, shared...

, by utilizing a shared database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

.

See also


Further reading


Books ISBN 0471453242.: Java Distributed Computing by Jim Faber, 1998 ISBN 0471036005.

Articles.

Conference Papers
  • C. Rodríguez, M. Villagra and B. Barán, , Bionetics2007, pp. 66–69, 2007.