Distributed operating system
Encyclopedia
A distributed operating system is the logical aggregation of operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

 software over a collection of independent, networked
Computer network
A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information....

, communicating
Inter-process communication
In computing, Inter-process communication is a set of methods for the exchange of data among multiple threads in one or more processes. Processes may be running on one or more computers connected by a network. IPC methods are divided into methods for message passing, synchronization, shared...

, and spatially disseminated computational nodes. Individual system nodes each hold a discrete software subset of the global aggregate operating system. Each node-level software subset is a composition of two distinct provisioners of services.

The first is a ubiquitous minimal kernel
Kernel (computing)
In computing, the kernel is the main component of most computer operating systems; it is a bridge between applications and the actual data processing done at the hardware level. The kernel's responsibilities include managing the system's resources...

, or microkernel
Microkernel
In computer science, a microkernel is the near-minimum amount of software that can provide the mechanisms needed to implement an operating system . These mechanisms include low-level address space management, thread management, and inter-process communication...

, situated directly above each node’s hardware. The microkernel provides only the necessary mechanisms for a node's functionality. Second is a higher-level collection of system management components, providing all necessary policies for a node's individual and collaborative activities. This collection of management components exists immediately above the microkernel, and below any user applications or APIs that might reside at higher levels.

These two entities, the microkernel and the management components collection, work together. They support the global system’s goal of seamlessly integrating all network-connected resources and processing functionality into an efficient, available, and unified system. This seamless integration of individual nodes into a global system is referred to as transparency, or single system image; describing the illusion provided to users of the global system’s appearance as a singular and local computational entity.

A system within a system

A Distributed operating system is an operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

. This statement may be trivial, but it is not always overt and obvious because the distributed operating system is such an integral part of the distributed system. This idea is synonymous to the consideration of a square. A square might not immediately be recognized as a rectangle. Although possessing all requisite attributes
Attribute (computing)
In computing, an attribute is a specification that defines a property of an object, element, or file. It may also refer to or set the specific value for a given instance of such....

 defining a rectangle, a square’s additional attributes and specific configuration provide a disguise. At its core, the distributed operating system provides only the essential services and minimal functionality required of an operating system, but its additional attributes and particular configuration
Computer configuration
In communications or computer systems, a configuration is an arrangement of functional units according to their nature, number, and chief characteristics. Often, configuration pertains to the choice of hardware, software, firmware, and documentation...

 make it different. The Distributed operating system fulfills its role as operating system; and does so in a manner indistinguishable from a centralized, monolithic operating system
Monolithic kernel
A monolithic kernel is an operating system architecture where the entire operating system is working in the kernel space and alone as supervisor mode...

. That is, although distributed in nature, it supports a single system image through the implementation of transparency; or more simply said, the system’s appearance as a singular, local entity.

Distributed operating system essentials

A distributed operating system, illustrated in a similar fashion, would be a container suggesting minimal operating system functionality and scope. This container would completely cover all disseminated hardware resources, defining the system-level. The container would extend across the system, supporting a layer of modular software components existing in the user-level. These software components supplement the distributed operating system with a configurable set of added services, usually integrated within the monolithic operating system (and the system-level). This division of minimal system-level function from additional user-level modular services provides a “separation of mechanism and policy
Separation of mechanism and policy
The separation of mechanism and policy is a design principle in computer science. It states that mechanisms should not dictate the policies according to which decisions are made about which operations to authorize, and which resources to...

.” Mechanism and policy can be simply interpreted as "how something is done" versus "why something is done," respectively. Achieving this separation allows for an exceptionally loosely coupled, flexible, and scalable distributed operating system.

The kernel

The kernel is a minimal, but complete set of node-level utilities necessary for access to a node’s underlying hardware and resources. These mechanisms provide the complete set of “building-blocks” essential for node operation; mainly low-level allocation, management, and disposition of a node’s resources, processes, communication, and I/O management support functions. These functions are made possible by exposing a concise, yet comprehensive array of primitive mechanisms and services. The kernel is arguably the primary consideration in a distributed operating system; however, within the kernel, the subject of foremost importance is that of a well-structured and highly efficient communications sub-system.

In a distributed operating system, the kernel is often defined by a relatively minimal architecture. A kernel of this design is referred to as a Microkernel. The microkernel usually contains only the mechanisms and services which, if otherwise removed, would render a node or the global system functionally inoperable. The minimal nature of the microkernel strongly enhances a distributed operating system’s modular potential. It is generally the case that the microkernel is implemented directly above its node’s hardware and resources; it is also common for a kernel to be identically replicated over all nodes in a system. The combination of a microkernel’s minimal design and ubiquitous node coverage enhances the global system's extensibility, and the ability to dynamically introduce new nodes or services.

System management components

A node’s system management components are a collection of software server processes that define the policies of a system node. These components are the composite of a node’s system software not directly required within the kernel. These software services support all of the needs of the node; namely communication, process and resource management, reliability, performance, and security to mention just a few. In this capacity, system management components compare directly to the centralized operating software of a single-entity system.

However, these system management components have additional challenges with respect to supporting a node's responsibilities to the global system. In addition, the system management components accept the defensive responsibilities of reliability, availability, and persistence inherent in the distributed operating system. Quite often, any effort to realize a high level of success in a particular area incites conflict with similar efforts in other areas. Therefore, a consistent approach, balanced perspective, and a deep understanding of the overall system and its goals can help mitigate some complexity, and assist in quickly identifying potential points of diminishing returns. This is an example of why the separation of policy and mechanism is so critical.

Working together as an operating system

The architecture and design of a distributed operating system is specifically aligned with realizing both individual node and global system goals. Any architecture or design must be approached in a manner consistent with separating policy and mechanism. In doing so, a distributed operating system attempts to provide a highly efficient and reliable distributed computing framework allowing for an absolute minimal user awareness of the underlying command and control efforts.

The multi-level collaboration between a kernel and the system management components, and in turn between the distinct nodes in a distributed operating system is the functional challenge of the distributed operating system. This is the point in the system that must maintain a perfect harmony of purpose, and simultaneously maintain a complete disconnect of intent from implementation. This challenge is the distributed operating system's opportunity, to produce the foundation and framework for a reliable, efficient, available, robust, extensible, and scalable system. However, this opportunity comes at a very high cost in complexity.

The price of complexity

In a distributed operating system, the exceptional degree of inherent complexity could easily render the entire system an anathema to any user. As such, the logical price of realizing a distributed operation system must be calculated in terms of overcoming vast amounts of complexity in many areas, and on many levels. This calculation includes the depth, breadth, and range of design investment and architectural planning required in achieving even the most modest implementation.

These design and development considerations are critical and unforgiving. For instance, a deep understanding of a distributed operating system’s overall architectural and design detail is required at an exceptionally early point. There are an exhaustive array of design considerations inherent in the development of a distributed operating system. Each of these design considerations can potentially affect many of the others to a significant degree. This leads to a massive effort in balanced approach, in terms of the individual design considerations, and many of their permutations. As an aid in this effort, most rely strongly on the immense amount of documented experience and research in distributed computing which exists, and continues even today.

Perspectives: past, present, and future

Research and experimentation efforts began in earnest in the 1970s and continued through 1990s, with focused interest peaking in the late 1980s. A number of distributed operating systems were introduced during this period; however, very few of these implementations achieved even modest commercial success.

The subject of distributed operating systems however, has a much richer historical perspective. This is especially evident when considering distributed operating system design issues severally, and with respect to some of the primordial strides taken towards their realization. There are several instances of fundamental and pioneering implementations of primitive distributed operating system component concepts dating back to the early 1950s. Some of these very early individual steps were not focused directly on distributed computing, and at the time, many may not have realized their important impact. These pioneering efforts laid important groundwork, and inspired continued research in areas related to distributed computing.

Beginning in the mid-1970s, many important research efforts produced extremely important advances in distributed computing. These breakthroughs provided a solid, stable foundation for the continued efforts through the 1990s, mentioned earlier. Considering the modern distributed operating system and its future, one must look no further than the current incredible challenges of many-core and multi-processor science. The accelerating proliferation of multi-processor
Multiprocessing
Multiprocessing is the use of two or more central processing units within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them...

 and multi-core processor systems research has led to a resurgence of the distributed operating system concept. Many of these research efforts are investigating interesting, exciting, and plausible paradigms impacting the future of distributed computing.

The nature of distribution


The unique nature of the Distributed operating system is both subtle and complex. A distributed operating system’s hardware infrastructure elements are not centralized; that is, the elements do not have a tight proximity to one another at a single location. A given distributed operating system’s structure elements could reside in various rooms within a building, or in various buildings around the world. This geographically spatial dissemination defines its decentralization; however, the distributed operating system is distributed, not simply decentralized.

This distinction is the source of the subtlety and complexity. While decentralized systems and distributed operating systems are both spatially diverse, it is the specific manner of and relative degree in linkage between the elements, or nodes in the systems that differentiate the two. In the case of these two types of operating system, these linkages are the lines of communication
Inter-process communication
In computing, Inter-process communication is a set of methods for the exchange of data among multiple threads in one or more processes. Processes may be running on one or more computers connected by a network. IPC methods are divided into methods for message passing, synchronization, shared...

 between the nodes of the system.

Three basic distributions

To better illustrate this point, examine three system architectures
Software architecture
The software architecture of a system is the set of structures needed to reason about the system, which comprise software elements, relations among them, and properties of both...

; centralized, decentralized, and distributed. In this examination, consider three tightly related aspects of their structure: organization, connection, and control. Organization will describe a system's physical arrangement characteristics, connection will involve the associations among constituent structural entities, and control will correlate the manner, necessity, and rationale of the earlier two considerations.

Organization

Firstly, the subject of organization. A centralized system
Centralized system
In telecommunications, a centralized, or centralised system, is one in which most communications are routed through one or more major central hubs. Such a system allows certain functions to be concentrated in the system's hubs, freeing up resources in the peripheral units...

 is organized most simply; basically one real level of structure, where all constituent elements are highly influenced by and are ultimately dependent upon this organization. The decentralized system is a more federated structure
Federation
A federation , also known as a federal state, is a type of sovereign state characterized by a union of partially self-governing states or regions united by a central government...

 composed of multiple levels, where subsets of a system’s entities unite. These entity subsets in turn unite at higher levels, in the direction of and ultimately culminating at the central master element. The (purely) distributed system has no discernable concept of, or indeed any necessity for levels; it is purely an autonomous collection of discrete elements.

It is important to note that all of these systems are distributed, in that they comprise separate and distinct constituent elements connected together to form a system. This is a generic idea of the distributed organization of system elements; however, a distributed system is a quite specific entity unto itself. It is this distributed system concept that will be approached in the following sub-sections.

Connection

Association linkages between elements are the second consideration. In each case, physical association is inextricably linked (or disconnected), to conceptual organization. The centralized system has its constituent members directly united to a central master entity. One could conceptualize holding a bunch of balloon
Balloon
A balloon is an inflatable flexible bag filled with a gas, such as helium, hydrogen, nitrous oxide, oxygen, or air. Modern balloons can be made from materials such as rubber, latex, polychloroprene, or a nylon fabric, while some early balloons were made of dried animal bladders, such as the pig...

s -- each on a string, -- with the hand being the central figure. A decentralized system (or network system
Network operating system
A networking operating system , also referred to as the Dialoguer, is the software that runs on a server and enables the server to manage data, users, groups, security, applications, and other networking functions...

) incorporates a single-step direct, or multi-step indirect path between any given constituent element and the central entity. This can be understood by thinking of a corporate organizational chart, the first level connecting directly, and lower levels connecting indirectly through successively higher levels (no lateral “dotted” lines). Finally, the distributed operating system has no inherent pattern; direct and indirect connections are possible between any two given elements of the system. Consider the 1970s phenomena of “string art
String art
String art, or pin and thread art, is characterized by an arrangement of colored thread strung between points to form abstract geometric patterns or representational designs such as a ship's sails, sometimes with other artist material comprising the remainder of the work. Thread, wire, or string is...

” or a spirograph
Spirograph
Spirograph is a geometric drawing toy that produces mathematical curves of the variety technically known as hypotrochoids and epitrochoids. The term has also been used to describe a variety of software applications that display similar curves, and applied to the class of curves that can be produced...

 drawing as a fully connected system, and the spider’s web
Spider web
A spider web, spiderweb, spider's web or cobweb is a device built by a spider out of proteinaceous spider silk extruded from its spinnerets....

 or the Interstate Highway System
Interstate Highway System
The Dwight D. Eisenhower National System of Interstate and Defense Highways, , is a network of limited-access roads including freeways, highways, and expressways forming part of the National Highway System of the United States of America...

 between U.S. cities as examples of a partially connected system.

Control

Notice, that the centralized and decentralized systems have distinctly directed flows of connection towards the central entity, while the distributed operating system is in no way influenced specifically by virtue of its organization. This is the pivotal notion of the third consideration. What correlations exist between a system’s organization, and its associations? In all three cases, it is an extremely delicate balance between the administration of processes, and the scope and extensibility of those processes. More simply said, it is about the control required or dictated by a particular organization and its collection of connections.

Notice that in the directed systems (centralized and decentralized) there is more control, therefore easing the administration of processes, but constraining their possible scope of influence. On the other hand, the distributed operating system — without directed connections — is much more difficult to control, but is effectively limited in extensible scope only by the capabilities of its individual, autonomous, and interdependent nodes. The associations of the distributed operating system conform only to the needs imposed by its many design considerations, and not in any way by organizational limitations.

Transparency

Transparency, is the quality of a distributed operating system to be seen and understood as a single-system image, and is the greatest overriding consideration in the high-level conceptual design of a distributed operating system. While a simple concept, the consideration of transparency directly effects decision making in every aspect of design of a distributed operating system. Depending on the degree to which transparency is implemented into a system, certain requirements and/or restrictions may be imposed upon the many other design considerations, and their relationships.

Transparency allows a user to accomplish a system-related objective with absolute minimal knowledge of the particular internal details related to the objective. A system or application may expose as much, or as little transparency in a given area of functionality as deemed necessary. That is to say, the degree to which transparency is implemented can vary between subsets of functionality in a system or application. There are many specific areas of a system that can benefit from transparency; access, location, performance, naming, and migration to name a few.

For example, a distributed operating system may present access to a hard drive as "C:" and access to a DVD as "G:". The user does not require any knowledge of device drivers or methods of direct memory access techniques possibly used behind-the-scenes; both devices work the same way, from the user's perspective. This example demonstrates a high-level of transparency; and displays how low-level details are made somewhat "invisible" to the user through transparency. On the other hand, if a user desires to access another system or server, a host name or IP address may be required, along with a remote-machine user login and password. This would indicate a low-degree of transparency, as there is detailed knowledge required of the user in order to accomplish this task.

Generally, transparency and user-required knowledge form an inverse relation. As transparency is designed and implemented into various areas of a system, great care must be taken not to adversely affect other areas of transparency and other basic design concerns. Transparency, as a design concept, is one of the grand challenges in design of a distributed operating system; as it is a factor in the necessity for a complete upfront understanding.
  • Location transparency - Location transparency comprises two distinct sub-aspects of transparency, Naming transparency and User mobility. Naming transparency requires that nothing in the physical or logical references to any system entity should expose any indication of the entities location, or its local or remote relationship to the user. User mobility requires the consistent referencing of system entities, regardless of the system location from which the reference originates. Transparency dictates that the relative location of a system entity—either local or remote—must be both invisible to, and undetectable by the user. [DCD Pradeep, pg 20]

  • Access transparency - Local and remote system entities must remain indistinguishable when viewed through the user interface. The distributed operating system maintains this perception through the exposure of a single access mechanism for a system entity, regardless of that entity being local or remote to the user. Transparency dictates that any differences in methods of accessing any particular system entity—either local or remote—must be both invisible to, and undetectable by the user. [TLD, pg 84]

  • Migration transparency - Logical resources and physical processes migrated by the system, from one location to another in an attempt to maximize efficiency, reliability, availability, security, or whatever reason, should do so automatically controlled solely by the system. There are a myriad of possible reasons for migration; in any such event, the entire process of migration—before, during, and after—should occur without user knowledge or interaction. Transparency dictates that both the need for, and the execution of any system entity migration must be both invisible to, and undetectable by the user. [DOS, pg 24]

  • Replication transparency - A system's elements or components may need to be copied to strategic remote points in the system in an effort to possibly increase efficiencies through better proximity, or provide for improved reliability through the duplication of a back-up. This duplication of a system entity and its subsequent movement to a remote system location may occur for any number of possible reasons; in any event, the entire process—before, during, and after—should occur without user knowledge or interaction. Transparency dictates that both the necessity and execution of replication, as well as the existence of replicated entities throughout the system must be both invisible to, and undetectable by the user. [DCP, pg 16]

  • Concurrency transparency - The distributed operating system allows for simultaneous use of system resources by multiple users and processes, who are kept completely unaware of the concurrent usage. Transparency dictates that both the necessity for concurrency, and the multiplexed usage of system resources must be both invisible to, and undetectable by the user. [DSC, pg 23]

  • Failure transparency - In the event of a partial system failure, the system is responsible for the automatic, rapid, and accurate detection and orchestration of a remedy. These measures should exhibit minimal user imposition, and should initiate and execute without user knowledge or interaction. Transparency dictates that users and processes be exposed to absolute minimal imposition as a result of partial system failure; and any system-employed techniques of detection and recovery must be both invisible to, and undetectable by the user. [DSA, pg 30]

  • Performance Transparency - In any event where parts of the system experience significant delay or load imbalance, the system is responsible for the automatic, rapid, and accurate detection and orchestration of a remedy. These measures should exhibit minimal user imposition, and should initiate and execute without user knowledge or interaction. While reasonable and predictable performance are important goals in these situations, there should be no expressed or implied concepts of fairness or equality among affected users or processes. Transparency dictates that users and processes be exposed to absolute minimal imposition as a result of performance delay or load imbalance; and any system-employed techniques of detection and recovery must be both invisible to, and undetectable by the user. [DCD, pg 23]

  • Size/Scale transparency - A system's geographic reach, number of nodes, level of node capability, or any changes therein should exists without any required user knowledge or interaction. Transparency dictates that system and node composition, quality, or changes to either must be both invisible to, and undetectable by the user. [DCD, pg 23]

  • Revision transparency - System occasionally have need for system-software version changes and changes to internal implementation of system infrastructure. While a user may ultimately become aware of, or discover the availability of new system functions or services, their implementation should in no way be the prompt for this discovery. Transparency dictates that the implementation of system-software version changes and changes to internal system infrastructure must be both invisible to, and undetectable by the user; except as revealed by administrators of the system. [DSA, pg 30]

  • Control transparency - All system information, constants, properties, configuration settings, etc. should be completely consistent in appearance, connotation, and denotation to all users and software applications aware of them. [TLD, pg 84]

  • Data transparency - No system data-entity should expose itself as peculiar with respect its location or purpose in the system, as a result of user interaction. [TLD, pg 85]

  • Parallelism transparency - Arguably the most difficult aspect of transparency, and described by Tanenbaum as the "Holy grail" for distributed system designers. A system's parallel execution of a task among various processes throughout the system should occur without any required user knowledge or interaction. Transparency dictates that both the need for, and the execution of parallel processing must be both invisible to, and undetectable by the user. [DOS, pg 24]

Inter-process communication

Inter-Process Communication
Inter-process communication
In computing, Inter-process communication is a set of methods for the exchange of data among multiple threads in one or more processes. Processes may be running on one or more computers connected by a network. IPC methods are divided into methods for message passing, synchronization, shared...

 (IPC) is the implementation of general communication, process interaction, and dataflow
Dataflow
Dataflow is a term used in computing, and may have various shades of meaning. It is closely related to message passing.-Software architecture:...

 between threads
Thread (computer science)
In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...

 and/or processes
Process (computing)
In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system , a process may be made up of multiple threads of execution that execute instructions concurrently.A computer program is a...

 both within a system node, and between all nodes in a distributed operating system. The distributed nature of a system's nodes and the multi-level considerations of intra-node and inter-node requirements provide the base-line for high-level IPC design considerations. However, IPC in a distributed operating system is a low-level implementation. IPC is the low-level critical complement to the high-level concept of transparency. Many of the requirements and restrictions imposed on a system as a result of transparency will be accomplished directly or indirectly through IPC. In this sense, IPC is the greatest underlying concept in the low-level design considerations of a distributed operating system.

Process management

Process management
Process management (computing)
Process management is an integral part of any modern day operating system . The OS must allocate resources to processes, enable processes to share and exchange information, protect the resources of each process from other processes and enable synchronisation among processes...

 provides policies and mechanisms for effective and efficient sharing of a system's distributed processing resources between that system's distributed processes. These policies and mechanisms support operations involving the allocation and de-allocation of processes and ports to processors, as well as provisions to run, suspend, migrate, halt, or resume execution of processes. While these distributed operating system resources and the operations on them can be either local or remote with respect to each other, the distributed operating system must still maintain complete state of and synchronization over all processes in the system; and do so in a manner completely consistent from the user's unified system perspective.

As an example, load balancing
Load balancing (computing)
Load balancing is a computer networking methodology to distribute workload across multiple computers or a computer cluster, network links, central processing units, disk drives, or other resources, to achieve optimal resource utilization, maximize throughput, minimize response time, and avoid...

 is a common process management function. One consideration of load balancing is which process should be moved. The kernel may have several mechanisms, one of which might be priority-based choice. This mechanism in the kernel defines what can be done; in this case, choose a process based on some priority. The system management components would have policies implementing the decision making for this context. One of these policies would define what priority means, and how it is to be used to choose a process in this instance.

Resource management

Systems resources
Resource (computer science)
A resource, or system resource, is any physical or virtual component of limited availability within a computer system. Every device connected to a computer system is a resource. Every internal system component is a resource...

 such as memory, files, devices, etc. are distributed throughout a system, and at any given moment, any of these nodes may have light to idle workloads. Load sharing and load balancing require many policy-oriented decisions, ranging from finding idle CPUs, when to move, and which to move. Many algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

s exist to aid in these decisions; however, this calls for a second level of decision making policy in choosing the algorithm best suited for the scenario, and the conditions surrounding the scenario.

Reliability

One of the basic tenets of distributed operating systems is a high-level of reliability. This quality attribute of a distributed operating system has become a staple expectation. Reliability is most often considered from the perspectives of availability
Availability
In telecommunications and reliability theory, the term availability has the following meanings:* The degree to which a system, subsystem, or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at an unknown, i.e., a random, time...

 and security of a system's hardware, services, and data. Issues arising from availability failures or security violations are considered faults. Faults
Fault (technology)
In document ISO/CD 10303-226, a fault is defined as an abnormal condition or defect at the component, equipment, or sub-system level which may lead to a failure....

 are physical or logical defects that can cause errors in the system. For a system to be reliable, it must somehow overcome the adverse effects of faults.

There are three general methods for dealing with faults: fault avoidance, fault tolerance
Fault-tolerant design
In engineering, fault-tolerant design is a design that enables a system to continue operation, possibly at a reduced level , rather than failing completely, when some part of the system fails...

, and fault detection and recovery. Fault avoidance is considered to be the proactive measures taken to minimize the occurrence of faults. These proactive measures can be in the form of transactions, replicated resources and processes
Replication (computer science)
Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility. It could be data replication if the same data is stored on multiple storage devices, or...

, and primary back-ups of complete servers. Fault tolerance is the ability of a system to continue some meaningful level of operation in the face of a fault. In the event a fault does occur, the system should detect the fault and have the capability to respond quickly and effectively to recover full functionality. In any event, any actions taken should make every effort to preserving the single system image.

Performance

Performance
Computer performance
Computer performance is characterized by the amount of useful work accomplished by a computer system compared to the time and resources used.Depending on the context, good computer performance may involve one or more of the following:...

 is arguably the quintessential computing concern, and in the distributed operating system, it is no different. Many benchmark metrics
Benchmark (computing)
In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it...

 exist for performance; throughput, job completions per unit time, system utilization, etc. Each of these benchmarks are more meaningful in describing some scenarios, and less in others. With respect to a distributed operating system, this consideration most often distills to a balance between process parallelism
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...

 and IPC. Managing the task granularity of parallelism in a sensible relation to the messages required for support is extremely effective. Also, identifying when it is more beneficial to migrate a process
Process migration
Process migration is when processes in computer clusters are able to move from machine to machine. Process migration is implemented in, among others, OpenMosix....

 to its data, rather than copy the data, is effective as well. Many process and resource management algorithms, and algorithms in this space work to maximize performance.

Synchronization

Cooperating concurrent processes
Concurrent computing
Concurrent computing is a form of computing in which programs are designed as collections of interacting computational processes that may be executed in parallel...

 have an inherent need for synchronization
Synchronization (computer science)
In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, so as to reach an agreement or...

. Three basic situations that define the scope of this need:
  • one or more processes must synchronize at a given point for one or more other processes to continue,
  • one or more processes must wait for an asynchronous condition in order to continue,
  • or a process must establish mutual exclusive access to a shared resource.


There are a multitude of algorithms available for these scenarios, and each have many variations. Unfortunately, whenever synchronization is required the opportunity for process deadlock
Deadlock
A deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg"...

 usually exists.

Flexibility

Flexibility
Flexibility (engineering)
Flexibility is used as an attribute of various types of systems. In the field of engineering systems design, it refers to designs that can adapt when external changes occur. Flexibility has been defined differently in many fields of engineering, architecture, biology, economics, etc...

 in a distributed operating system is made possible through the modular characteristics of the microkernel. With the microkernel presenting an absolute minimal—but complete—set of primitives and basic functionally cohesive
Cohesion (computer science)
In computer programming, cohesion is a measure of how strongly-related each piece of functionality expressed by the source code of a software module is...

 services, The higher-level management components can be composed in a similar cohesive manner. This capability leads to exceptional flexibility in the management components collection; but more importantly, it allows the opportunity to dynamically swap, upgrade, or install additional instances of components above the kernel.

Pioneering inspirations

With a cursory glance around the internet, or a modest perusal of pertinent writings, one could very easily gain the notion that computer operating systems were a new phenomenon in the mid-twentieth century. In fact, important research in operating systems was being conducted at this time. While early exploration into operating systems took place in the years leading to 1950; shortly afterward, highly advanced research began on new systems to conquer new problems. In the first decade of the second-half of the 20th century, many new questions were asked, many new problems were identified, many solutions were developed and working for years, in controlled production environments.

Aboriginal distributed computing

The DYSEAC (1954)

One of the first solutions to these new questions was the DYSEAC
DYSEAC
DYSEAC was the Second Standards Electronic Automatic Computer. DYSEAC was a first-generation computer built by the National Bureau of Standards for the US Army Signal Corps. It was housed in a truck, making it one of the first portable computers . It went into operation in April 1954.DYSEAC used...

, a self-described general-purpose synchronous
Synchronization (computer science)
In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, so as to reach an agreement or...

 computer; but at this point in history, exhibited signs of being much more than general-purpose. In one of the earliest publications of the Association for Computing Machinery
Association for Computing Machinery
The Association for Computing Machinery is a learned society for computing. It was founded in 1947 as the world's first scientific and educational computing society. Its membership is more than 92,000 as of 2009...

, in April 1954, a researcher at the National Bureau of Standards – now the National Institute of Standards and Technology (NIST) – presented a detailed implementation design specification of the DYSEAC. Without carefully reading the entire specification, one could be misled by summary language in the introduction, as to the nature of this machine. The initial section of the introduction advises that major emphasis will be focused upon the requirements of the intended applications, and these applications would require flexible communication. However, suggesting the external devices could be typewriters, magnetic medium
Magnetic storage
Magnetic storage and magnetic recording are terms from engineering referring to the storage of data on a magnetized medium. Magnetic storage uses different patterns of magnetization in a magnetizable material to store data and is a form of non-volatile memory. The information is accessed using...

, and CRTs
Cathode ray tube
The cathode ray tube is a vacuum tube containing an electron gun and a fluorescent screen used to view images. It has a means to accelerate and deflect the electron beam onto the fluorescent screen to create the images. The image may represent electrical waveforms , pictures , radar targets and...

, and with the term “input-output operation
Input/output
In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world, possibly a human, or another information processing system. Inputs are the signals or data received by the system, and outputs are the signals or data sent from it...

” used more than once, could quickly limit any paradigm of this system to a complex centralized “ensemble.” Seemingly, saving the best for last, the author eventually describes the true nature of the system.
While this more detailed description elevates the perception of the system, the best that can be distilled from this is some semblance of decentralized control. The avid reader, persevering in the investigation would get to a point at which the real nature of the system is divulged.
This is one of the earliest examples of a computer with distributed control. Dept. of the Army
United States Department of the Army
The Department of the Army is one of the three military departments within the Department of Defense of the United States of America. The Department of the Army is the Federal Government agency which the United States Army is organized within, and it is led by the Secretary of the Army who has...

 reports show it was certified reliable and passed all acceptance tests in April 1954. It was completed and delivered on time, in May 1954. In addition, was it mentioned that this was a portable computer
Portable computer
A portable computer is a computer that is designed to be moved from one place to another and includes a display and keyboard. Portable computers, by their nature, are generally microcomputers. Portable computers, because of their size, are also commonly known as 'Lunchbox' or 'Luggable' computers...

? It was housed in a tractor-trailer, and had 2 attendant vehicles and 6 tons of refrigeration
Refrigerator truck
A refrigerator truck is a van or truck designed to carry perishable freight at specific temperatures. Like refrigerator cars, refrigerated trucks differ from simple insulated and ventilated vans , neither of which are fitted with cooling apparatus...

 capacity.

Multi-programming abstraction

The Lincoln TX-2 (1957)

Described as an input-output system of experimental nature, the Lincoln TX-2 placed a premium on flexibility in its association of simultaneously operational input-output devices. The design of the TX-2 was modular, supporting a high degree of modification and expansion, as well as flexibility in operating and programming of its devices. The system employed The Multiple-Sequence Program Technique.

This technique allowed for multiple program counters to each associate with one of 32 possible sequences of program code. These explicitly prioritized sequences could be interleaved and executed concurrently, affecting not only the computation in process, but also the control flow of sequences and switching of devices as well. Much discussion ensues related to the complexity and sophistication in the sequence capabilities of devices.

Similar to the previous system, the TX-2 discussion has a distinct decentralized theme until it is revealed that efficiencies in system operation are gained when separate programmed devices are operated simultaneously. It is also stated that the full power of the central unit can be utilized by any device; and it may be used for as long as the device's situation requires. In this, we see the TX-2 as another example of a system exhibiting distributed control, its central unit not having dedicated control.

Memory access abstraction

Intercommunicating Cells, Basis for a Distributed Logic Computer (1962)

One early memory access paradigm was Intercommunicating Cells, where a cell is composed of a collection of memory elements. A memory element was basically an electronic flip-flop
Flip-flop
Flip-flops, thongs, Japanese sandals, or jandals are an open type of outdoor footwear, consisting of a flat sole held loosely on the foot by a Y-shaped strap, like a thin thong, that passes between the first and second toes and around either side of the foot...

 or relay
Relay
A relay is an electrically operated switch. Many relays use an electromagnet to operate a switching mechanism mechanically, but other operating principles are also used. Relays are used where it is necessary to control a circuit by a low-power signal , or where several circuits must be controlled...

, capable of two possible values. Within a cell there are two types of elements, symbol and cell elements. Each cell structure stores data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

 in a string
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

 of symbols, consisting of a name
Identifier
An identifier is a name that identifies either a unique object or a unique class of objects, where the "object" or class may be an idea, physical [countable] object , or physical [noncountable] substance...

 and a set of associated parameter
Parameter
Parameter from Ancient Greek παρά also “para” meaning “beside, subsidiary” and μέτρον also “metron” meaning “measure”, can be interpreted in mathematics, logic, linguistics, environmental science and other disciplines....

s. Consequently, a system's information is linked through various associations of cells.

Intercommunicating Cells fundamentally break from tradition in that it has no counter
Program counter
The program counter , commonly called the instruction pointer in Intel x86 microprocessors, and sometimes called the instruction address register, or just part of the instruction sequencer in some computers, is a processor register that indicates where the computer is in its instruction sequence...

s or any concept of addressing memory
Memory address
A digital computer's memory, more specifically main memory, consists of many memory locations, each having a memory address, a number, analogous to a street address, at which computer programs store and retrieve, machine code or data. Most application programs do not directly read and write to...

. The theory contends that addressing is a wasteful and non-valuable level of indirection
Indirection
In computer programming, indirection is the ability to reference something using a name, reference, or container instead of the value itself. The most common form of indirection is the act of manipulating a value through its memory address. For example, accessing a variable through the use of a...

. Information is accessed in two ways, direct and cross-retrieval. Direct retrieval looks to a name and returns a parameter set. Cross-retrieval projects
Projection (mathematics)
Generally speaking, in mathematics, a projection is a mapping of a set which is idempotent, which means that a projection is equal to its composition with itself. A projection may also refer to a mapping which has a left inverse. Bot notions are strongly related, as follows...

 through parameter sets and returns a set of names containing the given subset
Subset
In mathematics, especially in set theory, a set A is a subset of a set B if A is "contained" inside B. A and B may coincide. The relationship of one set being a subset of another is called inclusion or sometimes containment...

 of parameters. This would be similar to a modified hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...

 data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...

 that would allow for multiple values
Value (mathematics)
In mathematics, value commonly refers to the 'output' of a function. In the most basic case, that of unary, single-valued functions, there is one input and one output .The function f of the example is real-valued, since each and every possible function value is real...

 (parameters) for each key
Unique key
In relational database design, a unique key can uniquely identify each row in a table, and is closely related to the Superkey concept. A unique key comprises a single column or a set of columns. No two distinct rows in a table can have the same value in those columns if NULL values are not used...

 (name).
Cellular memory would have many advantages:
A major portion of a system's logic
Boolean logic
Boolean algebra is a logical calculus of truth values, developed by George Boole in the 1840s. It resembles the algebra of real numbers, but with the numeric operations of multiplication xy, addition x + y, and negation −x replaced by the respective logical operations of...

 is distributed within the associations of information stored in the cells,
This flow of information association is somewhat guided by the act of storing and retrieving,
The time required for storage and retrieval
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

 is mostly constant and completely unrelated to the size and fill-factor of the memory
Cells are logically indistinguishable, making them both flexible to use and relatively simple to extend in size


This early research into alternative memory describes a configuration
Computer configuration
In communications or computer systems, a configuration is an arrangement of functional units according to their nature, number, and chief characteristics. Often, configuration pertains to the choice of hardware, software, firmware, and documentation...

 ideal for the distributed operating system. The constant-time projection through memory for storing and retrieval would be inherently atomic and exclusive
Mutual exclusion
Mutual exclusion algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections. A critical section is a piece of code in which a process or thread accesses a common resource...

. The cellular memory's intrinsic distributed characteristics would be an invaluable benefit; however, the impact on the user
User interface
The user interface, in the industrial design field of human–machine interaction, is the space where interaction between humans and machines occurs. The goal of interaction between a human and a machine at the user interface is effective operation and control of the machine, and feedback from the...

, hardware
Hardware
Hardware is a general term for equipment such as keys, locks, hinges, latches, handles, wire, chains, plumbing supplies, tools, utensils, cutlery and machine parts. Household hardware is typically sold in hardware stores....

/device
Peripheral
A peripheral is a device attached to a host computer, but not part of it, and is more or less dependent on the host. It expands the host's capabilities, but does not form part of the core computer architecture....

, or Application programming interface
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...

s is uncertain. It is distinctly obvious that these early researchers had a distributed system concept in mind, as they state:

Coherent memory abstraction

Algorithms for scalable synchronization on shared-memory multiprocessors

File System abstraction

Measurements of a distributed file system

Memory coherence in shared virtual memory systems

Transaction abstraction

Transactions

Sagas

Transactional Memory

Composable memory transactions

Transactional memory: architectural support for lock-free data structures

Software transactional memory for dynamic-sized data structures

Software transactional memory

Persistence abstraction

OceanStore: an architecture for global-scale persistent storage

Coordinator abstraction

Weighted voting for replicated data

Consensus in the presence of partial synchrony

Reliability abstraction

Sanity checks

The Byzantine Generals Problem

Fail-stop processors: an approach to designing fault-tolerant computing systems

Recoverability

Distributed snapshots: determining global states of distributed systems

Optimistic recovery in distributed systems

Replicated model extended to a component object model

Architectural Design of E1 Distributed Operating System

The Cronus distributed operating system

Design and development of MINIX distributed operating system

Complexity/Trust exposure through accepted responsibility

Scale and performance in the Denali isolation kernel.

Multi/Many-core focused systems

The multikernel: a new OS architecture for scalable multicore systems.
Corey: an Operating System for Many Cores.

Distributed processing over extremes in heterogeneity

Helios: heterogeneous multiprocessing with satellite kernels.

Effective and stable in multiple levels of complexity

Tessellation: Space-Time Partitioning in a Manycore Client OS.

See also

  • Network operating system
    Network operating system
    A networking operating system , also referred to as the Dialoguer, is the software that runs on a server and enables the server to manage data, users, groups, security, applications, and other networking functions...

     (NOS)
  • Single system image (SSI)
  • Operating System Projects
    Operating System Projects
    OSP, an Environment for Operating System Projects, is a teaching operating system designed to provide an environment for an introductory course in operating systems...

  • List of operating systems
  • Comparison of operating systems
    Comparison of operating systems
    These tables compare general and technical information for a number of widely used and currently available operating systems.Because of the large number and variety of available Linux distributions, they are all grouped under a single entry; see comparison of Linux distributions for a detailed...

  • Computer systems architecture
  • Disk operating system
    Disk operating system
    Disk Operating System and disk operating system , most often abbreviated as DOS, refers to an operating system software used in most computers that provides the abstraction and management of secondary storage devices and the information on them...

     (DOS)
  • Multikernel
    Multikernel
    A multikernel operating system treats a multicore machine as a network of independent cores -- in other words, as if it were a distributed system. It does not assume shared memory but rather implements inter-process communications as message-passing....

  • MINIX
    Minix
    MINIX is a Unix-like computer operating system based on a microkernel architecture created by Andrew S. Tanenbaum for educational purposes; MINIX also inspired the creation of the Linux kernel....

  • Distributed computing
    Distributed computing
    Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...

  • Operating system
    Operating system
    An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...

  • Andrew S. Tanenbaum
    Andrew S. Tanenbaum
    Andrew Stuart "Andy" Tanenbaum is a professor of computer science at the Vrije Universiteit, Amsterdam in the Netherlands. He is best known as the author of MINIX, a free Unix-like operating system for teaching purposes, and for his computer science textbooks, regarded as standard texts in the...

  • List of important publications in concurrent, parallel, and distributed computing
  • Edsger W. Dijkstra Prize in Distributed Computing
  • List of distributed computing conferences
  • List of distributed computing projects

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK