Dependability
Encyclopedia
Dependability is a value showing the reliability of a person to others because of his/her integrity, truthfulness, and trustfulness, traits that can encourage someone to depend on him/her.

The wider use of this noun is in Systems engineering
Systems engineering
Systems engineering is an interdisciplinary field of engineering that focuses on how complex engineering projects should be designed and managed over the life cycle of the project. Issues such as logistics, the coordination of different teams, and automatic control of machinery become more...

.

Dependability as applied to a computer system is defined by the IFIP 10.4 Working Group on Dependable Computing and Fault Tolerance as:
"[..] the trustworthiness of a computing system which allows reliance to be justifiably placed on the service it delivers [..]"

an alternative and broader definition is provided by IEC IEV 191-02-03:
"dependability (is) the collective term used to describe the availability performance and its influencing factors : reliability performance, maintainability performance and maintenance support performance"


This definition was developed by Technical Committee 56 Dependability of the International Electrotechnical Commission
International Electrotechnical Commission
The International Electrotechnical Commission is a non-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known as "electrotechnology"...

 (IEC). The committee also develops and maintains International Standards in the field of dependability. The standards provide systematic methods and tools for dependability assessment and management of equipment, services and systems throughout their life cycles.

This concept can be further extended to encompass mechanisms to increase and maintain the Dependability of a system. Dependability can be thought of as being composed of three elements:
  • Attributes - A way to assess the Dependability of a system
  • Threats - An understanding of the things that can affect the Dependability of a system
  • Means - Ways to increase the Dependability of a system

History

The field of Dependability grew out of previous related fields such as fault tolerance and system reliability in the 1960s. As interest in these fields increased during the 1970s and early part of the 1980s the term reliability
Reliability engineering
Reliability engineering is an engineering field, that deals with the study, evaluation, and life-cycle management of reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time. It is often measured as a probability of...

 began to be overloaded and was being used outside of its originally intended definition, as a measurement of failures in a system, to encompass more diverse measures which would now come under other classifications such as safety, integrity, etc. Jean-Claude Laprie thus coined the term Dependability to encompass these related disciplines in the early 1980.

The field of Dependability has evolved from these beginnings to be an internationally active field of research. This research is fostered by a number of prominent international conferences, notably the International Conference on Dependable Systems and Networks, the International Symposium on Reliable Distributed Systems and the International Symposium on Software Reliability Engineering.

The original definition of dependability for a computing system gathers the following attributes or non-functional requirements:
  • Availability
    Availability
    In telecommunications and reliability theory, the term availability has the following meanings:* The degree to which a system, subsystem, or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at an unknown, i.e., a random, time...

    : readiness for correct service
  • Reliability
    Reliability engineering
    Reliability engineering is an engineering field, that deals with the study, evaluation, and life-cycle management of reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time. It is often measured as a probability of...

    : continuity of correct service
  • Maintainability
    Maintainability
    In engineering, maintainability is the ease with which a product can be maintained in order to:* isolate defects or their cause* correct defects or their cause* meet new requirements* make future maintenance easier, or* cope with a changed environment...

    : to undergo modifications and repairs


and combines them with the concepts of Threats and Failures to create Dependability.

This definition was further enhanced to incorporate Safety
Safety
Safety is the state of being "safe" , the condition of being protected against physical, social, spiritual, financial, political, emotional, occupational, psychological, educational or other types or consequences of failure, damage, error, accidents, harm or any other event which could be...

 and Security
Security
Security is the degree of protection against danger, damage, loss, and crime. Security as a form of protection are structures and processes that provide or improve security as a condition. The Institute for Security and Open Methodologies in the OSSTMM 3 defines security as "a form of protection...

.

Attributes

Attributes are qualities of a system. These can be assessed to determine its overall dependability using Qualitative or Quantitative
Quantitative property
A quantitative property is one that exists in a range of magnitudes, and can therefore be measured with a number. Measurements of any particular quantitative property are expressed as a specific quantity, referred to as a unit, multiplied by a number. Examples of physical quantities are distance,...

 measures. Avizienis et al. define the following Dependability Attributes:
  • Availability
    Availability
    In telecommunications and reliability theory, the term availability has the following meanings:* The degree to which a system, subsystem, or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at an unknown, i.e., a random, time...

     - readiness for correct service
  • Reliability
    Reliability engineering
    Reliability engineering is an engineering field, that deals with the study, evaluation, and life-cycle management of reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time. It is often measured as a probability of...

     - continuity of correct service
  • Safety
    Safety
    Safety is the state of being "safe" , the condition of being protected against physical, social, spiritual, financial, political, emotional, occupational, psychological, educational or other types or consequences of failure, damage, error, accidents, harm or any other event which could be...

     - absence of catastrophic consequences on the user(s) and the environment
  • Integrity
    Integrity
    Integrity is a concept of consistency of actions, values, methods, measures, principles, expectations, and outcomes. In ethics, integrity is regarded as the honesty and truthfulness or accuracy of one's actions...

     - absence of improper system alteration
  • Maintainability
    Maintainability
    In engineering, maintainability is the ease with which a product can be maintained in order to:* isolate defects or their cause* correct defects or their cause* meet new requirements* make future maintenance easier, or* cope with a changed environment...

     - ability for a process to undergo modifications and repairs


As these definitions suggested, only Availability and Reliability are quantifiable by direct measurements whilst others are more subjective. For instance Safety cannot be measured directly via metrics but is a subjective assessment that requires judgmental information to be applied to give a level of confidence, whilst Reliability can be measured as failures over time.

Confidentiality
Confidentiality
Confidentiality is an ethical principle associated with several professions . In ethics, and in law and alternative forms of legal resolution such as mediation, some types of communication between a person and one of these professionals are "privileged" and may not be discussed or divulged to...

, i.e. the absence of unauthorized disclosure of information is also used when addressing security. Security is a composite of Confidentiality
Confidentiality
Confidentiality is an ethical principle associated with several professions . In ethics, and in law and alternative forms of legal resolution such as mediation, some types of communication between a person and one of these professionals are "privileged" and may not be discussed or divulged to...

, Integrity
Integrity
Integrity is a concept of consistency of actions, values, methods, measures, principles, expectations, and outcomes. In ethics, integrity is regarded as the honesty and truthfulness or accuracy of one's actions...

, and Availability
Availability
In telecommunications and reliability theory, the term availability has the following meanings:* The degree to which a system, subsystem, or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at an unknown, i.e., a random, time...

. Security is sometimes classed as an attribute but the current view is to aggregate it together with dependability and treat Dependability as a composite term called Dependability and Security.

Practically, applying security measures to the appliances of a system generally improves the dependability by limiting the number of externally-originated errors.

Threats

Threats are things that can affect a system and cause a drop in Dependability. There are three main terms that must be clearly understood:
  • Fault: A fault (which is usually referred to as a bug for historic reasons) is a defect in a system. The presence of a fault in a system may or may not lead to a failure, for instance although a system may contain a fault its input and state conditions may never cause this fault to be executed so that an error occurs and thus never exhibits as a failure.
  • Error: An error is a discrepancy between the intended behaviour of a system and its actual behaviour inside the system boundary. Errors occur at runtime when some part of the system enters an unexpected state due to the activation of a fault. Since errors are generated from invalid states they are hard to observe without special mechanisms, such as debuggers or debug output to logs.
  • Failure: A failure is an instance in time when a system displays behaviour that is contrary to its specification. An error may not necessarily cause a failure, for instance an exception may be thrown by a system but this may be caught and handled using fault tolerance techniques so the overall operation of the system will conform to the specification.


It is important to note that Failures are recorded at the system boundary. They are basically Errors that have propagated to the system boundary and have become observable.
Faults, Errors and Failures operate according to a mechanism. This mechanism is sometimes known as a Fault-Error-Failure chain. As a general rule a fault, when activated, can lead to an error (which is an invalid state) and the invalid state generated by an error may lead to another error or a failure (which is an observable deviation from the specified behaviour at the system boundary).

Once a fault is activated an error is created. An error may act in the same way as a fault in that it can create further error conditions, therefore an error may propagate multiple times within a system boundary without causing an observable failure. If an error propagates outside the system boundary a failure is said to occur. A failure is basically the point at which it can be said that a service is failing to meet its specification. Since the output data from one service may be fed into another, a failure in one service may propagate into another service as a fault so a chain can be formed of the form: Fault leading to Error leading to Failure leading to Error, etc.

Means

Since the mechanism of a Fault-Error-Chain is understood it is possible to construct means to break these chains and thereby increase the dependability of a system.
Four means have been identified so far:
  1. Prevention
  2. Removal
  3. Forecasting
  4. Tolerance


Fault Prevention deals with preventing faults being incorporated into a system. This can be accomplished by use of development methodologies and good implementation techniques.

Fault Removal can be sub-divided into two sub-categories: Removal During Development and Removal During Use.

Removal during development requires verification so that faults can be detected and removed before a system is put into production. Once systems have been put into production a system is needed to record failures and remove them via a maintenance cycle.

Fault Forecasting predicts likely faults so that they can be removed or their effects can be circumvented.

Fault Tolerance deals with putting mechanisms in place that will allow a system to still deliver the required service in the presence of faults, although that service may be at a degraded level.

Dependability means are intended to reduce the number of failures presented to the user of a system. Failures are traditionally recorded over time and it is useful to understand how their frequency is measured so that the effectiveness of means can be assessed.

Dependability of information systems and survivability

Recent works, such upon dependability take benefit of structured information system
Information system
An information system - or application landscape - is any combination of information technology and people's activities that support operations, management, and decision making. In a very broad sense, the term information system is frequently used to refer to the interaction between people,...

s
, e.g. with SOA
Service-oriented architecture
In software engineering, a Service-Oriented Architecture is a set of principles and methodologies for designing and developing software in the form of interoperable services. These services are well-defined business functionalities that are built as software components that can be reused for...

, to introduce a more efficient ability, the survivability
Survivability
Survivability is the ability to remain alive or continue to exist. The term has more specific meaning in certain contexts.-Engineering:In engineering, survivability is the quantified ability of a system, subsystem, equipment, process, or procedure to continue to function during and after a natural...

, thus taking into account the degraded services that an Information System sustains or resumes after a non-maskable failure.

The flexibility of current frameworks encourage system architects to enable reconfiguration mechanisms that refocus the available, safe resources to support the most critical services rather than over-provisioning to build failure-proof system.

With the generalisation of networked information systems, accessibility
Accessibility
Accessibility is a general term used to describe the degree to which a product, device, service, or environment is available to as many people as possible. Accessibility can be viewed as the "ability to access" and benefit from some system or entity...

was introduced to give greater importance to users' experience.

To take into account the level of performance, the measurement of performability is defined as "quantifying how well the object system performs in the presence of faults over a specified period of time".

See also

  • Safety engineering
    Safety engineering
    Safety engineering is an applied science strongly related to systems engineering / industrial engineering and the subset System Safety Engineering...

  • Fault-tolerance
  • Fault injection
    Fault injection
    In software testing, fault injection is a technique for improving the coverage of a test by introducing faults to test code paths, in particular error handling code paths, that might otherwise rarely be followed. It is often used with stress testing and is widely considered to be an important part...

  • List of system quality attributes
  • Formal methods
    Formal methods
    In computer science and software engineering, formal methods are a particular kind of mathematically-based techniques for the specification, development and verification of software and hardware systems...

  • Dependable Systems and Networks
    Dependable Systems and Networks
    The International Conference on Dependable Systems and Networks is an annual conference on topics related to dependable computer systems and reliable networks...

     Conference

Papers


Journals


Books

  • J.C. Laprie, Dependability: Basic Concepts and Terminology Springer-Verlag, 1992. ISBN 0387822968

Research projects

  • DESEREC, DEpendability and Security by Enhanced REConfigurability, FP6/IST integrated project 2006-2008
  • NODES, Network on DEpendable Systems
  • ESFORS, European security Forum for Web Services, Software, and Systems, FP6/IST coordination action
  • HIDENETS HIghly DEpendable ip-based NETworks and Services, FP6/IST targeted project 2006-2008
  • RESIST FP6/IST Network of Excellence 2006-2007
  • RODIN Rigorous Open Development Environment for Complex Systems FP6/IST targeted project 2004-2007
  • SERENITY System Engineering for Security and Dependability, FP6/IST integrated project 2006-2008
  • Willow Survivability Architecture, and STILT, System for Terrorism Intervention and Large-scale Teamwork 2002-2004
  • ANIKETOS Dependable and Secure Service Composition, FP7/IST integrated project 2010-2014
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK