Fault tree analysis
Encyclopedia
Fault tree analysis is a top down, deductive failure analysis in which an undesired state of a system is analyzed using boolean logic
Boolean logic
Boolean algebra is a logical calculus of truth values, developed by George Boole in the 1840s. It resembles the algebra of real numbers, but with the numeric operations of multiplication xy, addition x + y, and negation −x replaced by the respective logical operations of...

 to combine a series of lower-level events. This analysis method is mainly used in the field of safety engineering
Safety engineering
Safety engineering is an applied science strongly related to systems engineering / industrial engineering and the subset System Safety Engineering...

 and Reliability engineering
Reliability engineering
Reliability engineering is an engineering field, that deals with the study, evaluation, and life-cycle management of reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time. It is often measured as a probability of...

 to determine the probability of a safety accident or a particular system level (functional) failure.

In Aerospace the more general term "system Failure Condition" is used for the "undesired state" / Top event of the fault tree. These conditions are classified by the severity of their effects. The most severe conditions require the most extensive fault tree analysis. These "system Failure Conditions" and their classification are often previously determined in the functional Hazard analysis
Hazard analysis
A hazard analysis is used as the first step in a process used to assess risk. The result of a hazard analysis is the identification of risks. Preliminary risk levels can be provided in the hazard analysis. The validation, more precise prediction and acceptance of risk is determined in the Risk...

.

FTA can be used to:
  • understand the logic leading to the top event / undesired state.
  • show compliance with the (input) system safety / reliability requirements.
  • prioritize the contributors leading to the top event - Creating the Critical Equipment/Parts/Events lists for different importance measures.
  • monitor and control the safety performance of the complex system (e.g. Is it still safe to fly an Aircraft if fuel valve x is not "working"? For how long is it allowed to fly with this valve stuck closed?).
  • minimize and optimize resources.
  • assist in designing a system. The FTA can be used as a design tool that helps to create (output / lower level) requirements.
  • function as a diagnostic tool to identify and correct causes of the top event. It can help with the creation of diagnostic manuals / processes.

History

Fault Tree Analysis (FTA) was originally developed in 1962 at Bell Laboratories
Bell Labs
Bell Laboratories is the research and development subsidiary of the French-owned Alcatel-Lucent and previously of the American Telephone & Telegraph Company , half-owned through its Western Electric manufacturing subsidiary.Bell Laboratories operates its...

 by H.A. Watson, under a U.S. Air Force
United States Air Force
The United States Air Force is the aerial warfare service branch of the United States Armed Forces and one of the American uniformed services. Initially part of the United States Army, the USAF was formed as a separate branch of the military on September 18, 1947 under the National Security Act of...

 Ballistics Systems Division
526th ICBM Systems Group
The United States Air Force's 526th ICBM Systems Group is a logistics unit located at Hill AFB, Utah.-Mission:The ICBM System Program Office develops, acquires, and supports silo-based ICBMs and provides program direction and logistics support as the single face to the customer...

 contract to evaluate the Minuteman I
LGM-30 Minuteman
The LGM-30 Minuteman is a U.S. nuclear missile, a land-based intercontinental ballistic missile . As of 2010, the version LGM-30G Minuteman-III is the only land-based ICBM in service in the United States...

 Intercontinental Ballistic Missile
Intercontinental ballistic missile
An intercontinental ballistic missile is a ballistic missile with a long range typically designed for nuclear weapons delivery...

 (ICBM) Launch Control System. The use of fault trees has since gained wide-spread support and is often used as a failure analysis tool by reliability experts. Following the first published use of FTA in the 1962 Minuteman I Launch Control Safety Study, Boeing
Boeing
The Boeing Company is an American multinational aerospace and defense corporation, founded in 1916 by William E. Boeing in Seattle, Washington. Boeing has expanded over the years, merging with McDonnell Douglas in 1997. Boeing Corporate headquarters has been in Chicago, Illinois since 2001...

 and AVCO
Avco
Avco Corporation is a subsidiary of Textron which operates Textron Systems Corporation and Lycoming.-Brief history:The Embry-Riddle Company created the Aviation Corporation in 1928 as a holding company tasked with acquiring small airlines...

 expanded use of FTA to the entire Minuteman II system in 1963-1964. FTA received extensive coverage at a 1965 System Safety
System safety
The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application of remedial controls using a systems-based approach...

 Symposium in Seattle sponsored by Boeing and the University of Washington
University of Washington
University of Washington is a public research university, founded in 1861 in Seattle, Washington, United States. The UW is the largest university in the Northwest and the oldest public university on the West Coast. The university has three campuses, with its largest campus in the University...

. Boeing began using FTA for civil aircraft
Civil aviation
Civil aviation is one of two major categories of flying, representing all non-military aviation, both private and commercial. Most of the countries in the world are members of the International Civil Aviation Organization and work together to establish common standards and recommended practices...

 design around 1966. In 1970, the U.S. Federal Aviation Administration (FAA) published a change to 14 CFR
Code of Federal Regulations
The Code of Federal Regulations is the codification of the general and permanent rules and regulations published in the Federal Register by the executive departments and agencies of the Federal Government of the United States.The CFR is published by the Office of the Federal Register, an agency...

 25.1309 airworthiness
Airworthiness
Airworthiness is a term used to describe whether an aircraft has been certified as suitable for safe flight. Certification is initially conferred by a Certificate of Airworthiness from a National Airworthiness Authority, and is maintained by performing required maintenance actions by a licensed...

 regulations for transport aircraft
Transport aircraft
Transport aircraft is a broad category of aircraft that includes:* Airliners* Cargo aircraft* Mail planes* Military transport aircraft...

 in the Federal Register
Federal Register
The Federal Register , abbreviated FR, or sometimes Fed. Reg.) is the official journal of the federal government of the United States that contains most routine publications and public notices of government agencies...

 at 35 FR 5665 (1970-04-08). This change adopted failure probability criteria for aircraft systems
Aircraft systems
- Aircraft systems :An aircraft is a complex system. In the design stage and in the flight and maintenance manuals it is broken down into simpler systems that carry out homogeneous functions...

 and equipment and led to widespread use of FTA in civil aviation.

Within the nuclear power
Nuclear power
Nuclear power is the use of sustained nuclear fission to generate heat and electricity. Nuclear power plants provide about 6% of the world's energy and 13–14% of the world's electricity, with the U.S., France, and Japan together accounting for about 50% of nuclear generated electricity...

 industry, the U.S. Nuclear Regulatory Commission
Nuclear Regulatory Commission
The Nuclear Regulatory Commission is an independent agency of the United States government that was established by the Energy Reorganization Act of 1974 from the United States Atomic Energy Commission, and was first opened January 19, 1975...

 began using probabilistic risk assessment
Probabilistic risk assessment
Probabilistic risk assessment is a systematic and comprehensive methodology to evaluate risks associated with a complex engineered technological entity ....

 (PRA) methods including FTA in 1975, and significantly expanded PRA research following the 1979 incident at Three Mile Island
Three Mile Island accident
The Three Mile Island accident was a core meltdown in Unit 2 of the Three Mile Island Nuclear Generating Station in Dauphin County, Pennsylvania near Harrisburg, United States in 1979....

. This eventually led to the 1981 publication of the NRC Fault Tree Handbook NUREG–0492, and mandatory use of PRA under the NRC's regulatory authority.

Fault Tree Analysis (FTA) attempts to model and analyze failure processes of engineering and biological systems. FTA is basically composed of logic diagrams that display the state of the system and is constructed using graphical design techniques. Originally, engineers were responsible for the development of Fault Tree Analysis, as a deep knowledge of the system under analysis is required.

Often, FTA is defined as another part, or technique, of reliability engineering
Reliability engineering
Reliability engineering is an engineering field, that deals with the study, evaluation, and life-cycle management of reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time. It is often measured as a probability of...

. Although both model the same major aspect, they have arisen from two different perspectives. Reliability engineering was, for the most part, developed by mathematicians, while FTA, as stated above, was developed by engineers.

Fault Tree Analysis usually involves events from hardware wear out, material failure or malfunctions or combinations of deterministic contributions to the event stemming from assigning a hardware/system failure rate to branches or cut sets. Typically failure rates are carefully derived from substantiated historical data such as mean time between failure of the components, unit, subsystem or function. Predictor data may be assigned. Assigning a software failure rate is elusive and not possible. Since software is a vital contributor and inclusive of the system operation it is assumed the software will function normally as intended. There is no such thing as a software fault tree unless considered in the system context. Software is an instruction set to the hardware or overall system for correct operation. Since basic software events do not fail in the physical sense, attempting to predict manifestation of software faults or coding errors with any reliability or accuracy is impossible, unless assumptions are made. Predicting and assigning human error rates is not the primary intent of a fault tree analysis, but may be attempted to gain some knowledge of what happens with improper human input or intervention at the wrong time.

FTA can be used as a valuable design tool, can identify potential accidents, and can eliminate costly design changes. It can also be used as a diagnostic tool, predicting the most likely system failure in a system breakdown. FTA is used in safety and reliability engineering and in all major fields of engineering.

Methodology

FTA methodology is described in several industry and government standards, including NRC NUREG–0492 for the nuclear power industry, an aerospace-oriented revision to NUREG–0492 for use by NASA
NASA
The National Aeronautics and Space Administration is the agency of the United States government that is responsible for the nation's civilian space program and for aeronautics and aerospace research...

, SAE ARP4761
ARP4761
ARP4761, Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment is a standard from the Society of Automotive Engineers . In conjunction with SAE ARP4754, ARP4761 is used to demonstrate compliance with 14 CFR 25.1309 in the U.S...

 for civil aerospace, MIL–HDBK–338 for military systems for military systems. IEC
International Electrotechnical Commission
The International Electrotechnical Commission is a non-profit, non-governmental international standards organization that prepares and publishes International Standards for all electrical, electronic and related technologies – collectively known as "electrotechnology"...

 standard IEC 61025 is intended for cross-industry use and has been adopted as European Norme EN 61025.

Since no system is perfect, dealing with a subsystem fault is a necessity, and any working system eventually will have a fault in some place. However, the probability for a complete or partial success is greater than the probability of a complete failure or partial failure. Assembling a FTA is thus not as tedious as assembling a success tree which can turn out to be very time consuming.

Because assembling a FTA can be a costly and cumbersome experience, the perfect method is to consider subsystems. In this way dealing with smaller systems can assure less error work probability, less system analysis. Afterward, the subsystems integrate to form the well analyzed big system.

An undesired effect is taken as the root ('top event') of a tree of logic. The logic to get to the right top events can be diverse. One type of analysis that can help with this is the FHA
FHA
FHA may mean:* Federal Housing Administration. See also FHA loan.* Federal Highway Administration* Civil Rights Act of 1968 -- In particular, Title VIII of the Act, also known as the Fair Housing Act* Forced Hot Air heating...

 analysis (Functional Hazard Analysis) - based on Aerospace Recommended Practise.
There should be only one Top Event and all concerns must tree down from it. Then, each situation that could cause that effect is added to the tree as a series of logic expressions. When fault trees are labeled with actual numbers about failure probabilities (which are often in practice unavailable because of the expense of testing), computer programs
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...

 can calculate failure probabilities from fault trees.

The Tree is usually written out using conventional logic gate
Logic gate
A logic gate is an idealized or physical device implementing a Boolean function, that is, it performs a logical operation on one or more logic inputs and produces a single logic output. Depending on the context, the term may refer to an ideal logic gate, one that has for instance zero rise time and...

 symbols. The route through a tree between an event and an initiator in the tree is called a Cut Set. The shortest credible way through the tree from fault to initiating event is called a Minimal Cut Set.

Some industries use both fault trees and event tree
Event tree
Error tree is an inductive analytical diagram in which an event is analyzed using Boolean logic to examine a chronological series of subsequent events or consequences...

s (see Probabilistic Risk Assessment
Probabilistic risk assessment
Probabilistic risk assessment is a systematic and comprehensive methodology to evaluate risks associated with a complex engineered technological entity ....

). An Event Tree starts from an undesired initiator (loss of critical supply, component failure etc.) and follows possible further system events through to a series of final consequences. As each new event is considered, a new node on the tree is added with a split of probabilities of taking either branch. The probabilities of a range of 'top events' arising from the initial event can then be seen.

Classic programs include the Electric Power Research Institute
Electric Power Research Institute
The Electric Power Research Institute conducts research on issues related to the electric power industry in USA. EPRI is a nonprofit organization funded by the electric utility industry. EPRI is primarily a US based organization, receives international participation...

's (EPRI) CAFTA software, which is used by many of the US nuclear power plants and by a majority of US and international aerospace manufacturers, and the Idaho National Laboratory
Idaho National Laboratory
Idaho National Laboratory is an complex located in the high desert of eastern Idaho, between the town of Arco to the west and the cities of Idaho Falls and Blackfoot to the east. It lies within Butte, Bingham, Bonneville and Jefferson counties...

's SAPHIRE
SAPHIRE
SAPHIRE is a probabilistic risk and reliability assessment software tool. SAPHIRE stands for Systems Analysis Programs for Hands-on Integrated Reliability Evaluations. The system was developed for the U.S...

, which is used by the U.S. Government to evaluate the safety and reliability
Reliability engineering
Reliability engineering is an engineering field, that deals with the study, evaluation, and life-cycle management of reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time. It is often measured as a probability of...

 of nuclear reactor
Nuclear reactor
A nuclear reactor is a device to initiate and control a sustained nuclear chain reaction. Most commonly they are used for generating electricity and for the propulsion of ships. Usually heat from nuclear fission is passed to a working fluid , which runs through turbines that power either ship's...

s, the Space Shuttle
Space Shuttle
The Space Shuttle was a manned orbital rocket and spacecraft system operated by NASA on 135 missions from 1981 to 2011. The system combined rocket launch, orbital spacecraft, and re-entry spaceplane with modular add-ons...

, and the International Space Station
International Space Station
The International Space Station is a habitable, artificial satellite in low Earth orbit. The ISS follows the Salyut, Almaz, Cosmos, Skylab, and Mir space stations, as the 11th space station launched, not including the Genesis I and II prototypes...

. Outside the US, the software RiskSpectrum is a popular tool for Fault Tree and Event Tree analysis and is licensed for use at almost half of the worlds nuclear power plants for Probabilistic Safety Assessment.

Graphic Symbols

The basic symbols used in FTA are grouped as events, gates, and transfer symbols. Minor variations may be used in FTA software.

Event Symbols

Event symbols are used for primary events and intermediate events. Primary events are not further developed on the fault tree. Intermediate events are found at the output of a gate. The event symbols are shown below:

The primary event symbols are typically used as follows:
  • Basic event - failure or error in a system component or element (example: switch stuck in open position)
  • Initiating event - an external event (example: bird strike to aircraft)
  • Undeveloped event - an event about which insufficient information is available, or which is of no consequence
  • Conditioning event - conditions that restrict or affect logic gates (example: mode of operation in effect)

An intermediate event gate can be used immediately above a primary event to provide more room to type the event description.
FTA is top to bottom approach.

Gate Symbols

Gate symbols describe the relationship between input and output events. The symbols are derived from Boolean logic symbols:

The gates work as follows:
  • OR gate - the output occurs if any input occurs
  • AND gate - the output occurs only if all inputs occur (inputs are independent)
  • Exclusive OR gate - the output occurs if exactly one input occurs
  • Priority AND gate - the output occurs if the inputs occur in a specific sequence specified by a conditioning event
  • Inhibit gate - the output occurs if the input occurs under an enabling condition specified by a conditioning event

Transfer Symbols

Transfer symbols are used to connect the inputs and outputs of related fault trees, such as the fault tree of a subsystem to its system.

Basic Mathematical Foundation

Events in a fault tree are associated with statistical
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

 probabilities
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

. For example, component failures typically occur at some constant failure rate
Failure rate
Failure rate is the frequency with which an engineered system or component fails, expressed for example in failures per hour. It is often denoted by the Greek letter λ and is important in reliability engineering....

 λ (a constant hazard function). In this simplest case, failure probability depends on the rate λ and the exposure time t:
P = 1 - exp(-λt)
P ≈ λt, λt < 0.1


A fault tree is often normalized to a given time interval, such as a flight hour or an average mission time. Event probabilities depend on the relationship of the event hazard function to this interval.

Unlike conventional logic gate
Logic gate
A logic gate is an idealized or physical device implementing a Boolean function, that is, it performs a logical operation on one or more logic inputs and produces a single logic output. Depending on the context, the term may refer to an ideal logic gate, one that has for instance zero rise time and...

 diagrams in which inputs and outputs hold the binary
Binary numeral system
The binary numeral system, or base-2 number system, represents numeric values using two symbols, 0 and 1. More specifically, the usual base-2 system is a positional notation with a radix of 2...

 values of TRUE (1) or FALSE (0), the gates in a fault tree output probabilities related to the set operations
Algebra of sets
The algebra of sets develops and describes the basic properties and laws of sets, the set-theoretic operations of union, intersection, and complementation and the relations of set equality and set inclusion...

 of Boolean logic
Boolean logic
Boolean algebra is a logical calculus of truth values, developed by George Boole in the 1840s. It resembles the algebra of real numbers, but with the numeric operations of multiplication xy, addition x + y, and negation −x replaced by the respective logical operations of...

. The probability of a gate's output event depends on the input event probabilities.

An AND gate represents a combination of independent events. That is, the probability of any input event to an AND gate is unaffected by any other input event to the same gate. In set theoretic
Set theory
Set theory is the branch of mathematics that studies sets, which are collections of objects. Although any type of object can be collected into a set, set theory is applied most often to objects that are relevant to mathematics...

 terms, this is equivalent to the intersection of the input event sets, and the probability of the and gate output is given by:
P(A and B) = P(A ∩ B) = P(A) P(B)


An OR gate, on the other hand, corresponds to set union:
P(A or B) = P(A ∪ B) = P(A) + P(B) - P(A ∩ B)


Since failure probabilities on fault trees tend to be small (less than .01), P(A ∩ B) usually becomes a very small error term, and the output of an OR gate may be conservatively approximated by using an assumption that the inputs are mutually exclusive events:
P(A or B) ≈ P(A) + P(B), P(A ∩ B) ≈ 0


An exclusive OR gate with two inputs represents the probability that one or the other input, but not both, occurs:
P(A xor B) = P(A) + P(B) - 2P(A ∩ B)


Again, since P(A ∩ B) usually becomes a very small error term, the exclusive OR gate has limited value in a fault tree.

Analysis

Many different approaches can be used to model a FTA, but the most common and popular way can be summarized in a few steps. Remember that a fault tree is used to analyze a single fault event, and that one and only one event can be analyzed during a single fault tree. Even though the “fault” may vary dramatically, a FTA follows the same procedure for an event, be it a delay of 0.25 msec for the generation of electrical power, or the random, unintended launch of an ICBM.

FTA analysis involves five steps:
  1. Define the undesired event to study
    • Definition of the undesired event can be very hard to catch, although some of the events are very easy and obvious to observe. An engineer with a wide knowledge of the design of the system or a system analyst with an engineering background is the best person who can help define and number the undesired events. Undesired events are used then to make the FTA, one event for one FTA; no two events will be used to make one FTA.
  2. Obtain an understanding of the system
    • Once the undesired event is selected, all causes with probabilities of affecting the undesired event of 0 or more are studied and analyzed. Getting exact numbers for the probabilities leading to the event is usually impossible for the reason that it may be very costly and time consuming to do so. Computer software is used to study probabilities; this may lead to less costly system analysis.
      System analysts can help with understanding the overall system. System designers have full knowledge of the system and this knowledge is very important for not missing any cause affecting the undesired event. For the selected event all causes are then numbered and sequenced in the order of occurrence and then are used for the next step which is drawing or constructing the fault tree.
  3. Construct the fault tree
    • After selecting the undesired event and having analyzed the system so that we know all the causing effects (and if possible their probabilities) we can now construct the fault tree. Fault tree is based on AND and OR gates which define the major characteristics of the fault tree.
  4. Evaluate the fault tree
    • After the fault tree has been assembled for a specific undesired event, it is evaluated and analyzed for any possible improvement or in other words study the risk management and find ways for system improvement. This step is as an introduction for the final step which will be to control the hazards identified. In short, in this step we identify all possible hazards affecting in a direct or indirect way the system.
  5. Control the hazards identified
    • This step is very specific and differs largely from one system to another, but the main point will always be that after identifying the hazards all possible methods are pursued to decrease the probability of occurrence.

Comparison with other analytical methods

FTA is a deductive
Deductive reasoning
Deductive reasoning, also called deductive logic, is reasoning which constructs or evaluates deductive arguments. Deductive arguments are attempts to show that a conclusion necessarily follows from a set of premises or hypothesis...

, top-down method aimed at analyzing the effects of initiating faults and events on a complex system. This contrasts with failure mode and effects analysis
Failure mode and effects analysis
A failure modes and effects analysis is a procedure in product development and operations management for analysis of potential failure modes within a system for classification by the severity and likelihood of the failures...

 (FMEA), which is an inductive
Inductive reasoning
Inductive reasoning, also known as induction or inductive logic, is a kind of reasoning that constructs or evaluates propositions that are abstractions of observations. It is commonly construed as a form of reasoning that makes generalizations based on individual instances...

, bottom-up analysis method aimed at analyzing the effects of single component or function failures on equipment or subysystems. FTA is very good at showing how resistant a system is to single or multiple initiating faults. It is not good at finding all possible initiating faults. FMEA is good at exhaustively cataloging initiating faults, and identifying their local effects. It is not good at examining multiple failures or their effects at a system level. FTA considers external events, FMEA does not. In civil aerospace the usual practice is to perform both FTA and FMEA, with a failure mode effects summary (FMES) as the interface between FMEA and FTA.

Alternatives to FTA include dependence diagram
Reliability block diagram
A reliability block diagram is a diagrammatic method for showing how component reliability contributes to the success or failure of a complex system. RBD is also known as a dependence diagram ....

 (DD), also known as reliability block diagram
Reliability block diagram
A reliability block diagram is a diagrammatic method for showing how component reliability contributes to the success or failure of a complex system. RBD is also known as a dependence diagram ....

 (RBD) and Markov analysis. A dependence diagram is equivalent to a success tree analysis (STA), the logical inverse of an FTA, and depicts the system using paths instead of gates. DD and STA produce probability of success (i.e., avoiding a top event) rather than probability of a top event.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK