All Topics  
Safety engineering

 

   Email Print
   Bookmark   Link

 

Safety engineering


 
 

Safety engineering is an applied science strongly related to systems engineeringSystems engineering

Aristotle, 384 BC 322 BC'Systems Engineering is an interdisciplinary approach and means for enabling the realization a...
 and the subset System Safety Engineering. Safety engineering assures that a life-critical systemLife-critical system

A life-critical system or safety-critical system is a system whose failure or malfunction may result in:...
 behaves as needed even when pieces fail.

In the real world the term "safety engineering" refers to any act of accident prevention by a person qualified in the field. Safety engineering is often reactionary to adverse events, also described as "incidents", as reflected in accident statistics. This arises largely because of the complexity and difficulty of collecting and analysing data on "near misses".

Increasingly, the importance of a safety review is being recognised as an important risk management tool. Failure to identify risks to safety, and the according inability to address or "control" these risks, can result in massive costs, both human and economic. The multidisciplinary nature of safety engineering means that a very broad array of professionals are actively involved in accident prevention or safety engineering.

The majority of those practicing safety engineering are employed in industry to keep workers safe on a day to day basis. See the publication Scope and Function of the Safety Profession.

Safety engineerSafety engineer

Scope of a Safety EngineerTo perform their professional functions, safety engineering professionals must have education, training...
s distinguish different extents of defective operation: A failure is "the inability of a system or component to perform its required functions within specified performance requirements", while a fault is "a defect in a device or component, for example: a short circuit or a broken wire". System-level failures are caused by lower-level faults, which are ultimately caused by basic component faults. (Some texts reverse or confuse these two terms. See NUREG-0492Safety engineering

Safety engineering is an applied science strongly related to systems engineering....
 page V-1.) The unexpected failure of a device that was operating within its design limits is a primary failure, while the expected failure of a component stressed beyond its design limits is a secondary failure. A device which appears to malfunction because it has responded as designed to a bad input is suffering from a command fault. A critical fault endangers one or a few people. A catastrophic fault endangers, harms or kills a significantSignificance

Significance can refer to:* statistical significance...
 number of people.

Safety engineers also identify different modes of safe operation: A probabilisticallyProbability

Informally, probable is one of several words applied to uncertain events or knowledge,...
 safe
system has no single point of failure, and enough redundantRedundancy (engineering)

In engineering, the duplication of critical s of a system with the intention of increasing reliability of the system, usually in t...
 sensorFacts About Sensor

Overview Most sensors are electrical or electronic, although other types exist....
s, computerComputer

A computer is a machine for manipulating data according to a list of instructions known as a program....
s and effectors so that it is very unlikely to cause harm (usually "very unlikely" means, on average, less than one human life lost in a billion1000000000 (number) Overview

One thousand million is the natural number following 999,999,999 and preceding 1,000,000,001....
 hours of operation). An inherently safe system is a clever mechanical arrangement that cannot be made to cause harm – obviously the best arrangement, but this is not always possible. A fail-safeFail-safe

The term fail-safe is used to describe:...
system is one that cannot cause harm when it fails. A fault-tolerant system can continue to operate with faults, though its operation may be degraded in some fashion.

These terms combine to describe the safety needed by systems: For example, most biomedical equipment is only "critical", and often another identical piece of equipment is nearby, so it can be merely "probabilistically fail-safe". Train signals can cause "catastrophic" accidents (imagine chemical releases from tank-cars) and are usually "inherently safe". AircraftAircraft

An aircraft is any machine capable of atmospheric flight....
 "failures" are "catastrophic" (at least for their passengers and crew) so aircraft are usually "probabilistically fault-tolerant". Without any safety features, nuclear reactorFacts About Nuclear reactor

A nuclear reactor is a device in which nuclear chain reactions are initiated, controlled, and sustained at a steady rate ....
s might have "catastrophic failures", so real nuclear reactors are required to be at least "probabilistically fail-safe", and some such as pebble bed reactorPebble bed reactor Overview

The pebble bed reactor or pebble bed modular reactor is an advanced nuclear reactor design....
s are "inherently fault-tolerant".

The process

Ideally, safety-engineers take an early design of a system, analyze it to find what faults can occur, and then propose safety requirements in design specifications up front and changes to existing systems to make the system safer. In an early design stage, often a fail-safe system can be made acceptably safe with a few sensors and some software to read them. Probabilistic fault-tolerant systems can often be made by using more, but smaller and less-expensive pieces of equipment.

Far too often, rather than actually influencing the design, safety engineerSafety engineer Summary

Scope of a Safety EngineerTo perform their professional functions, safety engineering professionals must have education, training...
s are assigned to prove that an existing, completed design is safe. If a safety engineer then discovers significant safety problems late in the design process, correcting them can be very expensive. This type of error has the potential to waste large sums of money.

The exception to this conventional approach is the way some large government agencies approach safety engineering from a more proactive and proven process perspective. This is known as System SafetySystem safety

The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application ...
. The System Safety philosophy, supported by the System Safety Society and many other organisations, is to be applied to complex and critical systems, such as commercial airliners, military aircraft, munitions and complex weapon systems, spacecraft and space systems, rail and transportation systems, air traffic control system and more complex and safety-critical industrial systems. The proven System Safety methods and techniques are to prevent, eliminate and control hazards and risks through designed influences by a collaboration of key engineering disciplines and product teams. Software safety is fast growing field since modern systems functionality are increasingly being put under control of software. The whole concept of system safety and software safety, as a subset of systems engineering, is to influence safety-critical systems designs by conducting several types of hazard analyses to identify risks and to specify design safety features and procedures to strategically mitigate risk to acceptable levels before the system is certified.

Additionally, failure mitigation can go beyond design recommendations, particularly in the area of maintenance. There is an entire realm of safety and reliability engineering known as "Reliability Centered Maintenance" (RCM), which is a discipline that is a direct result of analyzing potential failures within a system and determining maintenance actions that can mitigate the risk of failure. This methodology is used extensively on aircraft and involves understanding the failure modes of the serviceable replaceable assemblies in addition to the means to detect or predict an impending failure. Every automobile owner is familiar with this concept when they take in their car to have the oil changed or brakes checked. Even filling up one's car with gas is a simple example of a failure mode (failure due to fuel starvation), a means of detection, and a maintenance action (fill 'er up!).

For large scale complex systems, hundreds if not thousands of maintenance actions can result from the failure analysis. These maintenance actions are based on conditions (e.g., gauge reading or leaky valve), hard conditions (e.g., a component is known to fail after 100 hrs of operation with 95% certainty), or require inspection to determine the maintenance action (e.g., metal fatigue). The Reliability Centered Maintenance concept then analyzes each individual maintenance item for its risk contribution to safety, mission, operational readiness, or cost to repair if a failure does occur. Then the sum total of all the maintenance actions are bundled into maintenance intervals so that maintenance is not occurring around the clock, but rather, at regular intervals. This bundling process introduces further complexity, as it might stretch some maintenance cycles, thereby increasing risk, but reduce others, thereby potentially reducing risk, with the end result being a comprehensive maintenance schedule, purpose built to reduce operational risk and ensure acceptable levels of operational readiness and availability.

Analysis techniques

The two most common fault modeling techniques are called "failure modes and effects analysis" and "fault tree analysis". These techniques are just ways of finding problems and of making plans to cope with failures, as in Probabilistic Risk AssessmentProbabilistic risk assessment

Probabilistic risk assessment is a systematic and comprehensive methodology to evaluate risks associated with a complex engi...
 (PRA or PSA). One of the earliest complete studies using PRA techniques on a commercial nuclear plant was the Reactor Safety Study (RSS), edited by Prof. Norman Rasmussen (see WASH-1400WASH-1400

WASH-1400, 'The Reactor Safety Study' was a report produced in 1975 for the USNRC by a committee of specialists under Profes...
)

Failure modes and effects analysis

In the technique known as "failure mode and effects analysisFailure mode and effects analysis

Failure mode and effects analysis is a method that examines potential failures in products or processes....
" (FMEA), an engineer starts with a block diagram of a system. The safety engineerSafety engineer Overview

Scope of a Safety EngineerTo perform their professional functions, safety engineering professionals must have education, training...
 then considers what happens if each block of the diagram fails. The engineer then draws up a table in which failures are paired with their effects and an evaluation of the effects. The design of the system is then corrected, and the table adjusted until the system is not known to have unacceptable problems. It is very helpful to have several engineers review the failure modes and effects analysis.

Fault tree analysis

First a little history to put FTA into perspective. It came out of work on the Minuteman Missile System. All the digital circuits used in the Minuteman Missile System were designed and tested extensively. The failure probabilities as well as failure modes well understood and documented for each circuit. I believe it was GTE/Sylvania, one of the prime contractors, discovered that the probability of failure for various components were easily constructed from the boolean expressions for those components. [Note there was one complex digital system constructed by GTE/Sylvania about that time with no logic diagrams only pages of boolean expressions. These worked out nicely because logic diagrams are designed to be read left to right the way the engineer creates the design. But when they fail the technicians must read them from right to left.] In any case this analysis of hardware lead to the use of the same symbology and thinking for what (with additional symbols) is now known as a Fault Tree. Note the de Morgan's equivalent of a fault tree is the success tree.

In the technique known as "fault tree analysis", an undesired effect is taken as the root ('top event') of a tree of logic. There should be only one Top Event and all concerns must tree down from it. This is also a consequence of another Minuteman Missile System requirement that all analysis be Top Down. By fiat there was to be no bottom up analysis. Then, each situation that could cause that effect is added to the tree as a series of logic expressions. When fault trees are labeled with actual numbers about failure probabilities, which are often in practice unavailable because of the expense of testing, computer programComputer program

Most computer programs consist of a list of instructions that explicitly implement an algorithm , another form of computer ...
s can calculate failure probabilities from fault trees.

The Tree is usually written out using conventional logic gateLogic gate

A logic gate performs a logical operation on one or more logic inputs and produces a single logic output....
 symbols. The route through a Tree between an event and an initiator in the tree is called a Cutset. The shortest credible way through the tree from Fault to initiating Event is called a Minimal Cutset.

Some industries use both Fault Trees and Event Trees (see Probabilistic Risk AssessmentProbabilistic risk assessment

Probabilistic risk assessment is a systematic and comprehensive methodology to evaluate risks associated with a complex engi...
). An Event Tree starts from an undesired initiator (loss of critical supply, component failure etc) and follows possible further system events through to a series of final consequences. As each new event is considered, a new node on the tree is added with a split of probabilities of taking either branch. The probabilities of a range of 'top events' arising from the initial event can then be seen.

Classic programs include the Electric Power Research InstituteElectric Power Research Institute

The Electric Power Research Institute conducts research on issues of interest to the electric power industry in the USA....
's (EPRI) CAFTA software, which is used by almost all the US nuclear power plants and by a majority of US and international aerospace manufacturers, and the Idaho National LaboratoryIdaho National Laboratory

The Idaho National Laboratory is an 890-square-mile complex located in the Idaho desert between the towns of Arco and Idaho ...
's SAPHIRESAPHIRE

SAPHIRE is a probabilistic risk and reliability assessment software tool....
, which is used by the U.S. Government to evaluate the safety and reliabilityReliability

In general, reliability is the ability of a system to perform and maintain its functions in routine circumstances, as well ...
 of nuclear reactorNuclear reactor

A nuclear reactor is a device in which nuclear chain reactions are initiated, controlled, and sustained at a steady rate ....
s, the Space ShuttleSpace Shuttle

NASA's Space Shuttle, officially called Space Transportation System , is the United States government's current manned...
, and the International Space StationInternational Space Station

The International Space Station is a manned research space facility that is being assembled in orbit around the Earth....
.

Safety certification

Usually a failure in safety-certifiedProduct certification

Product certification or product qualification is the cornerstone of all bounding and the process of certifying that a...
 systems is acceptable if, on average, less than one life per 109 hours of continuous operation is lost to failure. Most Western nuclear reactors, medical equipment, and commercial aircraftFacts About Aircraft

An aircraft is any machine capable of atmospheric flight....
 are certified to this level. The cost versus loss of lives has been considered appropriate at this level (by FAA for aircraft under Federal Aviation RegulationsFederal Aviation Regulations

The Federal Aviation Regulations, or FARs, are rules prescribed by the Federal Aviation Administration governing all ...
).

Preventing failure

Probabilistic fault tolerance: adding redundancy to equipment and systems


Once a failure mode is identified, it can usually be prevented entirely by adding extra equipment to the system. For example, nuclear reactors contain dangerous radiationFacts About Radiation

Radiation in Physics is the process of emitting energy in the form of waves or particles....
, and nuclear reactions can cause so much heatHeat

In physics, heat, symbolized by Q, is defined as energy in transit....
 that no substance might contain them. Therefore reactors have emergency core cooling systems to keep the temperature down, shielding to contain the radiation, and engineered barriers (usually several, nested, surmounted by a containment buildingContainment building

A containment building, in its most common usage, is a steel or concrete structure enclosing a nuclear reactor....
) to prevent accidental leakage.

Most biologicalBiology

Biology is the branch of science dealing with the study of life....
 organisms have a certain amount of redundancy: multiple organs, multiple limbs, etc.

For any given failure, a fail-over, or redundancy can almost always be designed and incorporated into a system.

When does safety stop, where does reliability begin?

Assume there is a new design for a submarineSubmarine

A submarine is a specialized watercraft that can operate underwater....
. In the first case, as the prototypePrototype

A prototype is an original type, form, or instance of some thing serving as a typical example, basis, epitome, or standard f...
 of the submarine is being moved to the testing tank, the main hatchTrapdoor

A trapdoor is a door set into a floor or ceiling....
 falls off. This would be easily defined as an unreliable hatch. Now the submarine is submerged to 10,000 feetFoot

The foot is a biological structure found in many animals that is used for locomotion. ...
, whereupon the hatch falls off again, and all on board are killed. The failure is the same in both cases, but in the second case it becomes a safety issue. Most people tend to judge risk on the basis of the likelihood of occurrence. Other people judge risk on the basis of their magnitude of regret, and are likely unwilling to accept risk no matter how unlikely the event. The former make good reliability engineers, the latter make good safety engineers.

Now let us say there is a need to design a HumveeHigh Mobility Multipurpose Wheeled Vehicle

The M998 High Mobility Multipurpose Wheeled Vehicle is a highly durable military motor vehicle....
 with a rocket launcherRocket launcher

Rocket launcher or missile launcher can mean:...
 attached. The reliability engineer could make a good case for installing launch switches all over the vehicle, making it very likely someone can reach one and launch the rocket. The safety engineer could make an equally compelling case for putting only two switches at opposite ends of the vehicle which must both be thrown to launch the rocket, thus ensuring the likelihood of an inadvertent launch was small. An additional irony is that it is unlikely that the two engineers can reconcile their differences, in which case a manager who doesn't understand the technology could choose one design over the other based on other criteria, like cost of manufacturing.

Inherent fail-safe design

When adding equipment is impractical (usually because of expense), then the least expensive form of design is often "inherently fail-safe". The typical approach is to arrange the system so that ordinary single failures cause the mechanism to shut down in a safe way. (For nuclear power plants, this is termed a passively safe design, although more than ordinary failures are covered.)

One of the most common fail-safe systems is the overflow tube in baths and kitchen sinkKitchen sink

*The kitchen sink is an English phrase used to denote wildly exaggerated inclusion....
s. If the valve sticks open, rather than causing an overflow and damage, the tank spills into an overflow.

Another common example is that in an elevatorElevator

An elevator is a transport device used to move goods or people vertically....
 the cable supporting the car keeps spring-loaded brakeBrake

----A brake is a device for slowing or stopping the motion of a machine or vehicle, and to keep it from starting to move ag...
s open. If the cable breaks, the brakes grab rails, and the elevator cabin does not fall.

Inherent fail-safes are common in medical equipment, traffic and railway signals, communications equipment, and safety equipment.

Containing Failure

It is also common practice to plan for the failure of safety systems through containment and isolation methods. The use of isolating valves, also known as the Block and bleed manifoldBlock and bleed manifold

system. The purpose of the block and bleed manifold is to isolate or block the flow of fluid in the system, so the fluid from ups...
, is very common in isolating pumps, tanks, and control valves that may fail or need routine maintenance. In addition, nearly all tanks containing oil or other hazardous chemicals are required to have containment barriers set up around them to contain 100% of the volume of the tank in the event of a catastrophic tank failure. Similarly, long pipelines have remote-closing valves periodically installed in the line so that in the event of failure, the entire pipeline is not lost. The goal of all such containment systems is to provide means of limiting the damage done by a failure to a small localized area.

See also

  • Earthquake engineeringEarthquake engineering

    Earthquake engineering is a subset of both structural and civil engineering....


Articles


Related concepts

  • Public safetyPublic Safety

    This is an article about the modern meaning of the term "public safety." See the Committee of Public Safety for the French Revol...
  • Safety engineerSafety engineer

    Scope of a Safety EngineerTo perform their professional functions, safety engineering professionals must have education, training...
  • System safetySystem safety Overview

    The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application ...
  • Nuclear safetyNuclear safety

    Nuclear safety covers the actions taken to prevent nuclear and radiation accidents or to limit their consequences....
  • Life-critical (also safety-critical)
  • Reliability engineeringReliability engineering

    Reliability engineering is the discipline of ensuring that a system will be reliable when operated in a specified manner....
  • Reliability theoryReliability theory

    Reliability theory developed apart from the mainstream of probability and statistics....
  • Reliability theory of aging and longevityReliability theory of aging and longevity

    Reliability theory of aging and longevity is a scientific approach aimed to gain theoretical insights into mechanisms of bio...
  • Human reliabilityHuman reliability

    Human reliability is related to the field of human factors engineering, and refers to the reliability of humans in fields su...
  • Risk assessmentRisk assessment

    Risk assessment is a step in the risk management process....
  • Risk managementRisk management Overview

    Generally, Risk Management is the process of measuring, or assessing risk and developing strategies to manage it....
  • Air brake (rail)Air brake (rail)

    On railways and trams an air brake is a brake operated by compressed air....
  • Biomedical engineeringBiomedical engineering

    Biomedical engineering is the application of engineering principles and techniques to the medical field....
  • SAPHIRESAPHIRE

    SAPHIRE is a probabilistic risk and reliability assessment software tool....
  • Some of the techniques of safety engineering have been applied to the field of security engineeringSecurity engineering

    Security engineering is the field of engineering dealing with the security and integrity of real-world systems....
    .
  • Redundancy (engineering)Redundancy (engineering)

    In engineering, the duplication of critical s of a system with the intention of increasing reliability of the system, usually in t...
  • Double switchingDouble switching

    Double switching is the practice in railway signalling in particular of cutting the power to a relay in both the positive an...
  • Workplace safetyWorkplace safety

    Workplace safety is an important management responsibility in industry....
  • DO-178BDO-178B

    DO-178B, Software Considerations in Airborne Systems and Equipment Certification is a standard for software development....
  • DO-254DO-254

    DO-254, Design Assurance Guidance for Airborne Electronic Hardware is a standard for complex electronic hardware development...
  • ARP4761ARP4761

    ARP4761, Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment is a st...
  • Hazard analysisHazard analysis

    In development of avionics, a hazard analysis is used to characterize the elements of risk....
  • HazopHazop

    Hazop is an abbreviation for Hazard and Operability Study....
  • Process Safety Management

External links

  • – A discussion about redundancy schemes.
  • (official website)
  • (official website)
  • (official website)
  • (official website)
  • (official website)
  • (official website)