All Topics  
Cascading failure

 

   Email Print
   Bookmark   Link






 

Cascading failure



 
 
A cascading failure is a failure in a system of interconnected parts in which the failure of a part can trigger the failure of successive parts.

Cascading failure in power transmission
Cascading failure is common in power grids when one of the elements fails (completely or partially) and shifts its load to nearby elements in the system. Those nearby elements are then pushed beyond their capacity so they become overloaded and shift their load onto other elements.






Discussion
Ask a question about 'Cascading failure'
Start a new discussion about 'Cascading failure'
Answer questions from other users
Full Discussion Forum



Encyclopedia


A cascading failure is a failure in a system of interconnected parts in which the failure of a part can trigger the failure of successive parts.

Cascading failure in power transmission


Cascading failure is common in power grids when one of the elements fails (completely or partially) and shifts its load to nearby elements in the system. Those nearby elements are then pushed beyond their capacity so they become overloaded and shift their load onto other elements. Cascading failure is a common effect seen in high voltage
High voltage

The term high voltage characterizes electrical circuits, in which the voltage used is the cause of particular safety concerns and insulation requirements....
 systems, where a single point of failure
Single Point of Failure

A Single Point of Failure, , is a part of a system which, if it fails, will stop the entire system from working. They are undesirable in any system whose goal is high availability, be it a network, software application or other industrial system....
 (SPF) on a fully loaded or slightly overloaded system results in a sudden spike across all nodes of the system. This surge current can induce the already overloaded nodes into failure, setting off more overloads and thereby taking down the entire system in a very short time.

This failure process cascades through the elements of the system like a ripple on a pond and continues until substantially all of the elements in the system are compromised and/or the system becomes functionally disconnected from the source of its load. For example, under certain conditions a large power grid can collapse after the failure of a single transformer.

Monitoring the operation of a system, in real-time
Real-time computing

In computer science, real-time computing is the study of Computer hardware and computer software systems that are subject to a "real-time constraint"?i.e., operational deadlines from event to system response....
, and judicious disconnection of parts can help stop a cascade. Another common technique is to calculate a safety margin for the system by computer simulation of possible failures, to establish safe operating levels below which none of the calculated scenarios is predicted to cause cascading failure, and to identify the parts of the network which are most likely to cause cascading failures.

Examples

  • Blackout in northeast America in 1965
    Northeast Blackout of 1965

    The Northeast Blackout of 1965 was a significant disruption in the electricity distribution on November 9, 1965, affecting Ontario, Canada and Connecticut, Massachusetts, New Hampshire, Rhode Island, Vermont, New York, and New Jersey in the United States....
  • Blackout in northeast America in 2003
    Northeast Blackout of 2003

    The Northeast Blackout of 2003 was a massive widespread power outage that occurred throughout parts of the northeastern United States and Midwestern United States, and Ontario, Canada on Thursday, August 14, 2003, at approximately 4:15 pm EDT , with virtually full restoration by the following day....
  • Blackout in Italy in 2003
    2003 Italy blackout

    The 2003 Italy blackout was a serious power outage that affected all of Italy—except the island of Sardinia—for 9 hours and part of Switzerland near Geneva for 3 hours on 28 September 2003....
  • Blackout in London in 2003
    2003 London blackout

    The 2003 London blackout was a serious power outage that occurred in parts of southern London and north-west Kent on 28 August 2003. It was the largest blackout in South East England since the Great Storm of 1987, affecting an estimated 500,000 people....


Cascading failure in computer networks


Cascading failures can also occur in computer network
Computer network

A computer network is a group of interconnected computers. Networks may be classified according to a wide variety of characteristics. This article provides a general overview of some types and categories and also presents the basic components of a network....
s (such as the Internet
Internet

The Internet is a global network of interconnected computers, enabling users to share information along multiple channels. Typically, a computer that connects to the Internet can access information from a vast array of available server and other computers by moving information from them to the computer's local memory....
) in which network traffic
Network traffic control

In computer networking, network traffic control is the process of managing, prioritising, controlling or reducing the network traffic, particularly Internet Bandwidth , used by network administrators, to reduce congestion, Lag and packet loss....
 is severely impaired or halted to or between larger sections of the network, caused by failing or disconnected hardware or software. In this context, the cascading failure is known by the term cascade failure. A cascade failure can affect large groups of people and systems.

The cause of a cascade failure is usually the overloading of a single, crucial router
Router

A router is a Computer network device whose software and hardware are usually tailored to the tasks of routing and forwarding information. For example, on the Internet, information is directed to various paths by routers....
 or node, which causes the node to go down, even briefly. It can also be caused by taking a node down for maintenance or upgrades. In either case, traffic is routed
Routing

Routing is the process of selecting paths in a network along which to send network traffic. Routing is performed for many kinds of networks, including the PSTN, Computer network , and transport network....
 to or through another (alternative) path. This alternative path, as a result, becomes overloaded, causing it to go down, and so on. It will also affect systems which depend on the node for regular operation.

Symptoms


The symptoms of a cascade failure are easy to see: packet loss
Packet loss

Packet loss occurs when one or more packet s of data traveling across a computer network fail to reach their destination. Packet loss is distinguished as one of the three main error types encountered in digital communications; the other two being bit error and spurious packets caused due to noise....
 and high network latency
Lag

In computing and especially computer networks, lag is a term used where the computer freezes and then continues some time later when an action is performed, for example clicking a mouse button....
, not just to single systems, but to whole sections of a network or the internet. The high latency and packet loss is caused by the nodes that fail to operate due to congestion collapse, which causes them to still be present in the network but without much or any useful communication going through them. As a result, routes can still be considered valid, without them actually providing communication.

If enough routes go down because of a cascade failure, a complete section of the network or internet can become unreachable. Although undesired, this can help speed up the recovery from this failure as connections will time out, and other nodes will give up trying to establish connections to the section(s) that have become cut off, decreasing load on the involved nodes.

A common thing to see during a cascade failure is a walking failure, where sections go down, causing the next section to fail, after which the first section comes back up. This ripple can make several passes through the same sections or connecting nodes before stability is restored.

History


Cascade failures are a relatively recent development, with the massive increase in traffic and the high interconnectivity between systems and networks. The term was first applied in this context in the late 1990s by a Dutch IT professional and has slowly become a relatively common term for this kind of large-scale failure.

Example

The animation shown here illustrates an example of a connecting node between a local ISP and their Internet backbone
Internet backbone

The Internet backbone refers to the main Trunking connections of the Internet. It is made up of a large collection of interconnected commercial, government, academic and other high-capacity data routes and core routers that carry data across the countries, continents and oceans of the world....
 being overloaded.

Initially, the traffic that would normally go through the node is stopped. Systems and users get errors about not being able to reach hosts. Usually, the redundant systems of an ISP respond very quickly, choosing another path through a different backbone. The routing path through this alternative route is longer, with more hops
Hop (telecommunications)

In telecommunication, the term hop has the following meanings:#The excursion of a radio wave from the Earth to the ionosphere and back to the Earth....
 and subsequently going through more systems that normally do not process the amount of traffic suddenly offered.

This can cause one or more systems along the alternative route to go down, creating similar problems of their own.

Also, related systems are affected in this case. As an example, DNS
Domain name system

The Domain Name System is a hierarchical naming system for computers, services, or any resource participating in the Internet. It associates various information with domain names assigned to such participants....
 resolution might fail and what would normally cause systems to be interconnected, might break connections that are not even directly involved in the actual systems that went down. This, in turn, may cause seemingly unrelated nodes to develop problems, that can cause another cascade failure all on its own.

Other examples of cascading failure


Analogues to this exist in biology of cascade-like effects where a small reaction can have system-wide implications. One example to this is the release of toxin
Toxin

A toxin is a poisonous substance produced by living cells or organisms. For a toxic substance not produced by living organisms, "toxicant" is the more appropriate term, and "toxics" is an acceptable plural....
s caused by a small ischaemic
Ischemia

In medicine, ischemia is a restriction in blood supply, generally due to factors in the blood vessels, with resultant damage or dysfunction of tissue....
 attack, which kill off far more cells than the initial damage, resulting in more toxins being released. Current research is to find a way to block this cascade in stroke patients to minimize the damage.

Another example is the Cockcroft-Walton generator
Cockcroft-Walton generator

The Cockcroft-Walton generator, or multiplier, was named after the two men who in 1932 used this circuit design to power their particle accelerator, performing the first artificial nuclear disintegration in history....
, which can also experience cascade failures wherein one failed diode
Diode

In electronics, a diode is a two-terminal device .Diodes have two active electrodes between which the signal of interest may flow, and most are used for their unidirectional electric current property....
 can result in all the diodes failing in a fraction of a second.

Yet another example of this effect in a scientific experiment was the implosion in 2001 of several thousand fragile glass photomultiplier tubes used in the Super-Kamiokande
Super-Kamiokande

Super-Kamiokande, or Super-K for short, is a Neutrino detector in the city of Hida, Gifu, Gifu Prefecture, Japan. The observatory was designed to search for proton decay, study solar neutrino and Neutrino#Atmospheric neutrinoss, and keep watch for supernovas in the Milky Way Galaxy....
 experiment, where the shock wave caused by the failure of a single detector appears to have triggered the implosion of the other detectors in a chain reaction.

See also

  • Butterfly effect
    Butterfly effect

    The butterfly effect is a phrase that encapsulates the more technical notion of sensitive dependence on initial conditions in chaos theory....
  • Byzantine failure
  • Chain reaction
    Chain reaction

    A chain reaction is a sequence of reactions where a reactive product or by-product causes additional reactions to take place. In a chain reaction, positive feedback leads to a self-amplifying chain of events....
  • Congestion collapse
  • Cascading rollback
    Cascading rollback

    A cascading rollback occurs in database systems when a Database transaction causes a failure and a Rollback must be performed. Other transactions dependent on T1's actions must also be rolled back due to T1's failure, thus causing a cascading effect....


External links

  • (Monash University's Virtual Lab)
  • A. E. Motter and Y.-C. Lai, Physical Review E (Rapid Communications) 66, 065102 (2002).
  • Ian Dobson, Benjamin A. Carreras, and David E. Newman, A loading-dependent model of probabilistic cascading failure, Probability in the Engineering and Informational Sciences, vol. 19, no. 1, January 2005, pp. 15-32.
  • on September 2, 1998. Swissair Flight 111
    Swissair Flight 111

    Swissair Flight 111 was a Swissair McDonnell Douglas MD-11 on a scheduled airline flight from John F. Kennedy International Airport in New York City, United States to Cointrin International Airport in Geneva, Switzerland....
     flying from New York to Geneva slammed into the Atlantic Ocean off the coast of Nova Scotia with 229 people aboard. Originally believed a terrorist act. After $39 million investigation, insurance settlement of $1.5 billion and more than four years, investigators unravel the puzzle: cascading failure. What is the legacy of Swissair 111? "We have a window into the internal structure of design, checks and balances, protection, and safety." -David Evans, Editor-in-Chief of Air Safety Week.
  • PhysicsWeb story: