Incident management
Encyclopedia
Incident Management refers to the activities of an organization to identify, analyze and correct hazards. For instance, a fire in a factory would be a risk that realized, or an incident that happened. An Incident Response Team (IRT)
Incident Response Team
An incident response team or emergency response team is a group of people who prepare for and respond to any emergency incident, such as a natural disaster or an interruption of business operations. Incident response teams are common in corporations as well as in public service organizations...

 or an Incident Management Team (IMT), specifically designated for the task beforehand or on the spot, would then manage the organization through the incident.

Usually as part of the wider management process in private organizations, incident management is followed by post-incident analysis where it is determined why the incident happened despite precautions and control
Internal control
In accounting and auditing, internal control is defined as a process effected by an organization's structure, work and authority flows, people and management information systems, designed to help the organization accomplish specific goals or objectives. It is a means by which an organization's...

s. This information is then used as feedback to further develop the security policy and/or its practical implementation.
In the USA, the National Incident Management System
National Incident Management System
The National Incident Management System is emergency management doctrine used nationwide to coordinate emergency preparedness and incident management and response among the public and private sectors.NIMS is a comprehensive, national approach to incident management that is applicable at all...

, developed by the Department of Homeland Security, integrates effective practices in emergency management
Emergency management
Emergency management is the generic name of an interdisciplinary field dealing with the strategic organizational management processes used to protect critical assets of an organization from hazard risks that can cause events like disasters or catastrophes and to ensure the continuance of the...

 into a comprehensive national framework.

Computer security incident management

A specific case of incident management is computer security incident management
Computer security incident management
In the fields of computer security and information technology, computer security incident management involves the monitoring and detection of security events on a computer or computer network, and the execution of proper responses to those events...

, which is most often handled by a Computer Security Incident Response Team (CSIRT)
CSIRT
Computer Emergency Response Team is a name given to expert groups that handle computer security incidents. Most groups append the abbreviation CERT or CSIRT to their designation where the latter stands for Computer Security Incident Response Team...

. For example, if an organization discovers that an intruder has gained unauthorized access to a computer system, the CSIRT team would analyze the situation, determine the breadth of the compromise, and take corrective action. Computer forensics
Computer forensics
Computer forensics is a branch of digital forensic science pertaining to legal evidence found in computers and digital storage media...

 is one task included in this process.

Incident Management Process, as defined by ITIL

Incident management can be defined as : “Incident Definition as per V3” An unplanned interruption to an IT Service or a reduction in the Quality of an IT Service. Failure of a Configuration Item that has not yet impacted Service is also an Incident. For example, Failure of one disk from a mirror set.
An “Incident Definition as per V2” An event which is not part of the standard operation of a service and which causes or may cause disruption to or a reduction in the quality of services and Customer productivity.
The objective of incident management is to restore normal operations as quickly as possible with the least possible impact on either the business or the user, at a cost-effective price.

The Incident Manager is a functional role and not a position.Incident management provides to the external customer a focal point for leadership and drive during an event by ensuring adherence to follow-up on commitments and adequate information flow. This means, presenting to the customer an entity that accepts ownership of their problem.

The objective of Incident Management during an incident is service restoration as quickly as possible. The objective is not to make a system perfect. If service can be restored by a temporary workaround quicker than by correcting the underlying root cause of the issue then that is acceptable. After service restoration, correction of underlying root causes is done by the Problem Management team by a process called Root Cause Analysis (RCA)
Root cause analysis
Root cause analysis is a class of problem solving methods aimed at identifying the root causes of problems or events.Root Cause Analysis is any structured approach to identifying the factors that resulted in the nature, the magnitude, the location, and the timing of the harmful outcomes of one...

. An example of service restoration by temporary workaround is that done on the Apollo 13
Apollo 13
Apollo 13 was the seventh manned mission in the American Apollo space program and the third intended to land on the Moon. The craft was launched on April 11, 1970, at 13:13 CST. The landing was aborted after an oxygen tank exploded two days later, crippling the service module upon which the Command...

.

The primary focus of Incident Management is to ensure a prompt recovery of the system, supervising and directing the internal or external resources. Prompt system recovery and minimization of any impact to the customer’s, has priority over unreasonably long and intensive data collection for the event root cause investigation.

Incidents can be classified into three primary categories: Software (applications), hardware, and service requests. (Note that service requests are not always regarded as an incident, but rather a request for change. However, the handling of failures and the handling of service requests are similar and therefore are included in the definition and scope of the process of incident management.)

ITIL
Itil
Itil may mean:*Atil or Itil, the ancient capital of Khazaria*Itil , also Idel, Atil, Atal, the ancient and modern Turkic name of the river Volga.ITIL can stand for:*Information Technology Infrastructure Library...

 separates incident management into six basic components:
  • Incident detection and recording
  • Classification and initial support
  • Investigation and diagnosis
  • Resolution and recovery
  • Incident closure
  • Ownership, monitoring, tracking, and communication (monitoring the progress of the resolution of the incident and keeping those who are affected by the incident up to date with the status)


From ITIL
Itil
Itil may mean:*Atil or Itil, the ancient capital of Khazaria*Itil , also Idel, Atil, Atal, the ancient and modern Turkic name of the river Volga.ITIL can stand for:*Information Technology Infrastructure Library...

 point of view, the activities of Incident Management are:
  • Take ownership for an incident and act as the primary level of escalation
  • Provide a prompt recovery of the business within the specified Service level agreement
    Service Level Agreement
    A service-level agreement is a part of a service contract where the level of service is formally defined. In practice, the term SLA is sometimes used to refer to the contracted delivery time or performance...

     or SLA
  • Assure that the focus on the incident resolution is not taken away by other activities
  • Escalating incidents: functional (the support of a higher technical skills are needed to solve the problem) and hierarchical (a manager with more authority to be consulted in order to take decision that are beyond the competencies assigned to this level)
  • Send incident notifications to the customer (documents that contains detail information)
  • Setting-up and leading conference call or bridge communication between all involved parties
  • Keep tracking and records of the time lines
  • Act as an interface towards other technicians, customer technical staff and other groups within the organization.


An Incident Manager should be able to:
  • understand any incident/fault on a basic level (at least) in order to use the appropriate competences (resources)
  • drive the restoration team to gather sufficient information to start an analysis
  • maintain a general overview of the incident (keeping the focusing on the restoration via a workaround)
  • understand the functionality of multiple areas (RAN, Core Network, VAS, BSS/OSS)

Incident management software systems

Incident management software systems are designed for collecting consistent, documented Incident report data. Many of these products include features to automate the approval process of an incident report or case investigation. Additionally incident report systems will automatically send notifications, assign tasks and escalations to appropriate individuals depending on the incident type, priority, time, status and custom criteria. Modern products provide the ability for administrators to configure the Incident report forms as needed, create analysis reports and set access controls on the data.

See also

  • National Incident Management System
    National Incident Management System
    The National Incident Management System is emergency management doctrine used nationwide to coordinate emergency preparedness and incident management and response among the public and private sectors.NIMS is a comprehensive, national approach to incident management that is applicable at all...

     in the USA
  • Coordinated Regional Incident Management (Netherlands)
    Coordinated Regional Incident Management (Netherlands)
    The Coordinated Regional Incident-Management Procedure or Gecoördineerde Regionale Incidentbestrijdings Procedure is a nationwide emergency management procedure in the Netherlands....

     in the Netherlands
  • O’Callaghan, Katherine Mary, "Incident Management: Human Factors and Minimising Mean Time to Restore", Ph.D. Thesis, Australian Catholic University, 2010.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK