Professional Documents
Culture Documents
Session Number: 3
Making the Most of Alarms as a Layer of Protection
Todd Stauffer
Director Alarm Management Services, exida LLC
Abstract
Alarms and operator response are one of the first layers of protection in
preventing a plant upset from escalating into a hazardous event. This paper
discusses practices and procedures for maximizing the risk reduction of this
layer when it is considered in a layer of protection analysis. It reviews how the
performance of the alarm system can impact the probability of failure on
demand (PFD) when the alarm is in service. Key recommendations will be
drawn from the new ISA-18.2 standard on alarm management.
Introduction
Alarm management and functional safety go hand-in-hand. Alarms play a
significant role in maintaining plant safety. They are a means of risk reduction
(layer of protection) to prevent the occurrence of a process hazard, as shown in
Figure 1. The performance of the alarm system can impact the design of the
safety instrumented system (SIS) by limiting the level of risk reduction that can
be credited to an alarm. This effects the integrity requirement of any Safety
Instrumented Function (SIF) that is used in conjunction with an alarm.
Figure 2 Risk Reduction through the use of multiple protection layers [3]
Thus poor alarm management could reduce the protective capability of this
layer or eliminate it altogether, which could mean that that the actual risk
reduction no longer meets or exceeds the company-defined tolerable risk level.
This could have a ripple effect on the Safety Integrity Level (SIL) requirements
for numerous SIFs throughout the plant. The higher the SIL level, the more
complicated and expensive is the Safety Instrumented System (SIS). A higher
SIL will also require more frequent proof testing, which adds cost and can be
burdensome in many plants.
Safety Control Systems Conference IDC Technologies (May 2010)
Alarm Management
In June of 2009 the International Society of Automation (ISA) released the
standard ANSI/ISA-18.2, Management of Alarm Systems for the Process
Industries (ISA-18.2) [4]. ISA-18.2 provides a framework for the successful
design, implementation, operation and maintenance of alarm systems in a
process plant. It contains guidance on how to address the most common alarm
management problems and on how to sustain the performance of the alarm
system over time. The standard is expected to be recognized and generally
accepted good engineering practice (RAGAGEP) by both insurance
companies and regulatory agencies.
ISA-18.2 prescribes following a lifecycle approach, similar to the functional
safety standard IEC 61511/ ISA-84 [5,6]. Following the alarm management
lifecycle helps achieve optimum alarm system performance and thus is
absolutely critical to making the most of alarms as a layer or protection.
J
Philosophy
Identification
Rationalization
Management
of Change
Detailed Design
Audit
Implementation
Operation
H
Monitoring &
Assessment
Maintenance
It is interesting to note that the tank high level alarm in the Buncefield depot
incident would not have qualified as an IPL since the alarm was not
independent from the initiating event (the failure of the associated tank level
measurement).
In a LOPA the frequency of a potentially dangerous event is calculated by
multiplying the probability of failure on demand (PFD) of each individual layer of
protection times the frequency of the initiating event. In the example LOPA of
Figure 4, the likelihood of a fire occurring after the release of flammable
materials is calculated assuming that the initiating event (the loss of jacket
cooling water) occurs once every two years. In this example the operator
response to alarm layer was assigned a PFD of 0.2.
Initiating Event
Loss of Cooling
Water
Outcome
Fire
0.3
0.07
2.10E-05
Fire
0.2
0.01
0.5 / yr
No Event
Determining PFD
The PFD for the operators response to an alarm can be determined by adding
two separate contributions:
1) the probability that the alarm fails to annunciate, and
2) the probability that the operator fails to successfully detect, diagnose, and
respond to the alarm correctly and within the allowable time.
To analyze the PFD of the operator response to alarm layer we must first look
at the sequence of events which would make for a successful operator
response. The first step after the initiating event is the triggering / annunciation
of the alarm. If a failure were to occur in the hardware or software associated
with the alarm (the sensor, the control logic for triggering the alarm, or the
operator interface) then the alarm would never be annunciated. This represents
the probability of failure on demand that the alarm is annunciated.
Once the alarm is annunciated, a series of steps must be performed by the
operator to bring the process back to the normal operating range (reference) as
shown in Figure 5.
b)
c)
Category
Description
Probability
that Operator
responds
successfully
PFD
SIL
90%
0.1
99%
0.01
0%
1.0
Protection Layer
PFD
Control Loop
1.0 x 10-1
0.5 to 1.0
1.0 x 10-1
which is the time between the initiating event and occurrence of the hazardous
event, then the alarm plus operator action has failed as a layer protection.
Therefore a key requirement for a Safety IPL alarm to be valid is
t Detect, Diagnose, & Respond + t Deadtime < t Process Safety Time.
It is also important to define a minimum operator response time, which is the
minimum amount of time that must be provided to the operator for them to
respond to any alarm in the system. This parameter can vary from 10 to 30
minutes depending upon company operation and the time required to carry out
the alarm response actions. If the operator must leave the control room to go
out into the field or work with an outside operator, then the minimum response
time will need to reflect this. In some cases an operator activity analysis may
be required to set the minimum operator response time.
During a LOPA, it is important to factor the operator response time into the
analysis. Comparing the process safety time for the hazardous event to the
minimum operator response time is a good first step to determine whether an
alarm can be used as a layer of protection. If the process safety time is not
greater than the minimum operator response time, preferably by 5 minutes or
more, then the alarm will not likely make it as an IPL.
t Process Safety Time > t Minimum Operator Response Time
In a high pressure event, the process safety time may be short (on the order of
30 seconds), which would preclude the use of operator intervention and dictate
the need for an automated protection response. One company has set a
minimum operator response time of 10 minutes and prescribed that any alarm
which has a process safety time of less than 10 minutes cannot be claimed as
a layer of protection (PFD = 1.0).
In some cases it is possible to adjust the alarm limit (setpoint) to increase the
time available for the operator so that it is greater than the minimum operator
response time.
t Detect, Diagnose, & Respond > t Minimum Operator Response Time
Adjusting the alarm limit can have significant tradeoffs. Setting the alarm limit
closer to the normal operating conditions will provide the operator with greater
time to respond, but may result in nuisance alarms under some operating
conditions. The occurrence of nuisance alarms can reduce the operators
confidence in the alarm and affect the probability that they would initiate the
required actions in the event of a genuine alarm. Setting the alarm limit closer
to the consequence threshold maintains the operators confidence in the alarm,
but affects the probability that the operator would complete the required action
in time.
10
11
12
13
Target Value
~6 (average)
~12 (average)
~1 (average)
~2 (average)
Metric
Target Value
~<1%
~<1%
10
~<1%
Stale Alarms
14
Figure 9. Measuring Alarm Load on the Operator (Avg # of Alarms /10 mins)
A related metric, shown in Table 3, is the percentage of ten minute intervals in
which the operator received more than ten alarms (which indicates the
presence of an alarm flood). If operators are overloaded with alarms during an
upset, then it decreases the likelihood that they will respond correctly in the
event of a safety IPL alarm. Overall performance has a direct impact on the
operators ability to successfully respond to individual alarms. A poorly
performing alarm system correlates to the Response Unlikely / Unreliable
level shown in Table 1.
In addition to measuring overall alarm system performance, it is also
recommended to analyze the performance of safety IPL alarms as a group.
Information such as the number of times a safety IPL alarm has been triggered
can be used to evaluate and validate the assumptions of initiating event
frequency and whether the alarm is really a valid layer of protection. Looking at
how often safety IPL alarms are suppressed (via shelving or by design) is also
important for verifying the integrity of the alarm.
Conclusion
Alarm system design and performance has a significant impact on the ability of
the operator to maintain plant safety. In particular it affects the probability of
failure on demand of alarms and operator intervention when used as a layer of
protection. The presence of nuisance alarms, alarm floods and poorly designed
HMI screens (from a human factors point of view) will have a direct effect on
whether the operator can detect, diagnose and respond to an alarm in time.
Some practitioners might go as far as downgrading the risk reduction provided
by an alarm protection layer if the alarm system does not meet certain
performance criteria!
Following the alarm management lifecycle and recommendations of ISA-18.2
will help get the most of out of alarms as a layer of protection. Because of the
interaction between functional safety design and alarm management,
practioners are urged to take a holistic approach leading to increased plant
safety, reduced risk, and better operational performance.
15
References
1. The Buncefield Investigation www.buncefieldinvestigation.gov.uk/reports/index.htm
2. Stauffer, T., Sands, N., and Dunn, D., Get a Life(cycle)! Connecting
Alarm Management and Safety Instrumented Systems ISA Safety &
Security Symposium (2010).
3. ANSI/ISA-84.00.01-2004 Part 3 (IEC 61511-3 Mod) Functional Safety:
Safety Instrumented Systems for the Process Industry Sector- Part 3
4. ANSI/ISA-18.2-2009 Management of Alarm Systems for the Process
Industries.
5. ANSI/ISA-84.00.01-2004 Part 1 (IEC 61511-1 Mod) Functional Safety:
Safety Instrumented Systems for the Process Industry Sector.
6. Stauffer, T., Sands, N., and Dunn, D., Alarm Management and ISA-18
A Journey, Not a Destination Texas A&M Instrumentation Symposium
(2010).
7. Timms, C.R., How to Achieve 90% of the Gain Without Too Much Pain,
Measurement + Control, May 2004.
8. Marszal, E. and Scharpf, E. Safety Integrity Level Selection. ISA (2002)
9. Kletz, Trevor A., What Went Wrong: Case Histories of Process Plant
Disasters Fourth Edition, Gulf Publishing Co., 1999.
10. Nimmo, I., The Operator as IPL, Hydrocarbon Engineering, September
2005.
11. EEMUA (2007), Alarm Systems: A Guide to Design, Management and
Procurement Edition 2. The Engineering Equipment and Materials
Users Association.
12. Swain, A.D & Gutterman, H.E., Handbook of Human Reliability Analysis
with Emphasis on Nuclear Power Plant Applications, NUREG/CR-1278,
Albaquerque, NM, Sandia National Laboratories, 1983.
13. Williams, J.C., A data-based method for addressing and reducing
human error performance, IEEE Conference on Human Factors in
Nuclear Power, Monterey, CA, 436-450, June 1988.
14. Zapata, R. and P. Andow, Reducing the Severity of Alarm Floods,
Proceedings, Honeywell Users Group Americas Symposium 2008,
Honeywell, Phoenix, Ariz (2008)
15. Brown, N., Alarm Management / The EEMUA Guidelines in Practice,
Measurement + Control, May 2003.
16. Stauffer, T, and Hatch, D., Saved by the Bell: Using Alarm Management
to make Your Plant Safer, exida, 2009
16