You are on page 1of 6

Evaluating Relief Valve Reliability

When Extending the Test and


Maintenance Interval
Stanley A. Urbanik
Process Safety & Fire Protection Engineering, DuPont, Wilmington, DE 19898
Published online 5 November 2004 in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/prs.10030
Presently, more and more programs appear that
provide processes for evaluating and extending the
time between major process shutdowns and maintenance. RBI (risk-based inspections) and Six Sigma processes are two examples of methods currently in use to
help drive shutdown extension programs. The relief
valves on these processes often go along with the shutdown extension programs without a thorough understanding of the impact on relief valve reliability. Using
an analysis for a stand-by system, the change in PFD
(probability of failure on demand) of the relief system
as a function of change in test interval is shown.
Human error is another factor that compromises relief
valve reliability. Some arguments are offered that suggest reduced inspections increase reliability because
there are fewer opportunities for human error. This
study addresses the human error question and shows
that the human error contribution is constant while the
relief valve is in service. Finally, the increase in PFD
when increasing the test interval is discussed in terms
of the potential increase in risk. The increased risk as a
percentage change should be communicated and understood by those deciding to extend shutdown intervals. 2004 American Institute of Chemical Engineers
Process Saf Prog 23: 191196, 2004
INTRODUCTION

The purpose of this paper is to propose the basic


questions and process safety issues associated with the
testing and maintenance of relief valve systems. At the
end of the day the chemical manufacturing businesses
should be aware of the temporary increase in risk
generated by the extension of test and maintenance
intervals for relief valve systems.
The key features of process safety risk are described
in Guidelines for Chemical Process Quantitative Risk
2004 American Institute of Chemical Engineers

Process Safety Progress (Vol.23, No.3)

Analysis [3]. Relief valve reliability is often a key component in the safe operation of chemical plants. Some
larger sites can have literally hundreds of relief valves
and their reliability plays an important role in the safe
operation of these plants.
However, relief valves have limitations. Recent proposed changes to the ASME pressure vessel protection
code [4] provide for chemical process evaluation and
modication with safety instrumented systems in place
of relief valves. In addition, new risk evaluation methods, such as layer of protection analysis [2], require the
correct choice of relief valve PFD (probability of failure
on demand) for correct risk reduction recommendations.
This work does not propose any framework or value
system for judging risk for these activities. It attempts
only to ask a few questions about the characteristics
and key features of relief valve systems. The objective
is examined in the two Process Safety Management
Questions presented in the text. The hope is that the
thought and discussion generated through these questions leads to a better understanding of relief valve
system reliability.
This is a report of work in progress. There are no
nal conclusions or recommendations.
STANDBY SAFETY SYSTEMS

In general, safety systems are either active or


standby. A standby system is one that is held in reserve [1] until needed. A standby system will be successful if it starts when needed, and runs for the prescribed length of time. Because this type of system is in
the standby mode, it can never be the initiator for a
scenario. Therefore its reliability characteristic is a
probability.
The basic relationship that denes the standby system is
Undependability PFOD PFTR.
September 2004

191

The undependability of the standby system is equal to


the probability of failure to start on demand plus the
probability the standby system does not run for the
expected mission time.
In many standby systems such as an auxiliary diesel
generator, the mission time is important to the analysis.
In the study of relief systems, we are mainly concerned
with the failure of the relief system to perform when the
overpressure occurs and we assume that the relief system will remain open as necessary for the duration of
the overpressure event. However, there have been instances when a relief system became plugged or partially plugged while open and relieving. The system
failed to complete the mission and a dangerous overpressure occurred. For the most part, if the relief system
opens, it functions for the duration of the overpressure.
With this simplifying assumption the basic characteristic of relief system reliability is the unavailability. The
undependability then becomes the PFOD (or PFD) and
is equal to the average unavailability (a) of the system.
For a standby system, the average unavailability is
given by the following expression:

ROF t atdt

(1)

ROF tdt

where ROF(t) is the rate of failure of the initiating event


challenging the standby system, a(t) is the unavailability as a function of time, and T is the test interval for the
standby system. In most systems we assume that the
initiating event is equally likely to occur on any given
day. Therefore the rate of failure of the initiating event
is constant over the interval of interest and Eq. 1 becomes

atdt

(2)

If the unavailability is constant over the test interval, Eq.


2 becomes the familiar
a T/ 2,

(3)

where is the failure rate of the standby system and is


equal to the inverse of the mean time to failure of the
system (MTTF).
Relief Valves as Standby Systems
Relief valve systems fall into the classication of
standby systems because the only times we know they
are working is when the process challenges them, and
when they are tested and maintained. For the purposes
of this discussion, the relief system consists of the inlet
nozzle and piping to the relief valve, the relief valve,
192

September 2004

and the discharge piping from the valve outlet to the


rst major piping change (or to the atmosphere). This
would also include any valve(s) in the inlet or discharge piping that could be incorrectly left in the
closed position. For the relief system to be available, all
of these components must work as designed.
Many standby system evaluations include the test
and maintenance time in the calculation of average
unavailability. This is not the case for relief valve systems. In the majority of cases, the relief valve systems
are tested and maintained while the process is down.
Therefore, the fact that the relief valve system is unavailable while the process is down is not an issue.
HUMAN ERROR

Equation 2 above describes the PFD of the standby


system while the system is in operation. It specically
describes the deterioration in the availability of the
system to perform when challenged. This deterioration
could be the result of corrosion, plugging, or any other
mechanism that compromises the system while it is in
the standby mode. However, Eq. 2 does not account for
the contribution to the PFD from human error. This
includes all the ways that the system could be put into
service incorrectly or reassembled incorrectly in the
shop. This kind of error does not change with time; it is
constant. Once the valve is rebuilt or installed incorrectly it remains in that state until it is removed for
testing and maintenance, or it is found in an overpressure accident. Consider the following example:
Say we had 100 relief valves in the shop ready to

be put in service on the process.


I purposely compromise one of the valves by

installing the wrong (much higher) spring force.


I tell you that one of the 100 valves will not work

and you have one chance to pick the bad valve


before installing the 100 valves on the process.
What are the chances that you will pick the bad
valve?
The obvious answer is 0.01, or you have one chance in
100 to select the bad valve.
Now lets say that you do not know for sure that any
of the valves are bad because I do not say anything
about the bad valve. All 100 valves are put in service on
the process. At some time after the process is up and
running the process experiences an overpressure event
that could be relieved by one of the relief valves.
Assuming that this is an ideal process with no deterioration of the relief systems, what are the chances that
the overpressure scenario involves the bad valve? The
answer is still 0.01 or one in 100.
This component of unavailability, attributed to human error, is included in the integral expression for
average unavailability as a constant. Equation 2 now
becomes

HE atdt

(4)

Process Safety Process (Vol.23, No.3)

Figure 1. Plot of PFD vs. time (years).

where HE is the constant human error component.

By performing the integral and evaluating, we obtain

EVALUATION OF THE RELIEF SYSTEM

To evaluate the PFD of the relief system, Eq. 4 allows


the inclusion of human error as well as the systematic
error from process specic considerations. For this discussion, consider Figure 1. Figure 1 is a graph of PFD
(average unavailability of the relief system) vs. time.
Intuitively, we know that in most chemical manufacturing applications, the relief system availability will
deteriorate with time. This gure graphically represents
this process. The straight line labeled 1 is the failure
rate of the relief system under study. It is straight and
constant because we assume that the system operates
in the constant (normal life) portion of the failure rate
curve. In this example, 1 0.001/yr. This value represents an MTTF of 1,000 years. In addition, the human
error probability is assumed to be 0.002 or for every
1,000 relief valves tested, repaired, and reinstalled in
the process: two valves will be compromised. Equation
4 now becomes

0.002 0.001tdt

Process Safety Progress (Vol.23, No.3)

0.0023 0.0020 1/20.00132 0.00102


.
3

a 0.0035.
Therefore, the average PFD for this relief system with a
3-year test interval, an MTTF of 1,000 years, and a
background human error probability of 0.002 is 0.0035.
Now, the test interval for this relief system is extended to 8 years. Referring again to Figure 1, and using
8 years to evaluate the integral, the same (0.001/yr),
and the same background human error probability the
average PFD is 0.006.
A comparison of the two results yields
PFD 8 years 1.71 PFD3 years.
Intuitively, the difference between the two values of
PFD should be a factor of 2.67 because the test interval
increase from 3 to 8 years is a factor of 2.67. The
inclusion of the human error component accounts for
the lower factor between the two values. As the value
of the human error component approaches zero, the
September 2004

193

Figure 2. Plots of PFD vs. time (years).

factor between the two values of PFD would approach


that obtained by using the common relationship for
PFD (Eq. 3).
PROCESS SAFETY MANAGEMENT QUESTION #1

A business evaluating the prospects of extending


time between maintenance shutdowns of a process,
using relief valve systems to protect against overpressure and loss of containment, should want to know if it
can do so in a way that minimizes the risk (at least from
an overpressure perspective). It should also like to
know what, if anything, all data from previous relief
system testing would say about extending test intervals.
For any given system such as that for Figure 1, the
rst question might be: If I change from a 3-year test
interval to 8 years, what must I conclude from the data
to say the PFD for 8 years is the same as or similar to the
PFD for 3 years?
Figure 2 presents the answer graphically. For the
PFD to be equal for both test intervals, the for the
8-year test interval must improve. Said another way, the
MTTF must increase from 1,000 to 2,667years. This
calculation is shown below:
For PFD8 years equal to PFD3 years, what should be the
value of 8 years?
Using the integrated expression for PFD8 years and
194

September 2004

equating it to the PFD3 years value of 0.0035 produces


the following expression to solve for the required
8 years:
0.0028 0 1/ 28 2 0
0.0035.
8
0.016 32 0.028.
32 0.012.
0.000375.
or, the MTTF 2,667 years.
Of interest here is the same result is obtained using
Eq. 3. The fact is that human error is always constant
through the test interval regardless of the length of the
test interval. The value of the failure rate (or mean time
to failure) will need to change in relation to the change
in test interval. Therefore, to specify the new target
MTTF, Eq. 3 is sufcient.
Process Safety Process (Vol.23, No.3)

PROCESS SAFETY MANAGEMENT QUESTION #2

Will the overall risk of a facility (plant site) change


much if the PFD of the relief systems increases because
of increases in the test interval?
Consider the following situation:
A plant site currently has 500 relief valve systems

in process applications.
The plant leadership feels that any overpressure

event that results in a loss of containment of the


process material is a serious event regardless of
any consequences after the loss of containment.
The plant leadership feel that the chances of an
overpressure event leading to loss of containment
is rather remote but they ask you, the Process
Safety Management Coordinator, how often the
plant might experience a loss of containment
from an overpressure scenario involving a relief
valve system failure.
Rather conveniently, you have just completed an
order of magnitude assessment using an Industry Standard method LOPA (layer of protection
analysis). Your results show the following:
Each relief valve system protects one unique scenario. There are 500 scenarios total.
Of these, one scenario has an expected frequency
of once every 10,000 years. The other 499 scenarios have expected frequencies of once every
1,000,000 years each.
You report this to the plant staff and they con-

clude that the facility is in good shape. They also


conclude that the recent recommendation to increase relief valve test intervals from 1 to 2 years
is OK because it seems that the likelihood of an
overpressure event is still remote. However, they
have some concern about the one scenario with
the one in 10,000 year occurrence. They decide to
conservatively continue the current practice of
testing the relief valve system once per year for
this scenario.
How do you respond?
We should try to present a realistic picture of the risk
of a loss of containment from relief valve system failure.
The tendency is to believe this risk is not a concern
because the 1,000,000-year events are very remote.
However, in total the expected frequency for a loss of
containment is the sum of all these scenarios. The
current state is
499 scenarios with an occurrence of once every

1,000,000 years. Each scenario will occur


0.000001/year. The sum of all these frequencies is
0.000499/yr.
One scenario with an occurrence once every
10,000 years. This scenario occurs 0.0001/yr.
The sum of all 500 frequencies is 0.000599/yr.
Therefore, on average there will be one loss of
containment event every 1,669 years.
This result may be surprising to some. Actually for a
facility this size, this may be an acceptable result. The
point is that in total all the individual frequencies of
Process Safety Progress (Vol.23, No.3)

scenarios of interest are additive when considering the


impact to a large production facility.
Now what happens to the 499 relief valve systems
with the extended test intervals of 2 years?
The simple analysis of PFD for a standby system
where the test interval doubles from 1 year to 2 years
suggests the PFD will double. Because we were using
LOPA for the estimate of scenario frequencies this included the normal practice of assigning a PFD of 0.01
for a properly maintained relief valve system. The PFD
increases to 0.02. Now each of the 1,000,000-year scenarios is a 500,000-year scenario. The impact on the
average expected time to a loss of containment event:
499 scenarios with an occurrence of once every

500,000 years. Each scenario will occur 0.000002/


yr. The sum of all of these frequencies is
0.000998/yr.
The one scenario that occurs 0.0001/yr remains
the same because the test interval and the PFD are
the same.
The sum of all frequencies is 0.001098/yr.
With the new test interval for 499 relief valve
systems there will be a loss of containment event
every 911 years.
Conclusion: It does make a difference. There is an
increase in risk in this simplied example unless we
can conclude from the relief valve test data that the PFD
is the same, or essentially the same.
SUMMARY AND CONCLUSIONS

1. Process Safety Management Question #1 says the


important characteristic of the relief valve system is
the MTTF (mean time to failure) of the relief valve
system. The PFD (probability of failure on demand)
is a calculated value dependent on the inherent
characteristics of the system and the test interval.
The test interval is independent and controllable.
2. Process Safety Management Question #2 suggests
the testing and maintenance intervals for relief valve
systems is a PSM management of change issue. Each
relief valve system should be judged in the context
of the process system or unit operation it protects.
Ultimate risk evaluations need to include the challenge rate to the relief valve system, the PFD of the
relief valve system, and the consequences if these
systems do not work.
PATH FORWARD

If this analysis is correct, current studies and test data


collection protocols should be designed to evaluate the
MTTF of the relief valve systems, not the PFD. Current
thinking is to research the statistics for this type of data
collection and propose guidelines for the evaluation of
relief valve systems.
A vision of the evaluation capabilities would include:
Test data collection process that identies the

current state MTTF of the relief valve system under current test interval conditions.
Species the new MTTF required by the desired
new test interval.
September 2004

195

Provides condence limits for the new MTTF. This

gives a sense of the risk in going to the new test


interval.
Where necessary identies test intervals that are
already too long for the required PFD of the
chemical process.
LITERATURE CITED

1. ABS Group Inc., Course 207: Accident frequency


analysis methods, Section 17, Analysis of Standby
Safety Systems.
2. Center for Chemical Process Safety (CCPS), Layer of

196

September 2004

protection analysissimplied risk assessment,


American Institute of Chemical Engineers, CCPS,
New York, 2001.
3. CCPS, Guidelines for chemical process quantitative risk analysis, Second Edition, American Institute of Chemical Engineers, CCPS, New York,
2000.
4. American Society of Mechanical Engineers (ASME),
Pressure vessels with overpressure protection by
system design, Section VIII, Divisions 1 and 2, ASME
Code Case 2211, The 1995 Boiler Pressure Vessel
Code, ASME, New York, 1995.

Process Safety Process (Vol.23, No.3)

You might also like