Reliabilitycenteredmaintenanceparaslideshare 121129181739 Phpapp01

Reliability
Centered
Maintenance
(RCM)
Evolution of Maintenance
At the very beginning, Maintenance was an appendix

to Operations / Production:
It existed only to fix failures, when they happened.
These were the days of absolute
Corrective Maintenance
As times went by, it was detected that many failures
have an almost regular pattern, failing after an
average period. Therefore, one could choose regular
intervals to fix the equipment BEFORE the failure:
Preventive Maintenance
Also know as Time Based Maintenance.
However, very often these failures happen in irregular
periods. To avoid an unwanted failure, the periods of
Preventive Maintenance are shortened. If equipment
conditions were known, the maintenance could be later.
Technology development enabled to identify failure
symptoms:
Predictive Maintenance
Also know as Condition Based Maintenance.
Many pieces of equipment have sporadic activity (alarms,
stand-by equipments, etc.). However, we must be sure that
they are ready to run. These are "hidden faults. Detect and
prevent hidden failure is called:
Detective Maintenance
The different failure modes mean that theres not
one only approach, about Corrective, Preventive or
Predictive Maintenance Programs.
The correct balance will give in return better
equipment reliability, thus the name:
Reliability Centered Maintenance

Remember, my
kid, Prevention
is better than
Cure....
Take it easy,
grandma, not
always!
Reliability Centered Maintenance (RCM)
John Moubray 1949-2004

After graduating as a mechanical engineer in 1971, John Moubray worked
for two years as a maintenance planner in a packaging plant and for one
year as a commercial field engineer for a major oil company.
In 1974, he joined a large multi-disciplinary management consulting
company. He worked for this company for twelve years, specializing in the
development and implementation of manual and computerized
maintenance management systems for a wide variety of clients in the
mining, manufacturing and electric utility sectors.
He began working on RCM in 1981, and since 1986 was
full time dedicated to RCM, founding Aladon LCC, which
he led until his premature death in 2004.
John Moubray is today considered a synonym of RCM.
Its origins
What about a failure rate of 0.00006/event?

Quite good, no?
This was the average failure rate in commercial flights
takeoffs, in the 50s. Two thirds of them caused by
equipment failures.
Today, this would mean 2 accidents per day, with planes
with more than 100 passengers!!!
Thats why Reliability Centered Maintenance has begun
in the Aeronautical Engineering. Pretty soon, Nuclear
activities, Military, Oil & Gas industries also began to
use RCM concepts and implement them in their
facilities.
Reliability and Availability
Reliability
Reliability is a broad term that focuses on the ability of a product
to perform its intended function. Mathematically speaking,
reliability can be defined as the probability that an item will
continue to perform its intended function without failure for a
specified period of time under stated conditions.
Reliability is a performance expectation.
Its usually defined at design.
Availability
Depends upon Operation uptime and Operating cycle.
Availability is a performance result.
Equipment history will tell us the availability.
Bibliography: Kardec, Alan y Nascif, Julio - Manuteno- Funo Estratgica, Editora Qualitymark
MTBF = Mean Time Between Failures

MTTR = Mean Time To Repair
A first definition:
Availability =
MTBF
MTBF + MTTR
Availability definitions
MTBF = Mean Time Between Failures

MTTR = Mean Time To Repair
MTBM = Mean Time Between Maintenance actions
M = Maintenance Mean Downtime (including preventive
and planned corrective downtime)
Inherent Availability: consider only corrective downtime
Achieved Availability: consider corrective and preventive
maintenance
Operational Availability: ratio of the system uptime and total
time
Inherent Availability =
Achieved Availability =
MTBF
MTBF + MTTR
MTBM
MTBM + M
Uptime
Operational Availability =
Operation Cycle
250 days
360 days
Downtime
120 days
200 days
9d
= 947 days
MTBF = (250 + 360 + 200 + 120) / 4 = 232.5 days

MTTR = (9 + 6 + 2) / 3 = 5.67 days
Availability = 232.5 / (232.5 + 5.67) = 97.62 %
180 days
Downtime
400 days
7
233 days
120 days
4
MTBF = (180 + 400 + 120 + 233) / 4 = 233.25 days

MTTR = (7 + 4 + 3) / 3 = 4.67 days
Availability = 233.25 / (233.25 + 4.67) = 98.04 %
= 947 days
Achieved Availability = MTBM/ (MTBM+M)
To improve Availability:
Improve MTBM:
Reduce Preventive Programs to a minimum, or, have Preventive intervals as well
defined as possible.
Using Predictive techniques whenever possible
Implementing Maintenance Engineering (RCM, TPM...)
Minimize M:
Implementing Maintenance Engineering (Planning, Logistics...)
Improving personnel technical skills (training)
Developing Integrated Planning (Mntce+Ops+HSE+Inspection+...)
Improving Productivity
Productivity Improvement Factors:

Detailed work planning
Delivering equipments to Maintenance as clean as possible
Check-list at the end of Maintenance activities
Complete and comprehensive Equipment data available
Supplies available on job site
Skilled personnel
Availability benchmark
Translating percents to daily routine...
Availability%
Downtimeperyear
Downtimepermonth*
Downtimeperweek
90%
36.5 days
72 hours
16.8 hours
95%
18.25 days
36 hours
8.4 hours
98%
7.30 days
14.4 hours
3.36 hours
99%
3.65 days
7.20 hours
1.68 hours
99.5%
1.83 days
3.60 hours
50.4 min
99.8%
17.52 hours
86.23 min
20.16 min
99.9%("threenines")
8.76 hours
43.2 min
10.1 min
99.95%
4.38 hours
21.56 min
5.04 min
99.99%("fournines")
52.6 min
4.32 min
1.01 min
99.999%("fivenines")
5.26 min
25.9 s
6.05 s
99.9999%("sixnines")
31.5 s
2.59 s
0.605 s
Maintenance Programs costs
Maintenance Program
Cost US$/HP/year
Corrective (unplanned)
17 to 18
Preventive
11 to 13
Predictive / Planned Corrective
NMW Chicago
7 to 9

Benchmarking balance between Mtce programs
Maintenance activities
Corrective actions
28
Preventive actions
36
Predictive actions
19
Maintenance studies
17
NMW Chicago
Definitions
Failure rate ()
Failure rate () is defined as the reciprocal of MTBF:
1
(t )
MTBF
Reliability: R(t)
Let P(t) be the probability of failure between 0 and t; reliability is defined as:
R(t) = 1 P(t)
Bibliography: Lafraia, Joo Ricardo - Manual de Confiabilidade, Mantenabilidade e Disponibilidade, Editora Qualitymark
Some math...
Considering rate failure () constant, it is proven (check at www.weibull.com),

that R(t), meaning the probability of having operated until instant t, is given by:
R (t ) e
This reinforces the idea that Reliability is function of time, it isnt a definite
number. So, its incorrect to affirm: This equipment has a 0.97 reliability
factor.... We should rather say: This equipment has 97% reliability for
running, lets say, 240 days...
Tricks and tips...
Historically, an equipment has 4 failures per year. Which is the

reliability of this equipment for a 100 days run?
=4/365 =0.011/day R(100) = e-0.011x100 = e-1.1 = 0.333 = 33.3%
The probability of having no failure until 100 days is 33.3%
Some upgrades have been made, so failure rate now is 2 per year
(meaning that MTBF has doubled). Which is the reliability for a 100
days run?
=2/365 =0.0055/day R(100) = e-0.0055x100 = e-0.55 = 0.577 = 57.7%
The probability of having no failure until 100 days is 57.7%.
As seen, doubling MTBF doesnt double reliability.
Trick and tips...
Historically, an equipment has a MTBF = 200 days. To improve

10% its reliability to operate on a 100 days run, which percent
should MTBF be improved?
=1/200 =0.005/day R(100) =e-0.005x100 = e-0.5 = 0.607 = 60.7%
To improve this reliability in 10%, new reliability should be:
R(100) = 1.1 x 0.607 = 0.668 = e-x100
Ln 0.668 = - x 100 -0.403 = - x 100 = 0.00403
1/MTBF = 0.0043 MTBF = 232 days
232/200 = 1.16 MTBF should improve 16%
Trick and tips...
As per the manufacturer, an equipment has a 90%

reliability to run over one year. If you want to have a 95%
confidence that it will not fail, how long should it take
until the equipment undergo a Preventive maintenance or
some predictive technique?
0.9 = e-x365 ln 0.9 = - x 365 -0.1054 = - x 365
= 2.89 x 10-4/day
0.95 = e-t ln 0.95 = -t -0.0513 = - 2.89 x 10-4 x t
t = 177.5 days
For practical purposes, this equipment could be in a
semester preventive / predictive program.
Tricks and Tips...
System in series
Let P1=5%, P2=10% and P3=20% be the failure probability of each component of
this system, in a certain period. Which is the reliability of this system, in series?
This system will run, provided that ALL its components run. So, their reliabilities
are multiplied.
R1 = 1 P1 = 1 0.05 = 0.95
R2 = 1 P2 = 1 0.10 = 0.90
R3 = 1 P3 = 1 0.20 = 0.80
R = R1 x R2 x R3 = 0.95 x 0.90 x 0.80 = 0.6840 = 68.4%
System failure probability 31.6%
System failure probability is bigger than each individual component. System
reliability is less than each component.
System in parallel
1
2
3
Let P1=5%, P2=10% and P3=20% be the failure probability of each component of this
system, in parallel, in a given period. Which is the reliability of the system, in parallel?
This system will run until ALL components fail. In this case, the failure probabilities
are multiplied.
P = P1 x P2 x P3 = 0.05 x 0.10 x 0.20 = 0.0010
R = 1 P = 0.999 = 99.9%
System failure probability 0.1%
System failure probability is less than each component. System reliability is bigger
than each component.
Mixed systems
2
4
3
5
If P1=10%, P2=5%, P3=15%, P4=2% and P5=20%, which is the system reliability?
123
45
R1= 1 0.10 = 0.90

R2= 1 0.05 = 0.95
R123 = 0.9 x 0.95 x 0.85 = 0.7268 P 123= 0.2733
R3= 1 - 0.15 = 0.85

R4= 1 0.02 = 0.98
R45 = 0.98 x 0.80 = 0.7840
R5= 1 0.20 = 0.80
System
P123= 0.2733
Psystem = 0.2733 x 0.2160 = 0.0590
P45= 0.2160
Rsystem = 1 0.0590 = 0.941 = 94.1%
P45= 0.2160
Redundancy
A
B
C
The pumps A, B y C are feed pumps of a plant. To

operate in full condition, its necessary that at least
two of these three pumps are running. Failure
probability of each one is 10%. Which is the
reliability to run this plant at full production?
Failure probability is P= 0.1 (10%), and reliability is R=1-0.1= 0.9 (90%)

Three pumps in parallel, so:
(R + P)3 = R3 + 3R2P + 3RP2 + P3= 0.93 + 3x0.92x0.1 + 3x0.9x0.12 + 0.13
(R + P)3 = 0.729 + 0.243 + 0.027 + 0.001
Three running:
0.729
Two running and one off:
0.243
One running and two off:
0.027
None running:
0.001
Reliability = 0.972 = 97.2 %
No full production = 0.028 = 2.8 %
Redundancy
A
B
C
The pumps A, B y C are feed pumps of a plant.

Pump A flow rate is 2,000 gpm, pump B flow rate is
1,800 gpm and pump C flow rate is 1,700 gpm. To
operate, the plant need at least a feed rate of 3,600
gpm. Reliabilities are: RA=0.95, RB=0.90 and
RC=0.85. Which is the plant reliability?
As the plant needs at least 3,600 gpm, to supply this, there will be these cases:
AB C
0.95 x 0.90 x 0.85 =
0.72675
A B notC
0.95 x 0.90 x (1 0.85) =
0.12825
A notB C
0.95 x (1 0.90) x 0.85 =
0.08075
Plant reliability = 0.93575 93.6%
Systems in series
Systems in parallel
System and Component Redundancy
Component Redundancy
System Redundancy
Which of these systems would have a better overall reliability
(lets assume all components have the same reliability R)?
AA and BB subsystems reliability:
AB and AB subsystems reliability:
1 - (1-R)2 =1 1 + 2R R2 = 2R R2
R2
System reliability:
System reliability:
R system redundancy = 1 (1-R2)2
R component redundancy = (2R-R2)2
R system redundancy = 1 1 + 2R2-R4

R system redundancy = 2R2 - R4
R comp red - R syst red = (2R-R2)2 - (2R2 - R4) = 4R2 4R3 + R4 - 2R2 + R4
R comp red - R syst red = 2R4 4R3 + 2R2 = 2R2(R2 2R + 1) = 2R2(R-1)2 0
R comp red R syst red
Active and Passive Redundancy
Active Redundancy:
Passive Redundancy:
Both equipment are

operating at the same
time, sharing the load.
If one fails, the other
one will carry the load
alone.
One equipment is
operating, and the other
one is at stand-by,
starting operating after
the failure of the first
one, pending upon a
switch system.
Getting closer to real world...
In systems with active redundancy all redundant components are in

operation and are sharing the load with the main component. Upon
failure of one component, the surviving components carry the load,
and as a result, the failure rate of the surviving components may be
increased.
The reliability of an active, shared load, parallel system can be
calculated as follows:
where: 1 is the failure rate for each unit when both are working and
2 is the failure rate of the surviving unit when the other one has
failed.
If 21 = 2, then:
In a system with active redundancy, reliability of each of the two components for
100 days is R=0.96, when sharing the load. If one compontents fails, the
surviving one will have a 50% increase in its failure rate. Which is it the system
reliability for 100 days?
R(100) = 0.96 = e-x100 ln 0.96 = -100 1 = 0.00041

2 = 1.5 x 1 = 0.000615
2 0.00041
0.000615100
e 20.00041100
e
2 0.00041 0.000615
R(100) e 0.082 4 e 0.0615 e 0.082
R(100) e 20.00041x100
R(100) 0.9213 4 (0.9404 0.9213)

R(100) 0.9977
If there were no increase in failure rate, system reliability would be 0.9984. Look
like nothing, but this means a 30.5% decrease in system MTBF!!!
The redundant or back-up components in passive or standby systems start

operating only when one or more fail. The back-up components remain dormant
until needed.
For two identical components (primary and back-up) the formula is:
R(t) = e-t (1+t), considering a perfect switch
If the reliability of the switch is less than one, the reliability of the system is
affected by the switching mechanism and is reduced accordingly:
R(t) = e-t (1+Rswt), Rsw switch reliability
The reliability of a standby system consisting of one primary component with
constant failure rate 1 and a backup component with constant failure rate 2 is
given by:
Two feed pumps in a nuclear power plant are connected in a

stand-by mode. One is active and one is on standby. The
power plant will have to shut down if both feed pumps fail. If
the time between failures of each pump has an exponential
distribution with MTBF = 28,000 hours, and the failure rate of
the switching mechanism sw is 10-6 what is the probability that
the power plant will not have to shut down due to a pump
failure in 10,000 hours?
R(t) = e-t (1+Rswt)
R(t) = e-t (1+Rswt),
Switch reliability:
Rsw e10
104
e10 e 0.01 0.9900
= 1/MTBF
R (10000) e
10000
28000
(1 0.9900
R (10000) e 0.3571 (1 0.3536)

R (10000) 0.6997 1.3536
R (10000) 0.9471
1
10000)
28000
Bathtub Curve
Early Life (Burn-in, infant mortality)

large number of new component failures which decreases with time
Useful Life
small number of apparently random failures during working life
( constant)
Wear-out
increasing number of failures with time as components wear out
Bathtub Curve
Early Life:
sub-standard materials
often caused by poor / variable manufacturing and poor
quality control
prevented by effective quality control, burn-in, and run-in, debugging techniques
weak components eventually replaced by good ones
probabilistic treatment less important
Useful Life:
random or chance failures
may be caused by unpredictable sudden stress
accumulations outside and inside of the components beyond
the design strength
over sufficiently long periods frequency of occurrence () is
approximately constant
failure rate used extensively in Safety & Reliability analyses
Wear-out period:
symptom of component ageing
prediction is important for replacement and maintenance
policy
Different bathtub curves
These statistics are from

aeronautical industry. In a
process plant, like a
refinery, do you think the
percent of each one
would be about the
same?
Different bathtub curves
Which of these curves

would be applicable to:
A pump?
An electronic instrument?
A tire?
Failure modes
Common sense tells that the best way to optimize the availability of plants is to
implement some Preventive maintenance.
Preventive maintenance means fixing or replacing some pieces of equipments and/or
components in fixed intervals. Useful lifespan of equipments may be calculated with
Failure Statistical Analysis, enabling Maintenance Department to implement Preventive
Programs.
This is true for some simple pieces of equipment and components, which may have a
prevailing failure mode. Many components in contact with process fluids have a regular
lifespan, as well as cyclic equipment, due to fatigue and corrosion.
But, for many pieces of equipment theres no connection between reliability and time.
Furthermore, as seen in Reliability curves, defining the optimum interval for Preventive
maintenance may be a hard task. Besides, fixing or even replacing the equipment may
bring you back to Infant Mortality period...
Here begins wear-out period.

Failures are likely to happen
Lets define Preventive

maintenance here
Preventive maintenance may cause failures earlier....
Time
The failure likelihood is earlier!!!!
Turnarounds
Turnarounds are often seen by Operations as an unique opportunity to have all

problems solved, all equipment fixed
Meanwhile, for Maintenance, a Turnaround is a huge event, time & resources & costs
consuming, in which ONLY should be done whatever CANNOT be done on the run,
during normal operation.
Frequently, Maintenance is asked to perform General Maintenance in ALL rotating
equipment of a Unit, during its Turnaround. Matter of fact, if these equipment have
spares, this General Maintenance should be done out of the TAR.
Why do Operations want everything to be done during the TAR?
1) Because Ops dont have enough confidence that it will be done during routine
maintenance.
2) Because they dont feel comfortable running with an equipment momentarily without
spare the same way when we have a flat tire, we just drive with the spare tire
enough to hit the tire repair shop
Turnarounds
1) Ops dont have enough confidence that it will be done during routine maintenance.
To improve TAR results, reversing the vicious cycle below, Maintenance
management has to improve Routine Maintenance!
To much to
be done
during TAR
Many
equipments
left to TAR
Many
equipments
left to
Routine
Maintenance
Not in excess
equipments to
be done
during TAR
TAR wont be
able to
perform all
that has to be
done
TAR will carry

out all services
needed
Good routine
maintenance
Unit running
well
Turnarounds
2) Because they dont feel comfortable running with an equipment momentarily without
spare the same way when we have a flat tire, we just drive with the spare tire
enough to hit the tire repair shop
Consider these two pumps in a Passive Redundancy
(one will be as stand-by). Assume that during the first
100 h after a General Maintenance such a pump will
have a 70% reliability, and after this, for an one year
period, it would run with 97% reliability (which are
reasonable assumptions!!!).
If General Maintenance is performed in a Preventive or Predictive Program, during
normal operations, during repair time the unit will be running pending upon a unique
pump, with a 97% reliability.
If during TAR both pumps will be under General Maintenance, during the first 100
hours the system reliability (considering a perfect switch) would be 94.5% (using the
R(t) = e-t(1+t) formula) . So, the unit would run for a period of time with two
available pumps, but with an overall reliability below if it would be running with only
one pump!

RCM Implementation Flowchart
Will the failure affect
directly Health, Safety or
Environment?
No
Will the Failure affect
adversely the Mission, Vision No
and Core Values of the
Company?
Yes
Yes
Is there some Costeffective Monitoring

Technology available?
Yes
No
Will the failure cause

major economic losses?
(harm to systems and / or
machines)?
No
Yes
Deploy Monitoring
techniques
Are there regular failure

patterns (time
intervals)?
No
Yes
Predictive Maintenance
Preventive
Maintenance
Re-design the system,

accept failure risk, or
install redundancy
Run-to-fail?

Another RCM Implementation Flowchart
If this thing breaks will it

be noticed?
No
Prevent it
breaking

No
hurt someone or the
environment?

slow or stop production?
No
Can preventing it break

reduce the reduce the
risk to the environment
and safety?
Yes
Check to see Prevent it

if it is broken breaking
No
No
Yes
Yes
Can preventing it break

reduce the likelihood of
multiple failures?
Yes
Yes
Is it cheaper to prevent it
breaking than the loss of
production?
Yes
Re-design it Prevent it
breaking
No
Let it break
Is it cheaper to prevent
it breaking than to fix it?
Yes
Prevent it
breaking
No
Let it break

Reliabilitycenteredmaintenanceparaslideshare 121129181739 Phpapp01

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Reliabilitycenteredmaintenanceparaslideshare 121129181739 Phpapp01

Uploaded by

Copyright:

Available Formats

Reliability

At the very beginning, Maintenance was an appendix

Reliability Centered Maintenance

Reliability Centered Maintenance (RCM)

John Moubray 1949-2004

Reliability Centered Maintenance (RCM)

What about a failure rate of 0.00006/event?

Reliability Centered Maintenance (RCM)

Reliability and Availability

Reliability Centered Maintenance (RCM)

Reliability and Availability

MTBF = Mean Time Between Failures

Reliability Centered Maintenance (RCM)

MTBF = Mean Time Between Failures

Reliability Centered Maintenance (RCM)

Reliability and Availability

MTBF = (250 + 360 + 200 + 120) / 4 = 232.5 days

MTBF = (180 + 400 + 120 + 233) / 4 = 233.25 days

Reliability Centered Maintenance (RCM)

Reliability and Availability

Achieved Availability = MTBM/ (MTBM+M)

Reliability Centered Maintenance (RCM)

Productivity Improvement Factors:

Reliability Centered Maintenance (RCM)

Reliability Centered Maintenance (RCM)

Translating percents to daily routine...

Reliability Centered Maintenance (RCM)

Maintenance Programs costs

Predictive / Planned Corrective

Reliability Centered Maintenance (RCM)

Reliability Centered Maintenance (RCM)

Reliability Centered Maintenance (RCM)

Considering rate failure () constant, it is proven (check at www.weibull.com),

Reliability Centered Maintenance (RCM)

Tricks and tips...

Historically, an equipment has 4 failures per year. Which is the

Reliability Centered Maintenance (RCM)

Trick and tips...

Historically, an equipment has a MTBF = 200 days. To improve

Reliability Centered Maintenance (RCM)

Trick and tips...

As per the manufacturer, an equipment has a 90%

Reliability Centered Maintenance (RCM)

Tricks and Tips...

Reliability Centered Maintenance (RCM)

Reliability Centered Maintenance (RCM)

Reliability Centered Maintenance (RCM)

R1= 1 0.10 = 0.90

R123 = 0.9 x 0.95 x 0.85 = 0.7268 P 123= 0.2733

R3= 1 - 0.15 = 0.85

R45 = 0.98 x 0.80 = 0.7840

R5= 1 0.20 = 0.80

Psystem = 0.2733 x 0.2160 = 0.0590

Rsystem = 1 0.0590 = 0.941 = 94.1%

Reliability Centered Maintenance (RCM)

The pumps A, B y C are feed pumps of a plant. To

Failure probability is P= 0.1 (10%), and reliability is R=1-0.1= 0.9 (90%)

Two running and one off:

One running and two off:

Reliability = 0.972 = 97.2 %

No full production = 0.028 = 2.8 %

Reliability Centered Maintenance (RCM)

The pumps A, B y C are feed pumps of a plant.