You are on page 1of 69

RELI ABI LI TY ENGI NEERI NG UNI T

ASST4403
Lec t ur e 27-28 AVAI LABI LI TY MODELLI NG
L i Learning outcomes
E l i th f k f il bilit l t d t Explain the framework of availability related to
reliability, maintainability and maintenance
Interpret and analyse different times for availability
and downtime
Understand mathematical basis for availability
Apply some most commonly applied availability Apply some most commonly applied availability
measures
A ti l t th t il bilit t Articulate the system availability assessment
methods
Predict availability of simple systems
Framework of availability related to Framework of availability related to
reliability, maintainability and
maintenance maintenance
Dependability, framework of reliability,
availability, maintainability etc y, y
AS IEC 60300.12004
S d fi iti Some definitions
Dependability is a collective termused to describe the Dependability is a collective term used to describe the
availability performance and its influencing factors:
reliability performance, maintainability performance and
maintenance support performance
Maintainability performance is the ability of an itemunder Maintainability performance is the ability of an item under
given conditions of use, to be retained in, or restored to a
state in which it can perform a required function, when
maintenance is performed under given conditions and
using stated procedures and resources
Maintenance support performance is the ability of a
maintenance organization, under given conditions, to
provide upon demand, the resources required to
maintain an item, under a given maintenance policy
Availability (of repairable or maintained
systems) definition y )
The ability of an itemto be in a state to performa The ability of an item to be in a state to perform a
required function under given conditions at a given
instant of time or over a given time interval, assuming g , g
that the required external resources are provided
Probability that an itemwill be available when required Probability that an item will be available when required
Proportion of total time that the item is available for use
Review of Reliability
t
Review of Reliability
MTBF
t
e t R

= ) (
years t 30 =
Review of Reliability Review of Reliability
Review of Reliability
System A
PODA
Modem POD A
Ring Pair A1
POD A
0.659
0.950
0.950
0.950 0.942
0.741 0.861
0.861
0.861
0.659
0.942
Ring Pair A1
0.741 0.950
Ring Pair A2
0.950 0.741 0.861 0.659
0.659
Step 1
0.950 0.942 0.861
SCM
(16
Wires)
J unction
(incl
Conrs)
System B
Step 1
0.950 0.861 0.942
Modem POD B
POD B
0.659
0 950 0 741 0 861 0 659
0.950
0.950 0.942
0.861
0.861
0.942
Ring Pair B1
0.741
0.950 0.741 0.861
0.950
Ring Pair B2
0.950 0.741 0.861
0.659
0.659
0 659
0.950 0.942 0.861
0.659
0.950 0.861 0.942
Review of Reliability
Review of Reliability
Review of Reliability
S t
Step 6
System
0.689
SCM
(16
Wires)
J unction
(incl
Conrs)
Reliability and Confidence
Reliability is the probability, at a specified confidence level, that a device
or system will perform its intend function for a given interval of time o syste pe o ts te d u cto o a g e te a o t e
under specified operating conditions.
What is the relationship between confidence and reliability?
Eg Haul Pack Operational Capability
Task Haul Pack required to travel into operational area return and dump its load
Success Criteria - Haul Pack successfully travels to the AO return and dump its load
Mission Phases:
S
Haul Pack
Haul Pack
Available
Start
Haul Pack
Haul Pack
Transits to
&fromAO
Haul Pack
Dumps Load
& from AO
Eg Haul Pack Operational Capability
Pr(HPA) = Probability that HP is Available = 0.7
Pr(HPS) = Probability that HP Starts = 0.95
Pr(HPT) = Probability that HP Transits = 0.9
Pr(HPD) = Probability that HP Dumps Load = 0.8
HP
Transits to &
HP
Available
HP
Starts
HP
Dumps Load
from AO
(HPT)
Available
(HPA)
Starts
(HPS)
Dumps Load
(HPD)
Pr(Mission Success) = Pr(HPA) x Pr(HPS) x Pr(HPT) x Pr(HPD)
Eg - HP Operational Capability
Pr(HPA) = Probability that HP is Available = 0.7
Pr(HPS) = Probability that HP Starts = 0.95
P (HPT) P b bilit th t HP T it t AO 0 9 Pr(HPT) = Probability that HP Transits to AO = 0.9
Pr(HPD) = Probability that HP Dumps Load = 0.8
Pr(HPA) Pr(HPT) Pr(HPT)Pr(HPG) Pr(Mission Success)
0.700 0.950 0.900 0.800 0.479
0000 0.000
0.000
Pr(Mission Success) = Pr(HPA) x Pr(HPS) x Pr(HPT) x Pr(HPD)
0 479 =0.479
Eg - HP Operational Capability
What if the probabilities of each event are increased?
What is the impact on the mission success ?
Pr(HPA) = Probability that HP is Available from 0.7 to 0.8
Pr(HPS) = Probability that HP Starts from 0.95 to 0.975
Pr(HPT) =Probability that HP Transits from0.9 to 0.95 Pr(HPT) Probability that HP Transits from 0.9 to 0.95
Pr(HPD) = Probability that HP Dumps from 0.8 to 0.9
Eg - HP Operational Capability
Pr(HPA) = Probability that HP is Available = 0.8
Pr(HPS) = Probability that HP Starts = 0.975
Pr(HPT) = Probability that HP Transits = 0.95
Pr(HPD) = Probability that HP Dumps = 0.9
Pr(HPA) Pr(HPT) Pr(HPT)Pr(HPG) Pr(Mission Success)
0.700 0.950 0.900 0.800 0.479
0800 0975 0950 0900 0667 0.800 0.975 0.950 0.900 0.667
0.000
Pr(Mission Success) = Pr(HPA) x Pr(HPS) x Pr(HPT) x Pr(HPD)
0667 = .0667
Eg - HP Operational Capability
Case 1 - 47.9% of missions succeed
Case 2 - 66.7% of missions succeed
Pr(HPA) Pr(HPT) Pr(HPT)Pr(HPG) Pr(Mission Success)
0.700 0.950 0.900 0.800 0.479
0800 0975 0950 0900 0667 0.800 0.975 0.950 0.900 0.667
Increase in Capability 39.29%
Eg - HP Operational Capability
Question?
What is the effect of Improved Reliability and Maintainability on Cost and
Capability Capability
Data:
Require Haul Packs for four (4) Mine Sites
12 HPs are required to be available per Mine Site 12 HPs are required to be available per Mine Site
Each HP costs $2 million
Eg - HP Operational Capability
MTBF = 10 hours
MTTR = 5 hours
Inherent Availability = MTBF/(MTBF+MTTR)
Inherent Availability of Each HP = 67%
HPs required for each Mine Site =12 / 0 67 =18 HPs required for each Mine Site = 12 / 0.67 = 18
Total Cost of Task Forces
No of HPs x #per Mine Site x Unit Cost
=18 x 4 x $2M 18 x 4 x $2M
= $144M
Eg - HP Operational Capability
MTBF = 20 hours (double original Baseline)
MTTR = 5 hours (same as original Baseline)
Inherent Availability of Each HP = 80%
HPs required for each Mine Site = 12/ 0.8 = 15
Total Cost of Task Forces Total Cost of Task Forces
No of HPs x #per Mine Site x Unit Cost
=15 x 4 x $2M
= $120M
l b l Eg - HP Operational Capability
MTBF = 20 hours (double original Baseline)
MTTR = 2.5 hours (half the original Baseline)
Inherent Availability of Each HP = 88.8%
HPs required for each Mine Site = 12/ 0.888 = 14
Total Cost of Task Forces Total Cost of Task Forces
No of HPs #per Mine Site x Unit Cost
=14 x 4 x $2M
= $112M $
E HP O ti l C bilit Eg - HP Operational Capability
Capability Cost Comparison
B li C 1 C 2 Baseline
System
Case 1 Case 2
MTBF(hrs) 10 20 20
D t
MTBF (hrs) 10 20 20
MTTR (hrs) 5 5 2.5
Availability 67% 80% 88.8%
N f HP 18 15 14
Data
Capability
No of HPs 18 15 14
$144M $120M $112M COST($M)
Capability
A $32M saving and this only includes procurement costs
Types of availability to be discussed
Inherent availability, A
i
Achieved availability, A
a
Operational availability A Operational availability, A
o
Different times for availability and downtime y
What is time
All approaches to availability are time related
TT =total time calendar time period TT = total time, calendar time period
OT=operating time per given total calendar time
ST=standby time (not operating but assumed operable)
TCMT=Total corrective maintenance time
TPMT=TCMT=Total preventive maintenance time
TALDT=Total administrative &logistics delay time TALDT=Total administrative & logistics delay time
Adapted from the Defence Reliability Management Course, 2/2005
Breakdown of downtime
Supply delay: total delay time in
obtaining necessary spare parts or Influenced by obtaining necessary spare parts or
components for the repair
Influenced by
external factors,
not part of the
Maintenance delay: time spent waiting
for maintenance resources or facilities
not part of the
system
I h i
Repair time: sum of the following
Access time
Inherent repair
time
Diagnosis
Repair or replacement
Validation and alignment Validation and alignment
Factors influencing downtime
Main factors are equipment design and maintenance
philosophy philosophy
Active repair is determined by the design
Passive elements are governed by maintenance Passive elements are governed by maintenance
philosophy
Key design areas: access, adjustment, built-in test Key design areas: access, adjustment, built in test
equipment, display & indicators, handling & ergonomics,
Interchangeability, least replaceable assembly (LRA),
mounting, redundancy, test points
Maintenance strategies g
Some mathematical basics for availability
A note before we head on
Some of the slides that follow in this topic
contain quite a few mathematic expressions
and formulas. These are intended from the
author to be reference material for the ease
of the participants
Different measures of availability Different measures of availability
The (point) availability of systemat time t is that the The (point) availability of system at time t is that the
systems is working at time t
) ki i ( ) ( P A
The (average) interval or mission availability in the time
) at working is system ( ) ( t P t A =
interval (t
1
, t
2
) is
}

=
2
1
) (
1
) , (
2 1
t
t
av
dt t A
t t
t t A
which can be interpreted as the mean proportion of time in
the interval where the systemis able to function When
1
1 2
t t
the interval where the system is able to function. When
t
1
=0, t
2
=t, we have
1
}
=
t
t
t
0
) (
1
) ( dt t A A
av
Different measures of availability Different measures of availability
The long run availability of system is
}

=
t
t
t
0
) (
1
lim
dt t A A
av
which can be interpreted as the average proportion of a
long period of time where the system is able to function
The limiting or steady-state availability is , when the limit
exists,
) ( lim t A A =
Note A =A
) (
t
priod Mission
downtime unplanned total mean downtime planned total mean
1
+
=
op
A
Note A
av
= A
The operational availability is the mean proportion of a
p
mission period the system is able to perform its intended
function
Long run average availability
A failed item is replaced to an as good as new
condition condition
Up-times T
1
, T
2
, , T
n
are independent and identically
distributed (iid) with mean time to failure MTTF
Down-times D
1
, D
2
, , D
n
are independent and
1
,
2
, ,
n
p
identically distributed (iid) with mean downtime MDT
As n we have the long run average availability As n , we have the long run average availability
MDT MTTF
MTTF
D E T E
T E
A
av
= =
) ( ) (
) (
MDT MTTF D E T E
av
+ + ) ( ) (
Inherent availability
MTTR MTTF
MTTF
t A A
t
inh
+
= =

) ( lim
MTTR MTTF
t
+

Inherent availability is based solely on the failure distribution and
repair time distribution (less than downtime) repair-time distribution (less than downtime)
Equipment design parameter, based on which trade-offs between
reliability and maintainability can be made
Exponential availability model of a Exponential availability model of a
component
Assuming constant failure rate (exponential time
t f il ) d t t i t ( ti l to failure) and constant repair rate (exponential
time to repair), where =1/(MTTR)
MTTF
Steady-state availability
MTTR MTTF
MTTF
A
+
=
+
=


Instantaneous availability
t
e A
) (




+
+
+
+
=
+ +
S l li d il bili Some most commonly applied availability
measures
If we are only concerned with corrective
maintenance
Adapted from the Defence Reliability Management Course, 2/2005
Inherent availabilityy
A
i
is availability when we are only concerned with
corrective maintenance assuming no preventive corrective maintenance, assuming no preventive
maintenance, no administrative & logistic delay time
OT MTBF
TCM OT
OT
A
MTTR MTBF
MTBF
A
i i
+
=
+
= time of in terms or
where OT=operating time, TCM=total corrective
maintenance
A
i
is primarily a function of design
Adapted from the Defence Reliability Management Course, 2/2005
Example of inherent system availability p y y
Assume the system had been running for two years
and o had been monitoring the fail res and you had been monitoring the failures.
If you had 20 failures the MTBF would be?
What would the Inherent Availability be if the mean
time to repair was 4 hours?
MTBF
A =
A =
MTTR MTBF
A
inh
+
=
A
inh
= ____
Reproduced with courtesy from Mark Mackenzie
If we are concerned with both corrective and
preventive maintenance p
Adapted from the Defence Reliability Management Course, 2/2005
Achieved availability
A
a
is availability when we are concerned with both
corrective maintenance assuming no administrative & corrective maintenance, assuming no administrative &
logistic delay time
OT MTBM
TPM TCM OT
OT
A
MMT MTBM
MTBM
A
a a
+ +
=
+
= time of in terms or
where
MTBM=mean time between maintenance
MMT=mean maintenance time
OT=operating time,
TCM=total corrective maintenance TCM=total corrective maintenance
TPM=total preventive maintenance
A
a
is now both a function of design and preventive
a
g p
maintenance (may also be partly a function of design)
Adapted from the Defence Reliability Management Course, 2/2005
Example of achieved availability
A generator runs non-stop for 3 months and fails 3 times.
Total corrective maintenance =25 hrs and is serviced Total corrective maintenance = 25 hrs and is serviced
once which takes 5 hrs.
The achie ed a ailabilit is The achieved availability is
% 7 98
24 31 3 OT
A % 7 . 98
5 25 24 31 3
=
+ +
=
+ +
=
TPM TCM OT
A
a
Adapted from the Defence Reliability Management Course, 2/2005
Is more frequent preventive maintenance
better for availability? better for availability?
C.E. Ebeling (1997), Introduction to reliability and maintainability engineering, McGraw-Hill, Boston
If we also take into account If we also take into account
administration and logistics
Adapted from the Defence Reliability Management Course, 2/2005
Operational availability
A
o
is availability when we are concerned with corrective
& preventive maintenance, also administrative & logistic
delay time
MTBM
time of in terms or
MALDT MMT MTBM
MTBM
A
o
+ +
=
OT
A =
where
MTBM=mean time between maintenance
TALDT TPM TCM OT
A
o
+ + +
=
MMT=mean maintenance time
MALDT,TALDT = mean/total adm. and logistic delay time
OT=operating time OT=operating time,
TCM, TPM=total corrective/preventive maintenance
A
o
measures nowboth the design and the A
o
measures now both the design and the
organisational effectiveness
Adapted from the Defence Reliability Management Course, 2/2005
What do the factors mean for operational
availabilityy
What can we do about each of these?
Adapted from the Defence Reliability Management Course, 2/2005
Example of system availability Example of system availability
Is an inherent availability of 99% good enough? Is an inherent availability of 99% good enough?
What about the administrative logistics down time
(ALDT) Wh t if th l d ti f t (ALDT), What if the lead time for pump parts was one
week? mean down time (MDT) = 172hrs
h f h What if there was preventative maintenance or
scheduled maintenance? MTBM = 504hrs
MTBM
A = A =
MDT MTBM
A
o
+
= A
o
=
____
Reproduced with courtesy from Mark Mackenzie
System availability assessment methods System availability assessment methods
Most common approaches
RBD
FTA
Markov methods Markov methods
Flow networks
Petri Nets
M t C l i l ti Monte Carlo simulation
Availability modeling of simple systems Availability modeling of simple systems
Series systems
Assuming constant failure
rate (exponential time to
R
1
R
2
( p
failure) and constant repair
rate (exponential time to
repair), where =1/(MTTR) repair), where 1/(MTTR)
Steady-state availability
2 1 1 2 2 1 2 1
2 1


+ + +
= A
Generally for n components
[
+
=
n
i
i i
i
A
1


=
+
i
i i
1

Parallel systems
Assuming constant failure
rate (exponential time to
R
1
( p
failure) and constant repair
rate (exponential time to
repair), where =1/(MTTR)
2
2
R
2
repair), where 1/(MTTR)
Steady-state availability *)
2 2
2
2 2
2


+ +
+
= A
Generally for n components
[
+
=
n
i
i i
i
A
1
1


=
+
i
i i
1

*) Assuming = = Series repair i e single repair ) Assuming
1
=
2
= . Series repair, i.e. single repair
team is assumed
Standby systems
Unit 1
Assuming constant failure
rate (exponential time to
Sensor
( p
failure) and constant repair
rate (exponential time to
repair), where =1/(MTTR)
2

Unit 2
repair), where 1/(MTTR)
Steady-state availability *)
2 2
2


+ +
+
= A
*) Assuming = = ) Assuming
1
=
2
= .
K-out-of-n systems
Unit 1
K out of n systems
Assuming constant failure
rate (exponential time to
failure) and constant repair
Unit 2
failure) and constant repair
rate (exponential time to
repair), where =1/(MTTR)
Unit 3
Steady-state availability *),
e.g. n=3, k=2
3 2 2 3
2 2 3
6 6 3
6 3


+ + +
+ +
= A
6 6 3 + + +
k
n

|
|

|
1
1
For general n and k
i n i
i
n
i
n
A

=

|
|
.
|

\
|
+
=

0
) (
1
1
*) Assuming
1
=
2
=
3
= .
Example: ship missile system
One radar is in standby All units have 2 hrs repair time One radar is in standby. All units have 2 hrs repair time
Q: find steady-state availability of the system excluding the
f missiles and disregard switching failure
Example: ship missile system p p y
A: Availability of the radar system is
999996 . 0
001 0 001 0 5 0 5 0
001 . 0 5 . 0 5 . 0
2 2
2
2 2
2
=
+ +
+
=
+ +
+
=


radar
A
The availability of the launch and guidance system
001 . 0 001 . 0 5 . 0 5 . 0 + + + +
y g y
9974 . 0
0013 . 0 5 . 0
5 . 0
=
+
=
+
=

LG
A
The system availability is
997396 0 9974 0 999996 0 A A A 997396 . 0 9974 . 0 999996 . 0 = = =
LG radar sys
A A A
Monte Carlo simulation
Benefits
The designer can be confident that the system has
ifi d li bilit f th d ift f t specified reliability for the drift of component
characteristics, provided all the analytical results are
inside specifications; inside specifications;
It is suitable for computerized design;
Any probability distribution is simulated;
Simulated results are usually near to optimum; Simulated results are usually near to optimum;
No complex mathematical treatments are needed.
An advantage with Monte Carlo simulation is that the
events in the RBD do not have to be combined
analytically since the simulation itself takes into account
whether each block is failed or functional
Key elements
Identification of the probability distribution for each
design parameter; design parameter;
Identification of random variable generation for design
parameters based on the gi en probabilit distrib tion b parameters based on the given probability distribution by
computer;
Identification of the probability distribution, its mean and
variance of system performance by simulation.
Limitations
Mathematical models for simulation are required;
All the system components need to be included in
order to obtain reasonable analytical results;
A large number of replicas of the system are
simulated. simulated.
Example:
failure and failure and
repair logic
for a typical
system
Simulation results of depot stock levels
Markov model Markov model
A type of state-space analysis technique
To determine systemavailability performance with To determine system availability performance with
probability of state transitions from failed state to
operating state and vice versa p g
A component in a system is assumed to be in either
failed or functioning state failed or functioning state
Probability of failure and of returning to an available state
f i t t are of interests
Particularly useful to maintained systems for which RBD
can be not directly applicable
Example (1-out-of-2 active redundant
t ) AS IEC 6 6 system) AS IEC 61165
When the two components
are identical
The solution
The
instantaneous instantaneous
availability, A
S0
(t
) is )
The unavailability y
for some specific
and is shown
SUMMARY
Inherent
Achieved
Operational Operational