You are on page 1of 17

1

Mulugeta D. Abera

8. RELIABILITY, MAINTAINABILITY AND AVAILABILITY


In discussing the concepts of reliability and maintainability two terms must be understood
precisely. These are: -

Failure: - defined as non-conformance to some defined performance criterion.


Quality: - defined as conformance to specification.

Based on the definition of failure and quality, we can now in a position to define the two most
important terms in maintenance.
Reliability: - is defined as the probability that an item will perform a required function under
stated conditions for a stated period of time.
Maintainability: - is defined as the probability that a failed item will be restored to operational
effectiveness within a given period of time when the repair actions is performed in accordance
with prescribed procedures. It is also defined as the probability of repair in a given time.

Why do we need to talk of reliability and maintainability? The following are among the reasons.
a) Reliability
- Determines frequency of repair
- Fixes spares requirements
- Determines loss of revenue/customer satisfaction
b) Maintainability
- Affects training, test equipment, manpower

8.1 - REASONS FOR INTEREST IN THE CONCEPTS OF RELIABILITY AND


MAINTAINABILITY (R AND M)
The major reason for the interest in R and M are the following: -
Complexity: - As a machinery/equipment/instrument become complex, the cause of failure and
its failure is the more intrinsic. These failures may not result from clearly definable cause of
failure of a component part. The likely causes of failure can be from: -
- A combination of drift conditions or
- Unforeseen characteristics of components
2
Mulugeta D. Abera

Hence failures are more difficult to diagnose and are less likely to be foreseen by the designer.
Therefore reliability and maintainability concepts should be incorporated to lessen failure
occurrences and enhance system performance.
Mass production: - this requires very high degree of control over material procurement,
manufacture and assembly; engineering changes etc. along with the labor involved, these items
require sophisticated systems of control and good quality assurance techniques to prevent
manufacturing related failures.
Cost and Tolerances: - a product is designed to meet a production cost objective which imposes
sever restrictions. These restrictions in turn lead to the calculation of tolerance margins which
must satisfy the requirements. Thus the probability of tolerance related failure in the field is
increased.
Maintenance: - Field diagnosis and repair costs are much greater than those incurred in the
factory. As a result reductions in failure rate and repair time justify a reasonable investment,
hence the interest in reliability and maintainability.
These major reasons put together make reliability and maintainability factors which have to be
considered properly during the design, manufacture and operation time.

8.2 - ACTIVITIES INVOLVED ACHIEVING GOOD RELIABILITY AND


MAINTAINABILITY
There are three foremost activities that results good R and M of a system
Design stage: - At this stage a lot of parameters need to be assembled to introduce reliability and
maintainability features to the system. During this stage the following need to be considered
adequately in order to introduce good R and M.
- Reduce complexity
- Use of standard and proven methods
- Duplication of modules to increase fault tolerance
- Derating practice of using components of higher stress rating than minimum
requirement
- Prototype testing
- Subsequent feedback of information into the design
3
Mulugeta D. Abera

Manufacture stage: - the following considerations have direct effect on failure rate and should
be well accounted for during manufacture
- Control of materials, methods etc
- Control of work standards
- Assembling and commissioning procedures
Operation stage: - In this stage the following guidelines should be observed carefully
- Following adequate operating and maintenance instructions
- Use of preventive maintenance
- Feedback of accurate failure information to design and manufacture
A wide spectrum of engineering and management activities are involves in achieving good R and
M. It should be noted that R and M cannot be added after the design and manufacture stages by
inspection and testing. Good reliability and maintainability features should be incorporated in the
design and manufacturing stages. No amount of calculations, inspection, prediction etc would
ever enhance reliability and maintainability above the design stage.
The activities undertaken to introduce good R and M during design, manufacturing and operation
periods are interdependent as shown in the figure below.
As can be observed, the design reliability is never achieved in practice because activities of
manufacture, operation and serviceability are always less than the theoretical design.

Figure 8.1, shows interdependent


4
Mulugeta D. Abera

8.3 - INTER-DEPENDENCE OF RELIABILITY AND MAINTAINABILITY


Reliability and Maintainability are interdependent for three basic reasons
1. Design and assurance activities required to achieve R and M in many cases are the same
2. Maintainability is a parameter that greatly contributes to the reliability of a system. For
example a system whose reliability is partly dependent on a degree of duplication
(redundancy) is more reliable if the repair time (i.e. maintainability) of the failed
redundant unit is improved.
3. Both R and M contribute to the overall availability of the system. Availability is achieved
by a combination of the two parameters and a tradeoff between them.
The costs of R and M the relation between these costs along with the cost of maintenance, from a
complex interaction about which it is difficult to generalize. Money spent on maintainability
reduces repair time which in turn enhances reliability. Improved reliability reduces maintenance
costs whereas money spent on preventive maintenance may enhance reliability.
This interdependence between reliability, maintainability and maintenance is shown in the
diagram below.

Figure 8.2 Interrelationship of system effectiveness and costs


5
Mulugeta D. Abera

8.4 - RELIABILITY
With increasing complexity of equipment, the consequences of failure have become more
expensive. While the repair or replacement of faulty equipment may involve unexpected costs,
its unavailability when needed may have even more serious consequences not to talk of the
potentially catastrophic behavior of failure. This has led to the concept of reliability.
Reliability is defined as the probability that a device will perform its intended function for a
specified period of time under stated conditions. The terms used in this definition need some
attention.
- The term intended function used to describe equipment performance makes it
possible to identify what constitutes non-conformance (failure) of the equipment.
- Performance under stated conditions refers to operational and environmental
conditions or stresses that the equipment may experience during its useful lifetime.
Operational conditions vary from one piece of equipment to another, so it is important
that these conditions be fully specified.
- The definition of reliability involves a time constraint which is not unusual. No
product lasts forever; therefore its reliability under fully specified conditions of use
should be defined in terms of time.
Careful considerations of reliability and maintainability factors at the design stage help in
predicting the expected life of a facility/plant, the availability of a facility/plant and the expected
maintenance workload.

8.5 - FAILURE AND FAILURE FEATURES


A comprehensive definition of failure is non-conformance to some defined performance. What
may constitute non-conformance in one case may not be considered as failure in another
situation that leads us to careful consideration of failure.
For different non-conformance condition words like defect, malfunction, fault, failure, reject
should be defined. The definitions of these terms include and exclude failures by cause, degree
or use. Given a specific definition of failure, there is no ambiguity in the definition of quality and
reliability.
The classifications of failure are indicated as: -
By cause
6
Mulugeta D. Abera

- Production related failures


- Stress related failures
- Misuse failure
- Interest weakness failure
- Wear-out failure
- Maintenance induced failure
By suddenness
- Immediate failure
- Gradual degradation failure
By degree
- Catastrophic failure
- Intermittent failure
- Partial failure
By result
- Critical failure
- Major failure
- Minor failure
Be definition
- Applicable to the specification
- Not applicable
If the reliability of a system is to be determined, it is necessary that we determine the failure rate
of the item. And the definition of failure mode totally determines the system reliability and
dictates the data required at the component level.

8.6 - FUNCTIONAL FAILURE


To be functional equipment, equipment has to fulfill all the functions set by the user and must
also satisfy the performance standard. The function can be divided into two categories such as
primary and secondary functions.
Primary functions: - That includes the functions such as speed, output, product quality etc
which are the functions why the equipment was procured. The primary functions are the main
reasons for the existence of the equipment.
7
Mulugeta D. Abera

Secondary functions: - which includes functions like safety, control, operational efficiency,
compliance to environmental regulations etc.. These are the functions which the equipment is
expected to fulfill in addition to the primary functions.

The incidences that are the possible causes for the equipment not up to the performance standard;
are expect to be in some kind of failure. When the equipment is unable to fulfill a function to a
standard of operation which is set by the user, the equipment is in a state of failure or there is a
functional failure.
The performance standard used to define a functional failure set by the user. And to determine
this performance standard, there should be a common understanding among various people, not
only maintenance personnel. Who should set the standard is a question for many to participate.
For example, consider a leak problem in a hydraulic system. The safety officer has his own
definition of failure, so also the maintenance man and the equipment operator/production
manager. This is shown diagrammatically below. When do we say the system has failed? It is
when due to leakage there is a pool of oil formed around the machine/equipment? Or is it when
the consumption of oil increase or the equipment stops functioning properly? To answer this
question a standard of performance has to be set logically.

Figure 8.3 Performance standards related to oil leakage

Points to be considered in setting performance standards (PS)


- PS must be clearly established before the failure occurs
8
Mulugeta D. Abera

- PS used to define failure must be set by operation (production) maintenance


personnel working together.
- PS defines proactive responses required to avoid failure
8.7 - FAILURE MODE
Failure mode is any event which is likely to cause a functional failure of equipment. The failure
mode shows the connection between the failed state and the events which cause them. There are
three types of failure modes.
Falling capacity: - when the capacity of the equipment falls below the desired
performance/capacity, it is called falling capacity of equipment. The main causes for decrease in
capacities are:
- Deterioration due to wear and tear
- Dirt
- Disassembly (falling apart)
- Human error
Increase in desired performance: - when desired performance rises above initial capacity of the
equipment, there is failure of equipment. The main causes for increase in desired performance
are:
- Sustained, deliberate overloading
- Sustained, unintentional overloading
- Sudden, unintentional overloading
- Incorrect process materials which are out of the specifications
Initial incapacity: - Equipments incapable of doing what is expected to do from the outset is
called initial incapacity and the equipment is unfit for operation.
8.8 - FAILURE EFFECTS
This designates the effects of failure, when a failure mode occurs. The following must be
considered to describe effects of failure
Evidence of failure:
Is the failure obvious to operating crew?
Is the failure accompanied by evident physical effects?
Does the equipment stop functioning due to failure?
Safety and environment hazards:
9
Mulugeta D. Abera

Is it possible that someone could get hurt?


Is there environmental regulations and standards breach?
Production effects:
Is process/service stoppage caused?
How is production/service affected?
How long is the downtime associated with the failure?
Secondary effects:
How is product quality affected?
Is customer service and satisfaction affected?
What is the increase in the operating cost?
What secondary damages are caused?
Corrective action:
What must be done to repair the failure?
What resources are required for the repair?
To make comprehensive failure mode and effects analysis one need to have information about
the modes and effects which are obtained from various sources including:
- The manufacturer/supplier of the equipment
- Other users of the equipment
- People who operate and maintain the equipment
8.9 - THE WHOLE LIFE EQUIPMENT FAILURE PROFILE (THE BATHTUB CURVE)
The whole-life of equipment (failure) be divided into three major distinct periods:
a) Infant mortality period or early failure
b) Useful life period
c) Wear-out period
During the infant mortality period the failure rate is high owing to the presence of weak and
substandard components. As these component dropout one by one, the failure rate keeps
decreasing until a relatively low more or less constant level is obtained at time t1. Time t1 is the
beginning of the useful period. For the time interval between t1 and t2 which is known as the useful period
only random failures occur which are unpredictable and cannot be prevented? The wear-out period
beginning with time t2 is characterized by a rapidly rising failure rate as more and more components
breakdown.
10
Mulugeta D. Abera

The failure rate curve commonly known as the bathtub curve is the sum of the three separate over lapping
failure rate distributions known as burn-in (early failure), random failure and wear-out failure.
- The decreasing failure rate known as early failures of infant mortality or burn in,
usually related to poor manufacturing, start up, assembly, storage, quality control
- The constant failure rate as useful life or random failure is stress related.
- The increasing failure rate known as wear-out is due to damage causing wear
processes
The superposition of these failure results in a curve which is commonly known as the bathtub
curve.

Figure 8.4, The bathtub curve


Reasons for burn-in failures
- Inadequate quality control
- Inadequate manufacturing methods
- Substandard materials and workmanship
- Wrong startup and installation
- Inadequate processes and human error
- Inadequate handling methods
Reasons for useful life failures
- Unexplainable causes
11
Mulugeta D. Abera

- Human error, abuse, natural failures


- Undetectable failures
- Low safety factors
- Higher random stress than expected
Reasons for wear-out failures
- Inadequate maintenance
- Wear due to friction
- Wear due to aging
- Wrong overhaul practices
- Corrosion failures
8.10 - DOWN TIME AND REPAIR TIME
Failure effect descriptions should help to decide about operational and non-operational failure
consequences to undertake repair. This is usually given by the amount of downtime and repair
time. Downtime and repair time though they overlap greatly they are not he same. In order to be
able to distinguish between these two times the following must be understood.
- Mean down time
- Repair time
- Realization time
- Access time
- Diagnostic time
- Spare procurement time
- Replacement time
- Checkout time
- Alignment time
- Logistic time
- Administrative time
8.11 - FAILURE RATE AND MEAN TIME BETWEEN FAILURES
Failure rate :
Assume a batch of N items out of which a number K has failed at time t. The total cumulative
time T can be evaluated in one of the following ways.
i) If it is assumed that each failure is replaced as it occurs, the cumulative time is
12
Mulugeta D. Abera

ii) If items are not replaced as they fail, for non-replacement condition the cumulative
time is given by
. . .

Where ti = occurrence of the ith failure


Definition
For a stated period in the life of an item the ratio of the total number of failures to the total
cumulative observed time is defined as the observed failure rate.


Where is the failure rate of the N items observed


Noting that the failure rate of an item can be determined from observations made for infinite time
which is practically impossible, the observed failure rate gives the estimates for the actual failure
rate.

is expressed in
- Percentage failure per 1000h
- Failures/h
- Failures/106h
Mean Time between Failures (MTBF)
Definition
For a stated period in the life of an item the mean value of the length of time between
consecutive failures, computed as the ratio of the total cumulative observed time to the total
number of failures is defined as the MTBF.

Apply the above equation, it can be seen that


13
Mulugeta D. Abera

1

MTBF is the average of the value of (t)

Figure 8.5, Mean time between failures


Mean time to failures (MTTF)
For a stated period in the life of an item the ration of cumulative time to the total number of
failures is defined as MTTF.

The difference between MTBF and MTTF is in their usage


- MTTF is applied to items that are not repairable
- MTBF is applied to items that are repairable
- MTBF excludes downtime, therefore it is the mean up-time between failures
Mean life
Definition
Mean time is defined as the mean of the times to failures where each item is allows to fa il.
MTBF and MTTF can be calculated over any period of time. Mean life must include the failure
of every item.

8.12 - RELIABILITY FUNCTION


Consider the probability of an item failing in the interval between t and t+dt. Given the failure
rate (t), the probability that the item may fail in the interval t to t+dt, provided it has survived
until time t, is given by the conditional probability .


Where is the survival up to time t with survival probability which is given by the reliability
, and is item failing between time t and t+tdt.
14
Mulugeta D. Abera

The probability of failure in the interval t to t+dt unconditionally is f(t)dt where f(t) is the failure
probability density function. This probability is obtained by the multiplication theorem which
states that

is given by


Therefore

From which we obtain the failure rate to be

The probability that the item may fail between the running items 0 to t is given by

Differentiating both sides in the above equation yields

Substituting for in the equation we get


1

Integrating both sides yields

Where (t) is integrated with respect to time from 0 to and 1 is integrated with respect to

.
When, 0, 1 i.e the item is 100% reliable; at time the reliability is . Hence the
interval of integration for 1 is between 1 and R(t).

Integrating the above equation yields

|
15
Mulugeta D. Abera

From which the reliability function is obtained to be

Assuming constant failure rate

Or
To determine MTBF, consider the (N-K) items that survived at t. Let (N-K) be , then

In each interval the time accumulated is . At time , the total time accumulated
will be .

Hence MTBF will be given by

From which one obtains the MTBF to be

For the case of constant failure rate

8.13 - MAINTAINABILITY FUNCTION

Various measures are used in maintainability analysis

- Mean time to repair (MTTR)


- Mean preventive maintenance time (MPMT)
- Mean maintenance downtime (MMD)

In addition to these measures maintainability functions are used to predict the probability that a
repair starting at 0 will be completed in time .
16
Mulugeta D. Abera

Determination of MTTR:

For exponential case the MTTR is given by

Where is the repair rate. In many practical applications determination of MTTR is not easy.
MTTR is the mean of the distribution of equipment repair time and can be estimated from

Where

- is the time needed to repair the equipment when the part fails
- is the constant failure rate of the repairable part of the equipment
- is the number of repairable parts in the equipment

Maintainability:

The maintainability function for any distribution is defined by

Where

= time

= maintainability function (probability that a repair action will be finished at time t)

= probability density function of the repair time

Exponential probability distribution function is widely used in maintainability work to represent


repair times. Ti is expressed by:

1
exp

1
exp

1 exp

Where
17
Mulugeta D. Abera

- ADT - Administrative delay time


- LDT Logistic delay time
- MAMT Mean active maintenance time or mean time needed to perform preventive
and corrective maintenance associated with tasks

8.14 - AVAILABILITY

Availability is the available up-time of equipment. This is the probability when equipment is
used under stated conditions and ideal support environments to operate satisfactorily at any given
time. This is referred to as the inherent availability or the steady state availability.

Where: - steady state availability

- System constant repair rate

System constant failure rate

Similarly, system steady-state unavailability is defined as

Substituting for and the steady state availability is

In this calculation of preventive maintenance downtime, supply downtime, queuing


downtime and administrative downtime are excluded.

You might also like