Professional Documents
Culture Documents
/ Reliability Engineering
Introduction
You’ve spent a lot of time and money specifying and installing your
production equipment. Now you want to maximize the availability for
production. But how do you keep all the advantages of the system?
How do you maintain the efficiency and reliability?
We know from studies by United Airlines and the US Navy that the
bathtub curve applies to some components but most complex
machines fail in a random manner. There is no well defined wear out
period. So it makes little sense to do preventive replacement or
overhaul. That leaves predictive maintenance (PdM) or condition
based maintenance (CBM). The basic premise of CBM is to detect a
potential failure in its early stages so that we have time to take action
Figure 1. Motor-drive System Boundary.
to fix it in a planned manner. We want to manage the failure rather
than let the failure manage us.
At first glance, you might say that a motor-drive systems primary
function is to provide power to a shaft. In reality, its primary function
Let’s design a CBM system for a typical motor-drive system.
is more complex than that. Actually, it must provide a minimum
Instead of picking what technologies we want to use and then trying
amount of torque at a specific RPM. It also may be required to
to apply them to our system, we’ll use what I call a machine centered
accelerate the load from stop to operating speed, control speed ramp-
healthcare approach. We’ll find out how the machine can fail and find
up and ramp-down and respond to control signals from a control
tests to detect those failures.
system. Other possible functions (Table 1.) might be to maintain
power factor, minimize distortion of the source wave form or maintain
Machine Centered Approach phase balance.
A machine centered approach looks at the machine first, and by
asking a series of questions, helps you decide how to maintain the
machine’s health. What tests should be done? What routine PM
should be done? How can we make the overhaul/no-overhaul
decision? It has six steps:
Don’t try to use this example without referring to your own system.
It is neither universal nor complete. You may have functions not listed
and some of the ones listed may not be applicable. Use this as a
starting point.
/ Reliability Engineering
Note that some functions may have more than one way it can fail. By taking the score for frequency (1 to 5) and multiplying it by the
The table lists failure both up-stream and downstream of the motor and score for effect (1 to 5) you’ll get a composite score for each failure
drive. Some of these may be due to the supply transformer, but they in the range of 1 to 25. Rank the list by the composite score. The
can impact the motor-drive system. higher the composite score, the greater the significance of the failure.
Decide which of these Now you have to make a judgment call — which failures should
failures is significant you worry about? Any failure that presents a safety hazard
Now that you have a list of possible failures, you want to decide automatically goes to the top of the list. Often, only a few will have a
which ones you should worry about first. Limited resources mean we high rank and you can concentrate on them. Other times most will
can’t manage them all. Some failures are so unlikely that you won’t have a high rank. This is where your knowledge of the machine and
think to worry about them; others have such a low consequence that professional judgment come into play.
their impact and cost is minor. But both of these might be significant
over time. Try to avoid failures
The best solution for these potential failures is to eliminate them.
The significance of a failure is the combination of two factors: It’s probably not realistic to consider major design changes for
frequency and impact. A small failure that occurs often can have the deployed machines but we may be able to make minor changes. I
same impact as a large failure that occurs infrequently. For example, was once involved in a case where reducing the number of belts and
a bearing may fail once a month (a chronic problem) because the the belt tension on a fan eliminated bearing failures.
drive belts are too tight. The cost to replace is $2000. Over a period
of five years, that’s $120,000. But because each individual failure is This is where time-based maintenance can be effective. Periodic
not major, it doesn’t receive the same visibility as a single failure and correct lubrication of bearings can eliminate failures. Replacing
costing $120,000 that only occurs once in five years (an acute the florescent lights in a body shop can improve customer
problem). But they both cost the same. satisfaction by improving workmanship and color match. There also
may be tasks required by regulatory bodies.
The best way to determine how often a failure occurs and its
impact is from the machinery history. Search for both chronic and Decide how we can get an early warning.
acute problems. Try to determine their costs including labor, parts and For those failures that we can’t avoid, we ask, “How can we detect
downtime. Figure out how often they occur. Multiply cost times the failure before it occurs?” Most failures show symptoms before
frequency and rank by the result. they happen. Find the symptoms of failure for each failure mode. A
pump may have to be run faster because of a worn impellor. A motor
However, we can do it without the history. I’ve had success in the may draw more amps because of misalignment or a seal that is too
past using a subjective evaluation. Make a list of the failures and ask two tight. A coupling may be hot because of misalignment or lack of
questions about each of them: how often does this occur and what’s the lubrication. The delta-T across a heat exchange may decrease as it
impact on production when it does? Make it up as a questionnaire. becomes fouled. The wall thickness of a pipe may decrease because
Possible answers are in below (Table 1. Criticality Survey.). This may of flow induced corrosion. Make a list of symptoms for each failure.
sound simplistic but it works.
It is useful to think about the process of failing as what’s called a
PF Curve (Figure 2). This curve illustrates the relationship between
the failure symptom and time. There is a point where the symptom
has developed enough for its change to be detected with confidence.
This is called the “P” point or potential failure point. The value of the
symptom when the process is no longer capable of meeting
requirements is called the “F” point or functional failure point. Note
that the “F” point is not necessarily the point of mechanical break up.
Table 1. Criticality Survey.
You should also ask if the failure is a safety issue. Does it have the
potential to cause injury to personnel or machinery? If so, it’s
automatically the highest priority.
Now send the questionnaires to a cross section of maintenance,
production and management personnel. When you get them back,
average the scores for each item.
Figure 2. PF Curve.
/ Reliability Engineering
Doing the tests without putting all the information together is not
effective. I recommend that each machine have one or two
individuals assigned to monitor its health (the machine’s personal
trainer?). They should be trained in assessing all the information
provided by the tests. Notice I didn’t say, “Trained to evaluate the
data.” They don’t have to analyze the data (vibration spectra); they
just have to understand the results (information) of that analysis.
They should receive the results of the tests along with any other
pertinent information on a regular basis. Then they can use that
information to manage the machine. They can use it to adjust
lubrication intervals, decide when adjustments are needed or part
replacement is indicated. And that overhaul? They may decide it’s not
needed after all.
Summary
What we want to do is to maximize the effectiveness of the
technology in improving machinery reliability. We need to assess
machine health based on several measures. And we should only do
those tests and tasks that are cost effective from the point of view of
the machine. The question is how do we decide what to do? I
propose we follow a systematic process to identify that.