You are on page 1of 5

Lean RW Final.

qxp 4/16/2007 10:09 AM Page 307

/ Reliability Engineering

Six Steps to a Healthy Machine


By: Jim Taylor of Machinery Management Solutions, Inc.

Introduction
You’ve spent a lot of time and money specifying and installing your
production equipment. Now you want to maximize the availability for
production. But how do you keep all the advantages of the system?
How do you maintain the efficiency and reliability?

We know from studies by United Airlines and the US Navy that the
bathtub curve applies to some components but most complex
machines fail in a random manner. There is no well defined wear out
period. So it makes little sense to do preventive replacement or
overhaul. That leaves predictive maintenance (PdM) or condition
based maintenance (CBM). The basic premise of CBM is to detect a
potential failure in its early stages so that we have time to take action
Figure 1. Motor-drive System Boundary.
to fix it in a planned manner. We want to manage the failure rather
than let the failure manage us.
At first glance, you might say that a motor-drive systems primary
function is to provide power to a shaft. In reality, its primary function
Let’s design a CBM system for a typical motor-drive system.
is more complex than that. Actually, it must provide a minimum
Instead of picking what technologies we want to use and then trying
amount of torque at a specific RPM. It also may be required to
to apply them to our system, we’ll use what I call a machine centered
accelerate the load from stop to operating speed, control speed ramp-
healthcare approach. We’ll find out how the machine can fail and find
up and ramp-down and respond to control signals from a control
tests to detect those failures.
system. Other possible functions (Table 1.) might be to maintain
power factor, minimize distortion of the source wave form or maintain
Machine Centered Approach phase balance.
A machine centered approach looks at the machine first, and by
asking a series of questions, helps you decide how to maintain the
machine’s health. What tests should be done? What routine PM
should be done? How can we make the overhaul/no-overhaul
decision? It has six steps:

1. List the functions of the system.


2. List the possible failures to meet those functions.
3. Decide which of these failures are significant.
4. Decide how we can get an early warning.
5. Select a suite of tests to detect those early warning signs.
6. Collect the results of the tests at one decision point.

List the functions of the system.


To ask what the failures are, first we need to know what the
machine is supposed to do. What are its primary and secondary
functions? For this example, we’ll look at the system (Figure 1.
Motor-drive System Boundary.) from the secondary of the supply
transformer to the motor output shaft.

2007 Conference Proceedings 307


Lean RW Final.qxp 4/16/2007 10:10 AM Page 308

Table 1. Motor-Drive System Functions.

Don’t try to use this example without referring to your own system.
It is neither universal nor complete. You may have functions not listed
and some of the ones listed may not be applicable. Use this as a
starting point.

List the possible failures to


meet those functions
Once you’ve decided what the machine’s function is ask what can
happen to prevent it from meeting that function (Table 2.). In the case
of the motor-drive system, the answer might be the motor not
starting, poor speed control, high total harmonic distortion (THD)
reflected back into the power supply, or a number of other possible
failures.

At this point, don’t consider whether the failure is likely or has


much impact. You’re just brainstorming for a complete list.

Table 2. Functional failures.

308 2007 Conference Proceedings


Lean RW Final.qxp 4/16/2007 10:10 AM Page 309

/ Reliability Engineering

Note that some functions may have more than one way it can fail. By taking the score for frequency (1 to 5) and multiplying it by the
The table lists failure both up-stream and downstream of the motor and score for effect (1 to 5) you’ll get a composite score for each failure
drive. Some of these may be due to the supply transformer, but they in the range of 1 to 25. Rank the list by the composite score. The
can impact the motor-drive system. higher the composite score, the greater the significance of the failure.

Decide which of these Now you have to make a judgment call — which failures should
failures is significant you worry about? Any failure that presents a safety hazard
Now that you have a list of possible failures, you want to decide automatically goes to the top of the list. Often, only a few will have a
which ones you should worry about first. Limited resources mean we high rank and you can concentrate on them. Other times most will
can’t manage them all. Some failures are so unlikely that you won’t have a high rank. This is where your knowledge of the machine and
think to worry about them; others have such a low consequence that professional judgment come into play.
their impact and cost is minor. But both of these might be significant
over time. Try to avoid failures
The best solution for these potential failures is to eliminate them.
The significance of a failure is the combination of two factors: It’s probably not realistic to consider major design changes for
frequency and impact. A small failure that occurs often can have the deployed machines but we may be able to make minor changes. I
same impact as a large failure that occurs infrequently. For example, was once involved in a case where reducing the number of belts and
a bearing may fail once a month (a chronic problem) because the the belt tension on a fan eliminated bearing failures.
drive belts are too tight. The cost to replace is $2000. Over a period
of five years, that’s $120,000. But because each individual failure is This is where time-based maintenance can be effective. Periodic
not major, it doesn’t receive the same visibility as a single failure and correct lubrication of bearings can eliminate failures. Replacing
costing $120,000 that only occurs once in five years (an acute the florescent lights in a body shop can improve customer
problem). But they both cost the same. satisfaction by improving workmanship and color match. There also
may be tasks required by regulatory bodies.
The best way to determine how often a failure occurs and its
impact is from the machinery history. Search for both chronic and Decide how we can get an early warning.
acute problems. Try to determine their costs including labor, parts and For those failures that we can’t avoid, we ask, “How can we detect
downtime. Figure out how often they occur. Multiply cost times the failure before it occurs?” Most failures show symptoms before
frequency and rank by the result. they happen. Find the symptoms of failure for each failure mode. A
pump may have to be run faster because of a worn impellor. A motor
However, we can do it without the history. I’ve had success in the may draw more amps because of misalignment or a seal that is too
past using a subjective evaluation. Make a list of the failures and ask two tight. A coupling may be hot because of misalignment or lack of
questions about each of them: how often does this occur and what’s the lubrication. The delta-T across a heat exchange may decrease as it
impact on production when it does? Make it up as a questionnaire. becomes fouled. The wall thickness of a pipe may decrease because
Possible answers are in below (Table 1. Criticality Survey.). This may of flow induced corrosion. Make a list of symptoms for each failure.
sound simplistic but it works.
It is useful to think about the process of failing as what’s called a
PF Curve (Figure 2). This curve illustrates the relationship between
the failure symptom and time. There is a point where the symptom
has developed enough for its change to be detected with confidence.
This is called the “P” point or potential failure point. The value of the
symptom when the process is no longer capable of meeting
requirements is called the “F” point or functional failure point. Note
that the “F” point is not necessarily the point of mechanical break up.
Table 1. Criticality Survey.

You should also ask if the failure is a safety issue. Does it have the
potential to cause injury to personnel or machinery? If so, it’s
automatically the highest priority.
Now send the questionnaires to a cross section of maintenance,
production and management personnel. When you get them back,
average the scores for each item.

2007 Conference Proceedings 309


Lean RW Final.qxp 4/16/2007 10:10 AM Page 310

Figure 2. PF Curve.

We want to find a parameter to measure that will meet several


criteria. First it should be a reliable indicator of machine health: i.e.: it
should be a reliable measure of the symptom. It should be economic
to measure. And it should give sufficient warning so that there is time
to react.

Select a suite of tests to


detect those early warning signs.
With a list of symptoms, you’re now in the position to select tests
that measure or detect that symptom. For each symptom, try to get
as many independent tests as possible. The more information you
have, the more confident you’ll be in your call. You should have at
least two tests for each failure that can confirm each other and avoid
false positives (or negatives).

Table 2. Possible Tests for Symptoms.

As you’re considering tests, don’t limit yourself to high tech


methods. Process parameters are also valuable. And one of the most
valuable tests is the operator and maintainer. An experienced person,
familiar with the machine, making a conscious effort to sense a
particular effect, can be very effective at assessing the health of a
machine.

(Appendix A. is a matrix of tests vs. equipment that may give you


some ideas as to what and how to test. This list is a work in progress
and any inputs or comments will be welcome.)

310 2007 Conference Proceedings


Lean RW Final.qxp 4/16/2007 10:10 AM Page 311

/ Reliability Engineering

Collect the results of the


tests at one decision point
Many plants have some form of condition assessment program in
place. But as a rule, those programs operate in relative isolation. The
people responsible for them work to maximize the efficiency of
application of the technology. They optimize the technology.

Doing the tests without putting all the information together is not
effective. I recommend that each machine have one or two
individuals assigned to monitor its health (the machine’s personal
trainer?). They should be trained in assessing all the information
provided by the tests. Notice I didn’t say, “Trained to evaluate the
data.” They don’t have to analyze the data (vibration spectra); they
just have to understand the results (information) of that analysis.

They should receive the results of the tests along with any other
pertinent information on a regular basis. Then they can use that
information to manage the machine. They can use it to adjust
lubrication intervals, decide when adjustments are needed or part
replacement is indicated. And that overhaul? They may decide it’s not
needed after all.

One power company has an overhaul interval for feed pumps of


five years. When the five year overhaul comes up, instead of
immediately overhauling the pump, a planner collects all the
operational and condition based information and reviews it together.
He can then decide to overhaul immediately, defer for one year or
defer for two years. At the end of that deferral, he repeats the
process. They now overhaul on an average of nine years.

Summary
What we want to do is to maximize the effectiveness of the
technology in improving machinery reliability. We need to assess
machine health based on several measures. And we should only do
those tests and tasks that are cost effective from the point of view of
the machine. The question is how do we decide what to do? I
propose we follow a systematic process to identify that.

In summary, the process is:


1. List the functions of the system.
2. List the possible failures to meet those functions.
3. Decide which of these failures are significant.
4. Decide how we can get an early warning.
5. Select a suite of tests to detect those early warning signs.
6. Collect the results of the tests at one decision point.

2007 Conference Proceedings 311

You might also like