Professional Documents
Culture Documents
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Quantity of Data
. . . . . . . . . . . . . . . . . . . . . . . . . . . 3
. . . . . . . . . . . . . . . . . . . . . . . . . . 7
. . . . . . . . . . . . . . . . 11
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Multi-Modal Failures
. . . . . . . . . . . . . . . . . . . . . . . . . 13
Confidence Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Censoring of Sample Data . . . . . . . . . . . . . . . . . . . . . . . 15
COMPARISON WITH HAZARD PLOTTING . . . . . . . . . . . . . . 16
CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
TWO CYCLE WEIBULL PAPER . . . . . . . . . . . . . . . . . . 18
PROGRESSIVE EXAMPLE OF WEIBULL PLOTTING . . . . . . 20
ESTIMATION OF WEIBULL LOCATION PARAMETER . . . . 31
INTRODUCTION
These notes give a brief introduction to Weibull analysis and its potential contribution to
equipment maintenance and lifing policies. Statistical terminology has been avoided wherever possible and those terms which are used are explained, albeit briefly. Weibull analysis
originated from a paper, Reference 1, published in 1951 by a Swedish mechanical engineer, Professor Waloddi Weibull. His original paper did little more than propose a multi-parameter distribution, but it became widely appreciated and was shown by Pratt and Whitney in 1967 to
have some application to the analysis of defect data.
1.1
Information Sources
The definitive statistical text on Weibull is cited at Reference 2, and publications closer to the
working level are given at Reference 3 and 4. A set of British Standards, BS 5760 Parts 1 to 3 covering a broad spectrum of reliability activities are being issued. Part 1 on Reliability Programme Management was issued in 1979 but is of little value here except for its comments on
the difficulties of obtaining adequate data. Part 2, Reference 5, contains valuable guidance for
the application of Weibull analysis although this may be difficult to extract. The third of the
Standard contains authentic practical examples illustrating the principles established in Parts
1 and 2. One further source of information is an I Mech E paper by Sherwin and Lees at Reference 6. Part 1 of this paper is a good review of current Weibull theory and Part 2 provides some
insight into the practical problems inherent in its use.
1.2
DATA
The basic elements in defect data analysis comprise:
a population, from which some sample is taken in the form of times to failure (here
time is taken to mean any appropriate measure of utilisation),
The most difficult part of this process is the acquisition of trustworthy data. No amount of elegance in the statistical treatment of the data will enable sound judgements to be made from
invalid data.
Weibull analysis requires times to failure. This is higher quality data than a knowledge of the
number of failures in an interval. A failure must be a defined event and preferably objective
rather than some subjectively assessed degradation in performance. A typical sample, therefore, might at its most superficial level comprise a collection of individual times to failure for
the equipment under investigation.
2.1
Quality of Data
The quality of data is a most difficult feature to assess and yet its importance cannot be overstated. When there is a choice between a relatively large amount of dubious data and a relatively small amount of sound data, the latter is always preferred. The quality problem has
several facets:
The data should be a statistically random sample of the population. Exactly what
this means in terms of the hardware will differ in each case. Clearly the modification state of equipments may be relevant to the failures being experienced and
failure data which cannot be allocated to one or other modification is likely to be
misleading. By an examination of the source of the data the user must satisfy
himself that it contains no bias, or else recognise such a bias and confine the deductions accordingly. For example, data obtained from one user unit for an item experiencing failures of a nature which may be influenced by the quality of
maintenance, local operating conditions/practices or any other idiosyncrasy of that
unit may be used providing the conclusions drawn are suitably confined to the unit
concerned.
2.2
A less obvious data quality problem concerns the measure of utilisation to be used;
it must not only be the appropriate one for the equipment as a whole, but it must
also be appropriate for the major failure modes. As will be seen later, an analysis at
equipment level can be totally misleading if there are several significant failure
modes each exhibiting their own type of behaviour. The view of the problem at
equipment level may give a misleading indication of the counter-strategies to be
employed. The more meaningful deeper examination will not be possible unless
the data contains mode information at the right depth and degree of integrity.
It is necessary to know any other details which may have a bearing on the failure
sensitivity of the equipment; for example the installed position of the failures
which comprise the sample. There are many factors which may render elements of
a sample unrepresentative including such things as misuse or incorrect diagnosis.
Quantity of Data
Whereas the effects of poor quality are insidious, the effects of inadequate quantity of data are
more apparent and can, in part, be countered. To see how this may be done it is necessary to
examine one of the statistical characteristics used in Weibull analysis. An equipment undergoing in-service failures will exhibit a cumulative distribution function (F(t)), which is the distribution in time of the cumulative failure pattern or cumulative percent failed as a function of
time, as indicated by the sample.
Consider a sample of 5 failures (sample size n = 5). The symbol i is used to indicate the failure
number once the failure times are ranked in ascending order; so here i will take the integer
values 1 to 5 inclusive. Suppose the 5 failure times are 2, 7, 13, 19 and 27 cycles. Now the first
failure at 2 cycles may be thought to correspond to an F(t) value of i/n, where i = 1 and n = 5.
ie F(t) @ 2 cycles = 1/5 or 0.2 or 20%
Similarly for the second failure time of 7 cycles, the corresponding F(t) is 40% and so on. On
this basis, this data is suggesting that the fifth failure at 27 cycles corresponds to a cumulative
percent failed of 100%. In other words, on the basis of this sample, 100% of the population will
fail by 27 cycles. Clearly this is unrealistic. A further sample of 10 items may contain one or
more which exceed a 27 cycle life. A much larger sample of 1000 items may well indicate that
rather than correspond to a 100% cumulative failure, 27 cycles corresponds to some lesser
cumulative failure of any 85 or 90%.
This problem of small sample bias is best overcome as follows:
Sample Size Less Than 50. A table of Median Ranks has ben calculated which gives
a best estimate of the F(t) value corresponding to each failure time in the sample.
This table is issued with these notes. It indicates that in the example just considered, the F(t) values corresponding to the 5 ascending failure times quoted are not
20%, 40%, 60%, 80% and 100%, but are 12.9%, 31.4%, 50%, 68.6% and 87.1%. It is
this latter set of F(t) use values which should be plotted against the corresponding
ranked failure times on a Weibull plot. Median rank values give the best estimate
for the primary Weibull parameter and are best suited to some later work on confidence limits.
Sample Size Less Than 100. For sample sizes less than 100, in the absence of
Median Rank tables the true median rank values can be adequately approximated
using Bernards Approximation:
F(t) = (i - 0.3)/(n + 0.4)
Sample Sizes Greater Than 100. Above a sample size of about 100 the problem of
small sample bias is insignificant and the F(t) values may be calculated from the
expression for the Mean Ranks:
i/(n + 1)
3.1
3.2
F t
1 e
1
1 F t
b log t g
b log h
It follows that if F(t) can be plotted against t (corresponding failure times) on paper which has
a reciprocal double log scale on one axis and a log scale on the other, and that data forms a
straight line, then the data can be modelled by Weibull and the parameters extracted from the
plot. A piece of 2 cycle Weibull paper (Chartwell Graph Data Ref C6572) is shown at Annex A
and this is simply a piece of graph paper constructed such that its vertical scale is a double log
reciprocal and its horizontal scale is a conventional log.
The mechanics of the plot are described progressively using the following example and the
associated illustrations in plots 1 to 12 of Annex B.
Assemble the data in ascending order and tabulate it against the corresponding F(t)
values for a sample size of 10, obtained from the Median Rank tables. The tabulation is shown at table 1 (Annex B).
Mark the appropriate time scale on the horizontal axis on a piece of Weibull paper
(plot 2).
Plot on the Weibull paper the ranked hours at failure (ti) on the horizontal axis
against the corresponding F(t) value on the vertical axis (plot 3).
If the points constitute a reasonable straight line then construct that line. Note
that real data frequently snakes about the straight line due to scatter in the data;
this is not a problem providing the snaking motion is clearly to either side of the
line. When determining the position of the line give more weight to the later
points rather than the early ones; this is necessary both because of the effects of
cumulation and because the Weibull paper tends to give a disproportionate emphasis to the early points which should be countered where these are at variance with
the subsequent points. Do not attempt to draw more than one straight line
through the data and do not construct a straight line where there is manifestly a
curve. In this example the fitting of the line presents no problem (plot 4). Note
also that on the matter of how much data is required for a Weibull plot that any 4
or so of the pieces of data used here would give an adequate straight line. In such
circumstances 4 points may well be enough. Generally, 7 or so points would be a
reasonable minimum, depending on their shape once plotted.
The fact that the data produced a straight line when initially plotted enables 2
statements to be made:
The next step is to construct a perpendicular from the Estimation Point in the top
left hand corner of the paper to the plotted line (plot 5).
Once the plotted line is obtained, information based on the sample can be
extracted. For example, plot 6 illustrates that this data is indicating that a 400 hour
life would result in about 15% of in-service failures for these equipments. Conversely, an acceptable level of in-service failure may be converted into a life; for
example it can be seen from plot 6 that an acceptable level of in-service failure of
say, 30% would correspond to a life of about 550 hours, and so on.
At plot 7 a scale for the estimate of the Shape Parameter b, is highlighted. This
scale can be seen to range from 0.5 to 5, although b values outside this range are
possible.
At plot 11 the evaluation of the proportion failed corresponding to the mean of the
distribution of the times to failure (P) is shown to be 52.5% using the point of
intersection of the perpendicular and the P scale. This value is inserted in the F(t)
scale and its intersection with the plotted line determines the estimated mean of
the distribution of the times to failure ( $ ). In this case this is about 740 hours.
One additional piece of information which can be easily extracted also is the
median life; that is to say the life corresponding to 50% mortality. This is shown at
plot 12 to be about 720 hours, based on this sample.
4.1
Concept of Hazard
Before examining the significance of the Weibull shape parameter b it is necessary to know
something of the concept of hazard and the 3 so-called failure regimes. The parameter of interest here is the hazard rate, h(t). This is the conditional probability that an equipment will fail
in a given interval of unit time given that it has survived until that interval of time. It is, therefore, the instantaneous failure rate and can in general be thought of as a measure of the probability of failure, where this probability varies with the time the item has been in service. The
3 failure regimes are defined in terms of hazard rate and not, as is a common misconception, in
terms of failure rate.
The 3 regimes are often thought of in the form of the so-called bath-tub curve; this is a valid
concept for the behaviour of a system over its whole life but is a misleading model for the vast
majority of components and, more importantly, their individual failure modes (see Reference
5 and 7). An individual mode is unlikely to exhibit more than one of the 3 characteristics of
decreasing, constant or increasing hazard.
Shape Parameter Less Than Unity.
A b value of less than unity indicates that the item or failure mode may be characterised by the
first regime of decreasing hazard. This is sometimes termed the early failure of infant mortality period and it is a common fallacy that such failures are unavoidable. The distribution of
times to failure will follow a hyper-exponential distribution in which the instantaneous probability of failure is decreasing with time in service. This hyper-exponential distribution
models a concentration of failure times at each end of the time scale; many items fail early or
else go on to a substantial life, whilst relatively few fail between the extremes. The extent to
which b is below 1 is a measure of the severity of the early failures; 0.9 for example would be a
relatively weak early failure effect, particularly if the sample size and therefore the confidence,
was low. If there is a single or a predominant failure mode with a b<1 , then clearly component
lifing is inappropriate since the replacement is more likely to fail than the replaced item. Just
as importantly, a b<1 gives a powerful indication of the causes of these failures, which are classically attributed to two deficiencies. First such failures result from poor quality control in the
manufacturing process or some other mechanism which permits the installation of low
quality components. It is for this reason that burn-in programmes are the common counterstrategy to poor quality control for electronic components which would otherwise generate
an unacceptably high initial in-service level of failure. The second primary cause of infant mortality is an inadequate standard of maintenance activity, and here the analysis is pointing to a
lack of quality rather than quantity in the work undertaken. The circumstance classically associated with infant mortality problems is the introduction of a new equipment, possibly of
new design, which is unfamiliar to its operators and its maintainers. Clearly in such situations, the high initial level of unreliability should decrease with the dissemination of experience and the replacement of weakling components with those of normal standard. The
problem of infant mortality has been shown to be much more prevalent than might have been
anticipated. In one particular study (Part 2 of Reference 6) it was found to be the dominant
failure regime on a variety of mechanical components of traditional design.
pdf for b = 2
time
Figure 1 Probability Density Function for a Shape Parameter of 2
time
Figure 2 Probability Density Function for a Shape Parameter of 3.4
pdf for b = 6 or 7
t0
time
parameters indicates a pdf of the form shown below, of which a very high b, say about 6 or 7, is
just one element, then clearly a strategy to replace at t0 might be highly satisfactory, particularly if it is a critical component, since the evidence suggests there will be no in-service failures
once that life is introduced (Figure 3).
The initiation of increasing hazard conditions and their rate of increase may be a function of
the maintenance policy adopted and the operating conditions imposed on the equipment.
Some General Comments on b
The Weibull shape parameter provides a clear indication of which failure regime is the appropriate one for the mode under investigation and quantifies the degree of decreasing or increasing hazard. It can be used therefore, to indicate which counter-strategies are most likely to
succeed and aids interpretation of the physics of failure. It can also be used to quantify the
effects of any modifications or maintenance policy changes. Although the use of median ranks
provides the best estimate of b by un-biasing the sample data, it is important to remember that
the confidence which can be placed on the b estimate for any given failure mode is primarily a
function of the sample size and quality of the data for that mode.
4.2
f(t)
b = 2.4
63.2%
h = 830
time
10
f(t)
b = 2.4
52.7%
time
= 740
f(t)
b = 2.4
50%
life = 720
time
time between failures (MTBF) for a repairable equipment or a mean time to failure (MTTF)
for a non-repairable equipment, and is therefore the inverse of the constant hazard failure rate.
This is the only circumstance in which h may be termed an MTBF/MTTF.
4.3
11
F(t)
x
x
x
x
time
Figure 7 Representing Points on a Curve using Weibull Paper
The significance of g is that it is some value of time by which the complete distribution of
times to failure is shifted, normally to the right, hence the term location. In the earlier
example the distribution with g$ = 0 is shown at Figure 4. If, however, g$ had taken some positive value, say 425 hours, then this value must be added to all the times to failure extracted
from the subsequent analysis of the straight line, and Figure 4 would have changed to that illustrated at Figure 8.
Here two thirds of the population do not fail until 1245 hours and most importantly the g
value or minimum life value has shifted the time origin such that no failures are anticipated in
the first 425 hours of service. The existence of a positive location parameter is therefore a
highly desirable feature in any equipment and the initial plot should always be examined for a
potential concave form.
A further example of a 3-parameter Weibull plot is given at Annex D.
f(t)
b = 2.4
63.2%
g = 425
830
time
12
5.1
Scatter
The problem of scatter in the original data and the resultant snaking effect this can produce
has been briefly mentioned. At Annex E, however, is a plot using 11 pieces of real data which
illustrates a severe case of snaking. It is possible to plot a line and an attempt has been made in
this case which gives the necessary added weight to later points. The difficulty is obvious; it is
necessary to satisfy yourself that you are seeing true snaking about a straight line caused by
scatter of the points about the line and not some other phenomenon.
5.2
Extrapolation
Successful Weibull plotting relies on having historical failure data. Inaccuracies will arise if the
span in time of that data is not significantly greater than the mean of the distribution of times
to failure. If data obtained over an inadequate range is used as a basis for extrapolation (i.e.)
extending the plotted line significantly, estimates of the 3 parameters are likely to be inaccurate and may well fail to reveal characteristics of later life such as a bi-modal wear-out phenomenon. The solution is comprehensive data at the right level.
5.3
Multi-Modal Failures
The difficulty of multi-modal failures has been mentioned previously. In the same way that
the distribution of times to failures for a single mode will be a characteristic of that mode, so
the more modes there are contributing to the failure data, the more the individual characteristics of number of failure modes often tends to look like constant hazard (b = 1.0). In some
cases this has been found to be so even when the modes themselves have all had a high wearout characteristic (b 3 or 4). This tendency is strongest when there are many modes none of
which is dominant. Hence a knowledge of the failure regimes of the individual failure modes
of an equipment is more useful in formulating a maintenance policy than that of the failure
regime of the equipment itself. The solution once again is data precise enough to identify the
F(t)
or
time
13
characteristics of all the significant failure modes. A Weibull plot using data gathered at equipment level may or may not indicate multi-modal behaviour. The most frequent manifestation
of such behaviour is a convex or cranked plot as shown in Figure 9.
The cranked plot shown above should not normally be drawn since it implies the existence of
2 failure regimes, one following the other in time. This is rarely the case; in general the bi- or
multi-modal plots will be found to be mixed along both lines, because the distributions of
times to failure themselves overlap. This is illustrated in Figure 10.
mode 1 b < 1, hence infant mortality
f(t)
mode 2 b > 1, showing
time dependent failures
time
5.4
Confidence Limits
As was pointed out earlier, most forms of analysis will give a false impression of accuracy and
Weibull is no exception, particularly when the same size is less than 50. The limitations of the
data are best recognised by the construction of suitable confidence limits on the original plot.
The confidence limits normally employed are the 95% lower confidence limit (LCL) or 5%
Ranks, and the 5% upper confidence limit (UCL) or 95% Ranks, although other levels of confidence can be used. With these notes are tables of LCL and UCL ranks which can be seen to be a
function solely of sample size. The technique for using these ranks consists of entering the
14
vertical axis of the Weibull plot at the ith F(t) value quoted in the tables for the appropriate
sample size. A straight horizontal line should be drawn from the point of entry to intersect the
line constructed from the data. From the point of intersection, move vertically up (for a lower
limit) or down (for an upper limit) until horizontal with the corresponding ith plotted point.
The technique is shown at Plot 1 of Annex F for the lower bound using the same example as in
Annex B. The first value obtained from the table for a sample size of 10 is 0.5; this cannot be
used since it does not intersect the plotted line. The next value is 3.6 and this is shown in Plot 1
to generate point (1) on the lower bound. The third point of entry is at 8.7 and this is shown to
produce point (2) which is level with the third plotted point for the straight line, and so on.
The primary use of this lower bound curve constructed through the final set of points is that it
is a visual statement of how bad this equipment might be and still give rise to the raw data
$
observed, with 95% confidence. Hence it can be said here that although the best estimate for h
is 830 hours, we can be only 95% confident, based on the data used, that the true h is greater
than or equal to 615 hours. Similarly at Plot 2, which shows the construction of a 95% upper
bound, we can be 95% confident that the true h is less than or equal to 1040 hours. These 2
statements can be combined to give symmetrical 90% confidence limits of between 615 and
1040 hours. This range can only be reduced by either diminishing the confidence level (and
therefore increasing the risks of erroneous deduction) or by increasing the quantity of data.
5.5
15
mi
n 1 mi
1
1 ki
ki
Mean order number values are determined only for failures. Once the first censoring occurs at
65, all subsequent mi values are non-integers. The median rank values at column (e) are taken
from the median rank tables using linear interpolation when necessary. For purposes of comparison only, the equivalent median ranks obtained from Bernards Approximation, (i-0.3)/(n
+ 0.4) are included at column (f). These are obtained by substituting mi for i in the standard
expression. These can be seen to be largely in agreement with the purer figures in column (e).
Finally 5% LCL AND 95% UCL figures are included at columns (g) and (h). These are obtained
from the tables using linear interpolation where necessary.
The median rank figures in column (e) are plotted on Weibull paper against the corresponding
failure times at column a in the normal way. The plot is illustrated at Plot 1 of Annex G, and
produces b, h and g estimates without difficulty. For completeness, Plot 2 shows the 5% LCL
$ of between 90 and 148 units of time is
AND 95% UCL curves; a 90% confidence range for h
obtained.
16
CONCLUSIONS
The ability of the Weibull distribution to model failure situations of many types, including
those where non-constant hazard conditions apply, make it one of the most generally useful
distributions for analyzing failure data. The information it provides, both in terms of the modelled distribution of times to failure and the prevailing failure regime is fundamental to the
selection of a successful maintenance strategy, whether or not component lifing is an element
in that strategy.
Weibulls use of median ranks helps overcome the problems inherent in small samples. The
degree of risk associated with small samples can be quantified using confidence limits and this
can be done for complete or multiply-censored data. Weibull plots can quantify the risks associated with a proposed lifing policy and can indicate the likely distribution of failure arisings.
In addition, they may well indicate the presence of more than one failure mode. However,
Weibull is not an autonomous process for providing instant solutions; it must be used in conjunction with a knowledge of the mechanics of the failures under study. The final point to be
made is that Weibull, like all such techniques, relies upon data of adequate quantity and
quality; this is particularly true of multi-modal failure patterns.
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
17
ANNEX A
18
19
ANNEX B
Ranked Hours
Median Rank
at Failure
Cumulative % Failed
(ti)
F(t)
300
6.7
410
16.2
500
25.9
600
35.5
660
45.2
750
54.8
825
64.5
900
74.1
1050
83.8
10
1200
93.3
Failure Number
(i)
20
21
22
23
24
25
26
27
28
29
30
ANNEX C
1.
2.
3.
4.
5.
Plot the data initially, observing a concave curve when viewed from the bottom right
hand corner.
Select 2 extreme points on the vertical scale (say a and b), and determine the corresponding failure times (t1 and t3).
Divide the physical distance between points a and b in half without regard for the
scale of the vertical axis, and so obtain point c.
Determine the failure time corresponding to point c (ie t2).
he estimate of the location parameter is given by:
g$
t2
t3
t3
t2
t2
t2
t2
t1
t1
Weibull Plot
b
t1
t2
t3
time
31
ANNEX D
Steps:
1.
2.
3.
4.
5.
6.
Ranked Hours
Median Rank
at Failure
Cumulative % Failed
(ti)
F(t)
1000
9.4
1300
22.8
1550
36.4
1850
50.0
2100
63.6
2450
77.2
3000
90.6
Failure Number
(i)
32
33
34
From Plot 2:
t1 = 810 hours
t2 = 1500 hours
t3 = 4000 hours
t2
g$
1500
t3
t2
t3
t2
t2
t1
t2
t1
1500 810
1500 953
547 hours
Replot using:
1000 - 547 = 453
1300 - 547 = 753
1550 - 547 = 1003
1850 - 547 = 1303
2100 - 547 = 1553
2450 - 547 = 1903
3000 - 547 = 2453
f(t)
b = 1.9
63.2%
g = 547
1560
time
35
36
ANNEX E
37
38
ANNEX F
39
40
41
ANNEX G
42
110.0
105.8
130.0
109.2
(1)
(3)
(5)
(7)
88.3
(6)
(8)
87.5
84.2
(9)
101.7
(10)
75.0
11
70.0
75.0
12
65.8
13)
14
57.5
65.0
15
39.2
(c)
16
(b)
(a)
Survivors ki
31.7
Censoring Times
ci
Failure Times ti
4.08
10.69
7.53
1 4.08
1 11
5.16
4.08
63.31
44.03
29.49
22.89
16.3
10.2
4.2
(e)
Median Ranks %
16
16 1 3
1 12
(d)
63.35
44.09
29.63
23.05
16.46
10.37
4.27
(f)
Bernards Approx
%
43.14
25.65
13.8
9.32
5.3
2.2
80.45
64.18
49.12
42.48
34
26
17
(h)
(g)
0.3
5% Rank Lower
Bound
43
44
45