You are on page 1of 61

# Applied Reliability Page 1

APPLIED
RELIABILITY
Techniques for Reliability
Analysis
with
Applied Reliability Tools (ART)
and
JMP Software

## AM216 Class 5 Notes

Santa Clara University

STAT-TECH
Spring 2010
Applied Reliability Page 2

## AM216 Class 5 Notes

Accelerated Testing
(continued from Class 4 Notes)
Accelerated Test Example (Analysis in JMP)
Sample Sizes for Accelerated Testing
System Models
Series System
Parallel System
Analysis of Complex Systems
Standby Redundancy
Defective Subpopulations
Graphical Analysis
Mortals and Immortals
Models
Case Study
Class Project Example
Modeling the Field Reliability
Evolution of Methods
General Reliability Model
AMD Example
Applied Reliability Page 3

System Models
Series System

## Consider a system made up with n components in

series. If the i th component has reliability Ri (t),
the system reliability is the product of the individual
reliabilities, that is,

Rs ( t ) R1 t R2 t ... Rn t
which we denote with the capital pi symbol for
multiplication
n
Rs t Ri t
i 1

Fs t 1 1 Fi t
n

i 1

## The system failure rate is the sum of the individual

component failure rates. The system failure rate
is higher than the highest individual failure rate.
Applied Reliability Page 4

System Models
Parallel System

## Consider a system made up with n components in

parallel. The system CDF is the product of the
individual CDFs, that is,

n
Fs t Fi t
i 1

Rs t 1 1 Ri t
n

i 1

## System failure rates are no longer additive (in

fact, the system failure rate is smaller than the
smallest individual failure rate), but must be
calculated using basic definitions.
Applied Reliability Page 5

## System Failure Rate

Two Parallel Components
A component has CDF F(t) and a failure rate h(t).
Two components are used in parallel in a system.
Determine the failure rate of the system.

SOLUTION
The CDF for the two components in parallel is F2(t)
and the PDF, by differentiation, is 2F(t)f(t). The
failure rate of the system is
hs t s
f t
1 Fs t
2 F t f t

1 F 2 t
2 F t f t

1 F t 1 F t
2 F t
h t
1 F t

## The result shows that the system failure rate is a

factor 2F/(1+F) times the component failure rate. The
smaller the component CDF, the bigger the
improvement. Redundancy makes a larger difference
in early life, and much less difference later on.
Applied Reliability Page 6

Class Project
System Models

## A) A component has reliability R(t) = 0.99.

Twenty-five components in series form a
system. Calculate the system reliability.

## B) A component has reliability R(t) = 0.95

Three components in parallel form a system.
Calculate the system reliability.
Applied Reliability Page 7

A B

## For components in parallel:

B
Applied Reliability Page 8

Example of Series-Parallel
System: Big Rig

G C A
H D
Trailer Cab
I E
J F B

I G E C
B A
J H F D

## Reliability Block Diagram (RBD)

Applied Reliability Page 9

Class Project
Complex Systems

## A system consists of seven units: A, B, C, D, E, G, H.

For the system to function unit A and either unit B or C
and either D and E together or G and H together must
be working. Draw the reliability block diagram for this
setup.

## Write the equation for the CDF of the system in terms

of the individual component reliabilities, that is, the Ri,
where i = A, B, C, ..., G, H. Hint: Consider the three
subsystems:A alone; B with C; and D,E,G,H.
Applied Reliability Page 10

## Standby Versus Active

Redundancy
In contrast to active parallel redundancy, there is
standby redundancy in which the second
component is idle until needed. Assuming perfect
switching and no degradation of the idle
component, standby redundancy results in higher
reliability and less maintenance costs than active
parallel redundancy. An illustration, assuming
exponentially distributed failure times, is shown
below.
System Failure Rates (2 Components)

0.012

0.01

0.008

0.006
Single
Parallel
0.004
Standby

0.002

0
0 50 100 150 200 250 300 350 400
Applied Reliability Page 11

## Series, Parallel Reliability in

ART
In ART, select System Reliability... Enter necessary
information. Click OK.
Applied Reliability Page 12

Reliability Experiment
Consider . . .

## We test 100 units for 1,000 hours. There are 30

failures by 500 hours, but no more by the end of
test.

## Question : Are we dealing with two

populations or just censored data ?

## Question : If we continue the test, will we see

only a few more failures, or will the other 70 fail
with the same life distribution ?
Applied Reliability Page 13

Defect Models
Mortals versus Immortals

## The usual assumption in reliability analysis is that

all units can fail for a specific mechanism. If a
defective subpopulation exists, only a fraction of
the units containing the defect may be susceptible
to failure. These are called mortals.

## Units without the fatal flaw do not fail. These are

called immortals.

## The model for the total population of mortals and

immortals becomes :

## Reliability analysis focuses on the life distribution of

the defective subpopulation and the mortal fraction.
Applied Reliability Page 14

Example of a Defective
Subpopulation

A Processing Problem

## Suppose we have 25 wafers in a lot, but only two

wafers are contaminated with mobile ions due to a
processing error.

## If components are assembled from the 25 wafers,

assuming equal yield per wafer, only 2/25= 8% of
the components can have the fatal defect that
makes failure possible.

## The components from the non-contaminated

wafers will not fail for this mechanism since they
are defect free; that is, we have a defective
subpopulation.
Applied Reliability Page 15

Spotting a Defective
Subpopulation
Graphical Analysis
Assume that a specified failure mode follows a
lognormal distribution.

## Plot the data on lognormal graph paper. If instead of

following a straight line, the points seem to curve
away from the cumulative percent axis, its a signal
that a defective subpopulation may be present.

## If test is run long enough, expect plot to bend over

asymptotic to cumulative percent line that represents
proportion of defectives in the sample.
Applied Reliability Page 16

Defective Subpopulations
Graphical Analysis
Plot based on total sample (mortals and immortals).

## Plot based only on mortal subpopulation.

Applied Reliability Page 17

Defect Model
Mortals and Immortals

## The observed CDF Fobs(t) is

Fobs(t) = p Fm(t)
where Fm(t) is the CDF of the mortals and p is the
fraction of mortals (units with the fatal defect) in the
total sample size.

## For example, if there are 25 % mortals in the

population, and the mortal CDF at time t is 40%, then
we would expect to observe about
0.25x0.40 = 0.10
or 10% failures in the total random sample at time t.
Applied Reliability Page 18

Major Computer
Manufacturer Reliability Data
Gate Oxide Fails

## Time (hours) 24 48 168 500 1000

Rejects 201 23 1 1 1
Sample Size 58,000 57,392 10,000 2,000 1,999

## Analysis by Company Using Lognormal Distribution

T50: 1.149E32 hours Sigma: 26.175
Applied Reliability Page 19

What Do These
Numbers Mean?

## Analysis by Company Using Lognormal Distribution

T50 : 1.149E32 hours Sigma : 26.175

## Plus and minus 3 sigma range of time to failure

distribution extends from 33 seconds to 1.66E62
years !

## It takes seconds to get to 0.1% cumulative failures,

but over 412,000 hours (that is, 47 years) to get to
1.00% !

## Assuming everything can fail is misleading and

unnecessary.
Applied Reliability Page 20

Modeling with
Defective Subpopulations

## The same data, assuming 99% of the failures have

occurred by 48 hours, can be modeled by a fraction
defective subpopulation of 227/58,000 = 0.39% and
a lognormal distribution of failure times for the
mortals T50 =10.6 hours and sigma = 0.68.

## Practically 100% of failures occur by 168 hours. Any

failures thereafter are probably not related to the
defective subpopulation. For example, handling
induced failures are a possibility.
Applied Reliability Page 21

Defective Subpopulation
Models
If we dont consider mortals vs. immortals, we will
incorrectly assume that all units can fail.

## Projections of field reliability will be biased

unless we identify the limited defective units.
Applied Reliability Page 22

Statistical Reliability
Analysis and Modeling:
A Case Study

## Analysis of Reliability Data

with Failures from a
Defective Subpopulation
Applied Reliability Page 23

Reliability Study
Background

## One lot of a device type with initial burn-in results

at 168 hours, 125oC :

failures

## Since other lots, with similar manufacturing, might

have escaped to a few customers, we needed to
assess the field impact.

## We were able to impound this lot, containing about

300 devices not burned-in.
Applied Reliability Page 24

Reliability Study
Design

## Two static stresses:

179 Units : 125oC ambient
90 Units : 150oC ambient
30 Units: Control

## Frequent readouts at 2, 4, 8, 16, 32, 48, 68, 92,

116 hours
Applied Reliability Page 25

Purpose of Study
Reliability Modeling

applies

parameters)

factors

## Determine recovery kinetics with and without

bake
- Is 24 hours at 150oC necessary?
- Do devices recover at room temperature?
Applied Reliability Page 26

Modeling Procedure
Statistical Analysis Plan

## Analyze cumulative percent failures plot versus

time, both linear and probability plots.

## Estimate fraction mortals for stress cells. Test

for significant difference.

## Plot fallout of mortals (reduced sample size) on

lognormal probability graph. Check for linearity
and equality of slopes.

## Run maximum likelihood analysis. Test for

equality of shape factors (sigmas). Estimate
single sigma. Estimate median life T50 for both
cells.

## Check model fit against original data.

Applied Reliability Page 27

Reliability Study
Bake Recoverable Failures

L i n e a r P l o t o f C u m u l a ti v e F a i l ur e s V e r su s T i m e

80%

70%

60%
Cum ula tive P e r c e nt

50%

40%

30%

20%
1 50oC 1 25oC

10%

0%

0 20 40 60 80 100 120

S tre s s Ti m e (P ow e r on H our s )

S am ple S iz e s : 1 5 0 oC = 9 0 ; 1 2 5o C = 1 7 9
Applied Reliability Page 28

Reliability Study
Bake Recoverable Failures

P r o b a b i li t y Pl o t s ( N o A d ju s tm e n t fo r M o r t a ls)

0 .5

0 1 2 3 4 5
S ta ndar d Norm a l V a ri a te : Z

- 0 .5

-1

- 1 .5

150oC 125oC

-2

- 2 .5

Ln (Tim e t o Fa il ure )

S a m pl e S iz e s : 1 5 0 o C = 9 0 ; 1 2 5 o C = 1 7 9
Applied Reliability Page 29

Reliability Study
Bake Recoverable Failures

## P robability P lot (Adjusted for Mortals)

2.5

1.5

1
S ta n d a rd N o rm a l V a ria te : Z

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

-0.5

150oC 125oC

-1

-1.5

-2

L n (T im e to F a ilu re )

## M o rta l S a m p le S iz e s: 150o C = 64; 125o C = 113

Applied Reliability Page 30
APL PROGRAM FOR MLE
GE NLNEST

## ENTER NUMBE R OF CEL LS: 2

CHO OSE CONF. LIMIT FOR BOUND IN PERCENT: 9 0

## ENTER START AND ENDPOINT OF ALL RE ADO UT INTERVALS (INCLUDE ZEROS)

SPREAD 2 4 8 16 32 48 68 92 11 6
ENTER CORRE SPONDING NUMBERS OF FAILS PER INTERVAL (INCLUDE ZEROS)
34 6 21 2 0 0 0 1 0
ENTER TIMES AL L FAILED UNITS WERE REMO VED FROM TEST (INCLUDING END OF TES T)
116
ENTER CORRE SPONDING NUMBERS REMOVED
0

## ENTER START AND ENDPOINT OF ALL RE ADO UT INTERVALS (INCLUDE ZEROS)

SPREAD 2 4 8 16 32 48 68 92 11 6
ENTER CORRE SPONDING NUMBERS OF FAILS PER INTERVAL (INCLUDE ZEROS)
5 0 36 8 42 7 3 4 3
ENTER TIMES AL L FAILED UNITS WERE REMO VED FROM TEST (INCLUDING END OF TES T)
16 116
ENTER CORRE SPONDING NUMBERS REMOVED
23

## VARIANCE VARIANCE COVARIANCE

CELL T50 SIGMA MU SIGMA MU MU SIGMA

## 1 1.90 1.208 .444 .0322 .0 373e-1 .643e-2

2 15 .08 1.060 2.714 .0059 .0 104e-3 .266e-5

## ESTIMATE BO UNDS (90 PERCE NT CO NFIDENCE)

NUM. NUM.
CELL ON TEST FA IL T50 LOW T50 UP SIGMA LOW SIGMA UP

## 1 64 64 1.38 2.63 .909 1.508

2 11 3 108 12.74 17.86 .933 1.187

## WANT EQ UAL T50S OR SIGMAS OR B OTH IN SOME CELLS (Y/N)?

Y
CELLS: 1 2
TYPE 1 FO R EQ UAL SIGMAS , 2 FO R EQ UAL MUS, 3 FOR B OTH THE SAME: 1

THE AS SUMPTION O F QUAL SIGMAS CAN NOT BE REJECTED AT THE 95 PERCE NT LEVEL.
UNDER THIS A SSUMP TION, RESULTS LIK E O BSERVED OCCUR AB OUT 41.9 PERCE NT OF THE TIME.
(THE S MA LLER THIS PE RCENT, THE LESS LIKEL Y THE ASSUMPTION.)
MAXIMUM LIKELIHOOD ESTIMATES

## VARIANCE VARIANCE COVARIANCE

CELL T50 SIGMA MU SIGMA MU MU SIGMA

## 1 2.02 1.090 .704 .0051 .0 247e-2 .538e-3

2 15 .08 1.090 1.713 .0051 .0 110e-2 .250e-5

## ESTIMATE BO UNDS (90 PERCE NT CO NFIDENCE)

NUM. NUM.
CELL ON TEST FA IL T50 LOW T50 UP SIGMA LOW SIGMA UP

## 1 64 64 1.56 2.63 .972 1.207

2 11 3 108 12.68 17.54 .972 1.207

## WANT EQ UAL T50S OR SIGMAS OR B OTH IN SOME CELLS (Y/N)?

N
Applied Reliability Page 31

Reliability Study
Bake Recoverable Failures

M od el Fit to Ac tua l

80%

70%

60%
C um um ative Pe r ce nt Fa ilur e s

50%

40%

30%
1 50oC

1 25oC
20%
M LE F it: 1 50oC

M LE F it: 1 25oC
10%

0%
0 20 40 60 80 1 00 1 20 1 40

Tim e (P o w er o n Ho ur s)
Applied Reliability Page 32

## Projection to Field Conditions

Acceleration Statistics

## Estimate acceleration factor between two

stress cells : AF = 15.08 / 2.02 = 7.465

## Estimate activation energy, based on Tjs,

35oC above ambient: EA = 1.375 eV

## Estimate field T50 based on Tj at 55oC ambient

: field T50 = 18,288 hours

## Using field T50, sigma = 1.090, lognormal

distribution:
-project fallout and failure rates for
various mortal fractions
-use customer field data to determine
which mortal fraction applies
Applied Reliability Page 33

## Projection to Field Use

Bake Recoverable Fails

P r o je ct e d F ie ld F a llo u t w it h V a rio u s M o rt a l
P e r c en t a g es

2 0%

1 8%

5%
1 6%
1 0%
1 4% 2 0%
C u m u lat iv e P e r ce n t

3 0%
1 2%
4 0%

1 0% 5 0%

6 6%
8%

6%

4%

2%

0%
0 2 4 6 8 10

T im e i n F ie ld ( K H o u r s )
Applied Reliability Page 34

A Note of Caution
Analysis When Mortals Are Present

## Since the analysis which took into account the

presence of a defective subpopulation, parameter
estimates were accurate. The two customers,
notified of the affected lots, used analysis for
decisions on how to treat remaining product in field.

## If assessment is not done correctly and there is a

low incidence of mortals, the T50s and sigmas for a
lognormal distribution may become very large and
inaccurate.
Applied Reliability Page 35

A Side Benefit
Screening a Wearout Mechanism

## Note that it may be possible to screen a wearout

failure mechanism if only a subpopulation of the
units are mortal for that mechanism and sufficient
acceleration is obtainable.

## See Trindade paper Can Burn-in Screen Wearout

Mechanism? Reliability Models of Defective
Subpopulations - A Case Study in 29th Annual
Proceedings of Reliability Physics Symposium (1991)
Applied Reliability Page 36

Class Project
Defect Models
50 components are put on stress. Readouts are at
10, 25, 50, 100, 200, 500, and 1,000 hours. The
failure counts at the respective readouts are 2, 2,
4, 5, 4, 3, and 0.

## 1. Estimate the CDF for all units using the table

below with n = 50.
CDF Est
Cum # All Units
Time Fails (%)
10 2
25 4
50 8
100 13
200 17
500 20
1000 20

the next page.

## Does the data appear distributed according to a

Weibull distribution or does a defect model seem
possible?
Applied Reliability Page 37

## Weibull Probability Paper

Applied Reliability Page 38

Note: Percent Failure scale on Weibull Probability paper is faint. Values are 99.9, 98.0, 90.0,
70.0, 50.0, 30.0, 20.0, 10.0, 5.0, 2.0, 1.0, 0.5, 0.2, 0.1, etc.
Applied Reliability Page 39

Class Project
Defect Model Estimates

## Characteristic Life (c) __________

Shape Parameter (m) __________

t / cm
F (t ) 1 e
How could we confirm that the Weibull model for
the mortal population fits the data? We estimate
the CDF at three times and compare to
observations.
Mortal
CDF Model Empirical
(Weibull Mortal CDF for CDF All
Time Model) Fraction All Units Units
25 0.221 0.4
100 0.632 0.4
1000 1.000 0.4
Applied Reliability Page 40

Defective Subpopulations in
ART
Enter failure information (readout times, cumulative
failures) into columns. Under ART, select Defective
Subpopulations Enter required information. Click OK.
Applied Reliability Page 41

System Models

## A General Model for the

Field Reliability of
Integrated Circuits

## An Evolution in the Projection

of Field Failure Rates
Applied Reliability Page 42

## Failure Rate Calculations

Primitive Method

Assumptions
Constant failure rate
Single overall activation
energy
Ambient temperatures
No separation of failure modes
Applied Reliability Page 43

Primitive Method
Problems with Calculations

Example

## 100 units are stressed for 1,000 hours at 125oC.

Assume no self heating. One unit fails at 10 hours for
mechanism with EA of 1.0 eV. Second unit fails at 500
hours for failure mechanism with EA of 0.5 eV.

## Overall average activation energy : 0.75 eV

Acceleration Factor (125oC to 55oC): AF = 106
IFR (constant) at 55oC :

## [1E9x2/(10+500+98x1000)]/AF = 192 FITS

Applied Reliability Page 44

Primitive Method
Comparative Calculation

## Mechanism 1: EA = 1.0 eV, AF = 501

IFR (constant) at 55oC:
[1E9/(10+500+98x1000)]/AF = 20 FITS

## Mechanism 2: EA = 0.5 eV, AF = 22,

IFR (constant) at 55oC:
[1E9/(10+500+98x1000)]/AF = 461 FITS

## Total IFR = 481 FITS

Applied Reliability Page 45

## Failure Rate Calculations

Later Improved Method

## Long-term life modeled with activation energy

specific to failure mechanisms

## Temperature acceleration calculated with junction

temperatures
Applied Reliability Page 46

Later Method
Problems

modeled

## Competing failure modes not adequately

modeled with constant failure rate

## Zero rejects and unidentified mechanisms

often not treated

## Bathtub curve approximated in flat region only

because of constant failure rate
Applied Reliability Page 47

An Alternative Model

## Three categories of possible failures:

Test Escapes
Defective Subpopulations
Competing Failure Mechanisms

## The three Ds:

Defective

Deficient
Applied Reliability Page 48

Quality issue

## Inadequate testing at manufacturer

or damaged after testing prior to
customer receipt

## Rejects discovered at customer;

called mistakenly reliability failures

## Assume zero in model

Applied Reliability Page 49

Defective Subpopulations

## There are proportions of the total population at risk

of failure. Defective units are called mortals. The
ones without the defect are called immortals.

## Defective subpopulations are generally associated

with processing problems.

## There are physical reasons why defective

subpopulations should exist.

## Always question the assumption (common in the

traditional approach) that any observed failure type
will eventually affect all other devices.
Applied Reliability Page 50

Competing Risks

units.

## We call these mechanisms competing risks

because several different types may exist and any
one can cause the unit to fail.

## These mechanisms are typically associated with

design, processing, or material problems.

## We model the failures using Weibull or Lognormal

distributions
Applied Reliability Page 51

## General Reliability Model

FT Fe Fd 1 FN

where
FN = 1 - R1R2. . . RN

mechanisms.

## Zero rejects and unidentified

mechanisms are included.

## Generates complete bathtub curve!

Applied Reliability Page 52

## General Reliability Model In

Use at AMD
AMD Reliability Brochure 1994 Data
Applied Reliability Page 53

## AMD Reliability Brochure 1994 Data

Applied Reliability Page 54

Appendix
Applied Reliability Page 55

Class Project
System Models

## A) A component has reliability R(t) = 0.99.

Twenty-five components in series form a
system. Calculate the system reliability.

## B) A component has reliability R(t) = 0.95

Three components in parallel form a system.
Calculate the system reliability.

## Rs(t) = 1- (1- 0.95)3 = 0.9999 or 99.99%

Applied Reliability Page 56

Class Project
Complex Systems

## A system consists of seven units: A, B, C, D, E, G, H.

For the system to function unit A and either unit B or C
and either D and E together or G and H together must
be working. Draw the reliability block diagram for this
setup.
D E
B

C
G H

## Write the equation for the CDF of the system in

terms of the individual component reliabilities, that is,
the Ri, where i = A, B, C, ..., G, H. Hint: Consider the
three subsystems:A alone; B with C; and D,E,G,H.
1) RA
2) RBC=1- (1- RB )(1- RC )
3) RDEGH = 1- (1- RDE )(1- RGH )
= 1- (1- RDRE )(1- RGRH )
The system CDF is
FS = 1 - RS = 1 - RA RBC RDEGH
Applied Reliability Page 57

Class Project
Defect Models
1. Estimate the proportion defective p and the
number of mortals in the sample. Fill in the mortal
CDF column in the table below.
Cum # CDF Est All CDF Est
Time Fails Units (%) Mortals (%)
10 2 2/50 = 4%
25 4 4/50 = 8%
50 8 8/50 = 16%
100 13 13/50 = 26%
200 17 17/50 = 34%
500 20 20/50 = 40%
1000 20 20/50 = 40%

## 2. Plot the data for the mortal subpopulation on

the same sheet of paper. Does the fit look
reasonable?

percentile.

## 5. Estimate the shape parameter m by drawing a

line perpendicular to the best fit by eye line
through the estimation point on the Weibull paper
and reading the beta estimation scale.
Applied Reliability Page 58

Class Project
Defect Model Example

n = 50
Cum # CDF Est All CDF Est
Time Fails Units (%) Mortals (%)
10 2 2/50 = 4% 2/20 = 10%
25 4 4/50 = 8% 4/20 = 20%
50 8 8/50 = 16% 8/20 = 40%
100 13 13/50 = 26% 13/20 = 65%
200 17 17/50 = 34% 17/20 = 85%
500 20 20/50 = 40% 20/20 = 100%
1000 20 20/50 = 40% 20/20 = 100%

## Estimated mortal fraction, p : 0.40 or 40%

CDF estimate for mortals is based on
sample size of defective subpopulation.
Applied Reliability Page 59

## Weibull Probability Plot

Applied Reliability Page 60

Class Project
Defect Model Example
Model Check

## Characteristic Life (c) ___ 100 ______

Shape Parameter (m) ___ 1.0 ______

t / cm
F (t ) 1 e

Mortal
CDF Model Empirical
(Weibull Mortal CDF for CDF All
Time Model) Fraction All Units Units
25 0.221 0.4 0.088 0.08
100 0.632 0.4 0.253 0.26
1000 1.000 0.4 0.400 0.40
Applied Reliability Page 61

Class Project
Defect Model
p x Weibull CDF Plot

## Defect Model Example

0.9

0.8

0.7

0.6
CDF

0.5

0.4

0.3

0.2

0.1

0
0 100 200 300 400 500 600 700 800 900 1000
Times (Hrs)