You are on page 1of 23

RELIABILITY ENGINEERING UNIT

ASST4403
Lecture 12: DATA ANALYSIS

Learning outcomes
Present data visually and numerically, e.g. histogram Identify distributions from data by means of e.g. g histogram Perform simple linear regression

How confident are we? How much can we trust the results? How well have we done?

Identifying de t y g ca candidate d date d distributions st but o s

E s tim a tin g p a ra m e te rs

Confidence interval and goodness-of-fit goodness of fit

Histogram

Frequency distribution
Frequency distribution: data presented as class intervals and their corresponding frequency Range: the difference between the largest and the smallest data values al es Number of classes (bins): Sturges' rule: select a bin size such that there are about 1 + log2n nonempty bins (n is the number of data values) Class midpoint: average of the class endpoints Relative frequency: the ratio of the frequency of the class interval to the total frequency C Cumulative l ti f frequency: running i t total t l of f the th classes l of frequency distribution
5

Example: 5 years house loan interest rate


16 14 12 Freq quency 10 8 6 4 2 0
Lower End Upper End Frequency

6.5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 73 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 82 8.2 8.3 8.4 8.5 8.6 8.7 8.8

6.5 6 5 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3 74 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 83 8.3 8.4 8.5 8.6 8.7 8.8

0 8 0 1 0 0 15 0 14 0 0 3 0 3 0 0 9 0 3 0 0 2 1 0 1

n=60 data values, range = 2.3, class width=0.1, number of classes (bins) = 25

6.5 6.6 6.7 6.8 6.9 7 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 Variable

(If we use Sturges rule, number of classes (bins) ( ) =1+log g2n= 1+log g260=7, 7, then class width should be 2.3/7=0.33)

Histogram
Graph of relative frequencies, representing the underlying distribution (PDF). Construct a histogram
Sort data in ascending order Find data range (max-min) Decide on the number of intervals ( (bins) ) of equal q size and bin size (trial and error, there is no best number)
St Sturges'' rule: l select l t a bi bin size i such h th that t th there are about 1 + log2 n non-empty bins (n is the number of samples)

Group data into bins and count frequency


7
Reproduced with courtesy from Jo Sikorska

Histogram
1 f (t ) e 2
)2 (t 2 2

100

8
Reproduced with courtesy from Jo Sikorska

Histogram

e f (t )

05 0.5

9
Reproduced with courtesy from Jo Sikorska

Example: proper histogram classes


The following 35 failure times (in operating hours) were obtained bt i d f from fi field ld d data t over a 6 6-month th period. Construct a histogram and discuss the underlying distribution
20 31 36 47 98 157 182 185 210 210 214 221 246 247 279 284 289 300 400 401 428 438 442 467 499 552 553 597 767 796 1024 1297 1476 1563 2025

10

3 classes (too few)


Lower End Upper End Frequency

1000 2000

999 1999

30 4 1

35 30

Frequency

25 20 15 10 5 0 999 Variable 1999

11

17 7 classes ( (too many) y)


Lower End Upper End Frequency

20 138 256 374 492 610 728 846 964 1082 1200 1318 1436 1554 1672 1790

20 138 256 374 492 610 728 846 964 1082 1200 1318 1436 1554 1672 1790

1 4 9 4 6 4 0 2 0 1 0 1 0 1 1 0 1

10 9 8 7 Fr requency 6 5 4 3 2 1 0 20 138 256 374 492 610 728 846 964 1082 1200 1318 1436 1554 1672 1790 Variable

12

6 classes (proper)
Lower End Upper End Frequency

Fre quency

400 800 1200 1600 2000

399 799 1199 1599 1999

18 12 1 3 0 1

An exponential distribution?

20 15 10 5 0 399 799 1199 Variable 1599 1999

n=35 data values, range = 2005, class width=400, number of classes (bins) = 6 (Using Sturges rule, number of classes (bi ) =1+log (bins) 1+l 2n= 1+log 1+l 235=6, 35 6 th then class l width should be 2005/6=334)

13

Example: original data for a histogram

14

Example: class interval and frequency for a g histogram

15

Example E l : class l i interval t l and d relative l ti frequency for a histogram

16

Example: histogram
A normal distribution?

17

Simple regression
Process of constructing a mathematical model of f function ti to t predict/determine di t/d t i one variable i bl by b another Simple regression = linear regression, two variables Dependent variable = the variable to be predicted, y Independent variable (explanatory variable) =predictor x =predictor, How well does it fit? Find the coefficient of correlation l ti r (as ( close l t to 1 as possible) ibl )
18

Determining the equation of the regression line


m = slope of the line b = y intercept of the line We are trying to determine these two to form the model

y mx b
b

tg =m

19

Example

20
http://phoenix.phys.clemson.edu/tutorials/excel/regression.html

How to calculate/find m and b

n = number of data points r is the correlation coefficient

21

Doing g linear regression g using g EXCEL

22
http://phoenix.phys.clemson.edu/tutorials/excel/regression.html

Example
Individual Annual income ($000) Weekly time on National Direct Calls (minutes) 1 23 69 2 29 95 3 29 102 4 35 118 5 42 126 6 46 125 7 50 138 8 54 178 9 64 156 10 66 184 11 76 176 12 78 225

Slope Intercept r

2.231503994 SLOPE(C2:C13,B2:B13) 30.91246961 INTERCEPT(C2:C13,B2:B13) 0.941506251 CORREL(C2:C13,B2:B13)


We eeklytime onnat ional direct calls n) (min
250 50 200 150 100 50 0 0 20 40 60 80 100

23

Annualincome ($000)

You might also like