BASIC STATISTICS TOOLKIT

BASIC STATISTICS
A. CHAPTER OBJECTIVES
B. INTRODUCTION
C. SAMPLING
D. FUNDAMENTALS OF IMPROVEMENT
E. MEASURES OF CENTRAL TENDENCY
F. MEASURES OF VARIABILITY
G. NORMAL PROBABILITY TEST
H. HANDLING NON-NORMAL DATA
I. SAMPLING DISTRIBUTIONS
J. CENTRAL LIMIT THEOREM
K. STANDARD NORMAL PROBABILITY
DISTRIBUTION
CHAPTER OBJECTIVES
INTRODUCTION
We want information on the entire population.
Since this is not always possible nor economically
realistic, we use statistics.
Statistics enable us to take a sample from the population
and estimate a characteristic about the population.
It is important to differentiate the sample statistics
(estimations) from the population parameters.
SAMPLING
Sample is a portion (subset) of a larger population from which

information is required.
Sample can yield information that can be used to predict characteristics
of a population.
Sample statistics provide an estimate of the population parameters.
The figure below displays basic symbols for sample statistics versus
population parameters.
Sample Statistics vs. Population Parameters
X = Sample Mean
= Population Mean
s = Sample
Standard Deviation
Statistics
= Population Standard
Deviation
Estimate
Parameters
FUNDAMENTALS OF IMPROVEMENT
Variability & stability are used to determine the status of a
process.
We use the mean () to determine if process is on target.
We use standard deviation () to determine variability.
Stability helps us to determine how well a process
performs over time.
Stability is represented by a constant mean and
predictable variability over time.
Every process displays variation; some display controlled
variation while others display uncontrolled variation (Walter
Shewart).
VARIATION
Control chart A displays controlled variation; stable &
consistent pattern of variation over time.
Control chart B displays uncontrolled variation; variation
that changes over time.
X-Bar
Chart for Process A
X - B a r C h a r t fo r P ro c e s s A
X-Bar C ha rt for P rocess B
U C L = 7 7 .2 0
80
U C L = 7 7 .2 0
U CL = 7 7 .2 7
75
M ea n
X = 7 0 .9 1
70
X = 7 0 .9 1
70
X =70.98
70
S a m p le
S a m p le M e a n
S a m pl e M e a n
75
L C L = 6 4 .7 0
60
65
L C L = 6 4 .6 2
65
L C L = 6 4 .6 2
5
0
10
5
15
10
20
15
S a m p l e N u mb e r
25
20
50
0
25
10
15
2 0
25
S a m p le N u m b e r
S a m p l e N u mb e r
VARIATION - cont.
Variation will be present in any process and can be tolerated if:
The variation of the output is relatively small compared to the
process specifications and the process is on target.
The process is stable over time.
OLD VERSUS NEW WAY OF

THINKING
New Way - reduce variation/focus on nominal.
Old Way - as long as the process is within specifications,
everything is ok.
LSL
LSL
USL
USL
Old
Old
Thinking
Thinking
Cost
Acceptable
LSL
LSL
Cost
Nom
Nom
Nom
Nom
USL
USL
New
New
Thinking
Thinking
DATA ANALYSIS TASK

Class,
class.,
Says
Professor
Ada
Our data analysis task is to:

Determine if the process is stable.
If the process is not stable, we should identify and remove
the cause(s) of instability.
If the process is stable, we should:
Estimate the magnitude of the total variability.
Identify the sources of the variability.
Reduce the variability.
We will now review some basic statistical concepts to
assist us in our task of data analysis.
MEASURE OF CENTRAL TENDENCY

We will review 3 common measures of central tendency:
Mean
Median
Mode
10
MEAN
Mean ( - population, X - sample) is the arithmetic average of the data
values (X 1, X2, X3. XI) which is expressed as follows.
Population - =
Sample - X
Brian Sr.
A sample yields an estimate X for the true mean of a

population () from which a sample is randomly drawn.
The mean is the most commonly used measure of location
(Central Tendency).
The mean reflects the influence of all values, but is
strongly influenced by extreme values.
11
MEDIAN
Single value from data set that measures the central item
in the data.
Single item is the middle most or most central item in data
set.
Half the items lie above this point; other half lie below it.
Reflects 50% rank or center number after data set has
been sorted from high to low.
Median is Robust to extreme values.
12
MODE
Brian Jr.
Represents most frequently observed data of a sample.

As a result, it is not representative of all the data.
Most useful for data collected using a nominal scale
(customer survey with rating scale of 1 to 5).
Can be used to identify the most important interval when
data is classified by frequencies (histogram).
13
M e a n , M e d ia n ,
M ode
Normal distribution mean, median & mode

coincide.
When data is skewed to

left or right, mode is the
highest point, but median
is located between mode
& median.
Mean is more
representative of the
location of the
distribution.
S y m m e tr ic a l
d is tr ib u tio n
M ode
M e d ia n
M ean
S k e w e d to le ft
M ode
M e d ia n
M ean
S k e w e d to r ig h t
14
APPLICATION EXERCISE
Minitab can easily calculate the mean & median as follows:

1. Open file Distskew.mtw located in Gbdata.
2. Stat>Basic Statistics>Display Descriptive Statistics.
3. In the dialog box:
Double click on C1, C2 and C3 to enter into variables
box.
15
Dialog Window
Click OK.
16
SESSION WINDOW
Results for: DISTSKEW.MTW
Descriptive Statistics: Norm, Pos Skew, Neg Skew
Variable
Norm
Pos Skew
Neg Skew
N
500
500
500
N*
0
0
0
Variable
Norm
Pos Skew
Neg Skew
Maximum
103.301
130.366
77.106
Mean
70.000
70.000
70.000
SE Mean
0.447
0.447
0.447
StDev
10.000
10.000
10.000
Minimum
29.824
62.921
1.866
Q1
63.412
63.647
67.891
Median
69.977
65.695
73.783
Q3
76.653
72.821
76.290
Point out Mean, Median, Minimum and Maximum

Next slide will run graphs for same data
17
Minitab can also display a histogram of these same

data sets as follows:
1. Graph>Histogram.
2. In the Histogram dialog window
Click Simple
Click OK
3. In the Histogram-Simple dialog window:
Double click on C1, C2 and C3 to enter into the
Graph Variables box.
18
Dialog Window
Select
Simple
OK
Click OK.
19
HISTOGRAMS
Histogram of Neg Skew
Histogram of Pos Skew
250
140
120
200
Frequency
Frequency
100
150
100
80
60
40
50
20
0
12
24
36
Neg Skew
48
60
72
70
80
90
100
Pos Skew
110
120
Histogram of Norm
70
Project Manager
Select all 3 Graphs
Right click Selected Graphs
Select Tile
50
Frequency
Graphs
60
40
30
20
10
0
30
40
50
60
70
Norm
80
90
100
20
130
MEASURES OF VARIABILITY
Mean, median & mode tell us only part of what we need to
know about the characteristics of data.
We must also measure dispersion (spread) or variability.
Three measures of variability will be reviewed. Range,
variance and standard deviation.
21
RANGE
Difference between highest & lowest observed values.

Easy to understand; usefulness as a measure of
dispersion is limited.
Considers only the highest & lowest values; fails to take
into account other data in set.
Heavily influenced by extreme values.
Inefficient for large samples greater than n=10.
Generally used for developing control chart limits on
process control charts.
22
VARIANCE
Variance (2 population, s2 sample) - sum of the squared
distances between the mean and each item divided by the
total number of elements in the population.
Formulas:
Population
( X )
N
Sample
(X X )
n 1
Variance is the square of the units (squared dollars) which is

not easily interpreted.
We have to make a change in the variance to compute a
useful measure of deviation.
Measure is called standard deviation.
23
STANDARD DEVIATION
Standard deviation ( population, s sample) - quantifies
data variability and is the square root of the variance.
Enables us to determine where the values of a frequency
distribution are located relative to the mean.
This can be shown with a normal curve and related
probability areas.
24
In these cases we can say that:

About 68% of the values in the population will fall within plus
or minus 1 standard deviation of the mean.
About 95% of the values in the population will fall within plus
or minus 2 standard deviations of the mean.
About 99.73% of the values in the population will fall within
plus or minus 3 standard deviations of the mean.
25
STANDARD DEVIATION
Formulas:
Population
Sample
( X )
( X X )
n 1
An important Six Sigma principle indicates that the total variation (variance) of
a process output variable can be partitioned into the variation due to the
process inputs of the process as follows:
2
If Total = variance of the process output;

2
X1 = variance due to input variable x1;
2
X2 = variance due to input variable x2;
2
2
2
Then, Total = X1 + X2
2
2
So, Total X 1 X 2
26
NORMAL PROBABILITY TEST
We can test whether a given data

set can be described as normal
with a normal probability test.
If the distribution is close to
normal, the normal probability plot
will be a straight line.
27
NORMAL PROBABILITY TEST
Open-up Distskew.mtw located in the Gbdata file and

proceed as follows:
1. Stat>Basic Statistics>Normality Test
2. Using the dialog window, produce 3 separate Normal
Probability Plots; C1, C2 and C3 (Test for normality Anderson-Darling).
Note: If the Normality test shows a P-Value equal to or
less than 0.05, the data is NOT represented by a
normal distribution.
28
Normality Test Dialog Window
Note: We use Anderson-Darling test.

Run once for each Norm, Pos Skew, and Neg Skew
29
NORMAL PROBABILITY TEST RESULTS

Probability Plot of Norm
Probability Plot of Neg Skew
Normal
Normal
Mean
StDev
N
AD
P-Value
99
Percent
95
90
70.00
10.00
500
0.418
0.328
p=
.328
80
70
60
50
40
30
20
99.9
95
90
P<
.005
80
70
60
50
40
30
20
10
10
0.1
Mean
70.00
StDev
10.00
N
500
AD
44.491
P-Value <0.005
99
Percent
99.9
0.1
30
40
50
60
70
Norm
80
90
100
110
20
40
60
Neg Skew
80
100
Probability Plot of Pos Skew

Normal
99.9
Mean
70.00
StDev
10.00
N
500
AD
46.489
P-Value <0.005
99
Percent
95
90
80
70
60
50
40
30
20
P<
.005
10
5
1
What are the results?

If the Normality test shows a
P-Value equal to or less
than 0.05, the data is NOT
represented by a normal
distribution.
0.1
40
50
60
70
80
90
Pos Skew
100
110
120
130
30
MYSTERY DISTRIBUTION
Following the previous procedure, generate a normal probability plot for the
Mystery variable in C4 column of the Distskew.mtw file.
P-Value is less than 0.05; distribution is nonnormal.

NOTE: Observe the Bimodal distribution in the plot.
31
Graphical Summary
Next, create a histogram and descriptive statistics

summary of the Mystery data located in C4 as follows.
Stats>Basic Statistics>Graphical Summary
Enter C4 into variables window.
32
Graphical Summary Dialog Window
Click OK
33
RESULT
Minitab provides a histogram along with the related
descriptive statistics.
Summary for Mystery
Anderson-Darling Normality Test
A-Squared
P-Value <
Mean
StDev
Variance
Skewness
Kurtosis
N
40
60
80
100
120
140
Minimum
1st Quartile
Median
3rd Quartile
Maximum
160
27.11
0.005
100.00
32.38
1048.78
0.00716
-1.63184
500
41.77
68.69
104.20
130.81
162.82
95% Confidence Interval for Mean

97.15
102.85
95% Confidence Interval for Median

82.78
117.66
95% Confidence Interval for StDev
95% Confidence Intervals
30.49
34.53
Mean
Median
80
90
100
110
120
34
HANDLING NONNORMAL DATA
Nonnormal distribution is common for some measurements. Minitab can

be utilized to analyze the capability or performance of a process using
nonnormal data.
First, you should attempt to determine the cause(s) of nonnormal data.
Typical examples:
Two different machines provide a bimodal distribution. As a result,
analyze the data for each machine separately.
Data comes from an unstable process. As a result, the process must
be stabilized before reliable statistical results can be obtained.
35
HANDLING NONNORMAL DATA
In instances where the process is stable and predictable and the data proves to be
nonnormal there are a couple of options
Normalize the data via a transformation (transformations are beyond the scope of
our training)
Utilize a nonnormal probability model (weibull, lognormal, exponential, etc) to
analyze overall capability (Pp, Ppk, PPU, and PPL)
Prior to using any data we analyze the normality.

Verify if the data are normal using Minitab (Stat>Basic Statistics>Normality Test).
If the data proves to be nonnormal,
Utilize Minitabs Stat > Quality Tools > Individual Distribution Identification
Allows you to evaluate the optimal distribution for your data based on
probability plots and goodness-of-fit tests prior to conducting a capability
analysis study
36
NONNORMAL DATA EXERCISE
Open New Project (close without save)

File > New
Project
OK
Open-up Cltest.mtw, located in your Gbdata
file and follow along as we go through this
exercise.
First we will determine normality of data using;
Stat>Basic Statistics>Normality
Select Variable C3 Dist3
Probability Plot of Dist3

Normal
99.9
Mean
StDev
N
AD
P-Value
99
95
90
Percent
80
70
60
50
40
30
20
10
5
1
0.1
-2
-1
2
Dist3
Results: P value < 0.005 indicates data is nonnormal. As a result we will

proceed to identify the optimal distribution for our data.
37
0.9100
0.8654
500
19.095
<0.005
INDIVIDUAL DISTRIBUTION
IDENTIFICATION
Using Cltest.mtw, perform an Individual Distribution
Identification test as follows:
1. Stat > Quality Tools > Individual Distribution Identification
2. In the dialog window
Enter C3 Dist3 into single column.
Select Specify
Use the default distributions ( Normal, Exponential,
Weibull, Gamma).
Note: We will use these settings to simplify the example.
We could have used the Use all distributions option to
look at 10 additional distributions. However, in this instance
we know that one of these will Best Fit the distribution.
OK
38
INDIVIDUAL DISTRIBUTION IDENTIFICATION

Review the session window - Goodness of Fit Test section.
Point out that 3 distributions exhibit a good fit to the fitted line as identified by the
Anderson-Darling (AD) statistic.
The AD statistic is a measure of how far the plot points fall from the fitted line in
a probability plot.
The smaller the AD statistic the better the fit!
Point out the p-value and note that a p-value greater than alpha (.05) suggests that
the data follow that distribution.
AD
P
Exponential
1.032
0.109
Review distribution curves on each and point out specifically the Exponential,
Weibull and Gamma distributions.
Each plot similar, difference being confidence interval (outside lines)
Statistically the Exponential is the Best Fit
Remember the p-value is the probability that the data is from that distribution
39
CAPABILITY ANALYSIS
Now that we are comfortable with the probability that our data fits the
Exponential distribution we are able to perform a capability analysis.
Using Cltest.mtw, we will perform a capability analysis for Nonnormal
data and fit data with Exponential distribution.
1. Stat > Quality Tools > Capability Analysis > Nonnormal
2. In the dialog window
Enter C3 Dist3 into single column.
In Fit data with
Select Distribution
Select Exponential
Lower spec Enter 0
Upper spec Enter 3
Click OK
Next slide has graphs
40
Exponential Distribution Model

Pp = 0.50
Ppk = 0.44
Exp. Overall Performance = 37000.5

Process Capability of Dist3
Calculations Based on Exponential Distribution Model
LSL
USL
Process Data
LSL
0.00000
Target
*
USL
3.00000
Sample Mean 0.90997
SampleN
500
Mean
0.90997
O verall Capability
Pp
0.50
PPL
1.00
PPU
0.44
Ppk
0.44
Exp. O verall Performance
PPM<LSL
0.0
PPM>USL 37000.5
PPMTotal
37000.5
O bserved Performance
PPM<LSL
0
PPM>USL 32000
PPMTotal
32000
0.0
0.8
1.6
2.4
3.2
4.0
4.8
We are able to predict the long term process capability.
41
Note: Short term capability is not calculated for nonnormal data.
SAMPLING DISTRIBUTIONS
If we selected 10 groups of 25 samples from a continuous process
& computed the mean length and standard deviation of the length of
each sample group, the mean and standard deviation of each
sample group would be different.
Sampling distribution of the mean - a probability distribution of all
the possible means of the samples.
Sampling distributions can be partially described by its mean and
standard deviation.
Rather than say standard deviation of the distribution of sample
means we call it the standard error of mean.
Standard error indicates size of the chance error and the accuracy
we will likely get if we use it a sample statistic to estimate a
population parameter.
42
EXAMINING SAMPLING DISTRIBUTIONS
Population A is distributions with mean ()

and standard deviation ().
InB we take ongoing samples of 10 and

calculate mean & standard deviation for
each sample.
The sample means would not be the same
as the population.
B.
C is a distribution of all the means from

every sample taken.
This distribution is called sampling
distribution of the mean.
C.
A.
43
SAMPLING DISTRIBUTION OF THE MEAN
The sampling distribution has a mean equal to the population

mean ( X = ).
The sampling distribution has a standard deviation (a standard error) equal

to the population standard deviation divided by the square root of the
sample size ( X
).
n
The sampling distribution is normally distributed
The equation for the standard error (standard deviation) of the mean
for an infinite population is:
44
CENTRAL LIMIT THEOREM
Mean of sampling distribution will equal the population mean

even if the population mean is non-normal (regardless of
sample size).
Relationship between the shape of the population distribution
and the shape of the sampling distribution of the mean is
called the Central Limit Theorem.
This theorem is perhaps the most important theorem in all
statistical inference and is the basis upon which control charts
work.
Assures us that the form of the distribution of sample means
approaches the form of the normal distribution if the sample
size increases.
45
CENTRAL LIMIT THEOREM

S Sigma Training
What this means is that...
If I have a group of data, which
its distribution shape is any
form:
And you create subgroups out of that data :
The distribution of the averages of those

subgroups will always be
A narrower and more normally shaped
distribution.
4
If you have a group of data which its distribution shape is any

form and you create subgroups out of that data, the
distribution of the averages of those subgroups will always
be a narrower and more normally shaped distribution.
46
GRAPHICAL EXERCISE
Turn to the population graphical
exercise located on the next page of
your student manual and proceed as
follows:
1. Select 2 dots (at random).
2. Using the selected dots, draw a
new dot in-between the two.
3. Repeat steps 1 and 2 until all
preprinted dots are used only
once.
4. Circle the new dots, ignoring the
original dots.
47
GRAPHICAL EXERCISE
Questions:
1.
Is the spread of the new population different from the original?
2.
What about the shape?
3.
What differences are there between the original population of
dots and the population resulting from the subgroup?
48
CONTROL CHART EXERCISE

This exercise looks at the effects of the central limit theorem on 2 different SPC charts using the same data.
1.
2.
Open-up Cenlimit.mtw, located in your Gbdata file.

Perform 3 analysis:
A. Choose Stat > Control Charts > Variables Charts for Individuals > Individuals.
Variables: select C1
B. Choose Stat > Control Charts > Variables Charts for Subgroups > Xbar.
All observations for a chart are in one column
Select C1 Output
Subgroup sizes: enter 5
Select Xbar Options
Select Storage tab
Select Point Plotted stores the subgroup mean in worksheet for analysis
C.
3.
Choose Stat > Basic Statistics > Display Descriptive Statistics
In field Variables: select C1 Output and C2 PPOI1
Click Graphs
Uncheck First quartile and Third Quartile
Students are to investigate the upper and lower control limits.
How do they compare?
Why the difference; after all, its the same data?

Provide enough time for students to review and allow student to display and discuss their opinions of results
before going to next slide!
49
RESULTS OF CONTROL CHART EXERCISE

Xbar Chart of Output
I Chart of Output
100
UCL=80.70
80
UCL=96.59
90
70
_
_
X=68.28
65
Individual Value
Sample Mean
75
80
_
X=68.28
70
60
50
60
LCL=55.86
55
3
12
15
18
Sample
21
24
27
30
40
LCL=39.97
1
15
30
45
60
75
90
Observation
105
120
135
150
Control limits are tighter on the x bar chart.

Standard deviation is smaller on x bar chart than individual chart
Variable
Output
PPOI1
N
150
30
N*
0
0
Descriptive Statistics
Mean SE Mean StDev Minimum
68.280
0.776 9.498
43.000
68.280
0.858 4.701
58.000
Median
68.000
67.600
Maximum
92.000
80.800
50
TEAM EXERCISE-CENTRAL LIMIT

APPLICATION
Working as a team, you will analyze 2 different populations and 2 datasets containing the means of the subgroups from the 2 populations. Be
prepared to display and discuss results.
DO NOT LOOK AT THE RESULTS IN YOUR STUDENT MANUAL
UNTIL YOU COMPLETE THE EXERCISE.
1. Open the file Cltest.mtw located in Gbdata.
2. Analyze columns C1 and C2 against C7 and C8.
Note: using column C5 as a subgroup reference, 2 data-sets
containing the mean of the subgroups were created (mean 1 and 2).
3. Use a flipchart or computer to investigate, note and report:
Mean and standard deviation of these groups; what is the
difference?
What is the relation between the individual standard deviation (C1
and C2) and the means standard deviation (C7 and C8).
Create a normal probability plot for both sets and compare.
51
TEAM EXERCISE RESULTS

Session Window:
Descriptive Statistics: Dist1, Dist2, Mean1, Mean2
Variable
Dist1
Dist2
Mean1
Mean2
N
500
500
100
100
N*
0
0
0
0
Mean
0.90016
0.90005
0.90016
0.90005
SE Mean
0.00445
0.00291
0.00408
0.00311
StDev
0.09952
0.06497
0.04082
0.03106
Minimum
0.56399
0.62989
0.79356
0.82731
Median
0.89696
0.91351
0.90219
0.90221
Maximum
1.24185
0.99842
0.97541
0.97392
52

DIST1 vs. MEAN1 - Normal Plot
Probability Plot of Mean1
Normal
Normal
99.9
Mean
StDev
N
AD
P-Value
99
80
70
60
50
40
30
20
0.9002
0.09952
500
0.213
0.852
95
90
80
70
60
50
40
30
20
10
10
0.1
Mean
StDev
N
AD
P-Value
99
Percent
Percent
95
90
99.9
0.1
0.5
0.6
0.7
0.8
0.9
Dist1
1.0
1.1
1.2
1.3
P VALUE = .852
0.80
0.85
0.90
Mean1
0.95
1.00
1.05
P VALUE = .348
53
0.9002
0.04082
100
0.404
0.348

DIST2 vs. MEAN2 - Normal Plot
Probability Plot of Mean2
Normal
Normal
99.9
Mean
StDev
N
AD
P-Value
99
80
70
60
50
40
30
20
0.9001
0.06497
500
10.132
<0.005
95
90
80
70
60
50
40
30
20
10
10
0.1
Mean
StDev
N
AD
P-Value
99
Percent
Percent
95
90
99.9
0.1
0.6
0.7
0.8
0.9
1.0
1.1
Dist2
P VALUE = <0.005
0.80
0.85
0.90
Mean2
0.95
1.00
P VALUE = .232
54
0.9001
0.03106
100
0.478
0.232
STANDARD NORMAL PROBABILITY

DISTRIBUTION
In Chapter 8, you were introduced to the standard normal table for determining
the area under the normal curve and how to determine a Sigma level (Z value).
We can also use the normal table to compute the probability (area under the
curve) of being within a certain distance (ie. Spec limits) from the mean in units
of standard deviation (Z values).
Z standard transform equation produces a value from a distribution where
mean=0 and =1.
(X X )
Z
Z value indicates how far the number isfrom the mean in units of standard
deviations (Z).
For estimating a process yield, we can substitute the upper and lower spec
limit for X in the equation. We can calculate the proportion of product that is
out-of-spec.
55
Z TRANSFORM EXAMPLE
Lets determine estimates of the proportion of the normal curve that is

outside of the upper and lower specs where:
Mean = 1.03
= .0573
LSL = .90
USL = 1.10
These spec limits are displayed below.
56
Z TRANSFORM EXAMPLE
To perform the calculations, lets proceed as follows.
1. Calculate the Z score for each specification limit (upper and lower).
Z
( LSL X )
(USL X )
(.9 1.03)
.0573
(1.1 1.03)
.0573
Z 2.27
Z 1.22
2. Calculate the areas below the lower specification and above the upper
specification using the normal table.
Table A (Area Under the Standardized Normal Curve) located in the Gbdata
file gives us an area of .0116 (1.16%) for a Z value of 2.27 (disregard the
negative sign) and it gives us an area of .1112 (11.12%) for a Z value of 1.22.
If we add these 2 area under the curve together, we get 12% (.0116 + .1112 =
.1228 or 12%).
57
Z TRANSFORM EXAMPLE
This is shown graphically below.
.0116
.1112
+
.9
LSL
Z=-2.27
1.03
x
1.1
USL
Z=1.22
58
Z TRANSFORM EXAMPLE - cont.
1. Determine the sigma level (Z score).

To calculate the sigma level (Z score) for this process, we proceed as
follows.
Add the percentages (areas) of the upper and lower specifications limits
(.0116 + .1112 = .1228).
Using Minitabs Inverse Cumulative Distribution function, we will convert
the area outside of the specification to a Z score. This means, calculate
the Z score (sigma level) for .1228 based upon the standard normal
distribution that has a mean = 0 and a = 1.
Note: You could also use any of the other previous methods reviewed to
determine the sigma value (i.e.; normal table, etc).
59

To calculate a Z score (Sigma Level) for .1228, proceed as follows.
Calc>Probability Distributions>Normal
Click on Inverse Cumulative Probability with mean =0, standard
deviation=1
Click on input constant and enter .1228
Click OK
Result:
P (X<=X) = .1228
X = -1.1611 (Sigma Level)
60

The Sigma level of -1.1611 is graphically displayed below.
.1228
-1.161
61
INDIVIDUAL Z TRANSFORM EXERCISE

Now its your turn! Given the following situation, determine the
required probability using the Z transform. Be prepared to explain
and display your results.
DO NOT LOOK AT THE RESULTS IN YOUR
STUDENT MANUAL UNTIL YOU HAVE
COMPLETED THE EXERCISE!
The Finance Director for a company claims that the sum of the monthly
customer payments in millions of dollars received by Accounts
Payable on the first day of the month is normally distributed with a mean of
$10.1 and a standard deviation of $2.6.
A. Find the probability that, on the first day of a randomly selected month, the
payment receipts would be less than $6 million.
B. Find the probability that, on the first day of a randomly selected month, the
payment receipts would be between $6 million and $14 million.
62
RESULTS OF INDIVIDUAL
Z TRANSFORM EXERCISE
Results of A are as follows:
First, calculate the Z value for X = $6 million.
(X X )
6 10.1
Z
2.6
Where; Z
Z 1.57
Next, referring to Table A (Gbdata file), Z = -1.57 is equal to an area of

.0582 or 5.82%.
63
The area below is between Z=-1.57 and left-hand tail
(disregard minus sign).
.0582
+
6
Z=-1.57
10.1
64
Results of B are as follows:
First, calculate the Z values for X = $6 million.
Z = -1.57 (as indicated in A)
Then, calculate the Z value for X = $14 million as follows.
Z
(X X )
(14 10.1)
2.6
Z 1.5
Next, using Table A, calculate the probabilities for each of the Z values.
Z = -1.57 is equal to an area of .0582. This is the area between Z = -1.57 and
the left-hand tail.
Z = 1.5 is equal to an area of .0668. This is the area between Z = 1.5 and the
right-hand tail.
65
Lastly, add the 2 probabilities together and subtract from 1 to determine
the area between Z = -1.57 and Z = 1.5.
.0582 (area between Z = -1.57 and the left-hand tail)
.0668 (area between . Z = 1.5 and the right-hand tail)
.1250 (total area below Z = -1.57 and above Z = 1.5)
.875
1 - .1250 = .875 or 87.5% (area between Z = -1.57 and Z = 1.5)

This is depicted graphically as shown.
.0582
.0668
+
6
Z=-1.57
10.1
14
Z=1.5
66

BASIC STATISTICS TOOLKIT

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BASIC STATISTICS TOOLKIT

Uploaded by

Copyright:

Available Formats

BASIC STATISTICS

Sample is a portion (subset) of a larger population from which

X-Bar C ha rt for P rocess B

OLD VERSUS NEW WAY OF

DATA ANALYSIS TASK

Our data analysis task is to:

MEASURE OF CENTRAL TENDENCY

A sample yields an estimate X for the true mean of a

Represents most frequently observed data of a sample.

Normal distribution mean, median & mode

When data is skewed to

Minitab can easily calculate the mean & median as follows:

Point out Mean, Median, Minimum and Maximum

Minitab can also display a histogram of these same

Histogram of Pos Skew

Difference between highest & lowest observed values.

Variance is the square of the units (squared dollars) which is

In these cases we can say that:

If Total = variance of the process output;

NORMAL PROBABILITY TEST

We can test whether a given data

NORMAL PROBABILITY TEST

Open-up Distskew.mtw located in the Gbdata file and

Normality Test Dialog Window

Note: We use Anderson-Darling test.

NORMAL PROBABILITY TEST RESULTS

Probability Plot of Neg Skew

Probability Plot of Pos Skew

What are the results?

P-Value is less than 0.05; distribution is nonnormal.

Next, create a histogram and descriptive statistics

Graphical Summary Dialog Window

95% Confidence Interval for Mean

95% Confidence Interval for Median

95% Confidence Interval for StDev

95% Confidence Intervals

HANDLING NONNORMAL DATA

Nonnormal distribution is common for some measurements. Minitab can

First, you should attempt to determine the cause(s) of nonnormal data.

HANDLING NONNORMAL DATA

Prior to using any data we analyze the normality.

NONNORMAL DATA EXERCISE

Open New Project (close without save)

Probability Plot of Dist3

Results: P value < 0.005 indicates data is nonnormal. As a result we will

INDIVIDUAL DISTRIBUTION IDENTIFICATION

Exponential Distribution Model

Exp. Overall Performance = 37000.5

We are able to predict the long term process capability.

Note: Short term capability is not calculated for nonnormal data.

EXAMINING SAMPLING DISTRIBUTIONS

Population A is distributions with mean ()

InB we take ongoing samples of 10 and

C is a distribution of all the means from

SAMPLING DISTRIBUTION OF THE MEAN

The sampling distribution has a mean equal to the population

The sampling distribution has a standard deviation (a standard error) equal

The sampling distribution is normally distributed

CENTRAL LIMIT THEOREM

Mean of sampling distribution will equal the population mean

CENTRAL LIMIT THEOREM

And you create subgroups out of that data :

The distribution of the averages of those