Basic Statistics

Statistical Lingo
You may have come to this

presentation because you
really like statistics, but theres
also the possibility that youd
rather be somewhere else
like maybe playing golf at a
fancy resort or something?
The irony is that sports probably
refers to statistics more than any
other segment of our society.
Statistical Lingo
Virtually everyone has a pretty good
understanding of what is meant by
the word average.
A golfer who shot rounds of 78, 84,
and 87 could compute her average,
or what statisticians would call her
mean ( or x).
NOTE: In general, a Greek letter is
used if an entire populations data is
being checked
but in the case of a sample, the
regular letter is used.
78
75
84
80
85
87
90
Statistical Lingo
Virtually everyone has a pretty good
understanding of what is meant by
the word average.
A golfer who shot games of 78, 84,
and 87 could compute her average,
or what statisticians would call her
mean ( or x), like this:
78
84
+87
249 : 3 = 83
75
80
85
90
Statistical Lingo
Each of these scores deviates from
the average (83) by some amount.
These deviations can be combined
to calculate what is called a
standard deviation.
78
84
+87
249
75
-5
+1
+4
80
85
90
Statistical Lingo
But if we want to calculate the
standard deviation we cant simply
add them up theyll cancel each
other out and well get zero.
On the other hand, squaring the
deviations will prevent that problem.
78
84
+87
249
75
-5 25
+1 1
+4 16
80
85
90
Statistical Lingo
Then we can add the squares up - this
helps to get an estimate of how much
variation is present. (The concept of
adding up squares of differences like
this is called the sum of the squares.)
78
84
+87
249
75
-5 25
+1 1
+4 + 16
42
80
85
90
Statistical Lingo
Then we divide this sum by the number
of scores in the list (N) minus 1. (This is
because we only have a sample of all
this persons golf scores if we had all
of their golf scores we would simply
divide by N.)
78
84
+87
249
75
-5 25
+1 1
+4 + 16
42 : 2 = 21
80
85
90
Statistical Lingo
If we just leave it like this, its called the
variance ( 2 or s2). If we take the square
root (which cancels out the fact that we
squared the deviations earlier) well get
the standard deviation ( or s). (Also,
we divide by 2 because its the number of
data points in the sample minus 1.)
78
84
+87
249
-5 25
+1 1
+4 + 16
42 : 2 = 21
21 = 4.6
75
80
85
90
Statistical Lingo
Another common term is the median.
Its the middle value of the data and is
insensitive to actual values in the set.
Real estate folks might refer to a median
income level for an area its virtually
unaffected by Bill Gates moving into
(or out of) the neighborhood.
78
84
87
75
80
85
90
Statistical Lingo
In a few short slides, weve covered a
number of the most frequently used
statistical terms.
deviation
78
84
+87
249
variance
(s2)
-5 25
+1 1
+4 + 16
42 : 2 = 21
249 / 3 = 83
mean (x)
75
standard
deviation
(s)
21 = 4.6
median
80
85
90
Statistical Lingo
Of course, if you had to manually
compute:
an average
a deviation for each data point
a square of all the deviations
a sum of the squares
a variance
a standard deviation
a median
every time you got some data, things
could get crazy; especially if theres a
lot of data. Thankfully, we have Minitab.
75
80
85
90
Getting Basic Stats From Minitab

1. Enter whatever data you want to analyze into a column in Minitab
2. Click on Stat, then on

Basic Statistics, then on
Display Descriptive
Statistics.

3. In the box labeled Variable, indicate
the column containing the data.
4. Click on the box labeled Graphs.
3.
5. Check Graphical summary.

6. Click OK.
7. Click OK.
4.
5.
7.
6.
Minitab will provide a summary of the data that looks something like this.
Well break this down in pieces to explain all the information displayed.
Does the data fit a normal

distribution well enough to
assume normality?
(p < 0.05, no; p > 0.05 yes)
If a set of data is normally distributed it means that when it is plotted
as a histogram it has a symmetric bell shaped distribution.
Normal
Not Normal
Not Normal
If data is normally distributed, it allows for a number of predictions and

analytical methods that would otherwise not be valid. For example, the
mean and standard deviation can be used to predict the odds of having
values fall within certain ranges (like within specified tolerances).
x (sample) or
(population)
s (sample) or
(population)
Mean: The average value of all the data points. (If calculated using a sample of
data from a population it may be written x, if calculated using all the data in the
population it may be written .)
StDev: The standard deviation of all the data points. It can be thought of as the
average distance that data points are from the mean the larger the standard
deviation, the greater the variation. (If calculated using a sample of data from a
population its usually written s, if calculated using all the data in
the population Its usually written .)
s2 (sample) or
2
(population)
N (sample size)
Variance: Equal to the standard deviation squared.
Skewness: A measure of asymmetry the further from zero, the more skewed the
data. For example, if a distribution has a large tail at the upper end of its distribution,
skewness will likely be positive. Typically, the skewness value will range from negative
3 to positive 3.
Kurtosis: A number reflecting how much the sample data resembles a normal
distribution in shape. A very negative kurtosis indicates a distribution that is flatter than
usual, a very positive kurtosis indicates a distribution that is more peaked than usual.
The kurtosis value is approximately zero for a normal distribution.
N: The number of data points used in the creation of this summary.
Minimum: The lowest value data point in the sample.

1st Quartile: The value which 25% of the data points fall below.
Median: The value which 50% of the data points fall below.
3rd Quartile: The value which 75% of the data points fall below.
Maximum: The highest value data point in the sample.
Confidence Intervals: Because we only gave Minitab a

sample of data from a presumably larger population, it
can only estimate what the entire population is like.
Minitab can help us to understand how good our
estimates of things like the mean (Mu), the standard
deviation (Sigma), and median are.
Minitab does this by calculating an interval within which
it is 95% certain that these parameters actually reside if
the whole population were to be included.
The vertical line part way through each of the red boxes is the calculated
mean (top) and median (bottom) for the sample of data entered.
Around these points, Minitab calculates an interval within which it is 95%
certain that the population mean and median actually reside.
For example, in the case of the top red bar, the vertical line in the middle of
the red bar shows a mean of about 50.6. While this is probably not the EXACT
mean for the population, using the number of data points and the amount of
variation they exhibited it can be estimated with good confidence (95%) that
the mean for the population falls somewhere between 48.9 and 52.3.
Histogram of the data (with

Minitabs best estimate of what
normal curve fits the data best)
NOTE: Data points with values

lower than Q1-1.5(Q3-Q1) or
greater than Q3+1.5(Q3-Q1) are
considered outliers and
appear as individual dots
1st quartile
Median 3rd quartile
The Box and Whisker plot

divides data into quarters
Once you have the basic stats, whats next?

Given a process with a
mean = 83 & std dev = 4.6
69.5
74
68% of the population will be

captured within one standard
deviation of the mean.
95% of the population will be
captured within two standard
deviations of the mean.
83
87.5
92
96.5
68%
34% 34%
78.5
13.5%
13.5%
2.36%
69.5
87.5
68%
74
99.73% of the population will be

captured within three standard
deviations of the mean.
78.5
34% 34%
95%
68%
34% 34%
99.73%
13.5%
92
13.5%
2.36%
96.5

69.5
74
Note that the three items mentioned

(shape, mean, and standard deviation)
help to characterize the process
[or the performance of a process].
Its somewhat like when you ship a box
for overnight delivery: the courier wants
to know the length, width, height, and
weight of the box. That information
characterizes the box for them. In other
words, they know what to expect when
they come to get it.
78.5
83
87.5
92
96.5
68%
34% 34%
78.5
13.5%
68%
74
13.5%
2.36%
69.5
87.5
34% 34%
95%
68%
34% 34%
99.73%
13.5%
92
13.5%
2.36%
96.5

69.5
Understanding a process this well has

some rather powerful implications.
For example, once you know the mean
and standard deviation of a process
thats normally distributed, predicting
the percentage of times something will
fall above or below any given value
(like a tolerance limit, for instance) is
relatively easy.
In other words, we can tell how often
the process will perform properly.
Thats the topic of another tool time:
Process Capability.
74
78.5
83
87.5
92
96.5
68%
34% 34%
78.5
13.5%
68%
74
13.5%
2.36%
69.5
87.5
34% 34%
95%
68%
34% 34%
99.73%
13.5%
92
13.5%
2.36%
96.5

Basic Statistics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Statistics

Uploaded by

Copyright:

Available Formats

Statistical Lingo

You may have come to this

Getting Basic Stats From Minitab

2. Click on Stat, then on

Getting Basic Stats From Minitab

5. Check Graphical summary.

Getting Basic Stats From Minitab

Getting Basic Stats From Minitab

Does the data fit a normal

If data is normally distributed, it allows for a number of predictions and

Getting Basic Stats From Minitab

Getting Basic Stats From Minitab

Getting Basic Stats From Minitab

Minimum: The lowest value data point in the sample.

Getting Basic Stats From Minitab

Confidence Intervals: Because we only gave Minitab a

Getting Basic Stats From Minitab

Getting Basic Stats From Minitab

Histogram of the data (with

NOTE: Data points with values

Median 3rd quartile

The Box and Whisker plot

Once you have the basic stats, whats next?

68% of the population will be

99.73% of the population will be

Once you have the basic stats, whats next?

Note that the three items mentioned

Once you have the basic stats, whats next?

Understanding a process this well has

You might also like