Professional Documents
Culture Documents
Chapter 3
Describing Data: Numerical
Chap 3-1
Chapter Goals
After completing this chapter, you should be able to:
Chapter Topics
Chapter Topics
(continued)
Central Tendency
Variation
Arithmetic Mean
Range
Median
Interquartile Range
Mode
Variance
Standard Deviation
Coefficient of Variation
Mean
Median
Mode
Midpoint of
ranked values
Most frequently
observed value
x
i1
Arithmetic
average
Arithmetic Mean
x1 x 2 x N
N
N
i1
Population
values
Population size
x
i 1
x1 x 2 x n
Observed
values
Sample size
Arithmetic Mean
(continued)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
1 2 3 4 5 15
3
5
5
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
1 2 3 4 10 20
4
5
5
Median
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Median = 3
n 1
Note that
is not the value of the median, only the
2
position of the median in the ranked data
Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
Review Example
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
$500 K
$300 K
$100 K
$100 K
Review Example:
Summary Statistics
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Mean:
Sum 3,000,000
($3,000,000/5)
= $600,000
Shape of a Distribution
Measures of shape
Symmetric or skewed
Left-Skewed
Symmetric
Right-Skewed
Mean = Median
Measures of Variability
Variation
Range
Interquartile
Range
Variance
Standard
Deviation
Coefficient
of Variation
Same center,
different variation
Range
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12
Range = 14 - 1 = 13
13 14
10
11
12
Range = 12 - 7 = 5
10
11
12
Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Interquartile Range
Interquartile Range
Example:
X
minimum
Q1
25%
12
Median
(Q2)
25%
30
25%
45
Q3
maximum
25%
57
Interquartile range
= 57 30 = 27
70
Quartiles
25%
25%
Q2
25%
Q3
The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
Q2 is the same as the median (50% are smaller, 50% are
larger)
Only 25% of the observations are greater than the third
quartile
Quartile Formulas
Find a quartile by determining the value in the
appropriate position in the ranked data, where
First quartile position:
Q1 = 0.25(n+1)
Q3 = 0.75(n+1)
Quartiles
(n = 9)
Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so
Q1 = 12.5
Population Variance
Population variance:
Where
(x )
i1
= population mean
N = population size
xi = ith value of the variable x
N -1
Sample Variance
Sample variance:
s
2
Where
(x x)
i1
X = arithmetic mean
n = sample size
Xi = ith value of the variable X
n -1
2
(x
)
i
i1
N -1
(x x)
i 1
n -1
Calculation Example:
Sample Standard Deviation
Sample
Data (xi) :
10
12
14
n=8
s
15
17
18
18
24
Mean = x = 16
126
7
4.2426
Measuring variation
Small standard deviation
12
13
14
15
16
17
18
19
20 21
Mean = 15.5
s = 3.338
20 21
Mean = 15.5
s = 0.926
20 21
Mean = 15.5
s = 4.570
Data B
11
12
13
14
15
16
17
18
19
Data C
11
12
13
14
15
16
17
18
19
Chebyshevs Theorem
[ + k]
Is at least
Chebyshevs Theorem
(continued)
Examples:
At least
within
68%
95%
99.7%
Coefficient of Variation
s
CV
x
100%
Comparing Coefficient
of Variation
Stock A:
Average price last year = $50
Standard deviation = $5
s
CVA
x
$5
100%
100% 10%
$50
Stock B:
Average price last year = $100
Standard deviation = $5
s
CVB
x
$5
100%
100% 5%
$100
Both stocks
have the same
standard
deviation, but
stock B is less
variable relative
to its price
Using Excel
Using Excel
(continued)
Click OK
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Weighted Mean
w x
x
w
i1
w 1x1 w 2 x 2 w n x n
wi
fimi
where N fi
i1
i 1
fm
i1
where n fi
i 1
2
f
(m
)
i i
i1
s2
2
f
(m
x
)
i i
i1
n 1
Cov (x , y) xy
(x
i
i1
)(y i y )
Cov (x , y) s xy
(x x)(y y)
i1
n 1
Interpreting Covariance
Cov(x,y) > 0
Cov(x,y) < 0
Cov(x,y) = 0
Coefficient of Correlation
Cov (x , y)
XY
Cov (x , y)
r
sX sY
Features of
Correlation Coefficient, r
Unit free
r = -1
r = -.6
X
Y
r = +1
r=0
r = +.3
r=0
Select
Tools/Data Analysis
Click OK . . .
(continued)
r = .733
There is a relatively
strong positive linear
relationship between
test score #1
and test score #2
y b0 b1 x
sy
Cov(x, y)
b1
r
2
sx
sx
b 0 y b1x
Chapter Summary
Symmetric, skewed