You are on page 1of 10

2/12/2013

Compareanormalcurvebasedonthemeanandstandard
deviationandtheactualdata.

TestingforNormality

Aretheactualdatastatisticallydifferentthanthecomputed
normalcurve?

Thereareseveralmethodsofassessingwhetherdataare
normallydistributedornot.Theyfallintotwobroadcategories:
graphical andstatistical.Themostcommonare:

QQplotsdisplaytheobservedvaluesagainstnormally
distributeddata(representedbytheline).

Graphical
QQprobabilityplots
Cumulativefrequency(PP)plots
Statistical
KolmogorovSmirnovtest
W/Stest
ShapiroWilkstest

Normallydistributeddatafallalongtheline.

2/12/2013

Statisticaltestsfornormalityaremoreprecisesinceactual
probabilitiesarecalculated.
Tests of Normality
a

Age

Kolmogorov-Smirnov
Statistic
df
Sig.
.110
1048
.000

Statistic
.931

Shapiro-Wilk
df
1048

Sig.
.000

a. Lilliefors Significance Correction

Thehypothesesusedare:

Tests of Normality
a

TOTAL_VALU

Kolmogorov-Smirnov
Statistic
df
Sig.
.283
149
.000

Statistic
.463

Shapiro-Wilk
df
149

Sig.
.000

a. Lilliefors Significance Correction

H0:Thesampledataarenotsignificantlydifferentthananormal
population.

Tests of Normality
a

Z100

TheKolmogorovSmirnovandShapiroWilkstestsfornormality
calculatetheprobabilitythatthesamplewasdrawnfromanormal
population.

Kolmogorov-Smirnov
Statistic
df
Sig.
.071
100
.200*

Statistic
.985

Shapiro-Wilk
df
100

Sig.
.333

Ha:Thesampledataaresignificantlydifferentthananormal
population.

*. This is a lower bound of the true significance.


a. Lilliefors Significance Correction

Typically,weareinterestedinfindingadifferencebetweengroups.
Whenweare,welookforsmallprobabilities.
Iftheprobabilityoffindinganeventisrare(lessthan5%)and
weactuallyfindit,thatisofinterest.
Whentestingnormality,wearenotlookingforadifference.
Ineffect,wewantourdatasettobeNODIFFERENTthan
normal.
Sowhentestingfornormality:
Probabilities>0.05meanthedataarenormal.

KolmogorovSmirnovNormalityTest
Basedoncomparingtheobservedfrequenciesandtheexpected
frequencies.
Expectedfrequenciesareestimatedusingzscores.
Cumulativeexpectedfrequenciesaredeterminedas:
Fornegativezscorestakethenumberdirectlyfromtheztable.
Forpositivezscoresuse(1 thezscorefromthetable).
KolmogorovSmirnovDstatistic:
Di rel Fi rel Fi
and

Useabsolutevalues.

Di' rel Fi 1 rel Fi

Probabilities<0.05meanthedataareNOTnormal.

2/12/2013

ForDiscreteData:

Height
(in)
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

Observed
Frequency
0
2
2
3
5
4
6
5
8
7
7
10
6
3
2
0
0

Cumulative
Observed
Frequency
0
2
4
7
12
16
22
27
35
42
49
59
65
68
70
70
70

(0 / 70)
(2 / 70)
(4 / 70)
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Cumulative
Relative Observed
Frequency (Fi)
0.0000
0.0286
0.0571
0.1000
0.1714
0.2286
0.3143
0.3857
0.5000
0.6000
0.7000
0.8429
0.9286
0.9714
1.0000
1.0000
1.0000

Mean=70.17
Standarddeviation=3.31
n=70

Height
(in)
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

Cumulative
Observed Observed
Frequency Frequency
0
0
2
2
2
4
3
7
5
12
4
16
6
22
5
27
8
35
7
42
7
49
10
59
6
65
3
68
2
70
0
70
0
70

Mean=70.17
Standarddeviation=3.31
n=70

Important:Sortthedata
fromsmallesttolargest.

These are also called the


cumulative percentages and
will be used later.

Cumulative
Relative
Observed
Frequency
(Fi)
0.0000
0.0286
0.0571
0.1000
0.1714
0.2286
0.3143
0.3857
0.5000
0.6000
0.7000
0.8429
0.9286
0.9714
1.0000
1.0000
1.0000

Height
Z-score
-2.47
-2.17
-1.86
-1.56
-1.26
-0.96
-0.66
-0.35
-0.05
0.25
0.55
0.85
1.16
1.46
1.76
2.06
2.37

Probability
from Z-Table
0.0068
0.0150
0.0314
0.0594
0.1038
0.1736
0.2546
0.3632
0.4801
0.4013
0.2912
0.1977
0.1230
0.0721
0.0392
0.0197
0.0089

.
.
.
.
.
.
.
.
.
(1 0.4013)
(1 0.2912)
(1 0.1977)
(1 0.1230)
(1 0.0721)
(1 0.0392)
(1 0.0197)
(1 0.0089)

Cumulative
Relative
Expected
Frequency
(F-hati)
0.0068
0.0150
0.0314
0.0594
0.1038
0.1736
0.2546
0.3632
0.4801
0.5987
0.7088
0.8023
0.8770
0.9279
0.9608
0.9803
0.9911

Remembertouse(1 p)forall
positivezscores.Only use(1p)
whencalculatingcumulative
probabilities!

Height
(in)
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

Observed
Frequency
0
2
2
3
5
4
6
5
8
7
7
10
6
3
2
0
0

Cumulative
Observed
Frequency
0
2
4
7
12
16
22
27
35
42
49
59
65
68
70
70
70

Cumulative
Relative
Observed
Frequency
(Fi)
0.0000
0.0286
0.0571
0.1000
0.1714
0.2286
0.3143
0.3857
0.5000
0.6000
0.7000
0.8429
0.9286
0.9714
1.0000
1.0000
1.0000

Z-score
Calculation
(62 70.17) / 3.31
(63 70.17) / 3.31
(64 70.17) / 3.31
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Height
Z-score
-2.47
-2.17
-1.86
-1.56
-1.26
-0.96
-0.66
-0.35
-0.05
0.25
0.55
0.85
1.16
1.46
1.76
2.06
2.37

Mean=70.17
Standarddeviation=3.31
n=70

Height
(in)
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

Observed
Frequency
0
2
2
3
5
4
6
5
8
7
7
10
6
3
2
0
0

Cumulative
Observed
Frequency
0
2
4
7
12
16
22
27
35
42
49
59
65
68
70
70
70

Mean=70.17
Standarddeviation=3.31
n=70

Cumulative
Relative
Observed
Frequency (Fi)
0.0000
0.0286
0.0571
0.1000
0.1714
0.2286
0.3143
0.3857
0.5000
0.6000
0.7000
0.8429
0.9286
0.9714
1.0000
1.0000
1.0000

Cumulative Relative
Expected Frequency
(F-hati)
0.0068
0.0150
0.0314
0.0594
0.1038
0.1736
0.2546
0.3632
0.4801
0.5987
0.7088
0.8023
0.8770
0.9279
0.9608
0.9803
0.9911

Di
0.0068
0.0136
0.0257
0.0406
0.0676
0.0550
0.0597
0.0225
0.0199
0.0013
0.0088
0.0406
0.0516
0.0435
0.0392
0.0197
0.0089

Di rel Fi rel Fi
and
Di' rel Fi 1 rel Fi

2/12/2013

Height
(in)
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

Observed
Frequency
0
2
2
3
5
4
6
5
8
7
7
10
6
3
2
0
0

Cumulative
Observed
Frequency
0
2
4
7
12
16
22
27
35
42
49
59
65
68
70
70
70

Mean=70.17
Standarddeviation=3.31
n=70

Cumulative
Relative
Observed
Frequency (Fi)
0.0000
0.0286
0.0571
0.1000
0.1714
0.2286
0.3143
0.3857
0.5000
0.6000
0.7000
0.8429
0.9286
0.9714
1.0000
1.0000
1.0000

Cumulative Relative
Expected Frequency
(F-hati)
0.0068
0.0150
0.0314
0.0594
0.1038
0.1736
0.2546
0.3632
0.4801
0.5987
0.7088
0.8023
0.8770
0.9279
0.9608
0.9803
0.9911

Di
0.0068
0.0136
0.0257
0.0406
0.0676
0.0550
0.0597
0.0225
0.0199
0.0013
0.0088
0.0406
0.0516
0.0435
0.0392
0.0197
0.0089

Di
0.0068
0.0150
0.0028
0.0023
0.0038
0.0022
0.0260
0.0489
0.0944
0.0987
0.1088
0.1023
0.0341
0.0007
0.0106
0.0197
0.0089

Di rel Fi rel Fi
and
Di' rel Fi 1 rel Fi

Height
(in)
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

Observed
Frequency
0
2
2
3
5
4
6
5
8
7
7
10
6
3
2
0
0

Cumulative
Observed
Frequency
0
2
4
7
12
16
22
27
35
42
49
59
65
68
70
70
70

Mean=70.17
Standarddeviation=3.31
n=70

Cumulative
Relative
Observed
Frequency (Fi)
0.0000
0.0286
0.0571
0.1000
0.1714
0.2286
0.3143
0.3857
0.5000
0.6000
0.7000
0.8429
0.9286
0.9714
1.0000
1.0000
1.0000

Cumulative Relative
Expected Frequency
(F-hati)
0.0068
0.0150
0.0314
0.0594
0.1038
0.1736
0.2546
0.3632
0.4801
0.5987
0.7088
0.8023
0.8770
0.9279
0.9608
0.9803
0.9911

Di
0.0068
0.0136
0.0257
0.0406
0.0676
0.0550
0.0597
0.0225
0.0199
0.0013
0.0088
0.0406
0.0516
0.0435
0.0392
0.0197
0.0089

Di
0.0068
0.0150
0.0028
0.0023
0.0038
0.0022
0.0260
0.0489
0.0944
0.0987
0.1088
0.1023
0.0341
0.0007
0.0106
0.0197
0.0089

Di Max=0.0676Di Max=0.1088
D=0.1088(Usethelargerofthe2values)

Our probability falls


within this range.

Our calculated D value


falls about here.

2/12/2013

RememberthatLARGEprobabilitiesdenotenormallydistributeddata.

Dcritical =0.106
D=0.108

Normally Distributed Data


a

Since0.108>0.106rejectHo.
Thesampledatasetissignificantlydifferentthannormal(D0.05,0.108,0.05>p>0.025)

Kolmogorov-Smirnov
Shapiro-Wilk
Statistic
df
Sig.
Statistic
df
Asthma Cases
.069
72
.200*
.988
72

Sig.
.721

*. This is a lower bound of the true significance.


a. Lilliefors Significance Correction

FromSPSS:
Tests of Normality
a

VAR00001

Kolmogorov-Smirnov
Statistic
df
Sig.
.110
70
.036

Statistic
.965

Non-Normally Distributed Data


Shapiro-Wilk
df
70

Sig.
.045

a. Lilliefors Significance Correction

Average PM10

Kolmogorov-Smirnov
Statistic
df
Sig.
.142
72
.001

Statistic
.841

Shapiro-Wilk
df
72

Sig.
.000

Shapiro-Wilk
df
72

Sig.
.000

a. Lilliefors Significance Correction

Normally Distributed Data

Non-Normally Distributed Data


a

Kolmogorov-Smirnov
Shapiro-Wilk
Statistic
df
Sig.
Statistic
df
Asthma Cases
.069
72
.200*
.988
72

Sig.
.721

*. This is a lower bound of the true significance.

Average PM10

Kolmogorov-Smirnov
Statistic
df
Sig.
.142
72
.001

Statistic
.841

a. Lilliefors Significance Correction

a. Lilliefors Significance Correction

Inthiscasetheprobabilitiesaregreaterthan0.05(thetypicalalpha
level),soweacceptH0thesedataarenotdifferentfromnormal.

Inthiscasetheprobabilitiesarelessthan0.05(thetypicalalphalevel),
sowerejectH0thesedataaresignificantlydifferentfromnormal.
Important:Asthesamplesizeincreases,normalityparametersbecomes
MORErestrictiveanditbecomeshardertodeclarethatthedataare
normallydistributed.

2/12/2013

ForContinuousData:

Village
Aranza
Corupo
SanLorenzo
Cheranatzicurin
Nahuatzen
Pomacuaran
Sevina
Arantepacua
Cocucho
Charapan
Comachuen
Pichataro
Quinceo
Nurio
Turicuaro
Urapicho
Capacuaro

ForContinuousData:

Observed
Population Cumulative
Density
Frequency
4.13
0.0588
4.53
0.1176
4.69
0.1764
4.76
0.2352
4.77
0.2940
4.96
0.3528
4.97
0.4116
5.00
0.4704
5.04
0.5292
5.10
0.5880
5.25
0.6468
5.36
0.7056
5.94
0.7644
6.06
0.8232
6.19
0.8820
6.30
0.9408
7.73
0.9996

Expected
Z
Cumulative
Zscore Probability Frequency
1.40
0.0808
0.0808
0.94
0.1736
0.1736
0.75
0.2266
0.2266
0.67
0.2514
0.2514
0.66
0.2546
0.2546
0.44
0.3300
0.3300
0.43
0.3336
0.3336
0.39
0.3483
0.3483
0.35
0.3632
0.3632
0.28
0.3897
0.3897
0.10
0.4602
0.4602
0.02
0.4920
0.5080
0.69
0.2451
0.7549
0.83
0.2033
0.7967
0.98
0.1635
0.8365
1.11
0.1335
0.8665
2.76
0.0029
0.9971

Di
0.0220
0.0560
0.0502
0.0162
0.0394
0.0228
0.0780
0.1221
0.1660
0.1983
0.1866
0.1976
0.0095
0.0265
0.0455
0.0743
0.0025

D'i
0.0808
0.1148
0.1090
0.0750
0.0194
0.0360
0.0192
0.0633
0.1072
0.1395
0.1278
0.1388
0.0493
0.0323
0.0133
0.0155
0.0563

Fortheobservedcumulativefrequencysimply
dividetheobservationbythesamplesize,sofor
Aranza1/17=0.0588,forCorupo0.0588+0.0588=
0.1176,forSanLorenzo0.0588+0.0588+0.0588=
0.1764,etc

Mean=5.34
SD=0.866
N=17

Village
Aranza
Corupo
SanLorenzo
Cheranatzicurin
Nahuatzen
Pomacuaran
Sevina
Arantepacua
Cocucho
Charapan
Comachuen
Pichataro
Quinceo
Nurio
Turicuaro
Urapicho
Capacuaro

Observed
Population Cumulative
Density
Frequency
4.13
0.0588
4.53
0.1176
4.69
0.1764
4.76
0.2352
4.77
0.2940
4.96
0.3528
4.97
0.4116
5.00
0.4704
5.04
0.5292
5.10
0.5880
5.25
0.6468
5.36
0.7056
5.94
0.7644
6.06
0.8232
6.19
0.8820
6.30
0.9408
7.73
0.9996

Expected
Z
Cumulative
Zscore Probability Frequency
1.40
0.0808
0.0808
0.94
0.1736
0.1736
0.75
0.2266
0.2266
0.67
0.2514
0.2514
0.66
0.2546
0.2546
0.44
0.3300
0.3300
0.43
0.3336
0.3336
0.39
0.3483
0.3483
0.35
0.3632
0.3632
0.28
0.3897
0.3897
0.10
0.4602
0.4602
0.02
0.4920
0.5080
0.69
0.2451
0.7549
0.83
0.2033
0.7967
0.98
0.1635
0.8365
1.11
0.1335
0.8665
2.76
0.0029
0.9971

Di
0.0220
0.0560
0.0502
0.0162
0.0394
0.0228
0.0780
0.1221
0.1660
0.1983
0.1866
0.1976
0.0095
0.0265
0.0455
0.0743
0.0025

D'i
0.0808
0.1148
0.1090
0.0750
0.0194
0.0360
0.0192
0.0633
0.1072
0.1395
0.1278
0.1388
0.0493
0.0323
0.0133
0.0155
0.0563

Di Max=0.1983Di Max=0.1395
Mean=5.34
SD=0.866

D=0.1983(Usethelargerofthe2values)

Dcritical =0.207(fromthetable)
D=0.1983(computed)
Since0.1983<0.207acceptHo.
Thesampledatasetisnotsignificantlydifferentthannormal
(D0.05,0.1983,0.10>p>0.05)
FromSPSS:
Tests of Normality
a

VAR00001

Kolmogorov-Smirnov
Statistic
df
Sig.
.197
17
.077

Statistic
.883

Shapiro-Wilk
df
17

Sig.
.035

a. Lilliefors Significance Correction

NoticethatthereisdisagreementbetweentheKolmogorovSmirnovandtheShapiro
Wilktests.

2/12/2013

W/STestforNormality
Afairlysimpletestthatrequireonlythesamplestandard
deviationandthedatarange.
Basedontheqstatistic,whichisthestudentizedrange,
ortherangeexpressedinstandarddeviationunits.
ShouldnotbeconfusedwiththeShapiroWilkstest.

w
s

whereq istheteststatistic,w istherangeofthedataands is


thestandarddeviation.

Village
Aranza
Corupo
SanLorenzo
Cheranatzicurin
Nahuatzen
Pomacuaran
Sevina
Arantepacua
Cocucho
Charapan
Comachuen
Pichataro
Quinceo
Nurio
Turicuaro
Urapicho
Capacuaro

Population
Density
4.13
4.53
4.69
4.76
4.77
4.96
4.97
5.00
5.04
5.10
5.25
5.36
5.94
6.06
6.19
6.30
7.73

Standarddeviation(s)=0.866
Range(w)=3.6
n=17

w
s

3.6
4.16
0.866

qCritical Range 3.06 to 4.31

TheW/Stestusesacriticalrange.IFthecalculatedvaluefallsWITHINtherange,
thenacceptHo.IFthecalculatedvaluefallsoutsidetherangethenrejectHo.
Since4.16fallsbetween3.06and4.31,thenweacceptHo.

Sincewehaveacriticalrange,itisdifficulttodetermineaprobability
rangeforourresults.Thereforewesimplystateouralphalevel.
Thesampledatasetisnotsignificantlydifferentthannormal
(W/S4.16,p>0.05).

2/12/2013

DAgostinoTestforNormality
Averypowerfultestfordeparturesfromnormality.
BasedontheDstatistic,whichgivesanupperandlowercritical
value.
D

T
3

n SS

where

n 1
T i
Xi
2

whereD istheteststatistic,SS isthesumofsquaresofthedata


andn isthesamplesize,andi istheorderorrankofobservation
X.
Firstthedataareorderedfromsmallesttolargestorlargestto
smallest.

Usethenextlower
n onthetableifyour
samplesizeisNOT
listed.

Population
Village
Density
Aranza
4.13
Corupo
4.53
SanLorenzo
4.69
Cheranatzicurin
4.76
Nahuatzen
4.77
Pomacuaran
4.96
Sevina
4.97
Arantepacua
5.00
Cocucho
5.04
Charapan
5.10
Comachuen
5.25
Pichataro
5.36
Quinceo
5.94
Nurio
6.06
Turicuaro
6.19
Urapicho
6.30
Capacuaro
7.73

Mean
i
Deviates2
1
1.46410
2
0.65610
3
0.42250
4
0.33640
5
0.32490
6
0.14440
7
0.13690
8
0.11560
9
0.09000
10
0.05760
11
0.00810
12
0.00040
13
0.36000
14
0.51840
15
0.72250
16
0.92160
17
5.71210
Mean=5.34 SS=11.9916

(4.13 5.34) 2 1.212 1.46410

n 1 17 1

9
2
2
T (i 9) X 1
T (1 9)4.13 (2 9)4.53 (3 9)4.69 (17 9)7.73
T 63.23
D

63.23
(17 3 )(11.9916)

0.26050

DCritical 0.2587,0.2860

Ifthecalculatedvaluefallswithinthecriticalrange,acceptHo.
Since0.2587<D=0.26050< 0.2860acceptHo.
Thesampledatasetisnotsignificantlydifferentthannormal(D0.26050,p>0.05).

Values of D as Capacuaros
population density is increased.

Values of D as Aranzas
population density is decreased.

2/12/2013

WhichnormalitytestshouldIuse?

DecreasingAranzas
populationdensity
slightlymadethedata
setmore symmetrical.

KolmogorovSmirnov:
Notsensitivetoproblemsinthetails.
Worksreasonablywellwithdatasets<50.
ShapiroWilks:
Doesn'tworkwellifseveralvaluesinthedatasetarethesame.
Worksbestfordatasetswith>50,butcanbeusedwithsmaller
datasets.
ThebetterofthetwoSPSStests.
W/S:
Simple,buteffective.
NotavailableinSPSS.
DAgostino:
Probablythemostpowerfulofallthenormalitytests.
NotavailableinSPSS.

Normalitytestsunderdifferingconditions
KolmogorovSmirnov
Statistic Prob

Shapiro-Wilkes
Statistic
Prob

Data Type

Random Normal

30

0.116

0.200

0.987

0.962

Random Normal

100

0.059

0.200

0.986

0.382

Random Normal

1000

0.019

0.200

0.998

0.386

Random Normal

2000

0.018

0.120

0.999

0.154

Different normality tests produce vastly different probabilities. This is due


to where in the distribution (central, tails) or what moment (skewness, kurtosis)
they are examining.
Q-QPlots

NormalityTest
AndersonDarling
CramervonMises
ShapiroFrancia
ShapiroWilk
KolmogorovSmirnov
Pearsonchisquare

Statistic
A
W
W
W
D
P

CalculatedValue
1.0958
0.1776
0.9186
0.9224
0.1577
12.5000

Probability
0.006219
0.009396
0.013690
0.014770
0.023850
0.051700

Noticethatasthesamplesizeincreases,theprobabilitiesdecrease.
Inotherwords,itgetshardertomeetthenormalityassumptionas
thesamplesizeincreasessinceevensmalldifferencesaredetected.

2/12/2013

Whenisnonnormalityaproblem?

FinalWordsConcerningNormalityTesting:

Normalitycanbeaproblemwhenthesamplesizeissmall(<50).

1. SinceitISatest,stateanullandalternatehypothesis.

Highlyskeweddatacreateproblems.

2. Ifyouperformanormalitytest,donotignoretheresults.

Highlyleptokurticdataareproblematic,butnotasmuchas
skeweddata.

3. Ifthedataarenotnormal,usenonparametrictests.
4. Ifthedataarenormal,useparametrictests.

Normalitybecomesaseriousconcernwhenthereisactivityin
thetailsofthedataset.
Outliersareaproblem.

ANDMOSTIMPORTANTLY:
5. Ifyouhavegroupsofdata,youMUSTtesteachgroupfor
normality.

Clumpsofdatainthetailsareworse.

10

You might also like