Professional Documents
Culture Documents
Compareanormalcurvebasedonthemeanandstandard
deviationandtheactualdata.
TestingforNormality
Aretheactualdatastatisticallydifferentthanthecomputed
normalcurve?
Thereareseveralmethodsofassessingwhetherdataare
normallydistributedornot.Theyfallintotwobroadcategories:
graphical andstatistical.Themostcommonare:
QQplotsdisplaytheobservedvaluesagainstnormally
distributeddata(representedbytheline).
Graphical
QQprobabilityplots
Cumulativefrequency(PP)plots
Statistical
KolmogorovSmirnovtest
W/Stest
ShapiroWilkstest
Normallydistributeddatafallalongtheline.
2/12/2013
Statisticaltestsfornormalityaremoreprecisesinceactual
probabilitiesarecalculated.
Tests of Normality
a
Age
Kolmogorov-Smirnov
Statistic
df
Sig.
.110
1048
.000
Statistic
.931
Shapiro-Wilk
df
1048
Sig.
.000
Thehypothesesusedare:
Tests of Normality
a
TOTAL_VALU
Kolmogorov-Smirnov
Statistic
df
Sig.
.283
149
.000
Statistic
.463
Shapiro-Wilk
df
149
Sig.
.000
H0:Thesampledataarenotsignificantlydifferentthananormal
population.
Tests of Normality
a
Z100
TheKolmogorovSmirnovandShapiroWilkstestsfornormality
calculatetheprobabilitythatthesamplewasdrawnfromanormal
population.
Kolmogorov-Smirnov
Statistic
df
Sig.
.071
100
.200*
Statistic
.985
Shapiro-Wilk
df
100
Sig.
.333
Ha:Thesampledataaresignificantlydifferentthananormal
population.
Typically,weareinterestedinfindingadifferencebetweengroups.
Whenweare,welookforsmallprobabilities.
Iftheprobabilityoffindinganeventisrare(lessthan5%)and
weactuallyfindit,thatisofinterest.
Whentestingnormality,wearenotlookingforadifference.
Ineffect,wewantourdatasettobeNODIFFERENTthan
normal.
Sowhentestingfornormality:
Probabilities>0.05meanthedataarenormal.
KolmogorovSmirnovNormalityTest
Basedoncomparingtheobservedfrequenciesandtheexpected
frequencies.
Expectedfrequenciesareestimatedusingzscores.
Cumulativeexpectedfrequenciesaredeterminedas:
Fornegativezscorestakethenumberdirectlyfromtheztable.
Forpositivezscoresuse(1 thezscorefromthetable).
KolmogorovSmirnovDstatistic:
Di rel Fi rel Fi
and
Useabsolutevalues.
Probabilities<0.05meanthedataareNOTnormal.
2/12/2013
ForDiscreteData:
Height
(in)
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
Observed
Frequency
0
2
2
3
5
4
6
5
8
7
7
10
6
3
2
0
0
Cumulative
Observed
Frequency
0
2
4
7
12
16
22
27
35
42
49
59
65
68
70
70
70
(0 / 70)
(2 / 70)
(4 / 70)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Cumulative
Relative Observed
Frequency (Fi)
0.0000
0.0286
0.0571
0.1000
0.1714
0.2286
0.3143
0.3857
0.5000
0.6000
0.7000
0.8429
0.9286
0.9714
1.0000
1.0000
1.0000
Mean=70.17
Standarddeviation=3.31
n=70
Height
(in)
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
Cumulative
Observed Observed
Frequency Frequency
0
0
2
2
2
4
3
7
5
12
4
16
6
22
5
27
8
35
7
42
7
49
10
59
6
65
3
68
2
70
0
70
0
70
Mean=70.17
Standarddeviation=3.31
n=70
Important:Sortthedata
fromsmallesttolargest.
Cumulative
Relative
Observed
Frequency
(Fi)
0.0000
0.0286
0.0571
0.1000
0.1714
0.2286
0.3143
0.3857
0.5000
0.6000
0.7000
0.8429
0.9286
0.9714
1.0000
1.0000
1.0000
Height
Z-score
-2.47
-2.17
-1.86
-1.56
-1.26
-0.96
-0.66
-0.35
-0.05
0.25
0.55
0.85
1.16
1.46
1.76
2.06
2.37
Probability
from Z-Table
0.0068
0.0150
0.0314
0.0594
0.1038
0.1736
0.2546
0.3632
0.4801
0.4013
0.2912
0.1977
0.1230
0.0721
0.0392
0.0197
0.0089
.
.
.
.
.
.
.
.
.
(1 0.4013)
(1 0.2912)
(1 0.1977)
(1 0.1230)
(1 0.0721)
(1 0.0392)
(1 0.0197)
(1 0.0089)
Cumulative
Relative
Expected
Frequency
(F-hati)
0.0068
0.0150
0.0314
0.0594
0.1038
0.1736
0.2546
0.3632
0.4801
0.5987
0.7088
0.8023
0.8770
0.9279
0.9608
0.9803
0.9911
Remembertouse(1 p)forall
positivezscores.Only use(1p)
whencalculatingcumulative
probabilities!
Height
(in)
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
Observed
Frequency
0
2
2
3
5
4
6
5
8
7
7
10
6
3
2
0
0
Cumulative
Observed
Frequency
0
2
4
7
12
16
22
27
35
42
49
59
65
68
70
70
70
Cumulative
Relative
Observed
Frequency
(Fi)
0.0000
0.0286
0.0571
0.1000
0.1714
0.2286
0.3143
0.3857
0.5000
0.6000
0.7000
0.8429
0.9286
0.9714
1.0000
1.0000
1.0000
Z-score
Calculation
(62 70.17) / 3.31
(63 70.17) / 3.31
(64 70.17) / 3.31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Height
Z-score
-2.47
-2.17
-1.86
-1.56
-1.26
-0.96
-0.66
-0.35
-0.05
0.25
0.55
0.85
1.16
1.46
1.76
2.06
2.37
Mean=70.17
Standarddeviation=3.31
n=70
Height
(in)
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
Observed
Frequency
0
2
2
3
5
4
6
5
8
7
7
10
6
3
2
0
0
Cumulative
Observed
Frequency
0
2
4
7
12
16
22
27
35
42
49
59
65
68
70
70
70
Mean=70.17
Standarddeviation=3.31
n=70
Cumulative
Relative
Observed
Frequency (Fi)
0.0000
0.0286
0.0571
0.1000
0.1714
0.2286
0.3143
0.3857
0.5000
0.6000
0.7000
0.8429
0.9286
0.9714
1.0000
1.0000
1.0000
Cumulative Relative
Expected Frequency
(F-hati)
0.0068
0.0150
0.0314
0.0594
0.1038
0.1736
0.2546
0.3632
0.4801
0.5987
0.7088
0.8023
0.8770
0.9279
0.9608
0.9803
0.9911
Di
0.0068
0.0136
0.0257
0.0406
0.0676
0.0550
0.0597
0.0225
0.0199
0.0013
0.0088
0.0406
0.0516
0.0435
0.0392
0.0197
0.0089
Di rel Fi rel Fi
and
Di' rel Fi 1 rel Fi
2/12/2013
Height
(in)
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
Observed
Frequency
0
2
2
3
5
4
6
5
8
7
7
10
6
3
2
0
0
Cumulative
Observed
Frequency
0
2
4
7
12
16
22
27
35
42
49
59
65
68
70
70
70
Mean=70.17
Standarddeviation=3.31
n=70
Cumulative
Relative
Observed
Frequency (Fi)
0.0000
0.0286
0.0571
0.1000
0.1714
0.2286
0.3143
0.3857
0.5000
0.6000
0.7000
0.8429
0.9286
0.9714
1.0000
1.0000
1.0000
Cumulative Relative
Expected Frequency
(F-hati)
0.0068
0.0150
0.0314
0.0594
0.1038
0.1736
0.2546
0.3632
0.4801
0.5987
0.7088
0.8023
0.8770
0.9279
0.9608
0.9803
0.9911
Di
0.0068
0.0136
0.0257
0.0406
0.0676
0.0550
0.0597
0.0225
0.0199
0.0013
0.0088
0.0406
0.0516
0.0435
0.0392
0.0197
0.0089
Di
0.0068
0.0150
0.0028
0.0023
0.0038
0.0022
0.0260
0.0489
0.0944
0.0987
0.1088
0.1023
0.0341
0.0007
0.0106
0.0197
0.0089
Di rel Fi rel Fi
and
Di' rel Fi 1 rel Fi
Height
(in)
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
Observed
Frequency
0
2
2
3
5
4
6
5
8
7
7
10
6
3
2
0
0
Cumulative
Observed
Frequency
0
2
4
7
12
16
22
27
35
42
49
59
65
68
70
70
70
Mean=70.17
Standarddeviation=3.31
n=70
Cumulative
Relative
Observed
Frequency (Fi)
0.0000
0.0286
0.0571
0.1000
0.1714
0.2286
0.3143
0.3857
0.5000
0.6000
0.7000
0.8429
0.9286
0.9714
1.0000
1.0000
1.0000
Cumulative Relative
Expected Frequency
(F-hati)
0.0068
0.0150
0.0314
0.0594
0.1038
0.1736
0.2546
0.3632
0.4801
0.5987
0.7088
0.8023
0.8770
0.9279
0.9608
0.9803
0.9911
Di
0.0068
0.0136
0.0257
0.0406
0.0676
0.0550
0.0597
0.0225
0.0199
0.0013
0.0088
0.0406
0.0516
0.0435
0.0392
0.0197
0.0089
Di
0.0068
0.0150
0.0028
0.0023
0.0038
0.0022
0.0260
0.0489
0.0944
0.0987
0.1088
0.1023
0.0341
0.0007
0.0106
0.0197
0.0089
Di Max=0.0676Di Max=0.1088
D=0.1088(Usethelargerofthe2values)
2/12/2013
RememberthatLARGEprobabilitiesdenotenormallydistributeddata.
Dcritical =0.106
D=0.108
Since0.108>0.106rejectHo.
Thesampledatasetissignificantlydifferentthannormal(D0.05,0.108,0.05>p>0.025)
Kolmogorov-Smirnov
Shapiro-Wilk
Statistic
df
Sig.
Statistic
df
Asthma Cases
.069
72
.200*
.988
72
Sig.
.721
FromSPSS:
Tests of Normality
a
VAR00001
Kolmogorov-Smirnov
Statistic
df
Sig.
.110
70
.036
Statistic
.965
Sig.
.045
Average PM10
Kolmogorov-Smirnov
Statistic
df
Sig.
.142
72
.001
Statistic
.841
Shapiro-Wilk
df
72
Sig.
.000
Shapiro-Wilk
df
72
Sig.
.000
Kolmogorov-Smirnov
Shapiro-Wilk
Statistic
df
Sig.
Statistic
df
Asthma Cases
.069
72
.200*
.988
72
Sig.
.721
Average PM10
Kolmogorov-Smirnov
Statistic
df
Sig.
.142
72
.001
Statistic
.841
Inthiscasetheprobabilitiesaregreaterthan0.05(thetypicalalpha
level),soweacceptH0thesedataarenotdifferentfromnormal.
Inthiscasetheprobabilitiesarelessthan0.05(thetypicalalphalevel),
sowerejectH0thesedataaresignificantlydifferentfromnormal.
Important:Asthesamplesizeincreases,normalityparametersbecomes
MORErestrictiveanditbecomeshardertodeclarethatthedataare
normallydistributed.
2/12/2013
ForContinuousData:
Village
Aranza
Corupo
SanLorenzo
Cheranatzicurin
Nahuatzen
Pomacuaran
Sevina
Arantepacua
Cocucho
Charapan
Comachuen
Pichataro
Quinceo
Nurio
Turicuaro
Urapicho
Capacuaro
ForContinuousData:
Observed
Population Cumulative
Density
Frequency
4.13
0.0588
4.53
0.1176
4.69
0.1764
4.76
0.2352
4.77
0.2940
4.96
0.3528
4.97
0.4116
5.00
0.4704
5.04
0.5292
5.10
0.5880
5.25
0.6468
5.36
0.7056
5.94
0.7644
6.06
0.8232
6.19
0.8820
6.30
0.9408
7.73
0.9996
Expected
Z
Cumulative
Zscore Probability Frequency
1.40
0.0808
0.0808
0.94
0.1736
0.1736
0.75
0.2266
0.2266
0.67
0.2514
0.2514
0.66
0.2546
0.2546
0.44
0.3300
0.3300
0.43
0.3336
0.3336
0.39
0.3483
0.3483
0.35
0.3632
0.3632
0.28
0.3897
0.3897
0.10
0.4602
0.4602
0.02
0.4920
0.5080
0.69
0.2451
0.7549
0.83
0.2033
0.7967
0.98
0.1635
0.8365
1.11
0.1335
0.8665
2.76
0.0029
0.9971
Di
0.0220
0.0560
0.0502
0.0162
0.0394
0.0228
0.0780
0.1221
0.1660
0.1983
0.1866
0.1976
0.0095
0.0265
0.0455
0.0743
0.0025
D'i
0.0808
0.1148
0.1090
0.0750
0.0194
0.0360
0.0192
0.0633
0.1072
0.1395
0.1278
0.1388
0.0493
0.0323
0.0133
0.0155
0.0563
Fortheobservedcumulativefrequencysimply
dividetheobservationbythesamplesize,sofor
Aranza1/17=0.0588,forCorupo0.0588+0.0588=
0.1176,forSanLorenzo0.0588+0.0588+0.0588=
0.1764,etc
Mean=5.34
SD=0.866
N=17
Village
Aranza
Corupo
SanLorenzo
Cheranatzicurin
Nahuatzen
Pomacuaran
Sevina
Arantepacua
Cocucho
Charapan
Comachuen
Pichataro
Quinceo
Nurio
Turicuaro
Urapicho
Capacuaro
Observed
Population Cumulative
Density
Frequency
4.13
0.0588
4.53
0.1176
4.69
0.1764
4.76
0.2352
4.77
0.2940
4.96
0.3528
4.97
0.4116
5.00
0.4704
5.04
0.5292
5.10
0.5880
5.25
0.6468
5.36
0.7056
5.94
0.7644
6.06
0.8232
6.19
0.8820
6.30
0.9408
7.73
0.9996
Expected
Z
Cumulative
Zscore Probability Frequency
1.40
0.0808
0.0808
0.94
0.1736
0.1736
0.75
0.2266
0.2266
0.67
0.2514
0.2514
0.66
0.2546
0.2546
0.44
0.3300
0.3300
0.43
0.3336
0.3336
0.39
0.3483
0.3483
0.35
0.3632
0.3632
0.28
0.3897
0.3897
0.10
0.4602
0.4602
0.02
0.4920
0.5080
0.69
0.2451
0.7549
0.83
0.2033
0.7967
0.98
0.1635
0.8365
1.11
0.1335
0.8665
2.76
0.0029
0.9971
Di
0.0220
0.0560
0.0502
0.0162
0.0394
0.0228
0.0780
0.1221
0.1660
0.1983
0.1866
0.1976
0.0095
0.0265
0.0455
0.0743
0.0025
D'i
0.0808
0.1148
0.1090
0.0750
0.0194
0.0360
0.0192
0.0633
0.1072
0.1395
0.1278
0.1388
0.0493
0.0323
0.0133
0.0155
0.0563
Di Max=0.1983Di Max=0.1395
Mean=5.34
SD=0.866
D=0.1983(Usethelargerofthe2values)
Dcritical =0.207(fromthetable)
D=0.1983(computed)
Since0.1983<0.207acceptHo.
Thesampledatasetisnotsignificantlydifferentthannormal
(D0.05,0.1983,0.10>p>0.05)
FromSPSS:
Tests of Normality
a
VAR00001
Kolmogorov-Smirnov
Statistic
df
Sig.
.197
17
.077
Statistic
.883
Shapiro-Wilk
df
17
Sig.
.035
NoticethatthereisdisagreementbetweentheKolmogorovSmirnovandtheShapiro
Wilktests.
2/12/2013
W/STestforNormality
Afairlysimpletestthatrequireonlythesamplestandard
deviationandthedatarange.
Basedontheqstatistic,whichisthestudentizedrange,
ortherangeexpressedinstandarddeviationunits.
ShouldnotbeconfusedwiththeShapiroWilkstest.
w
s
Village
Aranza
Corupo
SanLorenzo
Cheranatzicurin
Nahuatzen
Pomacuaran
Sevina
Arantepacua
Cocucho
Charapan
Comachuen
Pichataro
Quinceo
Nurio
Turicuaro
Urapicho
Capacuaro
Population
Density
4.13
4.53
4.69
4.76
4.77
4.96
4.97
5.00
5.04
5.10
5.25
5.36
5.94
6.06
6.19
6.30
7.73
Standarddeviation(s)=0.866
Range(w)=3.6
n=17
w
s
3.6
4.16
0.866
TheW/Stestusesacriticalrange.IFthecalculatedvaluefallsWITHINtherange,
thenacceptHo.IFthecalculatedvaluefallsoutsidetherangethenrejectHo.
Since4.16fallsbetween3.06and4.31,thenweacceptHo.
Sincewehaveacriticalrange,itisdifficulttodetermineaprobability
rangeforourresults.Thereforewesimplystateouralphalevel.
Thesampledatasetisnotsignificantlydifferentthannormal
(W/S4.16,p>0.05).
2/12/2013
DAgostinoTestforNormality
Averypowerfultestfordeparturesfromnormality.
BasedontheDstatistic,whichgivesanupperandlowercritical
value.
D
T
3
n SS
where
n 1
T i
Xi
2
Usethenextlower
n onthetableifyour
samplesizeisNOT
listed.
Population
Village
Density
Aranza
4.13
Corupo
4.53
SanLorenzo
4.69
Cheranatzicurin
4.76
Nahuatzen
4.77
Pomacuaran
4.96
Sevina
4.97
Arantepacua
5.00
Cocucho
5.04
Charapan
5.10
Comachuen
5.25
Pichataro
5.36
Quinceo
5.94
Nurio
6.06
Turicuaro
6.19
Urapicho
6.30
Capacuaro
7.73
Mean
i
Deviates2
1
1.46410
2
0.65610
3
0.42250
4
0.33640
5
0.32490
6
0.14440
7
0.13690
8
0.11560
9
0.09000
10
0.05760
11
0.00810
12
0.00040
13
0.36000
14
0.51840
15
0.72250
16
0.92160
17
5.71210
Mean=5.34 SS=11.9916
n 1 17 1
9
2
2
T (i 9) X 1
T (1 9)4.13 (2 9)4.53 (3 9)4.69 (17 9)7.73
T 63.23
D
63.23
(17 3 )(11.9916)
0.26050
DCritical 0.2587,0.2860
Ifthecalculatedvaluefallswithinthecriticalrange,acceptHo.
Since0.2587<D=0.26050< 0.2860acceptHo.
Thesampledatasetisnotsignificantlydifferentthannormal(D0.26050,p>0.05).
Values of D as Capacuaros
population density is increased.
Values of D as Aranzas
population density is decreased.
2/12/2013
WhichnormalitytestshouldIuse?
DecreasingAranzas
populationdensity
slightlymadethedata
setmore symmetrical.
KolmogorovSmirnov:
Notsensitivetoproblemsinthetails.
Worksreasonablywellwithdatasets<50.
ShapiroWilks:
Doesn'tworkwellifseveralvaluesinthedatasetarethesame.
Worksbestfordatasetswith>50,butcanbeusedwithsmaller
datasets.
ThebetterofthetwoSPSStests.
W/S:
Simple,buteffective.
NotavailableinSPSS.
DAgostino:
Probablythemostpowerfulofallthenormalitytests.
NotavailableinSPSS.
Normalitytestsunderdifferingconditions
KolmogorovSmirnov
Statistic Prob
Shapiro-Wilkes
Statistic
Prob
Data Type
Random Normal
30
0.116
0.200
0.987
0.962
Random Normal
100
0.059
0.200
0.986
0.382
Random Normal
1000
0.019
0.200
0.998
0.386
Random Normal
2000
0.018
0.120
0.999
0.154
NormalityTest
AndersonDarling
CramervonMises
ShapiroFrancia
ShapiroWilk
KolmogorovSmirnov
Pearsonchisquare
Statistic
A
W
W
W
D
P
CalculatedValue
1.0958
0.1776
0.9186
0.9224
0.1577
12.5000
Probability
0.006219
0.009396
0.013690
0.014770
0.023850
0.051700
Noticethatasthesamplesizeincreases,theprobabilitiesdecrease.
Inotherwords,itgetshardertomeetthenormalityassumptionas
thesamplesizeincreasessinceevensmalldifferencesaredetected.
2/12/2013
Whenisnonnormalityaproblem?
FinalWordsConcerningNormalityTesting:
Normalitycanbeaproblemwhenthesamplesizeissmall(<50).
1. SinceitISatest,stateanullandalternatehypothesis.
Highlyskeweddatacreateproblems.
2. Ifyouperformanormalitytest,donotignoretheresults.
Highlyleptokurticdataareproblematic,butnotasmuchas
skeweddata.
3. Ifthedataarenotnormal,usenonparametrictests.
4. Ifthedataarenormal,useparametrictests.
Normalitybecomesaseriousconcernwhenthereisactivityin
thetailsofthedataset.
Outliersareaproblem.
ANDMOSTIMPORTANTLY:
5. Ifyouhavegroupsofdata,youMUSTtesteachgroupfor
normality.
Clumpsofdatainthetailsareworse.
10