You are on page 1of 56

ANOVA

ANALYSIS OF VARIANCE
ANOVA

A fost inventat n 1920 de ctre R.A. Fischer,


care a vrut s vad diferenele semnificative
dintre diferitele tipuri de plante.

Din 1970 este cea mai folosit metod


statistic n studii de psihologie.

Aplicaiile snt dintre cele mai variate:


psihologie, biologie, sociologie, economie.
ANOVA versus Testul t
Testul t permite compararea mediilor a dou
populaii, pentru a vedea dac exist o diferen
semnificativ ntre valorile acestora

Ideea de baz a ANOVA este aceeai ca i a


testului t, doar c n acest caz putem compara
mediile mai multor populaii statistice

n fapt, prin ANOVA putem cuantifica impactul


unuia sau mai multor factori de influen asupra
unei variabile de interes.
Exemplu: Vrem s determinm cum se poate mbunti
productivitatea funcionarilor potali

Lum n considerare 4 tipuri de factori:


Sistemul de stimulente bazat pe calificative acordate de
superiori
Recunoatere din partea superiorilor
Sisitemul de calificative plus recunoatere din partea
superiorilor
Reineri salariale n caz de abateri de la disciplina
muncii
Testul t nu poate testa simultan diferenele dintre
productivitatea muncii n fiecare din grupurile
create conform acestor criterii.

Putem efectua testul t comparnd productivitile


medii dou cte dou: C42 6 teste.
TERMINOLOGIE

Variabila dependenta ( variabla raspuns )


variabila studiata
Variabila independenta/explicativa (factor)
o variabila care influenteaza variabila dependenta
Nivelul factorului (tratament)
o valoare particulara a factorului
Variatia reziduala
influente aleatoare asupra variabilei dependente
TERMINOLOGIE

Exemplu
Determinam cum recolta este influentata de tipul
de ingrasamint folosit. Un fermier foloseste 3 tipuri
de ingrasamint notate A,B and C
Variabila raspuns - productia
factorul - tipul de ingrasamint
tratamentul - ingrasamintul A, B and C
TERMINOLOGIE

Exemplu 2
Analizam cum pretul actiunilor este
determinat de rata dobinzii pe care o ofera.
Studiem obligatiuni care platesc rate de
6%, 8%,10%
Variabila raspuns - pretul actiunii
factorul - rata dobinzii
tratamentul - 6%,8% sau10%
MODELE ANOVA

In functie de numarul de factori


one-factor ANOVA models

multi-factors ANOVA models

In functie de nivelele factorilor


fixed effect models
random effect models
mixed effect models
ANOVA unifactorial
One Way ANOVA(One Factor ANOVA)

O singur variabil independent X, ale crei


valori pot fi mprite n mai multe grupuri:
X1,...,Xk.
Vrem s vedem dac exist o diferen
semnificativ ntre valorile variabilei
dependente Y n interiorul grupurilor create
dup variabila de grupare X.
Practic, observaiile snt valorile lui Y n
interiorul celor k grupuri create dup valorile
lui X.
Tabelul de intrare pentru ANOVA
Valorile Numrul de Valorile observate Media de
variabilei observaii din pentru variabila Populaia selecie
independente fiecare grup dependent Y, n
fiecare grup
n1

X1 n1 y11 y12 y1n1 Y1 N ( 1 , 12 ) y 1i


y1 i 1

n1
n2

X2 n2 y21 y22 y2n2 Y2 N ( 2 , 22 ) y 2i


y2 i 1

n2
. . . . . . . .
. . . . . . . .
. . . . . . . .
nk

Xk nk y k 1 yk 2
y knk Yk N ( k , k2 ) y ki
yk i 1

nk
Volumul
eantionului
n n1 n2 ... nk
Ipoteze in ANOVA
Media total a populaiei va fi estimat prin media total a
eantionului: k n n
y
k

yn
k

ij i i
i 1 j 1
y i
i 1

n n
Setul de ipoteze H 0 : 1 2 ... k

H A : non H 0 (cel puin dou medii snt neegale)
Dac ipoteza nul este acceptat, atunci putem concluziona
c factorul de grupare nu are o influen semnficativ
asupra variabilei de interes.
Ideea de baz n testarea ipotezelor ANOVA este regula de
adunare a dispersiilor, descompunerea dispersiei totale n
dispersia dintre grupe(factorul sistematic) i dispersia din
interiorul grupelor(factorul aleator).
Tabelul de analiz a varianei
ANOVA Table

Source of SS df MS F
Variance

Between Groups
MST
k SST
(Factorul
sistematic) SST ni ( yi y )2 k-1 MST
k 1 MSE
i 1

Within Groups
k ni SSE
(Factorul
SSE ( yij yi ) 2 n-k MSE
aleator)
i 1 j 1
nk

k ni
Total SStotal ( yij y )2 n-1
i 1 j 1
Testul F(Fischer)
Decizia se ia pe baza testului F: se compar valoarea
statisticii F calculat n tabelul ANOVA cu valoarea critic,
corespunztoare cuantilei repartiiei F cu (k-1,n-k) grade de
libertate.
Dac F F ; k 1; n k atunci respingem ipoteza nul,
deci putem afirma, cu probabilitatea 1 , c factorul de
grupare are o influen semnificativ asupra variabilei de
interes.
Valoarea critic n EXCEL: F ;k 1;n k FINV ( , k 1, n k )
Comparaii multiple Procedura Tukey-Kramer

Dac n urma ANOVA a rezultat c exist o


diferen semnificativ ntre valorile variabilei de
interes n cele k grupuri, pasul urmtor este acela
de a realiza o procedur de comparaii multiple
pentru a determina care grupuri snt diferite.

Procedura Tukey-Kramer procedur post-hoc


Procedura Tukey-Kramer
Se calculeaz diferenele yi y j , i j pentru toate
cele C 2 k (k 1) perechi de medii.
k
2

Se determin distana critic dup formula

MSE 1 1
DC QU
2 ni n j
unde QU este cuantila superioar a distribuiei
studentizate a distanei (Studentized range
distribution) cu k grade de libertate la numrtor
i n-k grade de libertate la numitor.
Procedura Tukey-Kramer

Se compar distanele calculate yi y j , i j cu


valoarea critic DC definit anterior

Dac exist i,j a.. yi y j DC atunci mediile


yi , y j snt semnificativ diferite.
Valorile critice ale distribuiei distanei
pentru 0.05(5%)
Distribuia studentizat a distanei
Exemplu
Managerul unui lan de magazine vrea s determine dac
locul unde este amplasat un produs pe raft are o influen
semnificativ asupra valorii vnzrilor. Snt considerate trei
posibile amplasri pe raft: zona 1(nivelul de sus), zona
2(nivelul median) i zona 3(nivelul de jos).

Este selectat aleator un eantion de 18 magazine, 6 care


au produsul n zona 1, 6 n zona 2 i 6 n zona 3.

Dup o lun, a fost ntregistrat valoarea vnzrilor


produsului(n mii dolari) pentru fiecare magazin.
Cum realizm ANOVA folosind EXCEL

1. Introducem datele n Excel


Cum realizm ANOVA folosind EXCEL
2. Apelm procedura ANOVA folosind: Tools >
Data Analysis > ANOVA: Single Factor
Cum realizm ANOVA folosind EXCEL
3. Selectm zona de date i eventual schimbm
pragul de semnificaie i OK.
Cum realizm ANOVA folosind EXCEL
4. Excel Output
Concluzii

Putem afirma, cu probabilitatea 95%, c locul unde


este amplasat produsul pe raft influeneaz
semnificativ valoarea vnzrilor.

Mai mult, comparnd vnzrile medii pentru fiecare


locaie, putem afirm c valoarea medie a vnzrilor
pentru produsele amplasate n zona de sus a raftului
este semnificativ mai mare dect valoarea medie a
vnzrilor din celelalte zone.

n consecin, decizia pe care ar trebui s o ia


managerul respectiv este de a plasa produsul n zona
de sus a raftului.
Foarte important!!!
Atunci cnd realizm comparaii ntre mediile
unor populaii folosind ANOVA trebuie s fie
ndeplinite n mod necesar trei condiii:

Independena i caracterul aleator al


alegerii eantioanelor
Normalitatea eantioanele din fiecare
grup snt extrase din populaii normale
Omogenitatea varianei dispersiile
celor k grupuri snt presupuse a fi egale
ANOVA blocuri randomizate
(ANOVA unifactorial cu observaii repetate)
Testul F pentru blocuri randomizate
Ca i n ANOVA simpl, testm independena
mediilor unor populaii, pentru diferite nivele ale
variabilei factoriale....
...dar vrem s controlm i variaia datorat unui
factor secundar
Nivelele pentru cel de-al doilea factor snt numite
blocuri
Ipoteze: r = numrul de linii, c = numrul de
coloane
Ipoteze
1. Normalitatea
Populatiile snt distribuite normal
2. Omogenitatea varianiei
Populaiile au dispersii egale
3. Independena erorilor
Eantioanele snt selectate aleator i
independent
Descompunerea variaiei totale

Variaia total
SST=

Variaia dintre
grupuri
SSA
+ Variaia aleatoare
Variaia dintre
blocuri
SSBL + SSE
Sum of Squares for Blocking
SST = SSA + SSBL + SSE

r
SSBL c (Yi. Y) 2

i 1
Where:
c = number of groups
r = number of blocks
Yi. = mean of all values in block i
Y = grand mean (mean of all data values)
Partitioning the Variation
Total variation can now be split into three
parts:

SST = SSA + SSBL + SSE

SST and SSA are SSE = SST (SSA + SSBL)


computed as they were
in One-Way ANOVA
Mean Squares

SSBL
MSBL Mean square blocking
r 1

SSA
MSA Mean square among groups
c 1

SSE
MSE Mean square error
(r 1)(c 1)
Randomized Block ANOVA Table
Source of SS df MS F ratio
Variation
Among MSA
Treatments SSA c-1 MSA
MSE
Among SSBL r-1 MSBL MSBL
Blocks
MSE
Error SSE (r1)(c-1) MSE

Total SST rc - 1
c = number of populations rc = sum of the sample sizes from all populations
r = number of blocks df = degrees of freedom
Blocking Test
H0 : 1. 2. 3. ...
H1 : Not all block means are equal

MSBL
F=
MSE
Blocking test: df1 = r 1
df2 = (r 1)(c 1)

Reject H0 if F > FU
Main Factor Test
H0 : .1 .2 .3 ... .c
H1 : Not all population means are equal

MSA
F=
MSE Main Factor test: df1 = c 1
df2 = (r 1)(c 1)

Reject H0 if F > FU
The Tukey Procedure

To test which population means are


significantly different
e.g.: 1 = 2 3
Done after rejection of equal means in
randomized block ANOVA design
Allows pair-wise comparisons
Compare absolute mean differences with
critical range

1= 2 3 x
The Tukey Procedure
(continued)

MSE
Critical Range Qu
r

Compare:
Is x.j x.j' Critical Range ? x .1 x .2
If the absolute mean difference x .1 x .3
is greater than the critical range
then there is a significant x .2 x .3
difference between that pair of
means at the chosen level of etc...
significance.
Exemplu
6 experi n gastronomie trebuie s evalueze 4
restaurante n privina calitii serviciilor
Experii aloc fiecrui restaurant un punctaj de
la 1 la 100
Se poate afirma c exist o diferen
semnificativ ntre cele patru restaurante n
ceea ce privete punctajele acordate?
Exist vreo diferen n ceea ce privete
modalitatea de punctare a celor 6 experi?
Cum realizm ANOVA folosind EXCEL

1. Introducem datele n Excel


Cum realizm ANOVA folosind EXCEL
2. Apelm procedura ANOVA folosind: Tools > Data
Analysis > ANOVA: Two Factor Without Replication
Cum realizm ANOVA folosind EXCEL
3. Excel Output
Concluzii

Putem afirma, cu probabilitatea 95%, c


exist o diferen semnificativ ntre
cele 4 restaurante n ceea ce privete
punctajele acordate de cei 6 experi

Mai mult, exist o diferen


semnificativ ntre punctajele medii
acordate, i.e. unii experi acord n
general pucntaje mai mari dect ceilali
Factorial Design:
Two-Way ANOVA
Examines the effect of
Two factors of interest on the dependent
variable
e.g., Percent carbonation and line speed on
soft drink bottling process
Interaction between the different levels
of these two factors
e.g., Does the effect of one particular
carbonation level depend on which level the
line speed is set?
Two-Way ANOVA
Assumptions

Populations are normally


distributed
Populations have equal variances
Independent random samples are
drawn
Two-Way ANOVA
Sources of Variation
Two Factors of interest: A and B
r = number of levels of factor A
c = number of levels of factor B
n = number of replications for each cell
n = total number of observations in all cells
(n = rcn)
Xijk = value of the kth observation of level i of
factor A and level j of factor B
Two-Way ANOVA
Sources of Variation (continued)

SST = SSA + SSB + SSAB + SSE Degrees of


Freedom:
SSA r1
Factor A Variation

SST SSB c1
Factor B Variation
Total Variation
SSAB
Variation due to interaction (r 1)(c 1)
between A and B
n-1
SSE rc(n 1)
Random variation (Error)
Two Factor ANOVA Equations

Total Variation: r c n
SST ( Xijk X) 2

i1 j1 k 1

Factor A Variation: r
SSA cn ( Xi.. X)
2

i1

Factor B Variation:
c
SSB rn ( X. j. X)
2

j1
Two Factor ANOVA Equations
(continued)

Interaction Variation:
r c
SSAB n ( Xij. Xi.. X.j. X)2
i1 j1

Sum of Squares Error:


r c n
SSE ( Xijk Xij. )2
i1 j1 k 1
Two Factor ANOVA Equations
(continued)
r c n

where: X
i1 j1 k 1
ijk

X Grand Mean
c n
rcn
X
j1 k 1
ijk

Xi.. Mean of ith level of factor A (i 1, 2, ..., r)


cn
r n

X ijk
X. j. i1 k 1
Mean of jth level of factor B (j 1, 2, ..., c)
rn
n
Xijk
Xij.
r = number of levels of factor A
Mean of cell ij
k 1 n
c = number of levels of factor B
n = number of replications in each cell
Mean Square Calculations
SSA
MSA Mean square factor A
r 1

SSB
MSB Mean square factor B
c 1

SSAB
MSAB Mean square interactio n
(r 1)(c 1)

SSE
MSE Mean square error
rc(n'1)
Two-Way ANOVA:
The F Test Statistic
F Test for Factor A Effect
H0: 1.. = 2.. = 3.. =
MSA Reject H0
H1: Not all i.. are equal F
MSE if F > FU

F Test for Factor B Effect


H0: .1. = .2. = .3. =
MSB Reject H0
H1: Not all .j. are equal F
MSE if F > FU

F Test for Interaction Effect


H0: the interaction of A and B is
equal to zero
MSAB
H1: interaction of A and B is not F Reject H0
MSE if F > FU
zero
Two-Way ANOVA
Summary Table
Sum of F
Source of Degrees of Mean
Square Statisti
Variation Freedom Squares
s c
MSA MSA
Factor A SSA r1
= SSA /(r 1) MSE
MSB MSB
Factor B SSB c1
= SSB /(c 1) MSE
AB
(r 1)(c MSAB MSAB
(Interaction SSAB
1) = SSAB / (r 1)(c 1) MSE
)
MSE =
Error SSE rc(n 1)
SSE/rc(n 1)
Total SST n1
Features of Two-Way ANOVA
F Test
Degrees of freedom always add up
n-1 = rc(n-1) + (r-1) + (c-1) + (r-1)(c-1)
Total = error + factor A + factor B + interaction
The denominator of the F Test is always the same but
the numerator is different
The sums of squares always add up
SST = SSE + SSA + SSB + SSAB
Total = error + factor A + factor B + interaction
Examples:
Interaction vs. No Interaction
Interaction is
No interaction:
present:

Factor B Level 1
Mean Response

Mean Response
Factor B Level 1
Factor B Level 3

Factor B Level 2
Factor B Level 2
Factor B Level 3

Factor A Levels Factor A Levels


Multiple Comparisons:
The Tukey Procedure
Unless there is a significant interaction,
you can determine the levels that are
significantly different using the Tukey
procedure
Consider all absolute mean differences
and compare to the calculatedX1.. X 2..
critical
range X1.. X 3..
Example: Absolute differences
X 2.. X 3..
for factor A, assuming three factors:
Multiple Comparisons:
The Tukey Procedure
Critical Range for Factor A:
MSE
Critical Range QU
c n'

(where Qu is from Table E.10 with r and rc(n1) d.f.)

Critical Range for Factor B:


MSE
Critical Range QU
r n'

(where Qu is from Table E.10 with c and rc(n1) d.f.)

You might also like