Professional Documents
Culture Documents
Model
Linear regression Poisson regression Cox model Logistic regression
Logistic regression
Models relationship between set of variables xi dichotomous (yes/no) categorical (social class, ... ) continuous (age, ...) and dichotomous (binary) variable Y Dichotomous outcome most common situation in biology and epidemiology We code dichotomous varioable: 0 no disease, survive, non-smoker, female ... 1 disease, died , smoker, male ... We code with 1 what we want to predict
4
Linear regression
y = 0 + 1x +
error is normally distributed, with mean=0 and constant variance (i.e., homogeneity of variance) Binary dependent variable: y = 0 or y = 1
1 positive response 0 negative response P Q = (1-P)
= 1 0 1x = 0 0 1x
5
Applications
Compare two (success) probabilities with correction for prognostic factors (clinical trials) Determine which risk factors are important/not important (epidemiology) Determine the dose-response relation (toxicology)
Example 1
outcome CHD + CHD total smoking + 17 (a) 7 (c) 24 (m) smoking 9 (b) 27 (d) 36 (n) total 26 (r) 34 (s) 60 (N)
Analysis: Student t test for proportion: pCHD+ CHD+ smokers : p CHD+ nonnon-smokers t = 3,53 p < 0,01 or 2 - test
CHD
0 0 smoking 1
Example 1
outcome CHD + CHD total smoking + 17 (a) 7 (c) 24 (m) smoking 9 (b) 27 (d) 36 (n) total 26 (r) 34 (s) 60 (N)
a / m a 17 = = = 2,429 c /m c 7
A smoker is 2.428 times more likely to have CHD than he is likely to have not CHD
b/n b 9 = = = 0,333 d / n d 27
A non-smoker is 0.333 times more likely to have CHD as he is likely to have not CHD.
10
Odds
Odds for an event
p odds = 1 p
p log (odds ) = log 1 p
p is probability that an event occurs What is greater odds of an event, the greater the probability that the event occurs
11
Logit transformacija
Logit transformacija daje linearnu relaciju izmeu verovatnoe posmatranog dogaaja i vrednosti nezavisne varijable x
p log (odds) = log 1 p = 0 + 1x
Model je slian prostom regresionom modelu, ali: raspodela je binomna, a ne normalna koeficijenti a i b se ne odreuju na isti nain kao u linearnom regresionom modelu
12
Logit transformacija
Logit prirodni logaritam (ln) odds (anse) da se posmatrani dogaaj desi (kodiranog sa 1) obeleava se kao log odds logit skala je kontinuirana i ponaa se na slian nain kao z-score skala p = 0.50, logit = 0 p = 0.70, logit = 0.84 p = 0.30, logit = -0.84
13
e b0 + b1x
b 0 + b1x
p=
e 2,718 p = P(y=1)
15
Interpretacija koeficijenata b0 i b1
b0 neophodan za jednainu, nema znaaja za interpretaciju predstavlja vrednost log odds kada je prediktor jednak 0
b1 mera za asocijaciju izmeu prediktora i log odds za pojavu dogaaja koji nas interesuje b1 > 0 pozitivna asocijacija b1 = 0 nema asocijacije b1 < 0 negativna asocijacija
16
Interpretacija koeficijenta b1
b1 je frakcija za koju se promeni rizik za pojavu dogaaja koji nas interesuje kada se prediktor x promeni za jednu jedinicu Primer
log (odds za dogaaj kod osobe 2) = b0 + b1 (k + 1) log (odds za dogaaj kod osobe 1) = b0 + b1 (k)
Dalje:
log (odds za dogaaj kod osobe 2) = b0 + b1 (k) + b1 log (odds za dogaaj kod osobe 1) = b0 + b1 (k)
17
Interpretacija koeficijenta b1
Razlika izmeu log odds osobe 1 i osobe 2:
log (odds za dogaaj kod osobe 2) = b0 + b1 (k) + b1 log (odds za dogaaj kod osobe 1) = b0 + b1 (k)
log odds za pojavu dogaaja koji nas interesuje kod osobe 2 iji je prediktor x = k + 1, razlikuje se od log odds za pojavu dogaaja koji nas interesuje kod osobe 1 iji je prediktor x = k za vrednost koeficijenta b1 odnosno b1 je frakcija za koju se promeni rizik za pojavu dogaaja koji nas interesuje kada se prediktor x promeni za jednu jedinicu
18
Interpretacija koeficijenta b1
b1 = log (odds za pojavu dogaaja kod osobe 2) - log (odds za pojavu dogaaja kod osobe 1)
odds za pojavu dogaaja kod osobe 2 b1 = log odds za pojavu dogaaja kod osobe 1 b1 = log (odds ratio ) odds ratio (OR ) = eb1
19
Interpretacija koeficijenta b1
b1 = 0 odds i verovatnoa za pojavu eljenog dogaaja su jednaki za sve vrednosti x (eb1 = OR = 1) b1 > 0 odds i verovatnoa za pojavu eljenog dogaaja se poveavaju sa poveanjem vrednosti x (eb1 = OR > 1) b1 < 0 odds i verovatnoa za pojavu eljenog dogaaja se smanjuju sa smanjenjem vrednosti x (eb1 = OR < 1)
20
Smokers are 7,29 times more likely to have CHD than non-smokers
21
Odds za prisutan dogaaj koji nas interesuje: (a/m) / (c/m) = a/c Odds za odsutan dogaaj koji nas interesuje: (b/n) / (d/n) = b/d Odds ratio: (a/c) / (b/d) = ad/bc
22
Interpretation of coefficients
Odds (smokers) = 2.429 ln (odds) = 0.887 Odds (non-smokers) = 0.333 ln (odds) = -1.099 Model for this example is
p ln 1 p = b 0 + b1 x p ln 1 p = b 0 + b1 0 = b 0
The estimate of the intercept is equal to 0 which is the log odds for non-smokers
p ln 1 p
= 0 = 1 . 099
23
Interpretation of coefficients
The estimate of the slope is the difference between the log odds for smokers and the log odds for non-smokers:
Oddssmo ker s e (1.099+1.986 ) 1.986 = = e = 7.286 ( ) 1 . 099 Odds non smo ker s e
24
25
Example 1 in SPSS
Point to the variable labeled chd Move variable chd, to the box labeled Dependent Variable by clicking the arrow Point to the variable labeled smoking Move variable smoking to the box labeled Covariates by clicking the arrow Method Enter
26
Example 1 in SPSS
In the menu, click on Options Check CI for exp(B) and Continue Then click OK
27
a. If weight is in effect, see classification table for the total number of cases.
a,b Classification Table
Predicted CHD Step 0 Observed CHD Overall Percentage a. Constant is included in the model. b. The cut value is ,500 0 0 1 34 26 1 0 0 Percentage Correct 100,0 ,0 56,7
The Block 0 output is for a model that includes only the intercept (which SPSS calls the constant). Given the base rates of the two CHD options (34/60 = 56.7% no CHD, 43.3% with CHD), and no other information, the best strategy is to predict, for every case, that the subject has CHD. Using that strategy, you would be correct 56.7% of the time.
28
Step 0
Constant
B -,268
S.E. ,261
Wald 1,060
df 1
Sig. ,303
Exp(B) ,765
Omnibus Tests of Model Coefficients gives us a Chi-Square of 12.645 on 1 df, significant beyond 0.001. This is a test of the null hypothesis that adding the smoking variable to the model has not significantly increased our ability to predict the CHD in our subjects.
Omnibus Tests of Model Coefficients Chi-square 12,645 12,645 12,645 df 1 1 1 Sig. ,000 ,000 ,000
Step 1
29
Model Summary -2 Log likelihood 69,463 Cox & Snell R Square ,190 Nagelkerke R Square ,255
Step 1
30
Step a 1
PUSENJE Constant
B 1,986 -1,099
df 1 1
coefficient Wald 2 = SE
df = 1, 20,05; 1 = 3,841
31
Step a 1
PUSENJE Constant
B 1,986 -1,099
df 1 1
OR = e1,986 = 7,286
OR
32
33
Non-smokers: p = 0.333 / (1+0.333) = 0.250 = 25% Probability is 25% that non-smoker will have CHD Smokers: p = 2.428 / (1+2.428) = 0.708 = 70.8% probability is 70.8% that smoker will have CHD
34
Primer 2 u SPSS-u
KSB : Faktor rizika Starost
Variables in the Equation 95,0% C.I.for EXP(B) Lower Upper 1,931 19,338
Step a 1
AGE Constant
B 1,810 -1,299
df 1 1
b0 OR = e-1,299 = 6,111
b1
OR
Osobe starije od 50 g imaju 6,11 puta veu verovatnou da obole od KSB nego osobe mlae od 50 g
Model Summary -2 Log likelihood 71,437 Cox & Snell R Square ,163 Nagelkerke R Square ,219 Step 1
37
Primer 2 u SPSS-u
KSB : Faktor rizika Puenje
Variables in the Equation 95,0% C.I.for EXP(B) Lower Upper 2,286 23,223
Step a 1
PUSENJE Constant
B 1,986 -1,099
df 1 1
p OR = e1,986 = 7,286
OR
Puai imaju 7,29 puta veu verovatnou da obole od KSB nego nepuai
Model Summary -2 Log likelihood 69,463 Cox & Snell R Square ,190 Nagelkerke R Square ,255 Step 1
38
Primer 2 u SPSS-u
KSB : Faktor rizika Gojaznost
Variables in the Equation 95,0% C.I.for EXP(B) Lower Upper 1,096 9,581
Step a 1
OBESITY Constant
B 1,176 -,734
df 1 1
p OR = e1,176 = 3,241
OR
Gojazne osobe imaju 3,24 puta veu verovatnou da obole od KSB nego negojazne osobe
Model Summary -2 Log likelihood 77,415 Cox & Snell R Square ,075 Nagelkerke R Square ,101
Step 1
39
Primer 2 u SPSS-u
KSB : Faktor rizika Holesterol
p OR = e0,696 = 2,005
OR
Kada se holesterol povea za jednu jedinicu (1 mmol/L), verovatnoa da osoba oboli od KSB poveava se za 2,005 puta
Model Summary -2 Log likelihood 73,490 Cox & Snell R Square ,134 Nagelkerke R Square ,179
Step 1
40
Example 2
In the menu, click on Options Check CI for exp(B) Hosmer-Lemeshow goodnessof-fit and Continue Then click OK
41
Example 2
Point to the variable labeled chd Move variable chd, to the box labeled Dependent Variable by clicking the arrow Point to the variable labeled smoking, then obesity, age and cholestero Move variables to the box labeled Covariates by clicking the arrow Method Enter
42
Step 1
The Hosmer-Lemeshow tests the null hypothesis that there is a linear relationship between the predictor variables and the log odds of the criterion variable.
Hosmer and Lemeshow Test Step 1 Chi-square 5,583 df 8 Sig. ,694
43
Step a 1
df 1 1 1 1 1
one-predictor model OR smoking obesity age cholesterol 7.286 3.241 6.111 2.005 p < 0.05 < 0.05 < 0.05 < 0.05
four-predictors model OR 7.899 3.084 5.027 1.369 p < 0.05 > 0.05 <0.05 >0.05
44
45
Step a 1 Step b 2
Model Summary
Variables not in the Equation Step 1 Variables OBESITY AGE CHOLESTE OBESITY CHOLESTE Score 3,769 9,234 6,060 12,654 3,247 1,262 4,106 df 1 1 1 3 1 1 2 Sig. ,052 ,002 ,014 ,005 ,072 ,261 ,128
Step 1 2
Step 2
46
Step 1
Variables in the Equation 95,0% C.I.for EXP(B) Lower Upper 2,388 71,358 1,054 1,719 1,031 1,198 ,446 2,107
Step a 1
df 1 1 1 1 1
48
Hosmer and Lemeshow Test Step 1 2 3 Chi-square 2,687 4,078 6,346 df 8 8 8 Sig. ,952 ,850 ,609
Step 1 2 3
Variables in the Equation 95,0% C.I.for EXP(B) Lower Upper 1,038 1,142 2,802 1,046 60,461 1,171
Step a 1 Step b 2
S.E. ,024 1,339 ,784 ,029 1,763 ,854 ,125 ,034 4,365
Wald 12,268 12,558 10,724 12,337 14,451 8,973 5,681 9,515 11,402
df 1 1 1 1 1 1 1 1 1
Sig. ,000 ,000 ,001 ,000 ,000 ,003 ,017 ,002 ,001
Exp(B) 1,089 ,009 13,016 1,106 ,001 12,910 1,347 1,110 ,000
Step c 3
a. Variable(s) entered on step 1: YEARS. b. Variable(s) entered on step 2: SMOKING. c. Variable(s) entered on step 3: BMI.
49