Binary Logistic

Binary Logistic Regression
with SPSS
Karl L. Wuensch
Dept of Psychology
East Carolina University
Download the Instructional

Document
http://core.ecu.edu/psyc/wuenschk/SPSS
/SPSS-MV.htm
.
Click on Binary Logistic Regression .
Save to desktop.
Open the document.
When to Use Binary Logistic Regression

The criterion variable is dichotomous.
Predictor variables may be categorical or
continuous.
If predictors are all continuous and nicely
distributed, may use discriminant function
analysis.
If predictors are all categorical, may use
logit analysis.
Wuensch & Poteat, 1998

Cats being used as research subjects.
Stereotaxic surgery.
Subjects pretend they are on university
research committee.
Complaint filed by animal rights group.
Vote to stop or continue the research.
Purpose of the Research
Cosmetic
Theory Testing
Meat Production
Veterinary
Medical
Predictor Variables
Gender
Ethical Idealism (9-point Likert)
Ethical Relativism (9-point Likert)
Purpose of the Research
Model 1: Decision = Gender

Decision 0 = stop, 1 = continue
Gender 0 = female, 1 = male
Model is .. logit =
lnODDS ln
bX
1 Y
Y is the predicted probability of the event

which is coded with 1 (continue the research)
rather than with 0 (stop the research).
Iterative Maximum Likelihood

Procedure
SPSS starts with arbitrary regression
coefficents.
Tinkers with the regression coefficients to
find those which best reduce error.
Converges on final model.
SPSS
Bring the data into SPSS
http://core.ecu.edu/psyc/wuenschk/SPSS/L
ogistic.sav
Analyze, Regression, Binary Logistic
Decision Dependent
Gender Covariate(s), OK
Look at the Output

Case Processing Summary
a
Unweighted Cases
Selected Cases
N
Included in Analysis
Missing Cases
Total
Unselected Cases
Total
315
0
315
0
315
Percent
100.0
.0
100.0
.0
100.0
a. If weight is in effect, see classification table for the total

number of cases.
We have 315 cases.
Block 0 Model, Odds

Look at Variables in the Equation.
The model contains only the intercept
(constant, B0), a function of the marginal
distribution of the decisions.
Variables in the Equation
Step 0
Constant
B
-.379
S.E.
.115
Wald
10.919
df
1

Y
.379
ln ODDS ln
Sig.
.001
Exp(B)
.684
Exponentiate Both Sides

Exponentiate both sides of the equation:
e-.379 = .684 = Exp(B0) = odds of deciding to
continue the research.
Y
128
Exp( .379) .684
187
1 Y
128 voted to continue the research, 187 to stop

it.
Probabilities
Randomly select one participant.

P(votes continue) = 128/315 = 40.6%
P(votes stop) = 187/315 = 59.4%
Odds = 40.6/59.4 = .684
Repeatedly sample one participant and
guess how e will vote.
Humans vs. Goldfish

Humans Match Probabilities
(suppose p = .7, q = .3)

.7(.7) + .3(.3) = .49 + .09 = .58
Goldfish Maximize Probabilities
.7(1) = .70
The goldfish win!
SPSS Model 0 vs. Goldfish

Look at the Classification Table for Block 0.
Classification Tablea,b
Predicted
Step 0
Observed
decision
stop
continue
Overall Percentage
decision
stop
continue
187
0
128
0
Percentage
Correct
100.0
.0
59.4
a. Constant is included in the model.

b. The cut value is .500
SPSS Predicts STOP for every participant.

SPSS is as smart as a Goldfish here.
Block 1 Model
Gender has now been added to the
model.
Model Summary: -2 Log Likelihood = how
poorly model fits the data.
Model Summary
Step
1
-2 Log
Cox & Snell
likelihood
R Square
399.913a
.078
Nagelkerke
R Square
.106
a. Estimation terminated at iteration number 3 because

parameter estimates changed by less than .001.
Block 1 Model
For intercept only, -2LL = 425.666.
Add gender and -2LL = 399.913.
Omnibus Tests: Drop in -2LL = 25.653 =
Model 2.
df = 1, p < .001.
Omnibus Tests of Model Coefficients
Step 1
Step
Block
Model
Chi-square
25.653
25.653
25.653
df
1
1
1
Sig.
.000
.000
.000

ln(odds) = -.847 + 1.217Gender
ODDS e
a bGender

Step
a
1
gender
Constant
B
1.217
-.847
S.E.
.245
.154
a. Variable(s) entered on step 1: gender.
Wald
24.757
30.152
df
1
1
Sig.
.000
.000
Exp(B)
3.376
.429
Odds, Women
ODDS e
.847 1.217 ( 0 )
.847
0.429
A woman is only .429 as likely to decide to

continue the research as she is to decide
to stop it.
Odds, Men
ODDS e .847 1.217 (1) e .37 1.448
A man is 1.448 times more likely to vote to

continue the research than to stop the research.
Odds Ratio
male _ odds
1.448
3.376 e1.217
female _ odds .429
1.217 was the B (slope) for Gender, 3.376 is the
Exp(B), that is, the exponentiated slope, the
odds ratio.
Men are 3.376 times more likely to vote to
continue the research than are women.
Convert Odds to Probabilities

For our women,
ODDS
0.429
0.30
1 ODDS 1.429
For our men,

ODDS
1.448
0.59
1 ODDS 2.448
Classification
Decision Rule: If Prob (event) Cutoff,
then predict event will take place.
By default, SPSS uses .5 as Cutoff.
For every man, Prob(continue) = .59,
predict he will vote to continue.
For every woman Prob(continue) = .30,
predict she will vote to stop it.
Overall Success Rate

Look at the Classification Table
Classification Tablea
Predicted
Step 1
Observed
decision
stop
continue
decision
stop
continue
140
47
60
68
Overall Percentage
a. The cut value is .500
140 68 208
66%
315
315
SPSS beat the Goldfish!
Percentage
Correct
74.9
53.1
66.0
Sensitivity
P (correct prediction | event did occur)
P (predict Continue | subject voted to Continue)
Of all those who voted to continue the research,
for how many did we correctly predict that.
68
68
53%
68 60 128
Specificity
P (correct prediction | event did not occur)
P (predict Stop | subject voted to Stop)
Of all those who voted to stop the research, for
how many did we correctly predict that.
140
140
75%
140 47 187
False Positive Rate

P (incorrect prediction | predicted occurrence)
P (subject voted to Stop | we predicted Continue)
Of all those for whom we predicted a vote to Continue
the research, how often were we wrong.
47
47
41%
47 68 115
False Negative Rate

P (incorrect prediction | predicted nonoccurrence)
P (subject voted to Continue | we predicted Stop)
Of all those for whom we predicted a vote to Stop the
research, how often were we wrong.
60
60
30%
140 60 200
Pearson
Analyze, Descriptive Statistics, Crosstabs

Gender Rows; Decision Columns
Crosstabs Statistics
Statistics, Chi-Square, Continue
Crosstabs Cells
Cells, Observed Counts, Row
Percentages
Crosstabs Output
Continue, OK
59% & 30% match logistics predictions.
gender * decision Crosstabulation
gender
Female
Male
Total
Count
% within gender
Count
% within gender
Count
% within gender
decision
stop
continue
140
60
70.0%
30.0%
47
68
40.9%
59.1%
187
128
59.4%
40.6%
Total
200
100.0%
115
100.0%
315
100.0%
Crosstabs Output
Likelihood Ratio 2 = 25.653, as with
logistic.
Chi-Square Tests
Pearson Chi-Square
Likelihood Ratio
N of Valid Cases
Value
25.685b
25.653
315
df
1
1
Asymp. Sig.
(2-sided)
.000
.000
a. Computed only for a 2x2 table

b. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 46.73.
Model 2: Decision =
Idealism, Relativism, Gender
Decision Dependent
Gender, Idealism, Relatvsm
Covariate(s)
Click Options and check HosmerLemeshow goodness of fit and CI for

exp(B) 95%.
Continue, OK.
Comparing Nested Models

With only intercept and gender,
-2LL = 399.913.
Adding idealism and relativism dropped
-2LL to 346.503, a drop of 53.41.
2(2) = 399.913 346.503 = 53.41, p = ?
Model Summary
Step
1
-2 Log
Cox & Snell
likelihood
R Square
346.503a
.222
Nagelkerke
R Square
.300

Obtain p
Transform, Compute
Target Variable = p
Numeric Expression =
1 - CDF.CHISQ(53.41,2)
p=?
OK
Data Editor, Variable View
Set Decimal Points to 5 for p
p < .0001
Data Editor, Data View
p = .00000
Adding the ethical ideology variables
significantly improved the model.
Hosmer-Lemeshow
H: predictions made by the model fit
perfectly with observed group
memberships
Cases are arranged in order by their
predicted probability on the criterion.
Then divided into ten bins with
approximately equal n.
This gives ten rows in the table.
For each bin and each event, we have

number of observed cases and expected
number predicted from the model.
Contingency Table for Hosmer and Lemeshow Test
Step
1
1
2
3
4
5
6
7
8
9
10
decision = stop
Observed Expected
29
29.331
30
27.673
28
25.669
20
23.265
22
20.693
15
18.058
15
15.830
10
12.920
12
9.319
6
4.241
decision = continue
Observed Expected
3
2.669
2
4.327
4
6.331
12
8.735
10
11.307
17
13.942
17
16.170
22
19.080
20
22.681
21
22.759
Total
32
32
32
32
32
32
32
32
32
27
Note expected freqs decline in first

column, rise in second.
The nonsignificant chi-square is indicative
of good fit of data with linear model.
Hosmer and Lemeshow Test

Step
1
Chi-square
8.810
df
8
Sig.
.359
Hosmer-Lemeshow
There are problems with this procedure.
Not even Hosmer and Lemeshow
recommend it these days.
Even with good fit the test may be
significant if sample sizes are large
Even with poor fit the test may not be
significant if sample sizes are small.
Linearity of the Logit

We have assumed that the log odds are
related to the predictors in a linear fashion.
Use the Box-Tidwell test to evaluate this
assumption.
For each continuous predictor, compute
the natural log.
Include in the model interactions between
each predictor and its natural log.
Box-Tidwell
If an interaction is significant, there is a
problem.
For the troublesome predictor, try
including the square of that predictor.
That is, add a polynomial component to
the model.
See
T-Test versus Binary Logistic Regression

B
S.E.
gender
1.147
idealism
1.130 1.921
Wald
.269 18.129
.346
df
Sig.
1
.000 3.148
.556 3.097
relatvsm
1.656 2.637
.394
1
idealism by
Step 1a
-.652
.690
.893
1
idealism_LN
relatvsm by
-.479
.949
.254
1
relatvsm_LN
Constant
-5.015 5.877
.728
1
a. Variable(s) entered on step 1: gender, idealism, relatvsm, idealism *
idealism_LN , relatvsm * relatvsm_LN .
No Problem Here.
Exp(B)
.530 5.240
.345
.521
.614
.620
.393
.007
Model 3: Decision =
Idealism, Relativism, Gender, Purpose
Need 4 dummy variables to code the five
purposes.
Consider the Medical group a reference
group.
Dummy variables are: Cosmetic, Theory,
Meat, Veterin.
0 = not in this group, 1 = in this group.
Add the Dummy Variables

Add to the Covariates: Cosmetic, Theory,
Meat, Veterin.
OK
Block 0
Look at Variables not in the Equation.
Score is how much -2LL would drop if a
single variable were added to the model
with intercept only.
Variables not in the Equation
Step
0
Variables
Overall Statistics
gender
idealism
relatvsm
cosmetic
theory
meat
veterin
Score
25.685
47.679
7.239
.003
2.933
.556
.013
77.665
df
1
1
1
1
1
1
1
7
Sig.
.000
.000
.007
.955
.087
.456
.909
.000
Effect of Adding Purpose

Our previous model had -2LL = 346.503.
Adding Purpose dropped -2LL to 338.060.
Model Summary
Step
1
-2 Log
Cox & Snell
likelihood
R Square
a
338.060
.243
Nagelkerke
R Square
.327

2(4) = 8.443, p = .0766.

But I make planned comparisons (with medical
reference group) anyhow!
Classification Table
YOU calculate the sensitivity, specificity,
false positive rate, and false negative rate.
Classification Tablea
Predicted
Step 1
Observed
decision
Overall Percentage
a. The cut value is .500
stop
continue
decision
stop
continue
152
35
54
74
Percentage
Correct
81.3
57.8
71.7
Answer Key
Sensitivity = 74/128 = 58%

Specificity = 152/187 = 81%
False Positive Rate = 35/109 = 32%
False Negative Rate = 54/206 = 26%
Wald Chi-Square
A conservative test of the unique
contribution of each predictor.
Presented in Variables in the Equation.
Alternative: drop one predictor from the
model, observe the increase in -2LL, test
via 2.
Step
a
1
gender
idealism
relatvsm
cosmetic
theory
meat
veterin
Constant
B
1.255
-.701
.326
-.709
-1.160
-.866
-.542
2.279
Wald
20.586
37.891
6.634
2.850
7.346
4.164
1.751
4.867
df
1
1
1
1
1
1
1
1
Sig.
.000
.000
.010
.091
.007
.041
.186
.027
Exp(B)
3.508
.496
1.386
.492
.314
.421
.581
9.766
95.0% C.I.for EXP(B)

Lower
Upper
2.040
6.033
.397
.620
1.081
1.777
.216
1.121
.136
.725
.183
.966
.260
1.298
a. Variable(s) entered on step 1: gender, idealism, relatvsm, cosmetic, theory, meat, veterin.
Odds Ratios Exp(B)

Odds of approval more than cut in half (.496) for
each one point increase in Idealism.
Odds of approval multiplied by 1.39 for each one
point increase in Relativism.
Odds of approval if purpose is Theory Testing
are only .314 what they are for Medical
Research.
Odds of approval if purpose is Agricultural
Research are only .421 what they are for
Medical research
Inverted Odds Ratios

Some folks have problems with odds
ratios less than 1.
Just invert the odds ratio.
For example, 1/.421 = 2.38.
That is, respondents were more than two
times more likely to approve the medical
research than the research designed to
feed to poor in the third world.
Classification Decision Rule

Consider a screening test for Cancer.
Which is the more serious error
False Positive test says you have cancer,
but you do not
False Negative test says you do not have
cancer but you do
Want to reduce the False Negative rate?
Classification Decision Rule

Options
Classification Cutoff = .4, Continue, OK
Effect of Lowering Cutoff

YOU calculate the Sensitivity, Specificity,
False Positive Rate, and False Negative
Rate for the model with the cutoff at .4.
Fill in the table on page 15 of the handout.
Answer Key
SAS Rules
See, on page 16 of the handout, how easy
SAS makes it to see the effect of changing
the cutoff.
SAS classification tables remove bias
(using a jackknifed classification
procedure), SPSS does not have this
feature.
Presenting the Results

See the handout.
Interaction Terms
May want to standardize continuous
predictor variables.
Compute the interaction terms or
Let Logistic compute them.
Deliberation and Physical

Attractiveness in a Mock Trial
Subjects are mock jurors in a criminal trial.
For half the defendant is plain, for the
other half physically attractive.
Half recommend a verdict with no
deliberation, half deliberate first.
Get the Data

Bring Logistic2x2x2.sav into SPSS.
Each row is one cell in 2x2x2 contingency
table.
Could do a logit analysis, but will do
logistic regression instead.
Tell SPSS to weight cases by Freq. Data,

Weight Cases:
Dependent = Guilty.
Covariates = Delib, Plain.
In left pane highlight Delib and Plain.
Then click >a*b> to create the interaction

term.
Under Options, ask for the HosmerLemeshow test and confidence intervals
on the odds ratios.
Significant Interaction
The interaction is large and significant
(odds ratio of .030), so we shall ignore the
main effects.
Step
a
1
Delib
Plain
Delib by Plain
Constant
Wald
3.697
4.204
8.075
.037
df
1
1
1
1
Sig.
.054
.040
.004
.847
a. Variable(s) entered on step 1: Delib, Plain, Delib * Plain .
Exp(B)
.338
3.134
.030
1.077
95.0% C.I.for EXP(B)

Lower
Upper
.112
1.021
1.052
9.339
.003
.338
Use Crosstabs to test the conditional

effects of Plain at each level of Delib.
Split file by Delib.
Analyze, Crosstabs.
Rows = Plain, Columns = Guilty.
Statistics, Chi-square, Continue.
Cells, Observed Counts and Column
Percentages.
Continue, OK.
Rows = Plain, Columns = Guilty
For those who did deliberate, the odds of

a guilty verdict are 1/29 when the
defendant was plain and 8/22 when she
was attractive, yielding a conditional odds
ratio of 0.09483 .
a
Plain * Guilty Crosstabulation
Guilty
Plain
Total
Attrractive Count
% within Plain
Plain
Count
% within Plain
Count
% within Plain
a. Delib = Yes
No
Yes
Total
22
73.3%
29
96.7%
51
85.0%
8
26.7%
1
3.3%
9
15.0%
30
100.0%
30
100.0%
60
100.0%
For those who did not deliberate, the odds

of a guilty verdict are 27/8 when the
defendant was plain and 14/13 when she
was attractive, yielding a conditional odds
ratio of 3.1339.
a
Plain * Guilty Crosstabulation
Guilty
Plain
Total
Attrractive Count
% within Plain
Plain
Count
% within Plain
Count
% within Plain
a. Delib = No
No
Yes
Total
13
48.1%
8
22.9%
21
33.9%
14
51.9%
27
77.1%
41
66.1%
27
100.0%
35
100.0%
62
100.0%
Interaction Odds Ratio

The interaction odds ratio is simply the ratio of
these conditional odds ratios that is, .
09483/3.1339 = 0.030.
Among those who did not deliberate, the plain
defendant was found guilty significantly more
often than the attractive defendant, 2(1, N = 62)
= 4.353, p = .037.
Among those who did deliberate, the attractive
defendant was found guilty significantly more
often than the plain defendant, 2(1, N = 60) =
6.405, p = .011.
Interaction Between Continuous

and Dichotomous Predictor
Interaction Falls Short of

Significance
Standardizing Predictors
Most helpful with continuous predictors.
Especially when want to compare the
relative contributions of predictors in the
model.
Also useful when the predictor is
measured in units that are not intrinsically
meaningful.
Predicting Retention in ECUs

Engineering Program
Practice Your New Skills

Try the exercises in the handout.

Binary Logistic

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Binary Logistic

Uploaded by

Copyright:

Available Formats

Binary Logistic Regression

Download the Instructional

When to Use Binary Logistic Regression

Wuensch & Poteat, 1998

Purpose of the Research

Model 1: Decision = Gender

Y is the predicted probability of the event

Iterative Maximum Likelihood

Analyze, Regression, Binary Logistic

Look at the Output

a. If weight is in effect, see classification table for the total

We have 315 cases.

Block 0 Model, Odds

Exponentiate Both Sides

128 voted to continue the research, 187 to stop

Randomly select one participant.

Humans vs. Goldfish

(suppose p = .7, q = .3)

Goldfish Maximize Probabilities

The goldfish win!

SPSS Model 0 vs. Goldfish

a. Constant is included in the model.

SPSS Predicts STOP for every participant.

a. Estimation terminated at iteration number 3 because

Variables in the Equation

Variables in the Equation

a. Variable(s) entered on step 1: gender.

A woman is only .429 as likely to decide to

A man is 1.448 times more likely to vote to

Convert Odds to Probabilities

For our men,

Overall Success Rate

SPSS beat the Goldfish!

False Positive Rate

False Negative Rate

Analyze, Descriptive Statistics, Crosstabs

a. Computed only for a 2x2 table

Click Options and check HosmerLemeshow goodness of fit and CI for

Comparing Nested Models

a. Estimation terminated at iteration number 4 because

For each bin and each event, we have

Note expected freqs decline in first

Hosmer and Lemeshow Test

Linearity of the Logit

Variables in the Equation

Add the Dummy Variables

Effect of Adding Purpose

a. Estimation terminated at iteration number 5 because

2(4) = 8.443, p = .0766.

a. The cut value is .500

Sensitivity = 74/128 = 58%

Variables in the Equation

95.0% C.I.for EXP(B)

Odds Ratios Exp(B)

Inverted Odds Ratios

Classification Decision Rule

Want to reduce the False Negative rate?

Classification Decision Rule

Effect of Lowering Cutoff

Presenting the Results

Deliberation and Physical

Get the Data

Tell SPSS to weight cases by Freq. Data,

Then click >a*b> to create the interaction