Logistic Regression EBay

Prateek Shukla
122065
ATBA HW
Collaborated with: Navneet, Harish
Sol 8.1 : BANK

Part a
Estimated Equations
The Regression Model
Input variables
Constant term
TotExp/Assets
TotLns&Lses/Assets
Coefficient
Std. Error
p-value
Odds
-14.1875391
79.96391296
9.17319965
6.12205267
39.26251602
6.86388016
0.02047934
0.04168537
0.1814038
*
*
9635.410156
a) Logit as a function of predictors

Logit = -14.1875391 + 79.96391296 * TotExp/Assets + 9.17319965 * TotLns&Lses/Assets
b) Odds as a function of predictors
Odds ( Financial condition = 1) = e^ (-14.1875391 + 79.96391296 * TotExp/Assets + 9.17319965 * TotLns&Lses/Assets)
Here, Financial condition = 1 means that the bank is financially weak
c) Probability as a function of predictors
P( Financial condition = 1 ) =odds/(1+odds)= (e^ (-14.1875391 + 79.96391296 * TotExp/Assets + 9.17319965 *
TotLns&Lses/Assets)/(1+(e^(-14.1875391 + 79.96391296 * TotExp/Assets + 9.17319965 * TotLns&Lses/Assets))
Here, Financial condition = 1 means that the bank is financially weak
Prateek Shukla
122065
Part b)
Variables
for new
bank
Input variables
Coefficient
Constant term
-14.187539
TotExp/Assets
79.963913
0.11
TotLns&Lses/Assets
9.1731997
0.6
logit
odds
prob
0.112411
1.118973
0.528073
Since probability is more than cutoff value (.5), the bank is classified as financially weak
Part c) Cutoff Value
Cutoff value based on probability (c) = .5
Cutoff value based on odds = c / (1-c) = 1
Cutoff value based on logit = log (odd) = 0
Part d)
If we increase TotExp/Assets by 1, keeping TotLns&Lses/Assets as constant, the odd of being financially weak will increase by a
factor of e^79.963913 .
Part e)
Here, if we decrease the cutoff value of the probability, the error due to misclassification of financially
strong bank as weak is reduced. Thus reducing the misclassification costs. Hence, the cutoff value should
be decreased.
Prateek Shukla
122065
Sol 8.4 : EBAY

Part a)
Categorical predictors were combined analyzing their average of binary predictor variable using pivot
table
11 variables were created from category
Category
Count of Category Average of Competitive?
Antique/Art/Craft
177
0.564971751
Automotive
178
0.353932584
Books
54
0.5
Business/Industrial
18
0.666666667
Clothing/Accessories
119
0.504201681
Coins/Stamps
37
0.297297297
Collectibles
239
0.577405858
Computer
36
0.666666667
Electronics
55
0.8
EverythingElse
17
0.235294118
Health/Beauty
64
0.171875
Home/Garden
102
0.656862745
Jewelry
82
0.365853659
Music/Movie/Game
403
0.602977667
Photography
13
0.846153846
Pottery/Glass
20
0.35
SportingGoods
124
0.725806452
Toys/Hobbies
234
0.52991453
Category Name
Cat_1
Cat_2
Cat_3
Cat_4
Cat_3
Cat_5
Cat_1
Cat_4
Cat_6
Cat_7
Cat_8
Cat_4
Cat_2
Cat_9
Cat_10
Cat_2
Cat_11
Cat_3
4 variables were created from Duration

Count of
Duration
Duration
1
3
5
7
10
23
213
466
967
303
Average of
Competitive?
0.52173913
0.450704225
0.686695279
0.489141675
0.544554455
Category
Name
1_10
3_7
5_5
3_7
1_10
Average of
Competitive?
0.485207101
0.673357664
0.532163743
0.48
0.603960396
0.466898955
0.427350427
Category
Name
Sun_ wed
Mon
Tue
Sun_ wed
Thu
Fri_Sat
Fri_Sat
5 variables were created from end day

endDay
Sun
Mon
Tue
Wed
Thu
Fri
Sat
Count of
endDay
338
548
171
75
202
287
351
Prateek Shukla
122065
Part b)
Logistic regression was run after partitioning the dataset. Total no of variables was 25
Input variables
Constant term
Category_2_Cat_10
Category_2_Cat_11
Category_2_Cat_2
Category_2_Cat_3
Category_2_Cat_4
Category_2_Cat_5
Category_2_Cat_6
Category_2_Cat_7
Category_2_Cat_8
Category_2_Cat_9
currency_GBP
currency_US
sellerRating
Duration_2_3_7
Duration_2_5_5
End Day_2_Mon
End Day_2_Sun_ w ed
End Day_2_Thu
End Day_2_Tue
ClosePrice
OpenPrice
Residual df
Residual Dev.
% Success in training data
# Iterations used
Multiple R-squared
Coefficient
-0.44009131
-0.79104191
-1.3078928
-0.86424637
-0.60280806
-0.22523223
-1.30768478
0.54232538
-1.61604917
-2.5017364
-0.16735646
3.23823524
0.39725608
-0.00002281
-0.12177611
0.55042392
0.50146151
-0.26748273
-1.93684411
-0.09799619
0.14427601
-0.15849413
1161
1097.416016
53.33896872
11
0.32867715
Std. Error
0.27366763
1.10404444
0.46053472
0.27521288
0.22884195
0.3231107
0.66424638
0.54925162
0.72925997
0.63963985
0.22549526
0.72436929
0.24818772
0.00001591
0.27831584
0.3270472
0.22710043
0.21167931
0.68460619
0.28015536
0.01442795
0.01547789
p-value
Odds 90% Confidence Interval
0.10780817 0.643977617 0.613180812 0.674774421
0.47368601 0.45337218 0.07375304 2.78695416
0.00451215 0.27038923 0.12676696 0.57673019
0.00168785 0.42136899 0.26795635
0.6626147
0.00843438 0.54727268 0.37560415 0.79740179
0.48575616 0.79833078 0.46921137 1.35830486
0.0489905 0.27044547 0.09069322 0.80646324
0.32345164 1.72000182 0.69690031 4.24509287
0.02669066 0.19868211
0.0598703 0.65933508
0.00009185 0.08194258 0.02861425 0.23465879
0.45798263 0.84589803 0.58376133 1.22574663
0.00000781 25.48870277 7.74272871 83.90763092
0.10945947 1.48773682 0.98908371 2.23778939
0.15177883 0.99997717
0.999951 1.00000334
0.66171509 0.88534659 0.56014204 1.39935672
0.09237305 1.73398793 1.01255739 2.96942592
0.02723699 1.65113258 1.13645589
2.3988955
0.2063656 0.76530355 0.54028195 1.08404422
0.00466738 0.14415817 0.04675094 0.44451681
0.72649455 0.90665233
0.5718888 1.43737459
0 1.15520287 1.12811053 1.18294597
0 0.85342801 0.83197492 0.87543422
Prateek Shukla
122065
Part c) Logistic regression excluding closing price
Input variables
Constant term
Category_2_Cat_10
Category_2_Cat_11
Category_2_Cat_2
Category_2_Cat_3
Category_2_Cat_4
Category_2_Cat_5
Category_2_Cat_6
Category_2_Cat_7
Category_2_Cat_8
Category_2_Cat_9
currency_GBP
currency_US
sellerRating
Duration_2_3_7
Duration_2_5_5
End Day_2_Mon
End Day_2_Sun_ w ed
End Day_2_Thu
End Day_2_Tue
OpenPrice
Coefficient
0.3490935
1.13370371
0.4893876
-0.84495026
-0.43912375
0.02400566
-1.14571631
0.8775385
-1.58548605
-2.71096039
-0.43224972
1.89229739
-0.1914172
-0.00004362
0.07272565
0.91389531
0.7601704
-0.09534682
-0.92989129
0.11647422
-0.00549316
Std. Error
0.22879937
0.82462531
0.32210025
0.2280754
0.19728073
0.27618703
0.51691252
0.46306103
0.62581003
0.55936682
0.20738421
0.48742363
0.19648322
0.00001338
0.21553141
0.26471996
0.19846725
0.178409
0.4477064
0.23402157
0.00296826
p-value
0.12706903
0.16919015
0.12867084
0.00021164
0.02602204
0.9307366
0.02665997
0.05808158
0.01129316
0.00000126
0.03713341
0.0001035
0.32994908
0.0011125
0.73579663
0.00055581
0.00012804
0.59304523
0.03780052
0.61869043
0.06422202
Odds
*
3.10714316
1.6313169
0.42957872
0.64460099
1.02429616
0.31799605
2.40497255
0.20484819
0.06647293
0.64904726
6.63459349
0.82578802
0.99995637
1.0754354
2.49401855
2.13864064
0.90905762
0.39459661
1.1235286
0.99452192
Comparison between the models (Part b & c)

PART B
PART C
Classification Confusion Matrix
Predicted Class
Actual
Class
Predicted Class
319
116
73
281
Actual
Class
299
136
160
194
Error Report
Class
Error Report
# Cases
# Errors
% Error
435
116
26.67
354
73
Overall
789
189
Class
# Cases
# Errors
% Error
435
136
31.26
20.62
354
160
45.20
23.95
Overall
789
296
37.52
On comparing the errors, model b is better with lower errors for both the classes.
Prateek Shukla
122065
PART B
PART C
On comparing the validation lift charts (fit), model b is again better as the sorted graph is much above
the average line in case of model b. Also the decile chart performance for model b is also better than
model c.
Part d)
Input variables
ClosePrice
Coefficient
Std. Error
p-value
Odds
0.144276
0.014428
1.1552029
90% Confidence
Interval
1.1281105
1.182946
The coefficient for ClosePrice is 0.144276, indicating If we increase ClosePrice by 1, keeping all other
predictors constant, the odd of being financially weak will increase by a factor of e^0.144276. , or by
1.1552029. The p value is very low, indicating that ClosePrice is statistically significant in the model and
we reject the hypothesis that the coefficient for ClosePrice is zero. The 90% CI range for the variable is
also practically low (between 1.1829 and 1.1281) . It also has the sixth largest impact on ods on
comparing with other predictors. Also, practically, the CLosePrice can be an important determinant of
auction competitiveness.
Prateek Shukla
122065
Part e) Best fit to training Data
Stepwise and exhaustive search were used to create the best subset. Criteria of best subset selection
was by judging the Cp (nearest to the no of coefficients)
Variables finally used in the model-
EXHAUSTIVE SEARCH
STEPWISE SELECTION
Input variables
Input variables
Constant term
Constant term
Category_2_Cat_11
Category_2_Cat_11
Category_2_Cat_2
Category_2_Cat_2
Category_2_Cat_3
Category_2_Cat_3
Category_2_Cat_5
Category_2_Cat_7
Category_2_Cat_7
Category_2_Cat_8
Category_2_Cat_8
currency_GBP
currency_GBP
Duration_2_5_5
Duration_2_5_5
End Day_2_Mon
End Day_2_Mon
End Day_2_Thu
End Day_2_Thu
ClosePrice
ClosePrice
OpenPrice
OpenPrice
Comparing fit:
On comparing the Residual deviation for both models, the residual deviation for exhaustive model was
slightly lower and multiple R square was slightly higher, thus indicating better overall fit. Also on
comparing the training data lift charts, the exhaustive model had slightly higher decile-global mean
ratios for initial deciles. Thus exhaustive model fits the training data better.
Exhaustive Search
Stepwise Selection
Decile
Mean
Std.Dev.
Min.
Max.
Decile
Mean
Std.Dev.
Min.
Max.
0.9830508
0.1290809
0.9830508
0.1290809
0.9322034
0.2655668
0.9237288
0.2786319
0.940678
0.2362264
0.9491525
0.2196861
0.6610169
0.4733641
0.6694915
0.4734399
0.5338983
0.4988496
0.5508475
0.4982734
0.6340042
0.48362
0.6133475
0.487722
0.0863347
0.2785075
0.0935734
0.2907412
0.3305085
0.4703962
0.3269774
0.467282
0.1355932
0.3332136
0.1271186
0.3234802
10
0.107438
0.3096693
10
0.107438
0.3096693
Residual Dev.
1106.0862
Residual Dev.
Multiple R-squared
0.3233733
Multiple R-squared
1110.1195
0.320906
Prateek Shukla
122065
Exhaustive Search
Stepwise Selection
Part f)
Exhaustive Search
Stepwise Selection
Validation Data scoring - Summary Report
Validation Data scoring - Summary Report
Predicted Class
Actual
Class
Predicted Class
318
117
86
268
Actual
Class
320
115
83
271
Error Report
Class
Error Report
# Cases
# Errors
% Error
435
117
26.90
354
86
Overall
789
203
Class
# Cases
# Errors
% Error
435
115
26.44
24.29
354
83
23.45
25.73
Overall
789
198
25.10
Prateek Shukla
122065
On comparing the errors, stepwise selection model is slightly better with lower errors for both the
classes as well as overall error. (Refer part e for predictors used)
Part g) Dangers of the best predictive model (Stepwise)
Error is on higher side (25.1)

Errors for both the classes is high, thus it will be costly if the cost of mistakes is high
It is not totally free of the random idiosyncrasies of the validation dataset and may not work
equally well for data which are outside of the dataset (As the dataset is only for a specific
period)
Part h) The best fitting model and the best predictive models are different and the criteria for selection
of best model is different in each case. The best fit model is selected based on the performance on
training dataset and the best predictive model is selected based on the performance of validation
dataset. A highly overfitted model may look as the best fit model on training dataset but it may not
perform well on validation dataset.
Part i) Accurate Classification (considering stepwise data)
On changing the values of cutoff probabilities and determining the % errors for the respected cutoff
values, it was observed that for cutoff value = .38, the total error was minimum (23.45%). The error for
the class 0 was more than that for the class 1.
Cut off
Prob.Val. for
Success
0.1
0.2
0.3
0.33
0.36
0.37
0.38
0.39
0.4
0.5
0.6
0.7
0.8
% Error
1
Overall
2.76
88.14
41.06
4.83
77.12
37.26
9.43
59.89
32.07
11.95
57.06
32.19
14.02
46.05
28.39
15.17
36.44
24.71
16.32
32.20
23.45
17.01
31.92
23.70
18.16
31.64
24.21
26.44
23.45
25.10
34.02
15.25
25.60
43.91
4.24
26.11
55.17
3.67
32.07
Prateek Shukla
122065
Part j)
Input variables
Coefficient
Std. Error
p-value
Odds
Constant term
-0.5004671
0.1240727
5.491E-05
Category_2_Cat_11
-1.335864
0.4328027
0.002025
0.2629309
Category_2_Cat_2
-0.787796
0.2410092
0.0010803
0.4548462
Category_2_Cat_3
-0.5711761
0.188437
0.0024364
0.5648607
Category_2_Cat_7
-1.3216676
0.7154731
0.0647089
0.2666902
Category_2_Cat_8
-2.3378339
0.632952
0.0002212
0.0965365
currency_GBP
2.9857163
0.7010819
2.056E-05
19.80068
Duration_2_5_5
0.6854923
0.2099336
0.0010936
1.9847486
End Day_2_Mon
0.5395377
0.1944691
0.0055301
1.7152138
End Day_2_Thu
-1.9865052
0.6607946
0.002645
0.137174
ClosePrice
0.1392541
0.013746
1.1494161
OpenPrice
-0.1524749
0.0146576
0.8585804
In order to have a competitive auction, the following conditions would be helpful:
Currency GBP
Duration 5 Days
End Day Monday
Opening Price As low as possible

Logistic Regression EBay

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Logistic Regression EBay

Uploaded by

Copyright:

Available Formats

Prateek Shukla

Sol 8.1 : BANK

a) Logit as a function of predictors

Sol 8.4 : EBAY

4 variables were created from Duration

5 variables were created from end day

Comparison between the models (Part b & c)

Classification Confusion Matrix

Classification Confusion Matrix

Validation Data scoring - Summary Report

Validation Data scoring - Summary Report

Classification Confusion Matrix

Classification Confusion Matrix

Error is on higher side (25.1)

In order to have a competitive auction, the following conditions would be helpful:

You might also like