Professional Documents
Culture Documents
122065
ATBA HW
Collaborated with: Navneet, Harish
Coefficient
Std. Error
p-value
Odds
-14.1875391
79.96391296
9.17319965
6.12205267
39.26251602
6.86388016
0.02047934
0.04168537
0.1814038
*
*
9635.410156
Prateek Shukla
122065
Part b)
Variables
for new
bank
Input variables
Coefficient
Constant term
-14.187539
TotExp/Assets
79.963913
0.11
TotLns&Lses/Assets
9.1731997
0.6
logit
odds
prob
0.112411
1.118973
0.528073
Since probability is more than cutoff value (.5), the bank is classified as financially weak
Part c) Cutoff Value
Cutoff value based on probability (c) = .5
Cutoff value based on odds = c / (1-c) = 1
Cutoff value based on logit = log (odd) = 0
Part d)
If we increase TotExp/Assets by 1, keeping TotLns&Lses/Assets as constant, the odd of being financially weak will increase by a
factor of e^79.963913 .
Part e)
Here, if we decrease the cutoff value of the probability, the error due to misclassification of financially
strong bank as weak is reduced. Thus reducing the misclassification costs. Hence, the cutoff value should
be decreased.
Prateek Shukla
122065
Category Name
Cat_1
Cat_2
Cat_3
Cat_4
Cat_3
Cat_5
Cat_1
Cat_4
Cat_6
Cat_7
Cat_8
Cat_4
Cat_2
Cat_9
Cat_10
Cat_2
Cat_11
Cat_3
Duration
1
3
5
7
10
23
213
466
967
303
Average of
Competitive?
0.52173913
0.450704225
0.686695279
0.489141675
0.544554455
Category
Name
1_10
3_7
5_5
3_7
1_10
Average of
Competitive?
0.485207101
0.673357664
0.532163743
0.48
0.603960396
0.466898955
0.427350427
Category
Name
Sun_ wed
Mon
Tue
Sun_ wed
Thu
Fri_Sat
Fri_Sat
Count of
endDay
338
548
171
75
202
287
351
Prateek Shukla
122065
Part b)
Logistic regression was run after partitioning the dataset. Total no of variables was 25
The Regression Model
Input variables
Constant term
Category_2_Cat_10
Category_2_Cat_11
Category_2_Cat_2
Category_2_Cat_3
Category_2_Cat_4
Category_2_Cat_5
Category_2_Cat_6
Category_2_Cat_7
Category_2_Cat_8
Category_2_Cat_9
currency_GBP
currency_US
sellerRating
Duration_2_3_7
Duration_2_5_5
End Day_2_Mon
End Day_2_Sun_ w ed
End Day_2_Thu
End Day_2_Tue
ClosePrice
OpenPrice
Residual df
Residual Dev.
% Success in training data
# Iterations used
Multiple R-squared
Coefficient
-0.44009131
-0.79104191
-1.3078928
-0.86424637
-0.60280806
-0.22523223
-1.30768478
0.54232538
-1.61604917
-2.5017364
-0.16735646
3.23823524
0.39725608
-0.00002281
-0.12177611
0.55042392
0.50146151
-0.26748273
-1.93684411
-0.09799619
0.14427601
-0.15849413
1161
1097.416016
53.33896872
11
0.32867715
Std. Error
0.27366763
1.10404444
0.46053472
0.27521288
0.22884195
0.3231107
0.66424638
0.54925162
0.72925997
0.63963985
0.22549526
0.72436929
0.24818772
0.00001591
0.27831584
0.3270472
0.22710043
0.21167931
0.68460619
0.28015536
0.01442795
0.01547789
p-value
Odds 90% Confidence Interval
0.10780817 0.643977617 0.613180812 0.674774421
0.47368601 0.45337218 0.07375304 2.78695416
0.00451215 0.27038923 0.12676696 0.57673019
0.00168785 0.42136899 0.26795635
0.6626147
0.00843438 0.54727268 0.37560415 0.79740179
0.48575616 0.79833078 0.46921137 1.35830486
0.0489905 0.27044547 0.09069322 0.80646324
0.32345164 1.72000182 0.69690031 4.24509287
0.02669066 0.19868211
0.0598703 0.65933508
0.00009185 0.08194258 0.02861425 0.23465879
0.45798263 0.84589803 0.58376133 1.22574663
0.00000781 25.48870277 7.74272871 83.90763092
0.10945947 1.48773682 0.98908371 2.23778939
0.15177883 0.99997717
0.999951 1.00000334
0.66171509 0.88534659 0.56014204 1.39935672
0.09237305 1.73398793 1.01255739 2.96942592
0.02723699 1.65113258 1.13645589
2.3988955
0.2063656 0.76530355 0.54028195 1.08404422
0.00466738 0.14415817 0.04675094 0.44451681
0.72649455 0.90665233
0.5718888 1.43737459
0 1.15520287 1.12811053 1.18294597
0 0.85342801 0.83197492 0.87543422
Prateek Shukla
122065
Part c) Logistic regression excluding closing price
The Regression Model
Input variables
Constant term
Category_2_Cat_10
Category_2_Cat_11
Category_2_Cat_2
Category_2_Cat_3
Category_2_Cat_4
Category_2_Cat_5
Category_2_Cat_6
Category_2_Cat_7
Category_2_Cat_8
Category_2_Cat_9
currency_GBP
currency_US
sellerRating
Duration_2_3_7
Duration_2_5_5
End Day_2_Mon
End Day_2_Sun_ w ed
End Day_2_Thu
End Day_2_Tue
OpenPrice
Coefficient
0.3490935
1.13370371
0.4893876
-0.84495026
-0.43912375
0.02400566
-1.14571631
0.8775385
-1.58548605
-2.71096039
-0.43224972
1.89229739
-0.1914172
-0.00004362
0.07272565
0.91389531
0.7601704
-0.09534682
-0.92989129
0.11647422
-0.00549316
Std. Error
0.22879937
0.82462531
0.32210025
0.2280754
0.19728073
0.27618703
0.51691252
0.46306103
0.62581003
0.55936682
0.20738421
0.48742363
0.19648322
0.00001338
0.21553141
0.26471996
0.19846725
0.178409
0.4477064
0.23402157
0.00296826
p-value
0.12706903
0.16919015
0.12867084
0.00021164
0.02602204
0.9307366
0.02665997
0.05808158
0.01129316
0.00000126
0.03713341
0.0001035
0.32994908
0.0011125
0.73579663
0.00055581
0.00012804
0.59304523
0.03780052
0.61869043
0.06422202
Odds
*
3.10714316
1.6313169
0.42957872
0.64460099
1.02429616
0.31799605
2.40497255
0.20484819
0.06647293
0.64904726
6.63459349
0.82578802
0.99995637
1.0754354
2.49401855
2.13864064
0.90905762
0.39459661
1.1235286
0.99452192
PART C
Predicted Class
Actual
Class
Predicted Class
319
116
73
281
Actual
Class
299
136
160
194
Error Report
Class
Error Report
# Cases
# Errors
% Error
435
116
26.67
354
73
Overall
789
189
Class
# Cases
# Errors
% Error
435
136
31.26
20.62
354
160
45.20
23.95
Overall
789
296
37.52
On comparing the errors, model b is better with lower errors for both the classes.
Prateek Shukla
122065
PART B
PART C
On comparing the validation lift charts (fit), model b is again better as the sorted graph is much above
the average line in case of model b. Also the decile chart performance for model b is also better than
model c.
Part d)
Input variables
ClosePrice
Coefficient
Std. Error
p-value
Odds
0.144276
0.014428
1.1552029
90% Confidence
Interval
1.1281105
1.182946
The coefficient for ClosePrice is 0.144276, indicating If we increase ClosePrice by 1, keeping all other
predictors constant, the odd of being financially weak will increase by a factor of e^0.144276. , or by
1.1552029. The p value is very low, indicating that ClosePrice is statistically significant in the model and
we reject the hypothesis that the coefficient for ClosePrice is zero. The 90% CI range for the variable is
also practically low (between 1.1829 and 1.1281) . It also has the sixth largest impact on ods on
comparing with other predictors. Also, practically, the CLosePrice can be an important determinant of
auction competitiveness.
Prateek Shukla
122065
Part e) Best fit to training Data
Stepwise and exhaustive search were used to create the best subset. Criteria of best subset selection
was by judging the Cp (nearest to the no of coefficients)
Variables finally used in the model-
EXHAUSTIVE SEARCH
STEPWISE SELECTION
Input variables
Input variables
Constant term
Constant term
Category_2_Cat_11
Category_2_Cat_11
Category_2_Cat_2
Category_2_Cat_2
Category_2_Cat_3
Category_2_Cat_3
Category_2_Cat_5
Category_2_Cat_7
Category_2_Cat_7
Category_2_Cat_8
Category_2_Cat_8
currency_GBP
currency_GBP
Duration_2_5_5
Duration_2_5_5
End Day_2_Mon
End Day_2_Mon
End Day_2_Thu
End Day_2_Thu
ClosePrice
ClosePrice
OpenPrice
OpenPrice
Comparing fit:
On comparing the Residual deviation for both models, the residual deviation for exhaustive model was
slightly lower and multiple R square was slightly higher, thus indicating better overall fit. Also on
comparing the training data lift charts, the exhaustive model had slightly higher decile-global mean
ratios for initial deciles. Thus exhaustive model fits the training data better.
Exhaustive Search
Stepwise Selection
Decile
Mean
Std.Dev.
Min.
Max.
Decile
Mean
Std.Dev.
Min.
Max.
0.9830508
0.1290809
0.9830508
0.1290809
0.9322034
0.2655668
0.9237288
0.2786319
0.940678
0.2362264
0.9491525
0.2196861
0.6610169
0.4733641
0.6694915
0.4734399
0.5338983
0.4988496
0.5508475
0.4982734
0.6340042
0.48362
0.6133475
0.487722
0.0863347
0.2785075
0.0935734
0.2907412
0.3305085
0.4703962
0.3269774
0.467282
0.1355932
0.3332136
0.1271186
0.3234802
10
0.107438
0.3096693
10
0.107438
0.3096693
Residual Dev.
1106.0862
Residual Dev.
Multiple R-squared
0.3233733
Multiple R-squared
1110.1195
0.320906
Prateek Shukla
122065
Exhaustive Search
Stepwise Selection
Part f)
Exhaustive Search
Stepwise Selection
Predicted Class
Actual
Class
Predicted Class
318
117
86
268
Actual
Class
320
115
83
271
Error Report
Class
Error Report
# Cases
# Errors
% Error
435
117
26.90
354
86
Overall
789
203
Class
# Cases
# Errors
% Error
435
115
26.44
24.29
354
83
23.45
25.73
Overall
789
198
25.10
Prateek Shukla
122065
On comparing the errors, stepwise selection model is slightly better with lower errors for both the
classes as well as overall error. (Refer part e for predictors used)
Part g) Dangers of the best predictive model (Stepwise)
Part h) The best fitting model and the best predictive models are different and the criteria for selection
of best model is different in each case. The best fit model is selected based on the performance on
training dataset and the best predictive model is selected based on the performance of validation
dataset. A highly overfitted model may look as the best fit model on training dataset but it may not
perform well on validation dataset.
Part i) Accurate Classification (considering stepwise data)
On changing the values of cutoff probabilities and determining the % errors for the respected cutoff
values, it was observed that for cutoff value = .38, the total error was minimum (23.45%). The error for
the class 0 was more than that for the class 1.
Cut off
Prob.Val. for
Success
0.1
0.2
0.3
0.33
0.36
0.37
0.38
0.39
0.4
0.5
0.6
0.7
0.8
% Error
1
Overall
2.76
88.14
41.06
4.83
77.12
37.26
9.43
59.89
32.07
11.95
57.06
32.19
14.02
46.05
28.39
15.17
36.44
24.71
16.32
32.20
23.45
17.01
31.92
23.70
18.16
31.64
24.21
26.44
23.45
25.10
34.02
15.25
25.60
43.91
4.24
26.11
55.17
3.67
32.07
Prateek Shukla
122065
Part j)
The Regression Model
Input variables
Coefficient
Std. Error
p-value
Odds
Constant term
-0.5004671
0.1240727
5.491E-05
Category_2_Cat_11
-1.335864
0.4328027
0.002025
0.2629309
Category_2_Cat_2
-0.787796
0.2410092
0.0010803
0.4548462
Category_2_Cat_3
-0.5711761
0.188437
0.0024364
0.5648607
Category_2_Cat_7
-1.3216676
0.7154731
0.0647089
0.2666902
Category_2_Cat_8
-2.3378339
0.632952
0.0002212
0.0965365
currency_GBP
2.9857163
0.7010819
2.056E-05
19.80068
Duration_2_5_5
0.6854923
0.2099336
0.0010936
1.9847486
End Day_2_Mon
0.5395377
0.1944691
0.0055301
1.7152138
End Day_2_Thu
-1.9865052
0.6607946
0.002645
0.137174
ClosePrice
0.1392541
0.013746
1.1494161
OpenPrice
-0.1524749
0.0146576
0.8585804
Currency GBP
Duration 5 Days
End Day Monday
Opening Price As low as possible