You are on page 1of 10

Prateek Shukla

122065

ATBA HW
Collaborated with: Navneet, Harish

Sol 8.1 : BANK


Part a
Estimated Equations
The Regression Model
Input variables
Constant term
TotExp/Assets
TotLns&Lses/Assets

Coefficient

Std. Error

p-value

Odds

-14.1875391
79.96391296
9.17319965

6.12205267
39.26251602
6.86388016

0.02047934
0.04168537
0.1814038

*
*
9635.410156

a) Logit as a function of predictors


Logit = -14.1875391 + 79.96391296 * TotExp/Assets + 9.17319965 * TotLns&Lses/Assets
b) Odds as a function of predictors
Odds ( Financial condition = 1) = e^ (-14.1875391 + 79.96391296 * TotExp/Assets + 9.17319965 * TotLns&Lses/Assets)
Here, Financial condition = 1 means that the bank is financially weak
c) Probability as a function of predictors
P( Financial condition = 1 ) =odds/(1+odds)= (e^ (-14.1875391 + 79.96391296 * TotExp/Assets + 9.17319965 *
TotLns&Lses/Assets)/(1+(e^(-14.1875391 + 79.96391296 * TotExp/Assets + 9.17319965 * TotLns&Lses/Assets))
Here, Financial condition = 1 means that the bank is financially weak

Prateek Shukla
122065
Part b)
Variables
for new
bank

Input variables

Coefficient

Constant term

-14.187539

TotExp/Assets

79.963913

0.11

TotLns&Lses/Assets

9.1731997

0.6
logit
odds
prob

0.112411
1.118973
0.528073

Since probability is more than cutoff value (.5), the bank is classified as financially weak
Part c) Cutoff Value
Cutoff value based on probability (c) = .5
Cutoff value based on odds = c / (1-c) = 1
Cutoff value based on logit = log (odd) = 0

Part d)
If we increase TotExp/Assets by 1, keeping TotLns&Lses/Assets as constant, the odd of being financially weak will increase by a
factor of e^79.963913 .

Part e)
Here, if we decrease the cutoff value of the probability, the error due to misclassification of financially
strong bank as weak is reduced. Thus reducing the misclassification costs. Hence, the cutoff value should
be decreased.

Prateek Shukla
122065

Sol 8.4 : EBAY


Part a)
Categorical predictors were combined analyzing their average of binary predictor variable using pivot
table
11 variables were created from category
Category
Count of Category Average of Competitive?
Antique/Art/Craft
177
0.564971751
Automotive
178
0.353932584
Books
54
0.5
Business/Industrial
18
0.666666667
Clothing/Accessories
119
0.504201681
Coins/Stamps
37
0.297297297
Collectibles
239
0.577405858
Computer
36
0.666666667
Electronics
55
0.8
EverythingElse
17
0.235294118
Health/Beauty
64
0.171875
Home/Garden
102
0.656862745
Jewelry
82
0.365853659
Music/Movie/Game
403
0.602977667
Photography
13
0.846153846
Pottery/Glass
20
0.35
SportingGoods
124
0.725806452
Toys/Hobbies
234
0.52991453

Category Name
Cat_1
Cat_2
Cat_3
Cat_4
Cat_3
Cat_5
Cat_1
Cat_4
Cat_6
Cat_7
Cat_8
Cat_4
Cat_2
Cat_9
Cat_10
Cat_2
Cat_11
Cat_3

4 variables were created from Duration


Count of
Duration

Duration
1
3
5
7
10

23
213
466
967
303

Average of
Competitive?
0.52173913
0.450704225
0.686695279
0.489141675
0.544554455

Category
Name
1_10
3_7
5_5
3_7
1_10

Average of
Competitive?
0.485207101
0.673357664
0.532163743
0.48
0.603960396
0.466898955
0.427350427

Category
Name
Sun_ wed
Mon
Tue
Sun_ wed
Thu
Fri_Sat
Fri_Sat

5 variables were created from end day


endDay
Sun
Mon
Tue
Wed
Thu
Fri
Sat

Count of
endDay
338
548
171
75
202
287
351

Prateek Shukla
122065

Part b)
Logistic regression was run after partitioning the dataset. Total no of variables was 25
The Regression Model
Input variables
Constant term
Category_2_Cat_10
Category_2_Cat_11
Category_2_Cat_2
Category_2_Cat_3
Category_2_Cat_4
Category_2_Cat_5
Category_2_Cat_6
Category_2_Cat_7
Category_2_Cat_8
Category_2_Cat_9
currency_GBP
currency_US
sellerRating
Duration_2_3_7
Duration_2_5_5
End Day_2_Mon
End Day_2_Sun_ w ed
End Day_2_Thu
End Day_2_Tue
ClosePrice
OpenPrice

Residual df
Residual Dev.
% Success in training data
# Iterations used
Multiple R-squared

Coefficient
-0.44009131
-0.79104191
-1.3078928
-0.86424637
-0.60280806
-0.22523223
-1.30768478
0.54232538
-1.61604917
-2.5017364
-0.16735646
3.23823524
0.39725608
-0.00002281
-0.12177611
0.55042392
0.50146151
-0.26748273
-1.93684411
-0.09799619
0.14427601
-0.15849413

1161
1097.416016
53.33896872
11
0.32867715

Std. Error
0.27366763
1.10404444
0.46053472
0.27521288
0.22884195
0.3231107
0.66424638
0.54925162
0.72925997
0.63963985
0.22549526
0.72436929
0.24818772
0.00001591
0.27831584
0.3270472
0.22710043
0.21167931
0.68460619
0.28015536
0.01442795
0.01547789

p-value
Odds 90% Confidence Interval
0.10780817 0.643977617 0.613180812 0.674774421
0.47368601 0.45337218 0.07375304 2.78695416
0.00451215 0.27038923 0.12676696 0.57673019
0.00168785 0.42136899 0.26795635
0.6626147
0.00843438 0.54727268 0.37560415 0.79740179
0.48575616 0.79833078 0.46921137 1.35830486
0.0489905 0.27044547 0.09069322 0.80646324
0.32345164 1.72000182 0.69690031 4.24509287
0.02669066 0.19868211
0.0598703 0.65933508
0.00009185 0.08194258 0.02861425 0.23465879
0.45798263 0.84589803 0.58376133 1.22574663
0.00000781 25.48870277 7.74272871 83.90763092
0.10945947 1.48773682 0.98908371 2.23778939
0.15177883 0.99997717
0.999951 1.00000334
0.66171509 0.88534659 0.56014204 1.39935672
0.09237305 1.73398793 1.01255739 2.96942592
0.02723699 1.65113258 1.13645589
2.3988955
0.2063656 0.76530355 0.54028195 1.08404422
0.00466738 0.14415817 0.04675094 0.44451681
0.72649455 0.90665233
0.5718888 1.43737459
0 1.15520287 1.12811053 1.18294597
0 0.85342801 0.83197492 0.87543422

Prateek Shukla
122065
Part c) Logistic regression excluding closing price
The Regression Model
Input variables
Constant term
Category_2_Cat_10
Category_2_Cat_11
Category_2_Cat_2
Category_2_Cat_3
Category_2_Cat_4
Category_2_Cat_5
Category_2_Cat_6
Category_2_Cat_7
Category_2_Cat_8
Category_2_Cat_9
currency_GBP
currency_US
sellerRating
Duration_2_3_7
Duration_2_5_5
End Day_2_Mon
End Day_2_Sun_ w ed
End Day_2_Thu
End Day_2_Tue
OpenPrice

Coefficient
0.3490935
1.13370371
0.4893876
-0.84495026
-0.43912375
0.02400566
-1.14571631
0.8775385
-1.58548605
-2.71096039
-0.43224972
1.89229739
-0.1914172
-0.00004362
0.07272565
0.91389531
0.7601704
-0.09534682
-0.92989129
0.11647422
-0.00549316

Std. Error
0.22879937
0.82462531
0.32210025
0.2280754
0.19728073
0.27618703
0.51691252
0.46306103
0.62581003
0.55936682
0.20738421
0.48742363
0.19648322
0.00001338
0.21553141
0.26471996
0.19846725
0.178409
0.4477064
0.23402157
0.00296826

p-value
0.12706903
0.16919015
0.12867084
0.00021164
0.02602204
0.9307366
0.02665997
0.05808158
0.01129316
0.00000126
0.03713341
0.0001035
0.32994908
0.0011125
0.73579663
0.00055581
0.00012804
0.59304523
0.03780052
0.61869043
0.06422202

Odds
*
3.10714316
1.6313169
0.42957872
0.64460099
1.02429616
0.31799605
2.40497255
0.20484819
0.06647293
0.64904726
6.63459349
0.82578802
0.99995637
1.0754354
2.49401855
2.13864064
0.90905762
0.39459661
1.1235286
0.99452192

Comparison between the models (Part b & c)


PART B

PART C

Classification Confusion Matrix

Classification Confusion Matrix

Predicted Class
Actual
Class

Predicted Class

319

116

73

281

Actual
Class

299

136

160

194

Error Report
Class

Error Report

# Cases

# Errors

% Error

435

116

26.67

354

73

Overall

789

189

Class

# Cases

# Errors

% Error

435

136

31.26

20.62

354

160

45.20

23.95

Overall

789

296

37.52

On comparing the errors, model b is better with lower errors for both the classes.

Prateek Shukla
122065

PART B

PART C

On comparing the validation lift charts (fit), model b is again better as the sorted graph is much above
the average line in case of model b. Also the decile chart performance for model b is also better than
model c.
Part d)
Input variables
ClosePrice

Coefficient

Std. Error

p-value

Odds

0.144276

0.014428

1.1552029

90% Confidence
Interval
1.1281105

1.182946

The coefficient for ClosePrice is 0.144276, indicating If we increase ClosePrice by 1, keeping all other
predictors constant, the odd of being financially weak will increase by a factor of e^0.144276. , or by
1.1552029. The p value is very low, indicating that ClosePrice is statistically significant in the model and
we reject the hypothesis that the coefficient for ClosePrice is zero. The 90% CI range for the variable is
also practically low (between 1.1829 and 1.1281) . It also has the sixth largest impact on ods on
comparing with other predictors. Also, practically, the CLosePrice can be an important determinant of
auction competitiveness.

Prateek Shukla
122065
Part e) Best fit to training Data
Stepwise and exhaustive search were used to create the best subset. Criteria of best subset selection
was by judging the Cp (nearest to the no of coefficients)
Variables finally used in the model-

EXHAUSTIVE SEARCH

STEPWISE SELECTION

Input variables

Input variables

Constant term

Constant term

Category_2_Cat_11

Category_2_Cat_11

Category_2_Cat_2

Category_2_Cat_2

Category_2_Cat_3

Category_2_Cat_3

Category_2_Cat_5

Category_2_Cat_7

Category_2_Cat_7

Category_2_Cat_8

Category_2_Cat_8

currency_GBP

currency_GBP

Duration_2_5_5

Duration_2_5_5

End Day_2_Mon

End Day_2_Mon

End Day_2_Thu

End Day_2_Thu

ClosePrice

ClosePrice

OpenPrice

OpenPrice

Comparing fit:
On comparing the Residual deviation for both models, the residual deviation for exhaustive model was
slightly lower and multiple R square was slightly higher, thus indicating better overall fit. Also on
comparing the training data lift charts, the exhaustive model had slightly higher decile-global mean
ratios for initial deciles. Thus exhaustive model fits the training data better.

Exhaustive Search

Stepwise Selection

Decile

Mean

Std.Dev.

Min.

Max.

Decile

Mean

Std.Dev.

Min.

Max.

0.9830508

0.1290809

0.9830508

0.1290809

0.9322034

0.2655668

0.9237288

0.2786319

0.940678

0.2362264

0.9491525

0.2196861

0.6610169

0.4733641

0.6694915

0.4734399

0.5338983

0.4988496

0.5508475

0.4982734

0.6340042

0.48362

0.6133475

0.487722

0.0863347

0.2785075

0.0935734

0.2907412

0.3305085

0.4703962

0.3269774

0.467282

0.1355932

0.3332136

0.1271186

0.3234802

10

0.107438

0.3096693

10

0.107438

0.3096693

Residual Dev.

1106.0862

Residual Dev.

Multiple R-squared

0.3233733

Multiple R-squared

1110.1195
0.320906

Prateek Shukla
122065
Exhaustive Search

Stepwise Selection

Part f)

Exhaustive Search

Stepwise Selection

Validation Data scoring - Summary Report

Validation Data scoring - Summary Report

Classification Confusion Matrix

Classification Confusion Matrix

Predicted Class
Actual
Class

Predicted Class

318

117

86

268

Actual
Class

320

115

83

271

Error Report
Class

Error Report

# Cases

# Errors

% Error

435

117

26.90

354

86

Overall

789

203

Class

# Cases

# Errors

% Error

435

115

26.44

24.29

354

83

23.45

25.73

Overall

789

198

25.10

Prateek Shukla
122065
On comparing the errors, stepwise selection model is slightly better with lower errors for both the
classes as well as overall error. (Refer part e for predictors used)
Part g) Dangers of the best predictive model (Stepwise)

Error is on higher side (25.1)


Errors for both the classes is high, thus it will be costly if the cost of mistakes is high
It is not totally free of the random idiosyncrasies of the validation dataset and may not work
equally well for data which are outside of the dataset (As the dataset is only for a specific
period)

Part h) The best fitting model and the best predictive models are different and the criteria for selection
of best model is different in each case. The best fit model is selected based on the performance on
training dataset and the best predictive model is selected based on the performance of validation
dataset. A highly overfitted model may look as the best fit model on training dataset but it may not
perform well on validation dataset.
Part i) Accurate Classification (considering stepwise data)
On changing the values of cutoff probabilities and determining the % errors for the respected cutoff
values, it was observed that for cutoff value = .38, the total error was minimum (23.45%). The error for
the class 0 was more than that for the class 1.

Cut off
Prob.Val. for
Success

0.1
0.2
0.3
0.33
0.36
0.37
0.38
0.39
0.4
0.5
0.6
0.7
0.8

% Error
1

Overall

2.76

88.14

41.06

4.83

77.12

37.26

9.43

59.89

32.07

11.95

57.06

32.19

14.02

46.05

28.39

15.17

36.44

24.71

16.32

32.20

23.45

17.01

31.92

23.70

18.16

31.64

24.21

26.44

23.45

25.10

34.02

15.25

25.60

43.91

4.24

26.11

55.17

3.67

32.07

Prateek Shukla
122065
Part j)
The Regression Model

Input variables

Coefficient

Std. Error

p-value

Odds

Constant term

-0.5004671

0.1240727

5.491E-05

Category_2_Cat_11

-1.335864

0.4328027

0.002025

0.2629309

Category_2_Cat_2

-0.787796

0.2410092

0.0010803

0.4548462

Category_2_Cat_3

-0.5711761

0.188437

0.0024364

0.5648607

Category_2_Cat_7

-1.3216676

0.7154731

0.0647089

0.2666902

Category_2_Cat_8

-2.3378339

0.632952

0.0002212

0.0965365

currency_GBP

2.9857163

0.7010819

2.056E-05

19.80068

Duration_2_5_5

0.6854923

0.2099336

0.0010936

1.9847486

End Day_2_Mon

0.5395377

0.1944691

0.0055301

1.7152138

End Day_2_Thu

-1.9865052

0.6607946

0.002645

0.137174

ClosePrice

0.1392541

0.013746

1.1494161

OpenPrice

-0.1524749

0.0146576

0.8585804

In order to have a competitive auction, the following conditions would be helpful:

Currency GBP
Duration 5 Days
End Day Monday
Opening Price As low as possible

You might also like