You are on page 1of 2

Pilgrim Bank Case - Group 5, Section D The 95 percent confidence interval is (Z = 1.96) (n =31634) (s = 272.84) 108.

496 < Population Mean < 114.5094 The variation of the mean is not very large (+-3.0 1), so sample mean can be assumed to be representative of population mean. Numbe r of offline users: 27781, x1 = 110.79, s1 = 271.301 Number of online users: 385 3, x2 = 116.67 s2 = 283.66 Z = (x1-x2)-( 1- 2)/Sqrt(s12/n1+s22/n2) Null Hypothesis: 1 = 2 Alternative Hypothesis: 1 =! 2 Z = -1.212 and Z-critical at 5% level of signifi cance is +-1.96. Since Z lies in acceptance region, the null hypothesis is not r ejected. Hence, there is no significant difference in profitability of online an d offline users. We run a regression run using online/offline being the independ ent variable and profitability being the dependent variable. The output summary is: Regression Statistics 0.00705 Multiple R 4.97E-05 R Square 1.81E-05 ` Adjust ed R Square 272.8369 Standard Error 31634 Observations ANOVA Df SS MS F Signific ance F 1 117039.3 117039.3 1.572264 0.209887815 Regression 31632 2.35E+09 74439. 99 Residual 31633 2.35E+09 Total Coefficients Standard Error t Stat P-value 110. 7862 1.636956 67.67821 0 Intercept 5.880591 4.689842 1.253899 0.209888 X Variabl e 1 R Square value is very low indicating low predictability of the model. F-val ue analysis is unable to reject null hypothesis even at level of significance of 0. 2. T-stat also is unable to reject the null hypothesis. These indicate that ther e is no significant cause and effect relation between online/offline user and th eir profitability. To find the relation between the demographics of the customer and their profitability, we perform a regression analysis. To enable regression analysis, following steps are taken: 1) Independent variables are online/offlin e, age, Income, tenure and district. 2) Online/offline and age groups are well r ecoded as numbers. 3) Income and district are not recoded well. For e.g.: Income bucket 2 is 15000-19999 that is a range of 5000 but bucket 8 is 100000-124999 w hich is 25000. So the difference between INC1 to INC2 is not same as INC7 to INC 8. And for District, the number 1100, 1200 or 1300 do not specify any order.

To take care of these ordinal and categorical variables we introduce dummy varia bles 9Profit 9Online 9Age Inc1 Inc2 Summary Output: Regression Statistics 0.2387 54288 Multiple R 0.05700361 R Square 0.056548656 Adjusted R Square Standard Erro r 274.7420229 22812 Observations ANOVA Df SS 11 104034494.7 Regression 22800 172 1016485 Residual 22811 1825050979 Total Coefficients Standard Error 45.14865564 7.602563975 Intercept 18.48084654 5.513874201 Online 18.34772056 1.248717884 Age Inc1 -97.65083161 7.173800148 Inc2 -96.57716494 10.3 9736316 -87.47907503 6.370705674 Inc3 -86.43948587 6.677232896 Inc4 -81.75628053 6.561064698 Inc5 -57.63190799 4.914322483 Inc6 4.10893714 0.235904993 Tenure Di strict 1100 -6.805374381 7.788332957 District 1200 15.31440957 5.543695045 Obser vation: 1) ANOVA analysis gives a good significance of the model with a very low p-value. 2) T-stat analysis yields p-values which shows that all the variables except District 1100 are significant at a = 0.05 3) Residual plots for each vari able was observed and no non-linear relation was evident 4) R-Square value of th e models is very low at only 5.7%. However this can be significant because the p robability skew graph shows that 50% of customers are unprofitable. Using this m odel we can predict a few un-profitable customers and hence try to make them pro fitable. District Inc3 Inc4 Inc5 Inc6 9Tenure 1100 District 1200 MS 9457681.333 75483.17916 t Stat 5.938609105 3.351698982 14.6932472 -13.6121483 2 -9.288620919 -13.73145763 -12.94540526 -12.46082523 -11.72733539 17.41776247 0.873790889 2.762491344 F Significance F 125.2952173 2.2721E-280 P-value 2.91616E-09 0.000804478 1.18902E-48 4.95138E-42 1.69877E-20 9.73548E-43 3.40129E-38 1.59396E-35 1.13843E-31 1.65591E-67 0.382241346 0.005740803

You might also like