You are on page 1of 14

CBA Batch 8/ TERM- II/ AY 2017-18

Statistical Analysis 2: Assignment 1


Ravinderpal S Wasu (71710004)

1. (a)
Reasons of disagreement with Jacks comments :
 Though the R-squared value is low, we can still get some explanation from the other
variable
 The coefficients of other predictor variables could also have significant statistically
implications
 Other response variable could also indicate the trend and provide good prediction
 Some precision in the prediction could be affected

(b) Situations as give below could show high R-squared.


 Weather forecasting data – temperature rising
 Speed of vehicles in the city – average speed decreasing due to increase in traffic
 Now plotting daily change in temperature rise and decrease in average speed might show
correlation and give a good to linear regression line but these could still not be having any
cause / effect formula

(c) Various techniques are:

Residual vs Fitted plot


To view if data is normal or not

Residuals plot – Root values

Residuals plot – Standardized values


2. (a)
Name SAT Age Tenure MBA (yes = GRI
1, No = 0)
Bob 1042 35 5 0 1

Putney 1355 32 2 1 1

Summary from linear model regression


For 95 % CI  SAT and GRI are significant
Name RET

Bob -2.642+(0.005735*1042)+(-2.110*1) = 1.2287

Putney -2.642+(0.005735*1355)+(-2.110*1) = 3.0189

For 80 % CI  GRI, SAT, AGE and TENURE are significant


Name RET

Bob -2.642+(0.005735*1042)+(-0.0688*35)+(-0.1187*5) +(-2.110*1) = -1.78181

Putney -2.642+(0.005735*1355)+(-0.0688*32)+(-0.1187*2) +(-2.110*1) = 6.5647

(b) Putney is expected to obtain higher returns compared to Bob as the Return values for
Putney are definitely higher compared to Bob.
3. (a) As seen above, at 95% CI ( 5% significance level), we see that SAT and GRI show significant
effect on the RET.
If Bob would have attended Princeton, probably his SAT score could have been 1355.
His RET would be better, at 3.0189 compared to current RET of 1.2287

(b) At 10 % significance level, GRI and SAT show significant effect on the RET.
While managing Growth fund instead of Growth and Income, Bob’s return would be:

RET BOB = -2.642+(0.005735*1042)+(-2.110*0) =3.3387

Hence, it is 3.3387 – 1.2287 (at 95%) = 2.11 higher

4. (a)
The coefficient of MBA = 18.1 % and has negative impact
Compared to other coefficients this is high %, but on its own - not very high.
It does indicates that managers having an MBA degree would perform less than those who do
not have. Other factors like Age, Tenure, SAT score are constant.

(b) No, non-MBA managers are not taking more risks comparatively, to get higher returns. The
coefficient would not be negative in the table.

5. (a)
As per the regression, the p value coefficient for Age = 10.005 %

So the lowest level of significance is 10.006%

(b)
As per the regression, Age has a negative effect. Hence probability of younger managers
delivering better return is high.
Surely, the survivor bias would influence / dampen the affect seen in 5 (a)
6. (a) As MBA and Tenure are not significant at 15 % significance level, eliminating them

New Regression line is y = -2.5839+(-2.111*GRI)+(0.006242*SAT)+(-0.09595*AGE)

Linearity: In the Residual vs fitted plot below the Red line almost parallel to axis, indicating
linearity

Homoskedasticity: Also, there is very minimal variation for residuals vs fitted. Hence our
assumption of constant variance in the residuals is correct.

(b) Regression 1 : With all 5 variables

Regression 2 - MBA and Tenure


Observations:

i. Age  negative effect on Returns. Here the significance is lower = 0.01 compared to
original 0.106
ii. It is more impactful with the 5 variables

7. (a)
As per the R code output, Growth funds have higher Returns by 2.312 % compared to Growth
and Income funds

Also seen is that variation in the Residuals is constant about the fitted line. There is no trend.
Therefore homoskedasticity is proved.
(b) T-test using excel

t-Test: Two-Sample Assuming Equal


Variances

GFund GIFund
Mean 0.395924 -1.91593
Variance 79.86433 57.7226
Observations 327 213
Pooled Variance 71.13933
Hypothesized Mean Difference 0
df 538
t Stat 3.112946
P(T<=t) one-tail 0.000975
t Critical one-tail 1.647691
P(T<=t) two-tail 0.001951
t Critical two-tail 1.964383
Null hypothesis:

H0: mu GRI –mu GR =0


H1: mu GRI – mu GR !=0
The Null hypothesis is that the average return of Growth and Growth and Growth and income
funds are same.
Here we reject the NULL hypothesis as the p-value of the two tailed test is 0.0019 and is less
than significance level of 5 % (which is 0.05)

For 1 tailed test


H0: mu GR – mu GRI <=0
H1 : H0: mu GR – mu GRI >0.
Here the Null hypothesis that the average returns of Growth funds is less than the Average
return of Growth and income funds
Here too, we reject the Null hypothesis as p-value is 0.0009 and hence at 5 % significance level,
implying that Growth funds have better return than Growth and income funds
8. (a)
SAT score from Princeton Alum = 1355; for GRI = 0;

Estimated RET : y = -6.461+(0*-2.278)+(0.006*1355) = 1.669

(b) Yes, with 95 % CI, the Standard Error = 8.401;


DF = 537
Tcrit = 1.964 (TINV(0.05,537))
LL = 1.669 – (1.964*8.40) = -14.8286
UL = 1.669 + (1.964*8.40) = 18.166

(c)

Below is the calculations for return, for 1.5% probabilty

t value for 1.5 = 1.5 - 1.673/0.7232 = -0.239


p value for t>=0.239 = 1-p(t<-0.239) = 0.5944

There is 0.5944 probability (59.44%) of mean return of the funds to be greater than 1.5 % of the
bench mark.

9. Larger sample implies more Degrees of Freedom. The Residual error reduces, precision for
prediction improves. Overall this leads to more efficiency in capturing variance in response
variable and prediction.

10.
Here with GRI as predictor, and regression on AGE, indicates GRI significance = 0.05
Age = 43.2446+1.732*GRI

This implies that if the GRI is 0 for Growth funds, and 1 for Growth and Income funds, the
average Age of fund managers managing Growth and Income funds increases by 1.732. (All
other variables kept constant)

(b)
With GRI, TENURE, SAT constant; and GRI=0; MBA=0; TENURE=0; SAT=0
Age = 32.186 + (1.424*GRI) + (-1.879*MBA) + (0.9424*TENURE) + (0.0077*SAT)
Average Age of Fund Manager with MBA = 30.307
Observation  Age of Managers with MBA is less compared to who did not by 1.879 Years.
Constants are : SAT, TENURE

11.
As seen in above regression, excluding MBA as it is not significant at 80% CI
Also Tenure and Age has negative impact on Return
SAT has positive impact on Return

CONCLUSION:
Ms. Putney is the right choice for selection and is expected to deliver at a higher rate of
Return.
Q2. Nano Project
Identify a small problem related to day today work, in which you want to either understand the
relationship between two variables or want to predict one of the variable. Either case formulate
your problem which you want to attack. Collect the necessary dataset, to answer the question.
Apply tools and techniques discusses in the class (Regression Analysis). You have to discuss the
results both in statistical and business framework.

Please submit problem description, data description, R file used to analyse the data, along with
results and discussion. You may write the problem description, results, and discussion on a paper
and submit scanned copy of it. But you have to submit data description and data file (in excel or
csv or txt file) along with running R code.

Solution:
Driving from Hyderabad to Pune and from Pune to Hyderabad.

Description: The idea was to observe whether distance travelled is dependent on :

 Road conditions
 Traffic conditions
 Driver gender
 Stops taken on the route

Data captured and legend :

1) Driver – Male = 1, Female = 0


2) Road condition – Good = 1 and Bad = 0
3) Traffic condition – Heavy =1 and Light = 0
4) Stops taken – Stopped = 1, No stop = 0

While driving to and fro, I have collected data for the below metrics :

1) KM reading ever time duration (generally 15 mins)


2) Max speed touched in that 15 mins stretch
3) Calculating average speed and distance covered in that 15 mins stretch

Data excel file  PuneTrip.xlsx

Linear Regression perform with R code  SA2_Assignment1_71710004_NanoProject.R

Response Variable = DistanceTravelled

Predictor Variables  Driver + TrafficCond + RoadCond + Stop + X.Avg.Speed


Observations

Trip1 distance = 0.86020+(-2.22841*Stop)+(0.2205*X.Avg.Speed)


At 10% significance, only Stops and Avg speed are statistically significant

The assumption of linearity and homoscedasticity is satisfied.


Data is normal

Running the regression, by dropping the co-efficients that are not significant
Conclusion:
The Plots for the Distance travelled are linear.
Average speed and stops are significant.

Variance in residuals is constant and homoscedasticity is satisfied

Trip1 distance = 0.86020+(-2.22841*Stop)+(0.2205*X.Avg.Speed)

Trip2 distance = 1.859+(-3.493*Stop)+(0.22635*X.Avg.Speed)

You might also like