IIMB Term 1 DS Dean Answers

1.
Identify the variables that should be used by

Easwaran Iyer for admitting students.
To identify the variables to be used by Easwaran for analysis, lets do a regression analysis. In regression
analysis, following is the process:
Following property is available about the data:

Variable
Gender
Percent_SSC
Board_SSC
Percent_HSC
Board_HSC
Stream_HSC
Percent_Degree
Course_Degree
Experience_Yrs
Entrance_Test
Percentile_ET
Percent_MBA
Specialization_MBA
Marks_Communication
Marks_Projectwork
Marks_BOCA
Placement
Salary
Type
Categorical
Numerical
Categorical
Numerical
Categorical
Categorical
Numerical
Categorical
Interval
Categorical
Numerical
Numerical
Categorical
Numerical
Numerical
Numerical
Categorical
Numerical
Data Points Missing?

No
No
No
No
No
No
No
No
No
Yes
Yes
No
No
No
No
No
No
No
Also, the correlation matrix of all the numerical variables (except salary, which is
a dependent variable) is given below. Same has been extracted from the
program R.
Experien
ce_Yrs
Marks_Proj
ectwork
Marks_
BOCA
Percenti
le_ET
Percent
_HSC
Percent_
Degree
Marks_
Commun
ication
Percent
_MBA
Percen
t_SSC
Experience_Y
rs
100%
18%
14%
6%
5%
-6%
15%
21%
1%
Marks_Projec
twork
18%
100%
18%
4%
13%
23%
28%
35%
9%
Marks_BOCA
14%
18%
100%
34%
15%
28%
19%
43%
29%
Percentile_ET
6%
4%
34%
100%
17%
9%
19%
30%
32%
Percent_HSC
5%
13%
15%
17%
100%
31%
23%
32%
33%
Percent_Degr
ee
-6%
23%
28%
9%
31%
100%
35%
41%
33%
Marks_Comm
unication
15%
28%
19%
19%
23%
35%
100%
73%
44%
Percent_MBA
21%
35%
43%
30%
32%
41%
73%
100%
49%
Percent_SSC
1%
9%
29%
32%
33%
33%
44%
49%
100%
We have also observed that few data points are missing from the entrance test,
and the percentage in entrance test. However, since our analysis is to find out
salary of the placed students, and when we looked at the data, we figured that
data points are not missing for the placed students. While doing the regression,
we have only taken the data points, where students are placed.
After doing the preliminary steps, lets estimate the regression parameters. We
have used StatTools for the same purpose. We have also excluded placement
variable as it is not adding any new information.
Following is the solution:
Regression Table
Constant
Marks_Communication
Gender (F)
Experience_Yrs
Specialization_MBA (Marketing
& Finance)
Specialization_MBA (Marketing
& HR)
Coefficien
t
121087.
1
2620.87
7
42246.3
5
20264.2
4
10182.7
9
21459.0
2
Standard
Error
49303.
35
638.90
87
12486.
27
8333.7
68
32686.
56
33046.
8
tValue
2.45
6
4.10
2
3.38
3
2.43
2
0.31
2
0.64
9
p-Value
0.0147
<
0.0001
0.0008
0.0157
0.7557
0.5167
However, when we diagnose the model, we found that the model is not valid, as
the errors are not normal.
Thus, we need to make changes to the model. Lets try with making dependent
variable ln(salary).
Following is the output with this change:
Coefficie
nt
Regression Table
11.985
Marks_Communi
cation
0.0086
Gender (F)
-0.133
Specialization_M
BA (Marketing &
HR)
t-Value
0.15
42
0.00
2
0.0417
0.03
92
0.06
1
0.10
25
-0.077
0.10
4
0.1632
77.7120
78
4.30084
96
3.38144
9
2.67343
92
0.40647
39
0.73649
2
Lets check the validity of the model:

Errors are normal as seen below:
Histogram of Residuals
80
70
60
50
40
30
20
10
0
Frequency
p-Value
Error
Constant
Course_Degree
(Engineering)
Specialization_M
BA (Marketing &
Finance)
Stand
ard
Multicollinearity Checking
VIF
R-Square
<
0.0001
<
0.0001
1.07762809
0.072036
067
0.0008
1.082456251
0.0080
1.028763158
0.6847
8.513941538
0.076175
135
0.027958
969
0.882545
588
0.4621
8.570918457
0.883326
39
Residual Fit diagram also validate the model. There is no pattern and the
numbers are randomly distributed. Clearly, homoscedasticity is abs
Scatterplot of Fit vs ln(salary)
13.0
12.8
12.6
Fit
12.4
12.2
12.0
11.8
11.5
12.0
12.5
13.0
13.5
14.0
ln(s alary)
ent.
Also, when we see the output, specialization MBA variables are not significant
statistically at 95% significance level.
Thus, the new solution will be:
Coefficie
nt
Regression Table
Stand
ard
t-Value
p-Value
77.7120
78
4.30084
96
3.38144
9
2.67343
92
<
0.0001
<
0.0001
Error
Constant
11.985
Marks_Communi
cation
0.0086
Gender (F)
-0.133
Course_Degree
(Engineering)
0.1632
0.15
42
0.00
2
0.03
92
0.06
1
VIF
R-Square
1.07762809
0.072036
067
0.0008
1.082456251
0.0080
1.028763158
Thus, the dean should include communication marks, gender and course degree
(whether it is engineering or not) as its main decision variables.
The corresponding R square, Ftest and error values are:
Stepwise Regression for
ln(salary)
Summary
ANOVA Table
Explained
Multip
le
RSquar
e
Adjusted
Std. Err. of
R-square
Estimate
0.40
53
0.16
43
0.1479
0.280884
336
Degre
es of
Sum
of
Mean of
Freed
om
Squar
es
Squares
3.95
0.79092
10.02486
0.076175
135
0.027958
969
255
Unexplained
46
20.1
18
18
0.07889
6
394
The p-value corresponding to F-value of 10 is less than 0.0001, hence the model
is valid.
2. Parameters such as MBA marks will not be available

at the time of admission. How should these
parameters be incorporated while building the
model?
The variables that should be included in the analysis should be those that are available at the time of admission.
Following table shows all the variables, and if they are available at the time of admission.
Variable
Gender
Percent_SSC
Board_SSC
Percent_HSC
Board_HSC
Stream_HSC
Percent_Degree
Course_Degree
Experience_Yrs
Entrance_Test
Percentile_ET
Percent_MBA
Specialization_MBA
Marks_Communication
Marks_Projectwork
Marks_BOCA
Placement
Salary
Available at time of admission?

Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
No
No
Dependent Variable
Include?
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
No
No
Dependent Variable
Thus, the new solution, after removing communication marks is as follows:

Coefficie
nt
Stand
ard
Regression Table
Error
Constant
11.985
0.15
42
Gender (F)
-0.133
Course_Degree
(Engineering)
0.1632
0.03
92
0.06
1
t-Value
p-Value
77.7120
78
3.38144
9
2.67343
92
<
0.0001
VIF
0.0008
1.082456251
0.0080
1.028763158
R-Square
0.076175
135
0.027958
969
3. What is the impact of academic performance

(percentage marks in different board exams) on the
salary earned at the time of graduation?
Regression output with SSC marks:
Coefficient
Percent_SSC
t-Value
p-Value
Error
Regression Table
Constant
Standard
199867.5
001
1140.117
736
32318.03
541
486.0677
216
6.184395
11
2.345594
421
< 0.0001
0.0196
Thus, salary is directly related to the placement. For every percent increase in
SSC marks, There is a increase in salary of 1140 INR.
Although the model is not valid, as the errors are not normal. Hence, the above
solution is valid only if we assume errors are normal.
Histogram of Residuals
160
140
120
100
80
60
Frequency
40
20
0
4. What are your final recommendations to Easwaran

Iyer for admitting students to the MBA program?
The final equation that we have figured out from the analysis that is valid is
dependent on Gender and Course degree of the applicant. However, the R
square value is only 16.43%.
The equation is:
Ln (Salary) = 11.985 0.133 Female_Gender + 0.1632 Engineering_Degree
Thus, salary has been higher for Male Genders, and engineering degree holders.
Thus, Easwaran should keep an eye for applicants who are engineers and are
male. We would also like to say that Easwaran needs to be cautious as these two
variables only explain 16% of the salary. Thus, more different variables are to be
explored which are not provided. For example, salaries can be dependent on
incoming work function, and industry of the applicants.

IIMB Term 1 DS Dean Answers

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IIMB Term 1 DS Dean Answers

Uploaded by

Copyright:

Available Formats

1.

Identify the variables that should be used by

Following property is available about the data:

Data Points Missing?

Lets check the validity of the model:

2. Parameters such as MBA marks will not be available

Available at time of admission?

Thus, the new solution, after removing communication marks is as follows:

3. What is the impact of academic performance

4. What are your final recommendations to Easwaran

You might also like