Professional Documents
Culture Documents
Type
Categorical
Numerical
Categorical
Numerical
Categorical
Categorical
Numerical
Categorical
Interval
Categorical
Numerical
Numerical
Categorical
Numerical
Numerical
Numerical
Categorical
Numerical
Also, the correlation matrix of all the numerical variables (except salary, which is
a dependent variable) is given below. Same has been extracted from the
program R.
Experien
ce_Yrs
Marks_Proj
ectwork
Marks_
BOCA
Percenti
le_ET
Percent
_HSC
Percent_
Degree
Marks_
Commun
ication
Percent
_MBA
Percen
t_SSC
Experience_Y
rs
100%
18%
14%
6%
5%
-6%
15%
21%
1%
Marks_Projec
twork
18%
100%
18%
4%
13%
23%
28%
35%
9%
Marks_BOCA
14%
18%
100%
34%
15%
28%
19%
43%
29%
Percentile_ET
6%
4%
34%
100%
17%
9%
19%
30%
32%
Percent_HSC
5%
13%
15%
17%
100%
31%
23%
32%
33%
Percent_Degr
ee
-6%
23%
28%
9%
31%
100%
35%
41%
33%
Marks_Comm
unication
15%
28%
19%
19%
23%
35%
100%
73%
44%
Percent_MBA
21%
35%
43%
30%
32%
41%
73%
100%
49%
Percent_SSC
1%
9%
29%
32%
33%
33%
44%
49%
100%
We have also observed that few data points are missing from the entrance test,
and the percentage in entrance test. However, since our analysis is to find out
salary of the placed students, and when we looked at the data, we figured that
data points are not missing for the placed students. While doing the regression,
we have only taken the data points, where students are placed.
After doing the preliminary steps, lets estimate the regression parameters. We
have used StatTools for the same purpose. We have also excluded placement
variable as it is not adding any new information.
Following is the solution:
Regression Table
Constant
Marks_Communication
Gender (F)
Experience_Yrs
Specialization_MBA (Marketing
& Finance)
Specialization_MBA (Marketing
& HR)
Coefficien
t
121087.
1
2620.87
7
42246.3
5
20264.2
4
10182.7
9
21459.0
2
Standard
Error
49303.
35
638.90
87
12486.
27
8333.7
68
32686.
56
33046.
8
tValue
2.45
6
4.10
2
3.38
3
2.43
2
0.31
2
0.64
9
p-Value
0.0147
<
0.0001
0.0008
0.0157
0.7557
0.5167
However, when we diagnose the model, we found that the model is not valid, as
the errors are not normal.
Thus, we need to make changes to the model. Lets try with making dependent
variable ln(salary).
Following is the output with this change:
Coefficie
nt
Regression Table
11.985
Marks_Communi
cation
0.0086
Gender (F)
-0.133
Specialization_M
BA (Marketing &
HR)
t-Value
0.15
42
0.00
2
0.0417
0.03
92
0.06
1
0.10
25
-0.077
0.10
4
0.1632
77.7120
78
4.30084
96
3.38144
9
2.67343
92
0.40647
39
0.73649
2
p-Value
Error
Constant
Course_Degree
(Engineering)
Specialization_M
BA (Marketing &
Finance)
Stand
ard
Multicollinearity Checking
VIF
R-Square
<
0.0001
<
0.0001
1.07762809
0.072036
067
0.0008
1.082456251
0.0080
1.028763158
0.6847
8.513941538
0.076175
135
0.027958
969
0.882545
588
0.4621
8.570918457
0.883326
39
Residual Fit diagram also validate the model. There is no pattern and the
numbers are randomly distributed. Clearly, homoscedasticity is abs
Scatterplot of Fit vs ln(salary)
13.0
12.8
12.6
Fit
12.4
12.2
12.0
11.8
11.5
12.0
12.5
13.0
13.5
14.0
ln(s alary)
ent.
Also, when we see the output, specialization MBA variables are not significant
statistically at 95% significance level.
Thus, the new solution will be:
Coefficie
nt
Regression Table
Stand
ard
t-Value
p-Value
77.7120
78
4.30084
96
3.38144
9
2.67343
92
<
0.0001
<
0.0001
Multicollinearity Checking
Error
Constant
11.985
Marks_Communi
cation
0.0086
Gender (F)
-0.133
Course_Degree
(Engineering)
0.1632
0.15
42
0.00
2
0.03
92
0.06
1
VIF
R-Square
1.07762809
0.072036
067
0.0008
1.082456251
0.0080
1.028763158
Thus, the dean should include communication marks, gender and course degree
(whether it is engineering or not) as its main decision variables.
The corresponding R square, Ftest and error values are:
Stepwise Regression for
ln(salary)
Summary
ANOVA Table
Explained
Multip
le
RSquar
e
Adjusted
Std. Err. of
R-square
Estimate
0.40
53
0.16
43
0.1479
0.280884
336
Degre
es of
Sum
of
Mean of
Freed
om
Squar
es
Squares
3.95
0.79092
10.02486
0.076175
135
0.027958
969
255
Unexplained
46
20.1
18
18
0.07889
6
394
The p-value corresponding to F-value of 10 is less than 0.0001, hence the model
is valid.
Include?
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
No
No
No
Dependent Variable
Stand
ard
Regression Table
Error
Constant
11.985
0.15
42
Gender (F)
-0.133
Course_Degree
(Engineering)
0.1632
0.03
92
0.06
1
t-Value
p-Value
77.7120
78
3.38144
9
2.67343
92
<
0.0001
Multicollinearity Checking
VIF
0.0008
1.082456251
0.0080
1.028763158
R-Square
0.076175
135
0.027958
969
Percent_SSC
t-Value
p-Value
Error
Regression Table
Constant
Standard
199867.5
001
1140.117
736
32318.03
541
486.0677
216
6.184395
11
2.345594
421
< 0.0001
0.0196
Thus, salary is directly related to the placement. For every percent increase in
SSC marks, There is a increase in salary of 1140 INR.
Although the model is not valid, as the errors are not normal. Hence, the above
solution is valid only if we assume errors are normal.
Histogram of Residuals
160
140
120
100
80
60
Frequency
40
20
0
Thus, salary has been higher for Male Genders, and engineering degree holders.
Thus, Easwaran should keep an eye for applicants who are engineers and are
male. We would also like to say that Easwaran needs to be cautious as these two
variables only explain 16% of the salary. Thus, more different variables are to be
explored which are not provided. For example, salaries can be dependent on
incoming work function, and industry of the applicants.