You are on page 1of 14

Aakash Bhatia

MBA535 (Analytical Tools for Decision-Making)

Marist College

Week 8 Assessment
Question 1: The managing partner of an advertising agency believes that his company's
sales are related to the industry sales. He uses Microsoft Excel's Data Analysis tool to
analyze the last 4 years of quarterly data (i.e., n = 16) with the following results:
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error SYX
Observations

0.802
0.643
0.618
0.9224
16

ANOVA
df
Regression 1
Error
14
Total
15

SS
21.497
11.912
33.409

MS
21.497
0.851

Predictor Coef
StdError t Stat
Intercept 3.962
1.440
2.75
Industry 0.040451 0.008048 5.03
Durbin-Watson Statistic

F
25.27

Sig.F
0.000

P-value
0.016
0.000
1.59

a) What is the value of the quantity that the least squares regression line minimizes?
Explain your answer.
A regression line (LSRL - Least Squares Regression Line) is a straight line that describes
how a response variable y changes as an explanatory variable x changes. Error is defined
as observed value - predicted value and we are seeking a line that minimizes the sum of
these distances. Specifically, the least squares regression line of y on x is the line that
makes the sum of the squares of the vertical distances of the data points from the line as
small as possible.
Total sum of squares = Regression sum of squares + Error sum of squares
33.409=21.497+E
Thus, E=33.409-21.497
=11.912
b) What is the prediction of Y for a quarter in which X = 120? Show how you obtain your
answer.
R(g) = E(Y g(X))2

where E is the expected value with respect to the joint distribution f (x, y). Condition on
X=x
and let
r(x) = E(Y |X = x) = yf (y|x)dy be the regression function.
Let = Y r(X). Then,
E( ) = E[E[Y r(X)|X = x] = 0
and we can write
Y = r(X) + e
Thus putting the values in the equation we get
8.816
c) What is the value for the coefficient of determination?
Proportion of variation in Y explained by the regression on X.
The coefficient of determination is the ratio of the explained variation to the total
variation, r2
0.643 (given in question)
d) What is the value of the correlation coefficient?
Measure of the direction and strength of the linear association between Y and X
formula for computing r is:

where n is the number of pairs of data.


Thus Sqrt(0.643)
=0.802 (given in question)

Question 2: An investment specialist claims that if one holds a portfolio that moves in
the opposite direction to the market index like the S&P 500, then it is possible to reduce
the variability of the portfolio's return. In other words, one can create a portfolio with
positive returns but less exposure to risk.
A sample of 26 years of S&P 500 index and a portfolio consisting of stocks of
private prisons, which are believed to be negatively related to the S&P 500 index, is
collected. A regression analysis was performed by regressing the returns of the prison
stocks portfolio (Y) on the returns of S&P 500 index (X) to prove that the prison stocks
portfolio is negatively related to the S&P 500 index at a 5% level of significance. The
results are given in the following EXCEL output.
Intercept
S&P

Coefficients
Standard Error
4.8660
0.3574
-0.5025
0.0716

T Stat
13.6136
-7.0186

P-value
8.7932E-13
2.94942E-07

a) To test whether the prison stocks portfolio is negatively related to the S&P 500 index,
the appropriate null and alternative hypotheses are, respectively,
A) H0 : 0 vs. H1 : < 0
B) H0 : 0 vs. H1 : > 0
C) H0 : r 0 vs. H1 : r < 0
D) H0 : r 0 vs. H1 : r > 0
Ans A) H0 : 0 vs. H1 : < 0
b) To test whether the prison stocks portfolio is negatively related to the S&P 500 index,
what is the measured value of the test statistic?
-7.0186 (as given in the table)

c) To test whether the prison stocks portfolio is negatively related to the S&P 500 index,
what is the p-value of the associated test statistic?
2.94942E-07/2 (t test on slope)

d) Which of the following will be a correct conclusion? Explain your answer.


A) We cannot reject the null hypothesis and, therefore, conclude that there is sufficient
evidence to show that the prisons stock portfolio and S&P 500 index are negatively
related.
B) We can reject the null hypothesis and, therefore, conclude that there is sufficient
evidence to show that the prisons stock portfolio and S&P 500 index are negatively
related.
C) We cannot reject the null hypothesis and, therefore, conclude that there is not
sufficient evidence to show that the prisons stock portfolio and S&P 500 index are
negatively related.
D) We can reject the null hypothesis and conclude that there is not sufficient

evidence to show that the prisons stock portfolio and S&P 500 index are
negatively related.
As we calculate that H0 : 0 vs. H1 : < 0,we find sufficient evidence, we reject the
null and conclude that the alternative is probably true. As also given in the table the
coefficient and the T stat values are negative. Thus they are negatively related. Also the p
value is given as 2.94942E-07, which proves the point
Ans B) We can reject the null hypothesis and, therefore, conclude that there is sufficient
evidence to show that the prisons stock portfolio and S&P 500 index are negatively
related.
Question 3: It is believed that GPA (grade point average, based on a four point scale)
should have a positive linear relationship with ACT scores. Given below is the Excel
output from regressing GPA on ACT scores using a data set of 8 randomly chosen
students from a Big-Ten university.
Regressing GPA on ACT
Regression Statistics
Multiple R
0.7598
R Square
0.5774
Adjusted R Square
0.5069
Standard Error
0.2691
Observations
8
ANOVA
df
Regression
Residual
Total

Intercept
ACT

SS
1
6
7

0.5940
0.4347
1.0287

Coefficients Standard Error


0.5681
0.9284
0.1021
0.0356

MS
0.5940
0.0724

F
Significance F
8.1986
0.0286

t Stat
P-value Lower 95% Upper 95%
0.6119 0.5630
-1.7036
2.8398
2.8633 0.0286
0.0148
0.1895

a) The interpretation of the coefficient of determination in this regression is


A) 57.74% of the total variation of ACT scores can be explained by GPA.
B) ACT scores account for 57.74% of the total fluctuation in GPA.
C) GPA accounts for 57.74% of the variability of ACT scores.
D) None of the above.
As given in the question r2=0.5774 thus the answer is B) ACT scores account for 57.74%
of the total fluctuation in GPA.
b) What is the value of the measured test statistic to test whether there is any linear
relationship between GPA and ACT?

2.8633
c) What is the predicted average value of GPA when ACT = 20? Show how you obtain
your answer.
.5681 + .1021(20) = 2.6101
Thus, 2.61
d) What are the decision and conclusion on testing whether there is any linear
relationship at 1% level of significance between GPA and ACT scores? Explain your
answer.
As The p-value is above the significance level, so do not reject the null hypothesis. Thus
the answer is
Do not reject the null hypothesis; hence, there is not sufficient evidence to show that ACT
scores and GPA are linearly related.

Question 4: It is believed that, the average numbers of hours spent studying per day
(HOURS) during undergraduate education should have a positive linear relationship with
the starting salary (SALARY, measured in thousands of dollars per month) after
graduation. Given below is the Excel output from regressing starting salary on number of
hours spent studying per day for a sample of 51 students.
Note: Some of the numbers in the output are purposely erased.
Regression Statistics
Multiple R
0.8857
R Square
0.7845
Adjusted R Square
0.7801
Standard Error
1.3704
Observations
51
ANOVA
df
Regression
Residual
Total

Intercept
Hours

1
50

SS
MS
335.0472 335.0473
1.8782
427.0798

Standard
Coefficients
Error
-1.8940
0.4018
0.9795
0.0733

F
Significance F
178.3859

t Stat
P-value
Lower 95% Upper 95%
-4.7134 2.051E-05
-2.7015
-1.0865
13.3561 5.944E-18
0.8321
1.1269

a) What is the estimated average change in salary (in thousands of dollars) as a result of
spending an extra hour per day studying?
0.9795
b) What is the value of the measured t-test statistic to test whether average SALARY
depends linearly on HOURS?
13.3561
c) What is the p-value of the measured F-test statistic to test whether HOURS affects
SALARY?
5.944E-18
d) What are the degrees of freedom for testing whether HOURS affects SALARY?
1,49(50-1)

e) What is the error sum of squares (SSE) of the above regression? Show how you obtain
your answer.
427.0798-335.0472
92.0325465
=92.0326
f) The 90% confidence interval for the average change in SALARY (in thousands of
dollars) as a result of spending an extra hour per day studying is
A) wider than [-2.70159, -1.08654].
B) narrower than [-2.70159, -1.08654].
C) wider than [0.8321927, 1.12697].
D) narrower than [0.8321927, 1.12697].
Explain your reasoning.
Ans. D) narrower than [0.8321927, 1.12697]
90% Confidence Interval would be narrower than a 95% Confidence Interval. This
occurs because the as the precision of the confidence interval increases (ie CI width
decreasing), the reliability of an interval containing the actual mean decreases (less of a
range to possibly cover the mean).
Since the confidence interval is 90% which less that 95%, and in the table the hours for
confidence interval 95% is 0.8321 and for upper 95% is 1.1269. These values are defined
for hours. Thus when somebody will spend an extra hour per day studying, The 90%
confidence interval for the average change in SALARY (in thousands of dollars) will
surely be narrower than [0.8321927, 1.12697]
g) To test the claim that average SALARY depends positively on HOURS against the null
hypothesis that average SALARY does not depend linearly on HOURS, what is the pvalue of the test statistic? What are the results of the test? Explain your answer.
5.944E-18/2
There is a zero population correlation coefficient between a pair of random variables,
means that there is no linear relationship between the random variables.An appropriate
test to use is the t-test on the population corelaiton coefficient.
standard deviations are truly different, then the populations are different regardless of
what ANOVA concludes about differences among the means. This may be the most
important conclusion from the experiment.
A large value of r2 means that a large fraction of the variation is due to the treatment that
defines the groups.But here r2 is small.=0.7845
p-value is greater than the significance level, thus we do not have enough evidence to
reject the null hypothesis that the population means are all equal

Question 5: The management of a chain electronic store would like to develop a model
for predicting the weekly sales (in thousands of dollars) for individual stores based on the
number of customers who made purchases. A random sample of 12 stores yields the
following results:
Customer
s
907
926
713
741
780
898
510
529
460
872
650
603

Sales (Thousands of Dollars)


11.20
11.05
8.21
9.21
9.42
10.08
6.73
7.02
6.12
9.52
7.53
7.25

a) Estimate a linear regression. What are the values of the estimated intercept and slope?
Show how you obtain your answer.
All values calculated using excel and formula:-

Regression Formula :
Regression Equation(y) = a + bx Slope(b) = (NXY - (X)(Y)) / (NX2 - (X)2)
Intercept(a) = (Y - b(X)) / N Where,
x and y are the variables. b = The slope of the regression line a = The intercept point of
the regression line and the y axis. N = Number of values or elements X = First Score Y =
Second Score XY = Sum of the product of first and Second Scores X = Sum of First
Scores Y = Sum of Second Scores X2 = Sum of square First Scores

this relationship can be represented by the equation y = b0 + b1x, where b0 is the yintercept and b1 is the slope, thus putting the values in the formula and calculating,we get
0.01001=Slope(b)
14464,0.0100
Regression Equation(y):
1.446+0.01x
b) What is the value of the coefficient of determination?
Coefficient of Determination ( r2 ) = r x r.
X values:-907,926,713,741,780,898,510,529,460,872,650,603
Y Values:-11.20,11.05,8.21,9.21,9.42,10.08,6.73,7.02,6.12,9.52,7.53,7.25
0.9453
c) What is the value of the coefficient of correlation?
X values:-907,926,713,741,780,898,510,529,460,872,650,603
Y Values:-11.20,11.05,8.21,9.21,9.42,10.08,6.73,7.02,6.12,9.52,7.53,7.25
Thus putting the values in the equation:Correlation Coefficient ( r ) = N x XY - ( X ) ( Y ) / N x ( X2 - ( X )2 N x
( Y2 - ( Y )2
0.9723
d) What is the value of the standard error of the estimate?
The formula for the standard error of the estimate is:

0.4191
e) Which of the following is the correct null hypothesis for testing whether the number of
customers who make purchases affects weekly sales?
A) H0 : 0 = 0
B) H0 : 1 = 0
C) H0 : = 0
D) H0 : = 0
Ans B) H0 : 1 = 0
f) What is the value of the t test statistic when testing whether the number of customers
who make purchases affects weekly sales?

= 27530.5682

X1=715.75

X2=8.6117

=2.9187

=47.898
Thus putting these values in the formula:-

=715.75-8.6117
47.898
=13.1464
g) What are the degrees of freedom of the t test statistic when testing whether the number
of customers who make purchases affects weekly sales?

Using formula and values from previous parts, we get df


10
h) Construct a 95% confidence interval for the change in average weekly sales when the
number of customers who make purchases increases by one. Show how you obtain your
answer.

T table used:-

Formula:
If (n>=30), CI = x Z/2 (/n) If (n<30), CI = x t/2 (/n) Where, x = Mean
= Standard Deviation = 1 - (Confidence Level/100) Z/2 = Z-table value t/2 = t-table
value CI = Confidence Interval and looking for t values

5% confidence interval for the change in average weekly sales when the number of
customers who make purchases increases by one
0.0083 to 0.0117 generated from Excel using the TINV function

i) Construct a 95% confidence interval for the average weekly sales when the number of
customers who make purchases is 600. Show how you obtain your answer.

Formula:
If (n>=30), CI = x Z/2 (/n) If (n<30), CI = x t/2 (/n) Where, x = Mean
= Standard Deviation = 1 - (Confidence Level/100) Z/2 = Z-table value t/2 = t-table
value CI = Confidence Interval and looking for t values

95% confidence interval for the average weekly sales when the number of customers who
make purchases is 600
7.1194 to 7.7864 thousands of dollars generated from Excel using

the TINV function

j) Construct a 95% prediction interval for the weekly sales of a store that has 600
purchasing customers. Show how you obtain your answer.

Formula:
If (n>=30), CI = x Z/2 (/n) If (n<30), CI = x t/2 (/n) Where, x = Mean
= Standard Deviation = 1 - (Confidence Level/100) Z/2 = Z-table value t/2 = t-table
value CI = Confidence Interval and looking for t values

95% prediction interval for the weekly sales of a store that has 600 purchasing customers
6.4614 to 8.4444 thousands of dollars generated from Excel using

the TINV function

You might also like