You are on page 1of 7

Session 4 Learning Journal question

Session 4 Learning Journal question


The data in the Excel file AdvertSales.xlsx were collected by a small company and concern the companys weekly sales () following various weekly advertising expenditures (). Perform the correlation and simple linear regression analysis.

INSTRUCTIONS

Create the required SPSS output and paste where prompted. In the text below *** prompts where you have to type a numerical value read from SPSS output. Text in bold indicates where you have to choose the appropriate comment based upon supporting output e.g. we do/do not reject H 0 .

DELETE THE ABOVE INSTRUCTIONS WHEN YOU HAVE COMPLETED THE LEARNING JOURNAL QUESTION DELETE SENTENCES IN CAPITAL LETTERS WHEN YOU HAVE COMPLETED THE LEARNING JOURNAL QUESTION

BEFORE PERFORMING THE REGRESSION WE NEED TO PERFORM SOME EDA THIS IS FOR DATA SCREENING PURPOSES AND TO FAMILIARISE OURSELVES WITH THE DATA, AT THE MOMENT DO YOU EVEN KNOW: HOW MANY DATA POINTS YOU HAVE? WHAT IS THE MEAN ADVERTISING EXPENDITURE? HOW VARIABLE ARE THE WEEKLY SALES?

Methods of Enquiry Business Statistics Activity Leader: Dr Iain Weir (iain.weir@uwe.ac.uk)

Session 4 Learning Journal question

SALES IS THE VARIABLE WE ARE ULTIMATELY TRYING PREDICT IT IS THE VARIABLE THAT IS RANDOM: ADVERTISING IS CONTROLLED BY THE COMPANY AND THUS PROBABLY NOT AS INTERESTING! BEGIN BY SOME EDA OF JUST SALES VARIABLE

Sales Exploratory Data Analysis

PASTE CASE PROCESSING SUMMARY TABLE HERE

From the above we see that we have *** observations of sales.

PASTE BOXPLOTS HERE The boxplot of sales reveals there are *** outliers. Visually the boxplot of sales is/is not consistent with normal data.

PASTE DESCRIPTIVES TABLE HERE The mean weekly sales figure is ***. The 95% confidence interval of this mean is from *** to ***. The median weekly sales figure is ***. The best weekly sales figure the company had is ***. The worst weekly sales figure was ***. The standard deviation of sales is ***; this tells us that approximately 95% of the time the companys weekly sales will be approximately between , i.e. *** to ***. The sales skewness is/is not consistent with normality. The sales kurtosis is/is not consistent with normality.

Methods of Enquiry Business Statistics Activity Leader: Dr Iain Weir (iain.weir@uwe.ac.uk)

Session 4 Learning Journal question

PASTE PERCENTILES TABLE HERE: IN A PRESENTATION THAT INCLUDES THE BOXPLOT YOU MIGHT WANT TO QUOTE SOME OF THE |FOLLOWING The lower quartile is ***. The upper quartile is ***. OR IT MAYBE OF INTEREST TO QUOTE 95% of the time sales are over ***.

PASTE HISTOGRAM HERE IN A PRESENTATION INCLUDING THIS YOU COULD DISPLAY THIS WHILST DISCUSSING LOCATION/DISPERSION AND SHAPE AND/OR QUOTING STATISTICS FROM TABLES ABOVE Here we see a histogram of our weekly sales data. We can see that sales fall between roughly from as low as *** to as high as ***. The average sale is approximately ***. The sales vary over a range of ***. The data is fairly symmetrical/slightly negatively skewed/slightly positively skewed.

PASTE NORMAL Q-Q PLOT HERE The above plot does/does not give us faith that the data is normal as the points are/are not nicely entwined around the straight line.

EDIT IN SPSS THE TEST OF NORMAILTY TABLE TO REMOVE KOLMOGOROV-SMIRNOV TESTS PASTE SHAPIRO WILK TESTS OF NORMALITY TABLE HERE The Shapiro-Wilk (S-W) statistic does/does not give evidence of departure from normality (S-W(***) =***, p = ***).

Methods of Enquiry Business Statistics Activity Leader: Dr Iain Weir (iain.weir@uwe.ac.uk)

Session 4 Learning Journal question WHILST SALES IS THE RANDOM VARIABLE OF MAIN INTEREST WE STILL SHOULD SCREEN ADVERTISING AND KNOW SOMETHING ABOUT ITS DISTRIBUTION

Advertising Exploratory Data Analysis

PASTE CASE PROCESSING SUMMARY TABLE HERE

From the above we see that we have *** observations of advertising.

PASTE BOXPLOTS HERE The boxplot of sales reveals there are *** outliers.

PASTE DESCRIPTIVES TABLE HERE The mean weekly advertising figure is ***. The median weekly advertising figure is ***. The most spent on advertising in a week was ***. The least spent on advertising in a week was ***.

PASTE HISTOGRAM HERE Here we see a histogram of our weekly advertising expenditure. We can see that roughly from *** to *** is spent on advertising each week. The average amount of weekly advertising is approximately ***. The amount spent in advertising varies over a range of ***.

WE SHALL NOW CONSIDER THE RELATIONSHIP BETWEEN SALES AND ADVERTISING Methods of Enquiry Business Statistics Activity Leader: Dr Iain Weir (iain.weir@uwe.ac.uk)

Session 4 Learning Journal question

PASTE SALES V ADVERTISING SCATTERPLOT HERE THINK ABOUT WHICH OF THE TWO VARIABLES IS THE DEPENDENT VARIABLE AND THUS SHOULD BE PLOTTED ON THE Y AXIS!!!

From the above we can see the following. There appears to be a negative/no/a positive correlation between sales and advertising. The relationship between sales and advertising is linear/curved. The variability of sales is/is not constant over advertising. Thus it appears that simple linear regression is/is not appropriate for this data.

PASTE CORRELATION TABLE HERE


The Pearson correlation coefficient value of *** confirms what was apparent from the graph; there appears to be a very weak/weak/moderate/strong/very strong positive/negative correlation between the two variables.

There is/is not a significant correlation between sales and advertising (r=***, N=***, p=***).

PASTE MODEL SUMMARY OUTPUT HERE


From the above we can see that the model fits the data reasonably well; ***% of the variation in the sales values can be explained by the fitted line together with the advertising values. Conversely we have roughly a ***% of the variation not explained by the linear regression.

The standard deviation of sales around their expected values is ***.

PASTE MODEL COEFFICIENTS OUTPUT HERE


From the above we can see the following.

Methods of Enquiry Business Statistics Activity Leader: Dr Iain Weir (iain.weir@uwe.ac.uk)

Session 4 Learning Journal question


The intercept is ***. This is/is not significantly different to zero (p=***). The gradient is ***. This is/is not significantly different to zero (p=***). The expected sales value is given by: Sales = *** + *** advertising Thus we can see that for each 1 increase in advertising, the sales value is expected to increase by ***. The 95% confidence interval for this expected increase is *** to ***. The intercept for this example could be interpreted as the sales value (***) when there is no advertising. However this is extrapolation and thus cannot be relied upon!

PASTE MODEL CASE DIAGNOSTICS OUTPUT HERE From the above we can see that the ***th weekly observation does not fit the model too well. This week the model predicted sales of *** but we experienced *** less/more.

PASTE HISTOGRAM OF STD REDISUALS HERE The fitted normal curve does/does not match the observed residuals well. Thus the normality assumption does/does not seem reasonable.

PASTE Q-Q PLOT HERE


The plotted points do/do not follow the straight line fairly well. Thus the normality assumption is/is not met

PASTE STD RESIDUAL V STD PREDICTED VALUE PLOT HERE


From the above we can/cannot see a relationship between the residuals and the predicted values. Thus the fitted model is/is not consistent with the assumption of linearity.

Methods of Enquiry Business Statistics Activity Leader: Dr Iain Weir (iain.weir@uwe.ac.uk)

Session 4 Learning Journal question

PASTE FITTED LINE WITH 95% CI PLOT HERE The above plot gives us a visual idea of the predicted sales for various advertising expenditure. We can see that as you approach the extreme advertising values, the 95% confidence interval gets narrower/wider, indicating that the accuracy of our expected prediction is less/more.

Suppose you have been asked to predict the weekly sales for advertising expenditures of 1500 and 1800. Add these values at the bottom of the data set. Rerun the regression saving predicted values and 95% confidence interval.

PASTE SCREENSHOT OF DATA VIEW WITH PREDICTION + CI HERE State the predictions: For an advertising expenditure of 1500, the predicted sales is *** with a 95% confidence interval from *** to ***. For an advertising expenditure of 1800, the predicted sales is *** with a 95% confidence interval from *** to ***. Predictions from this model should be good as the R2 value is high (***%). However, predictions from extrapolation outside of the observed advertising expenditure data range (minimum *** to maximum=***) cannot be trusted. Thus we have reservations about the prediction for an advertising expenditure of 1500/1800.

SAVE THE COMPLETED LEARNING JOURNAL QUESTION AS THIS WILL FORM PART OF YOUR LEARNING JOURNAL SUBMISSION

Methods of Enquiry Business Statistics Activity Leader: Dr Iain Weir (iain.weir@uwe.ac.uk)

You might also like