You are on page 1of 7

Page 1 of 7

Trident University International


Ph.D. B.A. Program Class RES600:
Introductory to Data Analysis

Professor: Dr. Truel
Student: Anh Tran
E-mail: Anh.NTran@my.trident.edu
Phone: 714-904-6209


Subject

Date

From
Case 4 for Module 4: Causality and
inference: Structuring the well-formed
hypothesis
2-Dec-2013

A. Tran




References
1. Bryant, Adam, INVESTING IT: Duffers Need Not Apply, The New York
Times, May 31, 1998, Section 3, page 1
2. Knowledge Base, In SPSS, how do I find outliers in my regression?, Indiana
University, 2013, url: http://kb.iu.edu/data/afho.html.




A. Introduction
The purpose of this report are 1) to review the value of Adam Bryants articles [1] in the view of
statistical approaches and 2) to provide recommendations if the study should be conducted different
and the technical reasons for this approach.


Page 2 of 7

B. Analysis

In 1998, Adam Bryant [1] wrote an interesting article titled INVESTING IT :
Duffers Need Not Apply to propose that there was a significant relationship between the
company stock performance and the golf skill of these companys CEO. According to
Bryant [1], the support for this hypothesis is based on the good correlation of three golf
handicap sample means of the CEOs from three groups of good, average, and below-average
stock performances.

Because the interesting and very entertaining proposal of Mr. Bryant in this article,
its important to examine these data and his approach carefully from the statistical
standpoints.

After extensively reviewing his article, three main issues of Mr. Bryants thesis have
been observed as follows:

First, by selecting the Golf Digest survey data, which was collected based on the golf
handicap self-graded data from their readers, Mr. Bryant introduced the sampling frame
error, where certain sample elements are excluded or the entire population is not accurately
represented. In addition, the sample of 51 CEOs who voluntarily reported their golf
handicaps probably may not present the population of all CEOs, who played golf.

Second, assuming that the sample was collected properly from the statistical
standpoints, and that there was a significant correlation between the CEOs golf handicaps
and their companys stock performance, its not adequate and sufficient to infer a casuality
between these variables such as a good CEO golf handicap causes a good stock
performance. Therefore, Bryans approach was flaw in this regard.

Third, in Bryants article, Mr. Crystal deleted seven points from his analysis as
shown in Figure 4 ( observations numbers 45, 46, 47, 48, 49, 50 and 51 in SPSS
Handicap_stockrate.sav data file ) without providing any statically sound criteria for
rejecting these data.

Its common that when a sample of N observations of a variable is obtained, there
may be some observations that appear to differ markedly from the others. If some mistakes
in the survey technique are identified, these observations so called outliers can simply be
discarded. If mistakes are not found, in this case, a statistical criterion must be used to
identify observations that can be considered for discard.

In order to determine if Mr. Crystals decision to discard these seven observations is
valid or not, an influence of these discarded data on the correlation of the company stock


Page 3 of 7

performance versus their CEOs golf handicaps is examined through linear regression. From
Figures 1 and 2, its very clear that these discarded observations have a strong effect on the
correlation. For a full data set (55 observations), there is a very weak correlation (corr = -
0.04145) between these two variables. For a reduced data set (44 observations), there is a
strong correlation (corr=-0.4172). This conclusion is also supported by the descriptive
statistical analysis of the means of the CEO golf handicaps for three groups of good,
average, and below-average stock performances as shown in Figure 3. From Figure 3, the
44-observation sample shows a monotonic increase of the CEO gold handicaps from 12.42
to 14.56 to 17.22 across the three groups of good, average, and below-average stock
performances. However, the full sample shows a non-linear relationship with a value going
from 17.10 to 13.71 to 16.87 across the same stock performance groups.


Figure 1
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.41448
R Square 0.171794
Adjusted R Square 0.152075
Standard Error 19.09969
Observations 44


Page 4 of 7



Figure 2


Figure 3
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.04172
R Square 0.00174
Adjusted R Square -0.0186
Standard Error 25.3818
Observations 51


Page 5 of 7

From these above analyses, it may be correct that some of these observations are outliers.
However, its a flaw to just discard the observations based on the a statistical criterion must
be used to identify points that can be considered for rejection.

In SPSS, one of the criteria to identify outliers is to compare the centered leverages of each
observations and compare them with the significant value defined by 2*p/n, which lies between
0 and (n-1)/n, where n is the number of observations and p is the number of independent or
explanatory variables. Its recommended that you might like to discard the observations for
which leverage is greater than 2*p/n. For this data, the 2*p/n is calculated to be 0.03922.

The observations and their calculated centered leverages are shown in Figure 4. The SPSS
criteria shows that the last ten observations can be considered as outliers. Therefore, Mr.
Crystals decision to discard these last seven observations is valid from the statistical influence
standpoint. It can also be observed that the percentage on the observations being discarded may
not be used to invalidate the selected data.


Figure 4


Page 6 of 7

Now that Mr. Crystals data is valid and there is a good observation on the correlation
between the CEOs golf handicaps and their companys stock performance, the next question
will be if the differences of the means of these golf handicaps between these stock performance
groups are statically significant. In order to address this concern, an ANOVA test is conducted.

Figure 5 shows the ANOVA analysis on the 51-data and the 44-data sets. From Figure 5, we
can observe that the p-values of the ANOVA F-test for the 51-data and the 44-data sets are
0.156 and 0.077 respectively. The p-values are greater than 0.05. Therefore, it can be concluded
that there is no sufficient evidence to reject the null hypothesis that all means are equal. The
ANOVA results also shows that the outliers has a significant influence on outcome. The p-value
decrease from 0.156 for the 55-data set to 0.077 for a 44-data set. In brief, we can claim that
Bryans conclusion is invalid and wrong.



Figure 5

The independent samples t-tests between the group 1 (below-average stock performance)
versus the group 2 ( average stock performance ), and the group 2 ( average stock performance )
versus group 3 ( good stock performance ) show the same conclusion that there is not enough
evidence to reject the null hypothesis and no statistically significant difference between the
handicap means of these groups. The t-test results are shown in Figure 6.

Its not quite clear why the nationally well-respected newspapers such as the New York
Times had published this article, where the conclusion is not well supported. However, its
suspected that the entertaining value of the story to try to link two hot subjects: business success
and golf success may give the NYT editors enough reason to publish it.



Page 7 of 7



Figure 6

You might also like