You are on page 1of 5

Name: Johnny Medina Lopez

JHUAdvanced Academic Programs


Graduate Program for Applied Economics
Statistics 440.605.82
Summer 2017
HWUNIT #9 DUE TUES July 18th by 11:59PM (EST)

Please read each question carefully and answer it completely. (Note that although you are not
always required to show work on the homework assignments, you may be required to do so on
exams.)

#1. In the EXCEL file for this week you are given data taken from the World Banks Doing Business
website: http://www.doingbusiness.org/. You will consider whether the means for total tax rate are the
same across the three regions for which you have data.
(a) Please conduct a four-step ANOVA hypothesis test for the test described above, set = 0.04. [Note
that if you run this test in EXCEL you will want to re-format how the data is reported.]

Answer:
Step 1:
The hypotheses of interest in an ANOVA are as follows:
H0: 1 = 2 = 3
H1: Means are not all equal.
Step 2:
Since we are considering a 96% confidence interval, we reject null hypothesis if p<0.04
The degrees of freedom here are df1=2 and df2=27 and at significance level of 0.04, F=3.635 hence reject
null hypothesis if F>3.635

Step 3:
ANOVA table is as follows

ANOVA

Sum of Squares df Mean Square F Sig.

Between Groups 1396.769 2 698.384 5.736 .008


Within Groups 3287.546 27 121.761
Total 4684.315 29

Step 4:
Here since p<0.04 and F>3.635, we reject the null hypothesis. There is a significant difference in the
means between the groups.

(b) If appropriate based on your results from (a), please conduct post-hoc testing for pairwise comparisons
as described as Fishers LSD in 13.3 on p. 565 [i.e. please use the procedure finding the specific LSD.]
You do not need to conduct four step hypothesis tests for these comparisons, just please highlight
which pairs reject the null hypothesis of no difference. Continue to use = 0.04. When would such a
post-hoc procedure not be appropriate with such an ANOVA model? [Hint: Be sure to first discuss

1
specifically how the null hypothesis between the ANOVA F and the LSD tests differ. Then, feel free to go
further and discuss other concerns researchers have about post-hoc tests in general.]

Answer:
The results of Fishers LSD are as follows.

Multiple Comparisons

96% Confidence Interval

(I) GP (J) GP Mean Difference (I-J) Std. Error Sig. Lower Bound Upper Bound
CENTRAL/S AMER
ASIA -15.91000* 4.93479 .003 -26.5584 -5.2616

SUB SAH AFRICA


-3.52000 4.93479 .482 -14.1684 7.1284

CENTRAL/S AMER ASIA 15.91000* 4.93479 .003 5.2616 26.5584


SUB SAH AFRICA
12.39000* 4.93479 .018 1.7416 23.0384

SUB SAH AFRICA ASIA 3.52000 4.93479 .482 -7.1284 14.1684

CENTRAL/S AMER
-12.39000* 4.93479 .018 -23.0384 -1.7416

*. The mean difference is significant at the 0.04 level.

From the above table, we can see that there is no significant difference between the means of ASIA and
SUB SAH AFRICA. The remaining two pairs possess a significant difference between their means.

Following one-way analysis of variance (ANOVA), you may want to explore further and compare the
mean of one group with the mean of another. One way to do this is by using Fisher's Least Significant
Difference (LSD) test.

The Fisher's LSD test begins like the Bonferroni multiple comparison test. It takes the square root of the
Residual Mean Square from the ANOVA and considers that to be the pooled SD. Taking into account the
sample sizes of the two groups being compared, it computes a standard error of the difference between
those two means. Then it computes a t ratio by dividing the difference between means by the standard
error of that difference. To compute a P value and confidence interval, the Fisher's LSD test does not
account for multiple comparisons (but see the section on the protected LSD test below). In this respect, it
is quite different than the Bonferroni, Tukey and Dunnett methods. The Fishers LSD test is basically a set
of individual t tests. The only difference is that rather than compute the pooled SD from only the two
groups being compared, it computes the pooled SD from all the groups. If all groups are sampled from
populations with the same SD, using all the data to compute the pooled SD gives a more accurate value
for the SD (usually) and this shows up as more degrees of freedom.

(c) Describe the issues of finding the appropriate Type I Error for your post-hoc results. How would
you apply the appropriate Boneferroni correction to the tests you ran in (b)?

From the LSD in (b), we can see that although two pairs of groups state a significant difference in the
menas, there is one pair where there is no significant difference thereby accepting Null hypothesis.

2
Overall ANOVA results suggest that there is a significant difference between the groups. Hence the post-
hoc tests can identify the Type I Error in the problem.

The Bonferroni adjustment will always provide strong control of the family-wise error rate. This means
that, whatever the nature and number of the tests, or the relationships between them, if their assumptions
are met, it will ensure that the probability of having even one erroneous significant result among all tests
is at most , your original error level. It is therefore always available.

Boneferroni post-hoc results are as follows.

Multiple Comparisons

96% Confidence Interval

Mean Difference Std. Lower Upper


(I) GP (J) GP (I-J) Error Sig. Bound Bound

Bonferroni ASIA CENTRAL/S


-15.91000* 4.93479 .010 -28.9808 -2.8392
AMER

SUB SAH
-3.52000 4.93479 1.000 -16.5908 9.5508
AFRICA

CENTRAL/S ASIA 15.91000* 4.93479 .010 2.8392 28.9808


AMER SUB SAH
12.39000 4.93479 .055 -.6808 25.4608
AFRICA

SUB SAH ASIA 3.52000 4.93479 1.000 -9.5508 16.5908


AFRICA CENTRAL/S
-12.39000 4.93479 .055 -25.4608 .6808
AMER

*. The mean difference is significant at the 0.04 level.

#2. Consider the information for #25 on p. 575. Note that you are provided this data in the EXCEL
file/worksheet midwestgas. [question begins with The price drivers pay for gasoline ]
(a) Please conduct a four-step ANOVA hypothesis test as described in the textbook question (i.e. that
means are equal across brands, set = 0.05.) IMPORTANT: If you run this test using the Data Analysis
Toolpak in EXCEL, make sure you report the correct F-statistic as EXCEL will report two F-statistics,
see pp. 573 and 595 to see the example from the textbook. Also, make sure you read the instructions on p.
595 to be sure you select the correct command in EXCEL to correspond to the randomized block
design.

Answer
Step 1:
The hypotheses of interest in an ANOVA are as follows:
H0: Equals Means across Rows
H1: Means across rows are not all equal.

3
H0: Equals Means across Columns
H1: Means across Columns are not all equal.

Step 2:
Since we are considering a 95% confidence interval, we reject null hypothesis if p<0.05
The degrees of freedom for rows here are df1=10 and df2=20 and at significance level of 0.05, F=2.348
hence reject null hypothesis if F>2.348
The degrees of freedom for columns here are df1=2 and df2=20 and at significance level of 0.05, F=3.493
hence reject null hypothesis if F>3.493

Step 3:
ANOVA table is as follows

Anova: Two-Factor Without Replication

SUMMARY Count Sum Average Variance


Akron, OH 3 11.38 3.793333 0.001033
Cincinnati, OH 3 11.42 3.806667 0.006033
Cleveland, OH 3 11.61 3.87 0.0004
Columbus, OH 3 11.32 3.773333 0.000233
Ft. Wayne, IN 3 11.54 3.846667 0.000433
Indianapolis, IN 3 11.56 3.853333 0.000233
Lansing, MI 3 11.96 3.986667 0.003033
Lexington, KY 3 11.36 3.786667 3.33E-05
Louisville, KY 3 11.41 3.803333 0.001033
Muncie, IN 3 11.48 3.826667 0.000233
Toledo, OH 3 11.38 3.793333 0.008233

Shell 11 41.8 3.8 0.00468


BP 11 42.29 3.844545 0.004867
Marathon 11 42.33 3.848182 0.003856

ANOVA
Source of Variation SS df MS F P-value F crit
Rows 0.108006 10 0.010801 8.298487 3.52E-05 2.347878
Columns 0.015836 2 0.007918 6.083818 0.008632 3.492828
Error 0.02603 20 0.001302

Total 0.149873 32

Step 4:

4
In both rows and columns comparison, we can see that the p value is lesser than the significance value of
0.05 and critical F value is greater than the test statistic. Hence there is a significant difference between
the means of the rows and there is also a significant difference between the means of the columns.

(b) Explain what the blocks are in this experiment and how they change the ANOVA model from a
randomized/observational design such as described in 13.2.

Answer:
The completely randomized design is probably the simplest experimental design, in terms of data analysis
and convenience. With this design, participants are randomly assigned to treatments. With a randomized
block design, the experimenter divides participants into subgroups called blocks, such that the variability
within blocks is less than the variability between blocks. Then, participants within each block are
randomly assigned to treatment conditions. Because this design reduces variability and potential
confounding, it produces a better estimate of treatment effects.

#3. You are given the following pmf for X and Y:

X
0 1 2
0 0.05 0.10 0.03
Y 1 0.21 0.11 0.19
2 0.08 0.15 0.08

For this pmf please find:


(a) The marginal pmfs for X and Y.
(b) E(X) and E(Y).
(c) The conditional distributions for (X|Y) X given Y and (Y|X) Y given X.
(d) E(Y|X) and Var(Y|X).
(e) Show the Variance Decomposition for the Variance of Y (you must compute all three to
appropriately show this):
Var(Y) = Vx[E(Y|X)] + Ex[V(Y|X)]

You might also like