You are on page 1of 15

Mindanao State University

Iligan Institute of Technology

Experimental Design

Problem Set
Student Instructor

Asaad, Al-Ahmadgaid B.
alstat.weebly.com alstatr.blogspot.com

Lopez, Rosadelima Visorbo

June 20, 2012

I.

Questions
3-1 The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. The following data have been collected: Mixing Techniques 1 2 3 4 3129 3200 2800 2600 3000 3300 2900 2700 Tensile Strength (lb/in2) 2865 2975 3985 2600 2890 3150 3050 2765

(a) Test the hypothesis that mixing techniques affect the strength of the cement. Use . (b) Construct a graphical display as described in Section 3-5.3 to compare the mean tensile strengths for the four mixing techniques. What are your conclusions? (c) Use the Fisher LSD method with to make comparisons between pairs of means. (d) Construct a normal probability plot of the residuals. What conclusion would you draw about the validity of the normality assumption? (e) Plot the residuals versus the predicted tensile strength. Comment on the plot. (f) Prepare a scatter plot of the results to aid the interpretation of the results of this experiment. 3-2 . (a) Rework part (b) of Problem 3-1 using Duncans multiple range test with . Does this make any difference in your conclusions? (b) Rework part (b) of Problem 3-1 using Tukeys test with . Do you get the same conclusions from Tukeys test that you did from the graphical procedure and/or Duncans multiple range test? (c) Explain the difference between the Tukey and Duncan procedures. 3-3 Reconsider the experiment in Problem 3-1. Find a 95 percent confidence interval on the mean tensile strength of the Portland cement produced by each of the four mixing techniques. Also find a 95 percent confidence interval on the difference in means for techniques 1 and 3. Does this aid you in interpreting the results of the experiment?

II.

Computational and Graphical Section


3-1 The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. The following data have been collected: Totals 2890 3150 3050 2765 11884 12625 11735 10665 (a) Test the hypothesis that mixing techniques affect the strength of the cement. Use . I. Hypotheses: H 0: H1: some means are different II. Level of significance: III. Test Statistics: Averages 2971 3156.25 2933.75 2666.25

Mixing Techniques 1 2 3 4 3129 3200 2800 2600

Tensile Strength (lb/in2) 3000 3300 2900 2700 2865 2975 2985 2600

IV. Rejection Region: V. Computation:

( )

( ) ( ) ( ) ( )

( ) [( ) ( ) ] ( ) ( )[ ] ( )

ANOVA Table Source Model Error Total Sum of Squares 489740.19 153908.25 643648.44 Degrees of Freedom 3 12 15 Mean Square 163246.73 12825.69 12.73 P-Value 0.0005

The F-value of 12.73 implies that the model is significant, since it is greater than the tabulated value, 3.49. And the p-value of it is also less than the level of significance. Thus, will lead to the rejection of the null hypothesis and conclude that the mixing techniques affect the strength of the cement significantly. (b) Construct a graphical display as described in Section 3-5.3 to compare the mean tensile strengths for the four mixing techniques. What are your conclusions?

Dashed line in the plot by color:

Red Mean of Treatment 4 (2666.25) Pink Grand Mean (2931.81) Brown Mean of Treatment 3 (2933.75) Green Mean of Treatment 1 (2971.00) Blue Mean of Treatment 2 (3156.25)

Based on the plot and from the data also, we would conclude that and are the same, refer also to the plot of question 3-1 (f). Morever, the differs from that of and , and that differs from and , and that and are different. How did I do it? First thing we need to do is to make a student t distribution with degrees of freedom N 1 = 15. After having that plot, we need to insert the four means of the treatment and locate it in the x-values. Now, since the mean values are not seen on the plot because its too large, we then convert it first to t-values, using the following formula,

You can confirm this in the R Codes Section (c) Use the Fisher LSD method with to make comparisons between pairs of means. ( )

Thus, any pair of treatment averages that differ in absolute value by more than 174.495 would imply that the corresponding pair of population means are significantly different. The differences in averages are

The starred values indicate pairs of means that are significantly different. Data Layout for Fisher LSD Method Group a b b c Treatment B A C D Means 3156.25 2971.00 2933.75 2666.25 .

Means with the same letter are not significantly different, at

(d) Construct a normal probability plot of the residuals. What conclusion would you draw about the validity of the normality assumption?

Nothing is unusual in the plot. The residuals met the normality assumption since the points fluctuate within the 95 percent confidence interval. (e) Plot the residuals versus the predicted tensile strength. Comment on the plot.

The points exhibits a little outward-opening funnel or megaphone, though not too obvious but still affect the non-constancy of the error variance. (f) Prepare a scatter plot of the results to aid the interpretation of the results of this experiment.

3-2. (a) Rework part (b) of Problem 3-1 using Duncans multiple range test with make any difference in your conclusions? Ranking the treatment averages in ascending order, we have The standard error of each average is

. Does this

From the table of )

( significant ranges for 12 degrees of freedom and , we obtain ( ) ( ) . Thus, the least significant ranges are ( ( ( The comparison would yield 2 vs. 4: 2 vs. 3: 2 vs. 1: ( ) ( ) ( ) ) ) )

( ( (

)( )( )(

) ) )

1 vs. 4: 1 vs. 3: 3 vs. 4:

( ) ( ) ( )

From the analysis we observed that there are significant differences between all pairs of means except 1 and 3. Data Layout for Duncans Multiple Range Test Group a b b c Treatment B A C D Means 3156.25 2971.00 2933.75 2666.25 Means with the same letter are not significantly different, at .

This makes no difference in the previous conclusion of LSD method, which confirms that the Duncans multiple range test and the LSD method produce identical conclusions. (b) Rework part (b) of Problem 3-1 using Tukeys test with . Do you get the same conclusions from Tukeys test that you did from the graphical procedure and/or Duncans multiple range test? ( ) ( )

Thus, any pair of treatment averages that differ in absolute value by more than 237.825 would imply that the corresponding pair of population means is significantly different. The four treatment averages are,

And the differences in averages are

The starred values indicate pairs of means that are significantly different. The conclusions are not the same. The mean of Treatment 4 is different than the means of Treatment 1, 2, and 3 in Duncans, and that mean of Treatment 2 is different than the means of Treatment 1 and 3. However, in Tukey the mean of Treatment 2 is not different than the means of Treatment 1 and 3. They were found to be different using the graphical method and the Fisher LSD method.

(c) Explain the difference between the Tukey and Duncan procedures. Tukey utilizes single critical value, while Duncan has several critical values. Morever, Tukey is based on the studentized range statistic while Duncan is based on standard error of each average. 3-3 Reconsider the experiment in Problem 3-1. Find a 95 percent confidence interval on the mean tensile strength of the Portland cement produced by each of the four mixing techniques. Also find a 95 percent confidence interval on the difference in means for techniques 1 and 3. Does this aid you in interpreting the results of the experiment?

Treatment 1:

Thus, the desired 95 percent confidence interval is Treatment 2: Thus, the desired 95 percent confidence interval is Treatment 3: Thus, the desired 95 percent confidence interval is Treatment 4: Thus, the desired 95 percent confidence interval is Treatment 1 - Treatment 3: ( ) ( ) ( )

Thus, the desired 95 percent confidence interval on the difference between Treatment 1 and 3 is The above computations performed gives us an idea that the corresponding population mean of every treatment means which we are estimating falls on the above intervals.

III.

R Codes Section
Note: You cannot run the codes in questions 3-1 (c), (d), and so on unless you run first the data inputted in the 3-1 (a). To avoid errors it is better to run the codes in every question first, starting from question 3-1 (a).

#(3-1.a) Test the hypothesis that mixing techniques affect the strength of #the cement. Use . #INPUT TensileData <- read.table(header = TRUE, text = " Treatment Observations Predicted A 3129 2971 A 3000 2971 A 2865 2971 A 2890 2971 B 3200 3156.25 B 3300 3156.25 B 2975 3156.25 B 3150 3156.25 C 2800 2933.75 C 2900 2933.75 C 2985 2933.75 C 3050 2933.75 D 2600 2666.25 D 2700 2666.25 D 2600 2666.25 D 2765 2666.25") attach(TensileData) Model<-aov(Observations~Treatment, data=TensileData) summary(Model) #OUTPUT Treatment Residuals --Signif. codes: Df Sum Sq Mean Sq F value Pr(>F) 3 489740 163247 12.73 0.000489 *** 12 153908 12826 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

#(3-1.b) Construct a graphical display as described in Section 3-5.3 to #compare the mean tensile strengths for the four mixing techniques. What are #your conclusions? #INPUT library(ggplot2) x <- seq(-4.5, 4.5, length = 90) xval <- c(2666.25, 2933.75, 2971, 3156.25) xvaltrn <- (xval - mean(xval))/(sd(xval)/sqrt(4)) tvalues <- dt(x,15) vlines <- data.frame(xint = c(xvaltrn,mean(xvaltrn)),grp = letters[1:5]) attach(vlines) qplot(x, tvalues) + geom_polygon(fill = "purple", colour = "purple", alpha = 0.5) + geom_point(fill = "purple", colour = "purple", alpha = 0.2, pch = 21) + geom_vline(data = vlines,aes(xintercept = xint, colour = grp), linetype = "dashed", size = 1) + theme_bw() + xlab(bquote(bold('x values with intercept of Average Tensile Strength (lb/in'^'2'*')'))) + ylab(expression(bold(P(x)))) + opts(title = expression(bold("Scaled t Distribution")), plot.title = theme_text(size = 20, colour = "darkblue"), panel.border = theme_rect(size = 2, colour = "red")) #OUTPUT #Refer to question 3-1 (b) of Computational and Graphical Section.

#(3-1.c) Use the Fisher LSD method with #pairs of means. #INPUT library(agricolae) LSD.test(Model,"Treatment") #OUTPUT Study: LSD t Test for Observations Mean Square Error: Treatment, 12825.69

to make comparisons between

means and individual ( 95 %) CI

A B C D

Observations 2971.00 3156.25 2933.75 2666.25

std.err replication LCL UCL 60.27852 4 2839.664 3102.336 67.98820 4 3008.116 3304.384 54.13621 4 2815.797 3051.703 40.48534 4 2578.040 2754.460

alpha: 0.05 ; Df Error: 12 Critical Value of t: 2.178813 Least Significant Difference 174.4798 Means with the same letter are not significantly different. Groups, Treatments and means a B 3156.25 b A 2971 b C 2933.75 c D 2666.25

#(3-1.d) Construct a normal probability plot of the residuals. What #conclusion would you draw about the validity of the normality assumption? #INPUT Residuals <- Observations Predicted library(ggplot2) library(MASS) df<-data.frame(x=sort(Residuals),y=qnorm(ppoints(length(Residuals)))) probs <- c(0.01, 0.05, seq(0.1, 0.9, by = 0.1), 0.95, 0.99) qprobs<-qnorm(probs) xl <- quantile(Residuals, c(0.25, 0.75)) yl <- qnorm(c(0.25, 0.75)) slope <- diff(yl)/diff(xl) int <- yl[1] - slope * xl[1] fd<-fitdistr(Residuals, "normal") #Maximum-likelihood Fitting of Univariate #Dist from MASS xp_hat<-fd$estimate[1]+qprobs*fd$estimate[2] #estimated perc. for the fitted #normal #var. of estimated perc v_xp_hat<- fd$sd[1]^2+qprobs^2*fd$sd[2]^2+2*qprobs*fd$vcov[1,2] xpl<-xp_hat + qnorm(0.025)*sqrt(v_xp_hat) #lower bound xpu<-xp_hat + qnorm(0.975)*sqrt(v_xp_hat) #upper bound

#Make sure you run the #attach(TensileData) first

df.bound<-data.frame(xp=xp_hat,xpl=xpl, xpu = xpu,nquant=qprobs) #The above codes was originally programmed by Julie B at stackoverflow.com, #Link to her stackoverflow profile: #http://stackoverflow.com/users/1200228/julie-b #Link to the posted question in stackoverflow: #http://stackoverflow.com/questions/3929611/recreate-minitab-normal#probability-plot ggplot(data = df, aes(x = x, y = y)) + geom_point(colour = "darkred", size = 3) + geom_abline(intercept = int,slope = slope, colour = "purple", size = 2, alpha = 0.5) + scale_y_continuous(limits=range(qprobs), breaks=qprobs, labels = 100*probs) + geom_line(data=df.bound,aes(x = xpl, y = qprobs), colour = "darkgreen", alpha = 0.5, size = 1) + geom_line(data=df.bound,aes(x = xpu, y = qprobs), colour = "darkgreen", alpha = 0.5, size = 1) + xlab(expression(bold("Residuals"))) + ylab(expression(bold("Normal % Probability"))) + theme_bw() + opts(title = expression(bold("Normal Probabiliy Plot of Residuals")), plot.title = theme_text(size = 20, colour = "darkblue"), panel.border = theme_rect(size = 2, colour = "red")) #OUTPUT #Refer to question 3-1 (d) of Computational and Graphical Section.

#(3-1.e) Plot the residuals versus the predicted tensile strength. Comment on #the plot. #INPUT library(colorRamps) ggplot(data = TensileData, aes(x = Predicted, y = Residuals)) + ylim(c(-210, 210)) + geom_point(aes(size = 3, colour = matlab.like(16))) + theme_bw() + xlab(expression(bold("Predicted Values"))) + ylab(expression(bold("Residuals"))) + opts(title = expression(bold("Residuals versus Fitted")), plot.title = theme_text(colour = "darkblue", size = 20), panel.border = theme_rect(size = 2, colour = "red"), legend.position = "none") #OUTPUT #Refer to question 3-1 (e) of Computational and Graphical Section.

#(3-1.f) Prepare a scatter plot of the results to aid the interpretation of #the results of this experiment #INPUT ggplot(data = TensileData, aes(factor(Treatment), y = Observations)) + geom_point(colour = "darkred", size = 3) + labs(y ="Percent" , x="Data") + geom_boxplot(aes(fill = factor(Treatment))) + xlab(expression(bold("Mixing Technique"))) + ylab(expression(bold("Strength"))) + theme_bw() + opts(title = bquote(bold('Mean of Tensile Strength (lb/in'^'2'*') by Treatment')), plot.title = theme_text(size = 20, colour = "darkblue"), panel.border = theme_rect(size = 2, colour = "red"), legend.position = "none") #OUTPUT #Refer to question 3-1 (f) of Computational and Graphical Section.

#(3-2.a) Rework part (b) of Problem 3-1 using Duncans multiple range test #with . Does this make any difference in your conclusions? #INPUT duncan.test(Model,"Treatment") #OUTPUT Study: Duncan's new multiple range test for Observations Mean Square Error: Treatment, means std.err replication 60.27852 4 67.98820 4 54.13621 4 40.48534 4 12825.69

A B C D

Observations 2971.00 3156.25 2933.75 2666.25

alpha: 0.05 ; Df Error: 12 Critical Range 2

174.4798 182.6303 187.5686 Means with the same letter are not significantly different. Groups, Treatments and means a B 3156.25 b A 2971 b C 2933.75 c D 2666.25

#(3-2.b) Rework part (b) of Problem 3-1 using Tukeys test with #you get the same conclusions from Tukeys test that you did #graphical procedure and/or Duncans multiple range test? #INPUT TukeyHSD(Model) #OUTPUT Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Observations ~ Treatment, data = TensileData) $Treatment diff B-A 185.25 C-A -37.25 D-A -304.75 C-B -222.50 D-B -490.00 D-C -267.50

. Do from the

lwr upr p adj -52.50029 423.00029 0.1493561 -275.00029 200.50029 0.9652776 -542.50029 -66.99971 0.0115923 -460.25029 15.25029 0.0693027 -727.75029 -252.24971 0.0002622 -505.25029 -29.74971 0.0261838

You might also like