Professional Documents
Culture Documents
16
b)
> library(MASS)
1: 0.07 9.0
3: 0.09 9.0
5: 0.08 9.0
7: 0.16 7.0
9: 0.17 7.0
31:
Read 30 items
> y = dat[,1]
> x = dat[,2]
>
> box.opt = box.c$x[which.max(box.c$y)] # choose the x value which has the maximum y
value
For the study of the concentration of a solution (Y) over time (X), the transformation that leads to the
smallest value of SSE corresponds to 𝜆̂ = 0.02. However, the upper and lower limits of 𝜆̂ can also be
considered and the values are -0.1 and 0.1, respectively.
Based on this interval and the ladder of power transformation table (which contains the most common
powers to be raised in the Box-Cox transformation), the logarithmic transformation was chosen (λ = 0 by
definition).
λ 𝒚′ = 𝒚𝝀 Name
2 𝑦2 Square
1 𝑦 Original (no transformation)
1/2 √𝑦 Square root
0 log(𝑦) Logarithmic
-1/2 1/√𝑦 Reciprocal square root
-1 1/𝑦 Reciprocal
c)
### Model logarithmic fit ###
> summary(slogfit1)
Coefficients:
Response: log10(y)
Source df SS MS F
Regression 1 SSR MSR F*
Error n-2 SSE MSE -
Total n-1 SST - -
Table 3. ANOVA table for the problem.
Source df SS MS F
Regression 1 4.582 4.582 1838.2
Error 13 0.032 0.0025 -
Total 14 4.614 - -
Hypotheses:
e)
> pred = predict(slogfit1) # prediction
Figure 3. Normal qqplot for the SLR model between the logarithm of the solution concentration and time.
Based on Figure 2., it can be seen ups and downs (curvature) of the points but it is hard to assume that
the trend exists. It might be possible to say that there is slight variability of the variances but it can only
be concluded if the Breusch-Pagan test stastistic is performed. Likewise, it is hard to conclude there is any
negative autocorrelation of the error terms. Some increases and decreases can be seen but it may not be
strong enough to conclude that the nonindependency exists in the population. A Durbin-Watson statistics
test would be required to get further conclusions. About the outliers, it seems that there is no value that
deviates a lot from the studentized residual = 0 line. Only two points have a studentized residual value
bigger than 1.5 which are not enough to be considered as outliers. A test for outlier using the Bonferroni
correction would be needed to get more accurate conclusions.
According to Figure 3., although the judgment is somewhat subjective, it is safe to assume that the data
matches with the straight line. The most deviated points are the three points close to -1 but still the
discrepancies are not too big. All the other points almost fit the straight line. Therefore, no concerns are
raised about the normality assumption and it is safe to conclude that all model errors are normally
distributed.
Problem 3.1
a)
> ###Data###
>
>
According to the shape of the plot, it seems there is no reason to be concerned about the equal variance
assumption. No pattern can be observed since the points are randomly spread out. Likewise, since there
is no particular trend, neither negative autocorrelation nor positive autocorrelation is verified for the
population. As a result, it is reasonable to declare that the population correlation of errors is independent.
b)
qqnorm(student,pch=16); abline(a=0,b=1)
Figure 5. Normality plot of the studentized residuals for the soccer data.
> shapiro.test(student)
data: student
Hypotheses:
From R code
𝑊 ∗ = 0.9190
𝑃 − 𝑣𝑎𝑙𝑢𝑒 = 0.1859
Statistical decision:
Since the Shapiro-Wilk test statistic is 𝑊 ∗ = 0.9190 with a 𝑃 − 𝑣𝑎𝑙𝑢𝑒 of 0.1859 that is greater than 0.05
(assumed), fail to reject 𝐻0 at the 0.05 level.
Practical conclusion:
Since it fails to reject 𝐻0 in favor of 𝐻𝑎 at the significance level of 𝛼 = 0.05, it can be declared that all
model errors {𝑒𝑖 } are normally distributed.
c)
> library(lmtest)
data: fit1
Statistical hypotheses:
Statistical conclusion:
Since it fails to reject 𝐻0 in favor of 𝐻𝑎 at the significance level of 𝛼 = 0.05, the homogeneity of variance
assumption for the model errors does hold.
The residuals showed in Figure 4. are randomly spread out without showing any specific tendency (no
linear or polynomial trend). Therefore, it is a good indication that the error variance is constant (it confirms
the Breusch-Pagan test statistic). However, it can be noticed that there are fewer points above the error
mean = 0 line than below it. Nevertheless, the vertical width seems to be the same above and below the
mean = 0 line and does not seem to increase or decrease. It is possible that more data would be required
to get a better conclusion through the plot.