You are on page 1of 6

Ridho Annabil Ghazy 05211640000115

AB-D
4.1 Breakfast Cereal’s Case Study
A. Which variables are quantitative/numerical? Which are ordinal? Which are nominal?

Numerical:
1. Calories 6. Carbo
2. Protein 7. Sugars
3. Fat 8. Potatass
4. Sodium 9. Vitamins
5. Fiber

Ordinal:
1. Shelf

Nominal:
1. Mfr 2. Type

B. Compute the mean, median, min, max, and standard deviation for each of the quantitative variables. This can be
done through R’s sapply() function (e.g., sapply(data, mean, na.rm = TRUE)).

R:

Output:

c. Use R to plot a histogram for each of the quantitative variables. Based on the
histograms and summary statistics, answer the following questions:
i. Which variables have the largest variability?
ii. Which variables seem skewed?
iii. Are there any values that seem extreme?

R:
Ridho Annabil Ghazy 05211640000115
AB-D

Output:
Ridho Annabil Ghazy 05211640000115
AB-D

1) Berdasarkan SD pada tiap variabel, yang mempunyai nilai variability terbesar adalah
sodium dan potass
2) Skewed variable : protein, fat, potass, vitamins, rating.
d. Use R to plot a side-by-side boxplot comparing the calories in hot vs. Cold cereals. What does this
plot show us?
R’s

Terlihat bahwa kalori pada cereal hot adalah 100 kalori, dan cold 50-160 kalori

f. Use R to plot a side-by-side boxplot of consumer rating as a function of the shelf height. If we
were to predict consumer rating from shelf height, does it appear that we need to keep all three
categories of shelf height?

R;
Ridho Annabil Ghazy 05211640000115
AB-D
Output:

Pada boxplot dapat dilihat bahwa untuk shelf 1 & shelf 3 memiliki rating yang hampir sama,
sehingga untuk ke depannya dapat dijadikan dalam se-kategori atau se-shelf untuk kedua shelf
tersebut karena memiliki rating yang sama.

f. Compute the correlation table for the quantitative variable (function cor()). In addition, generate a
matrix plot for these variables (function plot(data)).
i. Which pair of variables is most strongly correlated?
ii. How can we reduce the number of variables based on these correlations?
iii. How would the correlations change if we normalized the data first?

R:
Ridho Annabil Ghazy 05211640000115
AB-D
Output:

i. Positive: fiber dan potass (0,91) Negative : sugars dan rating (-0,76)
ii. Menggunakan Principal Component Analysis (PCA), karena menggunakan quantitative
variabel. Memilih variabel yang paling berpengaruh.
iii.
R;
Ridho Annabil Ghazy 05211640000115
AB-D

Output:

g. Consider the first PC of the analysis of the 13 numerical variables in Table 4.11. Describe briefly
what this PC represents.

Ekstrasi pada data dan diperoleh std = 83,7641; variance = 0,5395; dan cumulative proportion =
0,5395, sehingga didapatkan variabel paling berpengaruh pada PC pertama adalah sodium.

You might also like