You are on page 1of 8

The candy color is qualitative based on attributes or characteristics.

The number of candies per bag is


quantitative based on numerical measures.
CANDIES (how many)
Orange 698

BAGS (# of bags)

Yellow 726
Green 710
Purple 701
Total 3551

60

Summary statistics:

Column n Mean Std. dev. Median Range Min Max Q1 Q3 IQR Mode
Total 60 59.183
3.110
59
14 50 64 58 61.5 3.5
59
Outliers/Fences
Total candies 58-1.5(3.5) 61.5+1.5(3.5) Less than 52.75 More than 66.75
Mine was not an outlier.

Shape of distribution
Skittles Project 3
1- Height cannot/should not be used to determine the number of skittles in a bag.
2- Candy is the response variable (Y)
3- Height is the explanatory variable (X)

Is there a relationship between the two variables?


R (correlation coefficient) = 0.17042887
R-sq = 0.029046
n=60 CV=0.361
0.17 > 0.361
There is NOT a significant linear relationship.
Regression equation.
y= 0.1288 x + 50.7137
y = 0.1288 (63.5) + 50.7137 = 58.8925
Was it appropriate to use regression equation? No, because there is no linear relationship.

Regression output.
R2 = 0.0290
2.9% is the percentage variation in candies that can be explained by height.
This is not a good predictor of x because it is very small.
Yao Mings height of 90 inches is outside the scope; therefore, would not be appropriate and is outside
the scope.
Systematic sampling
52

64

57

70

58

61

59

80

61

65

62

66

Correlation coefficient
r= 0.1457
n=6 CV=0.811
0.1457 > 0.811
Regression equation

y = 0.2759 x + 51.620
Is there a significant linear relationship in the smaller data set?
There is NOT a significant linear relationship because R is less than the CV.
Project 4
Problem 1: Suppose you are going to randomly select two Skittles from the bag YOU purchased.
(a) What is the probability that both Skittles are purple if you select them with replacement? Give your
answer correct to four decimal places. (4 points)
.3115 x .3115 = .0970
(b) What is the probability that both Skittles are purple if you select them without replacement? Give
your answer correct to four decimal places. (4 points)
.3115 x .3000 = .0935
(c) What is the probability that at least one Skittle is purple if you select them with replacement? (4
points)
.6885 x .6885 = .4740 1-.4740 = .5260
Problem 2: Suppose all of the Skittles in the class data set are combined into one large bowl and you are
going to randomly select one Skittle.
(a) What is the probability that you select a green Skittle? (4 points)
710/3551 = .1999
(b) What is the probability that you select a Skittle that is NOT green? (4 points)
1-0.1999 = .8001
(c) What is the probability that you select a Skittle that is red OR yellow? (4 points)
716+726/3551 = .4061
(d) What is the probability that you select a Skittle that is orange GIVEN that it is a secondary color
(secondary colors are green, orange and purple)? (4 points)
698/710+698+701 = .3310

Problem 3: Suppose all of the Skittles in the class data set are combined into one large bowl and you are
going to randomly select ten Skittles with replacement and count how many are yellow.
Yellow = 726 n = 10 Total = 3551
(a) Show that this meets the requirements of the binomial probability distribution and identify n and p.
(5 points)

n = 10 p = 726/3551 = .2044
This is a binominal experiment because:
The number of trials is 10
The trials are independent
There are two possible outcomes of the experiment: Yellow or (Red, Orange, Green, Purple) Everything
else
The probability of success (Yellow) is .2044 and the probability of Everything else is .7956. The
probabilities are the same for each trial
(b) What is the probability that exactly 4 of the 10 Skittles are yellow? (4 points)
Binompdf = (10, .2044, 4) = .0930
(c) For samples of size 10, what is the expected value and standard deviation for the number of yellow
skittles that will be included? (4 points)
Expected value (mean) = 10 x .2044 = 2.044
Standard deviation = square root of 10 x .2044(1-.2044) = 1.275
Problem 4: For this problem, treat a 2.17 ounce bag of Skittles as an individual. Suppose the values for
our class data are the parameter values for all 2.17 ounce bags of Skittles. In other words, assume =
mean number of candies per bag in our class data set and = standard deviation of number of candies
per bag in our class data set (you computed these values in Part 2).
= 59.183 = 3.110 n = 32
(a) Describe the sampling distribution for the mean number of candies per bag for samples of 32 bags.
Include center, spread and shape. Note: The shape of the SAMPLING DISTRIBUTION is different from the
shape of the population, which you determined in Part 2 of the project. (5 points)
Center 59.183
Spread 3.11/square root of 32 = .5498
Shape Approximately normal because n more than or equal to 30.
(b) What is the probability that the mean number of candies per bag for a sample of 32 bags is greater
than 58.5? (4 points)
58.5-59.183/.5498 = -1.24 z-score = .1075 1-.1075= .8925
Project 5
Purpose and meaning of confidence intervalConfidence interval is an interval of numbers based on a point estimate that gives a range of likely
values for an unknown parameter. The purpose of a confidence interval is to give a percentage level of
confidence that the population proportion is between the lower bound and upper bound range.

Requirements for confidence interval for a population proportion

An approximately normal sampling distribution of p-hat np(1-p) more than or equal to 10


Independent trials n greater than or equal to 0.05N

Requirements for population mean

Sample data come from a SRS or randomized experiment


Sample size is small relative to the population size (n less than or equal to 0.05N)
The data come from a population that is normally distributed or the sample size is large (n
greater than or equal to 30)

99% confidence interval estimate for true proportion of yellow candiesConditions- SRS? N greater than or equal to 30? Population normal or n greater than or equal to 30
Yellow = 726 n = 3551 p-hat = .2044 alpha/2 = .005 Z alpha/2 = 2.575
.2044 - 2.575 square root .2044(1-.2044)/3551 = .1870
.2044 + 2.575 square root .2044(1-.2044)/3551 = .2218
With a 99% confidence the true proportion of the yellow candies is .1870 and .2218 or 18.70% and
22.18%.
99% confidence interval estimate for true proportion of yellow candies in my bag of Skittles
Yellow = 15 n= 61 (total candies) p-hat = .2459 alpha/2 = .005 Z alpha/2 = 2.575
.2459 - 2.575 square root .2459(1-.2459)/61 = .1039
.2459 + 2.575 square root .2459(1-.2459)/61 = .3879
With a 99% confidence the true proportion of the yellow candies in my bag is .1039 and .3879 or 10.39%
and 38.79%.
95% confidence interval estimate for the true mean number of candies per bag
n = 60 (bags) s = 3.110 mean = 59.183 T alpha/2 = 2.000
59.183 2.000(3.110/square root 60)
59.183 + 2.000(3.110/square root 60)
With a 95% confidence the true mean number of candies per bag is between 58.114 and 60.252.
Calculated using T-Interval T-Test
Based on the true mean number of candies calculated above my bag of Skittles was NOT a likely value
for the population mean.
My bag = 61
Expected value between 58.114 and 60.252

Part 6: Hypothesis Tests


Submit a paper that includes the following:
Explain in general the purpose and meaning of a hypothesis test. (4 points)
Hypothesis testing is a procedure based on sample results and probability that tests hypotheses about
the population. We cannot state with 100% certainty that a hypothesis statement is true; we can only
determine whether the sample data support the statement or not.
Using values for the class data that you computed in Part 2 of the project and a 0.05 significance level,
test the claim that 20% of all Skittles candies are red. Show all the steps (neatly written and scanned,
typed, or copied from StatCrunch) including:
1. the hypotheses with correct notation (4 points)
H0: p = .20
H1: p .20
2. the conditions for performing the hypothesis test, along with checking that they are methint: they
are not all met! (5 points)

Simple Random sample or data from randomized experiment. The purchase of the bags was not
a SRS, it was a convenience sample.
np0(1-p0) 10 3551 x .20 (1-.20) 10
sampled values are independent of each other (n .05N)

3. the test statistic (2 points)


1 proportion Z test.
n = 3551
z0 = .243
4. the p-value (2 points)
p-value = .808
5. the appropriate decision about the null hypothesis and an appropriate conclusion (4 points)
= .05 .808 > .05
Do not reject the null hypothesis. There is insufficient evidence to conclude that 20% of the Skittles are
red.
6. Also describe the Type I and Type II errors for this test. (8 points)
Type 1 error Reject null hypothesis when the null hypothesis is actually true

Reject that 20% of the Skittles are red (the null hypothesis), when 20% of the Skittles are actually red.
Type II error Do not reject the null hypothesis when the alternative hypothesis is actually true.
Do not reject that 20% of the Skittles are red (the null hypothesis), when the proportion of red Skittles is
not equal to 20%.
Using values for the class data that you computed in Part 2 of the project and a 0.01 significance level,
test the claim that the mean number of candies in a bag of Skittles is more than 58. Show all the steps
(neatly written and scanned, typed, or copied from StatCrunch) including:
1. the hypotheses with correct notation (4 points)
H0: = 58
H1: > 58
2. the conditions for performing the hypothesis test, along with checking that they are methint: they
are not all met! (5 points)

Simple Random sample or data from randomized experiment. The purchase of the bags was not
a SRS, it was a convenience sample.
No outliers and comes from a normal population OR sample size larger than 30
Sampled values are independent of each other

3. the test statistic (2 points)


T-Test
Mean = 59.183
Standard deviation = 3.11
t0 = 2.95
4. the p-value (2 points)
p-value = .002
5. the appropriate decision about the null hypothesis and an appropriate conclusion (4 points)
= .01 .002 < .01
Reject the null hypothesis. There is sufficient evidence to conclude that there are more than 59.183
candies per bag.
6. Also interpret the p-value for this test. (4 points)
If the number of Skittles in each bag is 58, the probability of observing a sample result as extreme or
more extreme than 59.183 is .002.

You might also like