You are on page 1of 15

Lesly Pineda

Part 2: Descriptive statistics


Skittles Project

457

416 412 407 406

Colors
Lesly Pineda

Candy color is Qualitative/Nominal, no natural order.

Table describing Class Data results for Skittles by color:


Column Sum
Red 416
Orange 457
Yellow 406
Green 412
Purple 407
Total 2098

Quantitative Data: Number of Candies per bag


This is Quantitative/Ratio because we are counting how many candies are per bag. (volume)
(natural order means more, natural zero means none)
Summary statistics:
Column n Mean Std. dev. Unadj. std. dev. Min Q1 Median Q3 Max IQR Range Mode

Total 35 59.942857 2.3001644 2.2670668 54 59 60 61 65 2 11 60


Lesly Pineda

Frequency of skittles per bag of candy: Histogram

Number of Skittles per bag


Lesly Pineda

Frequency of skittles per bag of candy: boxplot

Lower Fence: 59-1.5(2) = 56

 Outliers: 54, 55
Upper Fence: 61+1.5(2) = 64

 Outlier: 65
From my observation the shape of the Quantitative data is of a bell shape: symmetric, highest
frequency in the middle with frequencies tailing off to the left and right. However, for Qualitative it is
inappropriate to discuss shape since the data is of candy color.

My individual’s bag total


red 13
orange 12
yellow 13
green 8
purple 11
Total: 57

In my personal bag of skittles, I had 57 candies which fits int the bag range. No outliers.
Lesly Pineda

Part 3: correlation and regression

Research question:

Can height be used to predict the number of candies that will be in a bag of Skittles you
purchase?
I don’t thing height is used to predict the number of candies that will be in a bag of skittles.

- Number of candies per bag: response variable


- Height of the person: explanatory variable

Simple linear regression results:


Dependent Variable: Total
Independent Variable: Height
Total = 64.32367 - 0.066960037 Height
Sample size: 35
R (correlation coefficient) = -0.12087521
R-sq = 0.014610816
Estimate of error standard deviation: 2.3176362

Parameter estimates:
Parameter Estimate Std. Err. Alternative DF T-Stat P-value
Intercept 64.32367 6.2749803 ≠ 0 33 10.250816 <0.0001
Slope -0.066960037 0.095724999 ≠ 0 33 -0.69950418 0.4891

Analysis of variance table for regression model:


Source DF SS MS F-stat P-value
Model 1 2.6282771 2.6282771 0.4893061 0.4891
Error 33 177.25744 5.3714375
Total 34 179.88571
Lesly Pineda

There is no significant relationship between the two variables.

n= 35 CV= 0.361

(n>30 CV=0.361)

|r|= -0.12087521

r is negative, and r is not greater than the critical value .361, so no linear relation
exists.

r2= 0.014610816

r2 is 1.46% of the variation in the number of candies per bag is explained by the least
squared regression line.

This matches what I said at the beginning, height could not be used to predict number of candies.
Lesly Pineda

Based on the regression equation, how many candies would be expected to be in a bag purchased by
someone who is 63.5 inches tall? Was it appropriate to use this regression equation to make this
prediction? Why or why not?

“Y^”= -0.0669x + 64.32

“y^”= -0.0669(63.5)+ 64.32

“y^”= 60.07

- No, it’s inappropriate because height has nothing to do with the number of candies one
gets.

Assume there is a significant relationship between height and number of candies per bag. Would it be
appropriate to predict the number of candies in a bag purchased by retired Houston Rockets player Yao
Ming, who is 90 inches tall? Why or why not?

- No, it would be inappropriate. Don’t predict outside the scope of the model, meaning not
using the regression model to make predictions for values of the explanatory variable that
are much larger or much smaller than those observed.

Systematic sample:

X height” Y number of
candies
69 56
64 59
63 60
74 60
67 61
60 62
70 62

Regression equation: y^= -0.986x + 66.4

a= -.0959915612

b= 66.40400844

r2= .0479957806

r= -.2190793934

Critical value (7) = 0.754

R is less than the critical value .754 there is no significant linear relation.
Lesly Pineda

Project part 4: probability


Problem 1: Suppose all of the Skittles in the class data set are combined into one large bowl
and you are going to randomly select one Skittle.
a) What is the probability that you select a green Skittle?

There are 2098 total skittles. There are 412 green skittles.
412/2098=.1963
There is 19.63% chance that you will select a green skittle out of the bowl.

b) What is the probability that you select a Skittle that is NOT green?

There are 2098 total skittles and 412 green skittles.


2098-412= 1686 “not green” skittles.
1686/2098= 80.36% chance that you select a skittle that is not green.

c) What is the probability that you select a Skittle that is red OR yellow?

There are 2098 total skittles, 416 of them are red, and 406 of them are yellow.
416+406=822, 822 skittles are red or yellow
822/2098=.3918. there is a 39.18% chance that you select a red or yellow skittle out of
the bowl.

d) What is the probability that you select a Skittle that is orange GIVEN that it is a secondary
color (secondary colors are green, orange and purple)?

There are 457 orange, 412 green, and 407 purple skittles.
457+412+407=1276 secondary colors total.
457/1276=.3581 there is a 35.81% chance you will select a skittle that is orange given
it’s a secondary color.
Problem 2: Suppose you are going to randomly select two Skittles from the bag YOU purchased.

a) What is the probability that both Skittles are purple if you select them with
replacement? Give your answer correct to four decimal places.

(11/57) x (11/57) = .0372

b) What is the probability that both Skittles are purple if you select them without
replacement? Give your answer correct to four decimal places.

(11/57) x (10/56) = .0345


Lesly Pineda

c) What is the probability that the first skittle is purple, and the second skittle is not purple
if you select them with replacement?

(11/57) (1-11/57) = (11/57) (46/57) =.1557

d) What is the probability that at least one Skittle is purple if you select them with
replacement?

1-(46/57)(46/57) = .3487

Problem 3: Suppose all of the Skittles in the class data set are combined into one large bowl
and you are going to randomly select ten Skittles with replacement and count how many are
yellow.

a) List the requirements of the binomial probability distribution and show that this meets
them, including identifying the values for n and p.

Fixed number of trails n= number of skittles being selected= 10


Independent rails since the skittles are being selected with replacement
Constant probability of success: 406/2098=.1935
Two outcomes possible: yellow or not yellow
It does meet the requirements because there are two outcomes possible.

b) What is the probability that exactly 4 of the 10 Skittles are yellow?

Binompdf (10, 406/2098 ,4) = .0810346466

c) What is the probability that at most 2 of the 10 Skittles are yellow?

Binomcdf (10, 406/2098 , 2)= .6972881049

d) For samples of size 10, what is the expected value and standard deviation for the
number of yellow skittles that will be included?

Mean= 10 x (406/2098) = 1.935


Standard deviation = 10 x .1935(1-.1935) = 1.5605 sqrt (1.5605) = 1.25
Lesly Pineda

Project Part 5: Sampling Distributions and Confidence Intervals


Assume p = the proportion of yellow candies for all Skittles = 0.2. Describe the sampling distribution for
the proportion of yellow candies for samples of 85 candies, including center, spread, and shape (justify
your answers).

 Mean: P: 0.2
 Std Dev: σ ^ p= sqrt 0.2 (1-0.2) / 85= .043
 Shape approximately normal np(1-p) = 85(0.2) (1-0.2) =13.6 value is greater than or equal to 10.

Explain in general the purpose and meaning of a confidence interval.

 A confidence interval is for an unknown parameter consists of an interval of numbers based on a


point estimate. They are used to give a range of likely values to estimate a population parameter.

Using values for the class data that you computed in Part 2 of the project, construct a 99% confidence
interval estimate for the true proportion of yellow candies using the class data as your sample. Remember
that for this computation, n is the number of CANDIES for the entire class data. Include all your work,
showing the formula used and appropriate values inserted (neatly written and scanned or typed) or
including the appropriate calculator commands and inputs.

 X = 406 # of yellow skittles


 n = 2098 total # of all skittles
 C = .99
 Calc > Stats > Tests > 1-PropZInt
 (.1713 , .21573) p^=.1935

Give an appropriate interpretation of your interval

 With 99% confident the true proportion of yellow skittles is between .1713 and .21573 .

Based on your interval for the true proportion of yellow candies, was the proportion of yellow candies in
the single bag of candy you purchased a likely value for the true population proportion? Explain how you
know using actual values from your data and computations.

 No, from the bag I purchases I had 13 yellow skittles out of a total of 57 for a proportion of .228.
13/57= .22807 My bag falls a above .215 needed and doesn’t fall with the range of .171 and .215
.

Assume μ = mean number of candies per bag for all 2.17 oz bags of Skittles = 60 candies and σ = standard
deviation of number of candies per bag for all 2.17 oz bags of Skittles = 2.5. Describe the sampling
distribution for the mean number of candies per bag for samples of 32 bags, including center spread, and
shape (justify your answers)

 μ = 60
 σ = 2.5
 32 bags
Lesly Pineda

 2.5 / sqrt 32= .442 Shape approximately normal, since n=32 is greater than or equal to 30 .

Based on this sampling distribution, what is the probability that a sample of 32 bags will have a mean of
less than 59 candies per bag?

 Calc > Distr > normalcdf (-1E99,59,60,.442)=.0118

Using values you computed in Part 2 of the project, construct a 95% confidence interval estimate for the
true mean number of candies per bag using the class data as your sample, but for this computation, n is
the number of BAGS. Make sure you use the correct standard deviation from Part 2, which treats the class
data set as a sample. Include all your work, showing the formula used and appropriate values inserted
(neatly written and scanned or typed) or including the appropriate calculator commands and inputs.

 n = 35
 x̅ = 59.942857
 s = 2.3001644
 c = .95
 Calc > Stats > Tests > Tinterval > Stats
 ( 59.153 , 60.733)

Give an appropriate interpretation of your interval.

 With 95% confidence the true mean number of skittles per bag is between 59.153 and 60.733 .

Based on your interval for the true mean number of candies per bag, was the total number of candies in
the single bag you purchased a likely value for the population mean? Explain how you know using actual
values from your data and computations.

 My bag of skittles contained 57 skittles so it did not fall within the like values of 59.153 and 60.733
for the population mean.
 57, is NOT likely.

Part 6: Hypothesis Tests


Submit a paper that includes the following (see rubric below):

 Explain in general the purpose and meaning of a hypothesis test. (4 points)

A hypothesis test is a procedure based on sample results and probability that tests
hypotheses about the population(s).
Lesly Pineda

A hypothesis test examines two hypotheses about a population: the null hypothesis and
alternative hypothesis. the null hypothesis is a statement to be tested indicating no change,
effect, difference or relationship in the population, assumed to be true until evidence
indicated otherwise, always contains “=”. The alternative hypothesis is a statement that we
are trying to find evidence to support.

 Using values for the class data that you computed in Part 2 of the project and a 0.05
significance level, test the claim that 20% of all Skittles candies are red. Show all the steps
(neatly written and scanned, typed, or copied from StatCrunch) including:
1. the hypotheses with correct notation (4 points)

Ho: P = .20

H1: P ≠ .20

2. the conditions for performing the hypothesis test, along with checking that they are
met…hint: they are not all met! (5 points)

(Not met): The sample is a simple random sample.


(Met): Npo (1-po) ≥ 10
2098(.20)(1-.20)=335.68 ≥10
(Met): the sample values are independent of each other (n ≤ .05N)
2098 is less than 5% of all the skittle population.

3. the test statistic and supporting work (2 points)

Calc> Stat>Test>1-PropZtest
Po=.20
X= 416
N= 2098
Prop: ≠

Zo= -.1964

4. the p-value (2 points)

P= .8442

5. the appropriate decision about the null hypothesis and an appropriate conclusion (4
points)
Lesly Pineda

p value .8442 is greater than .05α do not reject the null hypothesis because there is
insufficient evidence to conclude that H1 is true. Insufficient evidence to conclude that the
proportion of red skittles does not equal .20.

6. Also interpret the p-value for this test. (4 points)

If the true proportion of red skittles is 20%, there is a probability of .844 that we would
obtain a sample proportion of 416/2098 = .1983 or more extreme (since it's a two-tailed
test).
 Using values for the class data that you computed in Part 2 of the project and a 0.01
significance level, test the claim that the mean number of candies in a bag of Skittles is
more than 58.5. Show all the steps (neatly written and scanned, typed, or copied from
StatCrunch) including:
1. the hypotheses with correct notation (4 points)
Ho: μ = 58.5
H1: μ > 58.5

2. the conditions for performing the hypothesis test, along with checking that they are
met…hint: they are not all met! (5 points)

(Not met): The sample is a simple random sample.


(Met): the sample has no outliers and comes from a normal population, sample size is ≥
30. There are outliers in our sample, but we have a sample size of 35 so its ok.
(Met): (n ≤ .05N) since n=35 is less than 5% of all the bags of skittles in the world.

3. the test statistic and supporting work (2 points)

Calc>stats>tests>T-test
μ o = 58.5
mean = 59.9428
Sx= 2.3
n=35
Prop= >

T= 3.7

4. the p-value or critical value (2 points)


Lesly Pineda

p= .0004

5. the appropriate decision about the null hypothesis and an appropriate conclusion (4
points)

0<.01 reject the null hypothesis. There is sufficient evidence to conclude that the mean
number of candies per bag is more than 58.5.

6. Also describe the Type I and Type II errors for this test. (8 points)

Type I: Reject Ho when it's true = Conclude that the mean is greater than 58.5 when it
actually equals 58.5

Type II: Fail to reject Ho when it's false = Fail to conclude that the mean is greater than
58.5 when it actually is greater than 58.5

Part 7: Reflection
For our project in my statistics class it was a project about skittles, we wanted to know how many

candies were in a bag of skittles. Every student was to purchase a one 2.17-ounce bag of Original skittles

and give the data to professor. For part one I counted how many candies were in my bag and sorted

them by color to get the total, and to give my height in inches. For part two we were given the class data

and told to input it into a pareto chart, histogram, boxplot and a pie chart, and to identify if the data was

qualitative or quantitative data and to see if there were any outliers for both the class data and your

own bag data. As for part three I was to analyze the bivariate data from the number of candies per bag

and height to see if there was a relationship between the two. The Research question was “can height

be used to predict the number of cadies that will be in a bag of skittles you purchase?”. For part four it

was all about probability’s, with the information I got from project 2-3 I was able to do part four. For

part five, sampling distributions and confidence intervals. Part six was the last of the project hypothesis

testing, for this part we were told to use the information from project part two to get the answers for

project part six.


Lesly Pineda

This project pulls together many concepts that we learned in class this semester; organizing and

analyzing data, drawing conclusions using confidence intervals and hypothesis tests and presenting the

work in an organized paper. How the math skills I applied in this project will impact other classes, well I

want to go in the medical field and statistics are a hug part in developing medicine, so I think I’ll see

some of what I learned in this class again one day.

This course showed how statistics really work in the real world, the hard work and you can’t mess up a

single number or else your data is wrong.

You might also like