Data Analysis Project Julia Packard

Julia Packard
Statistics 1040
Data Analysis Project

Part 1:
The goal of this project is to gather a categorical data set and a quantitative data
set, take two samples from each data set, and compare the samples with each other and to
the population in order to determine if the samples are an accurate representation of the
population. It also allows us to compare different sampling methods in order to determine
which are more accurate in relation to the population. My group chose to pull samples
from the body measurements data set.

Part 2:
The two sampling methods that our group chose to use were simple random
sample with a sample size 40, and a systematic sample with sample size 50. Both samples
meet the criteria that n > 35, and both use random selection. For our simple random
sample, a number was assigned to each person in the population, and 40 numbers were
randomly selected. For our systematic sample, we divided our population size 507 by our
sample size 50, then rounded down to increments of 10. A random number within our
range should have been selected as a starting point for our 10 person increments. Both
samples as well as the population were all very close to 50:50, however with this data set,
our systematic sample 2 was slightly more accurate with the population.
Categorical Population Data: Var 1 - Gender
P = 507
Male 1
Female 0

Categorical Data:
Column
#
Gender
(1 -
M, 0 -
F)
Random
Sample
Column
#
Gender (1
- M, 0 -
F)
Systematic
Sample
283 0 10 1
12 1 20 1
125 1 30 1
43 1 40 1
199 1 50 1
389 0 60 1
181 1 70 1
25 1 80 1
406 0 90 1
228 1 100 1
29 1 110 1
126 1 120 1
358 0 130 1
144 1 140 1
280 0 150 1
152 1 160 1
378 0 170 1
219 1 180 1
326 0 190 1
287 0 200 1
376 0 210 1
406 0 220 1
80 1 230 1
167 1 240 1
304 0 250 0
450 0 260 0
92 1 270 0
223 1 280 0
251 0 290 0
147 1 300 0
477 0 310 0
50 1 320 0
172 1 330 0
504 0 340 0
475 0 350 0
264 0 360 0
438 0 370 0
301 0 380 0

Column
#
Gender
(1 -
M, 0 -
F)
Random
Sample
Column
#
Gender (1
- M, 0 -
F)
Systematic
Sample
398 0 390 0
121 1 400 0
410 0
420 0
430 0
440 0
450 0
460 0
470 0
480 0
490 0
500 0

Categorical Data: Gender
Sample 1 Pie Chart
Var 1 = Gender

Sample 1: Simple Random Sample

n = 40

x = 20

= 0.50

Sample 1 Pareto Chart
Var 1 = Gender


n = 40

x = 20

= 0.50

Categorical Data Sample 1: Parts 4 and 5

Sample 2 Pie Chart
Var 1 = Gender

Sample 2: Systematic Sample

n = 50

x = 26

= 0.52

Sample 2 Pareto Chart
Var 1 = Gender


n = 50

x = 26

= 0.52

Categorical Data Sample 2: Parts 4 and 5

Part 3:

The two sampling methods that our group chose to use were simple random
sample with a sample size 40, and a systematic sample with sample size 50. Both samples
meet the criteria that n > 35, and both use random selection. For our simple random
sample, a number was assigned to each person in the population, and 40 numbers were
randomly selected. For our systematic sample, we divided our population size 507 by our
sample size 50, then rounded down to increments of 10. A random number within our
range should have been selected as a starting point for our 10 person increments. The
population frequency histogram and box plot both show a slight skew right, which
suggests that the mean is larger than the median.

Quantitative Data: Weight
Population Frequency Histogram

Population Box Plot

Quantitative Population Data: Weight

= 69.1
= 13.3

Minimum: 42
Q1 : 58.4
Median: 68.2
Q3: 78.9
Maximum: 116.4
Quantitative Data
Column
#
Weight
Systematic
Sample
Column
#
Weight
Random
Sample
!" $% &%' (()*
%" *%)' !!* $$)+
&" $&)% $, *"
'" $+)$ &$( $&
(" *'), !'% !"+)$
$" +$)+ &+* $%)%
*" !"!)$ !(, $$)'
+" *')( $* $')$
," *()( &"" $,)!
!"" +')( !+* +,)!
!!" $')! !!, ,&)%
!%" +%)* &+" (,)+
!&" +(), %(% $&
!'" $')( !,( ,%)*
!(" *%)* !!! *%)&
!$" !"%)& !,( ,%)*
!*" ,$)+ '"' ((
!+" +%)( %+% (')'
!," *&)$ &!" *')&
%"" ,&)% &*! '')+
%!" *%)& '*, $')(
%%" *$)$ '$( (%)*
%&" *%)& ''! (&),
%'" ,"), &'' '+)$
%(" (, ("' *!)+
%$" (')% '(( (')(
%*" +%)( !$$ $(),
%+" (( %,% (")%
%," (+)( '"% ("
&"" $,)! '$ +%)$
&!" *')& '$, $*)&
&%" (%)& %$& ("
&&" (+)% &', (&)$
&'" (+)% %%$ ,')&
&(" *&)% '&$ (+)$
&$" !"()% &$, (")%
&*" $")% %"! **)*

Column
#
Weight
Systematic
Sample
Column
#
Weight
Random
Sample
&+" (,)+ '%( (+)%
&," (')$ ,$ !"%)&
'"" ($)+ &* (,)(
'!" (()(
'%" (,)!
'&" ((
''" ($)$
'(" (')(
'$" $"),
'*" $&
'+" *%)&
'," $!)'
("" *()(

Sample 1 Frequency Histogram


n = 40

= 67.1

s = 16.1

Minimum: 44.8
Q1: 54.45
Median: 63.75
Q3: 73.3
Maximum: 108.6

Sample 1 Box Plot


n = 40

= 67.1

s = 16.1

Minimum: 44.8
Q1: 54.45
Median: 63.75
Q3: 73.3
Maximum: 108.6

Quantitative Data Sample 1: Parts 4 and 5

Quantitative Data Sample 1 Standard Deviation: Part 4

Sample 2 Frequency Histogram


n = 50

= 70.8

s = 14

Minimum: 52.3
Q1: 59
Median: 70.7
Q3: 76.6
Maximum: 105.2

Sample 2 Box Plot


n = 50

= 70.8

s = 14

Minimum: 52.3
Q1: 59
Median: 70.7
Q3: 76.6
Maximum: 105.2

Quantitative Data Sample 2: Parts 4 and 5

Quantitative Data Sample 2 Standard Deviation: Part 4

Part 4:

For each of my samples, I used a level of confidence of 95%, or = 0.05. In
categorical data sample 1, my population proportion was 0.5128; according to the sample
interval, the population proportion should lay between 0.345 < P < 0.655. Sample 1
interval captures the population parameter.
In categorical data sample 2, my population proportion was 0.5128; according to
the sample interval, the population proportion should lay between 0.382 < P < 0.658.
Sample 1 interval captures the population parameter.
In quantitative data sample 1, my population mean was = 69.1; according to
my sample interval, the population proportion lays between 60.206 < < 73.994.
Sample 1 interval captures the population parameter.
In quantitative data sample 2, my population mean was = 69.1; according to
my sample interval, the population proportion lays between 66.82 < < 74.78. Sample 1
interval captures the population parameter.
In quantitative data sample 1, my = 14.048; according to the sample interval,
the lays between 14.7 < < 24.5. The sample 1 confidence interval for standard
deviation, though very close, does not capture the population parameter.
In quantitative data sample 2, my = 14.048; according to the sample interval,
the lays between 12.7 < < 19.8. The sample 2 confidence interval for standard
deviation captures the population parameter.

Part 5:
For Categorical data sample 1:
= 0.50
0.50
P-value = .8728
= 0.05
Since .8728 is NOT less than 0.05, do NOT reject . There is not sufficient
evidence to conclude that the population proportion has NOT changed from 0.50.

For Categorical data sample 2:
= 0.48
0.48
P-value = .9204
= 0.05

For quantitative data sample 1:

= = 69.1
= 69.1
P-value = .4379
= 0.05

For quantitative data sample 2:

= = 69.1
= 69.1
P-value = .4002
= 0.05

Given that all of my samples in both my categorical and my quantitative data are
an accurate representation of my population, I should not experience a type 1 error. None
of my values were rejected, and therefore does not qualify for an incorrect rejection
of the null hypothesis.

Part 5:
Statistics are something that we see every day, whether it be the likelihood
of winning the lottery, the number of yellow M&Ms in a pack, or even something as
simple as weight or gender amongst a population. Most of the time, when we are
presented with a statistic, we trust it without question. This project has taught me, above
all, to second guess results. It has helped me to understand the difference between a
sample and a population, and how inaccurate a sample can be as a representation of a
population unless certain guidelines are met: a randomized sample, and a sample size n >
35.
As my hope is to go to nursing school, medical statistics are something that will
frequently cross my path. As a caregiver it is important to me that I am providing my
patients with accurate information. Its like the toothbrush commercials that suggest that
4/5 dentists use particular toothpaste. Perhaps they only sampled five dentists, or perhaps
the dentists that were sampled volunteered to sample. If I am trying to find out how many
people in a population a drug will produce side effects on, it is important that my sample
is collected correctly.
In statistics, there are many signs that you have made a mistake. Including
everything from outliers, to unusually high or low numbers. There were several times
throughout the course of this project where my numbers did not match up to my graphs or
my distribution shapes. In one case, I had plugged an incorrect number in and ended up
dividing with 0. It was a red flag to me that I had done something very wrong, however it
forced me to go back through my steps and problem solve to find out where my mistake
was at. As soon as I went back through the steps I was able to correct my errors.
All in all, I normally dont attribute math (at least the math classes Ive
taken in the past) to real life situations. Statistics, however, has been a completely
different story. There are so many places that statistics come up whether we realize it or
not; from correlation in my psych class, to trying to assign a group of people to certain
tasks. This class has been beneficial to me in realizing how inaccurate many statistics
actually are, and what not to believe, as well as preparing me for any kind of statistic or
hypothesis that I may need to test in the future.

Data Analysis Project Julia Packard

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Analysis Project Julia Packard

Uploaded by

Copyright:

Available Formats

Julia Packard

You might also like