You are on page 1of 26

1

Statistic Research Project By: Gabriel Gonzalez Statistic 1510 Professor Brian Jean October 29, 2011 Taft College

Abstract I conducted an experiment to find whether most human characteristics are similar to each others based on gender and to determine the differences between different people. I conducted a sample of 40 out of the population of 2,627 individuals that were surveyed. Furthermore, I stated my hypothesis, methods, and tested my data values. To my surprise, most categories concluded with enough evidence to suggest a decision which didnt required further testing. Overall, the experiment went pretty well with only a few conclusions stating there wasnt enough evidence to find a difference or relationship in the test. Introduction The motivation for conducting the experiment was based on assumptions made from people who thought a persons characteristics do not from an appropriate combination. In order to verify these assumptions, each category such as weight, height, ring, and shoe size were calculated from our sample. My hypothesis states there is a relationship between an appropriate combination of a persons weight and height. For example, height and weight can be closely related to each other because a tall person may weigh heavier than a short person. In terms of ring and shoe size, both variables can also form an appropriate combination. Most of the time, not all, people will have a bigger ring and shoe size if they are tall. Ring and shoe size can be determined if a person is tall or short, husky or thin. Furthermore, I will conduct a study to find if height and weight will form an appropriate combination with ring and shoe size. Over the years, women have fought for equal gender rights and wages. Although men and women may have the same position in the work field, my hypothesis states there is a significant

difference in gross income based on gender. Historically, men have been placed in higher paying positions in part because most jobs are male dominant and also because women are often times underestimated for their capabilities in the work field. People will aspire to support political figures and laws pertaining to that persons political party. Many people assume political parties are not affiliated with laws and presidential re-elections, but they certainly are. My hypothesis suggests there is indeed a relationship between political parties and the re-election of Obama, health care bill, and in favor of the death penalty. The relationship between political parties and the three variables will be closely associated. The three variables are response to president Obama re-elected, in favor of health care bill, and in favor of the death penalty. People with different handedness that are in favor of the death penalty and how much water consumed are clearly independent. A persons handedness does not affect whether they are in favor of the death penalty and how much water they will consume. For example, a person wouldnt say, because I am right handed I will favor the death penalty and drink thirty ounces of water. This is erroneous, so hypothetically there would be no relationship between a persons handedness, the amount of water they consume, or whether or not there in favor of the death penalty. Methods The data used for my experiment was collected randomly at Taft College, and also friends and family. All the people involved in the report selected ten people, and I combined all the surveys into one survey. I took a survey of ten randomly selected people, and entered the information on the computer, thus adding to the database of people previously surveyed. The

population of interest consists of 2,627people randomly surveyed for the experiment. From the population of 2,627 people, I conducted a random number sample on TC-Stats, and generated a sample of forty people. The random number generator helped decrease chance of error or bias. It also helps create an equal probability each person in the population would be chosen. Each person from the sample was chosen based on the random number process in TCStats. I generated a random sample based on a random number generator inside TC-Stats, a statistical package designed for the Ipad, and I got my sample of forty individuals. I than recorded the information from each person in my sample. The information recorded includes the response from the questions asked in the survey. As I organized my data, I noticed a couple errors and missing data among the information derived from my sample. Some people put the wrong formatting in height while other people did not put in any number for water consumption. For water consumption, I added the zero value for any missing data, and for height I corrected the formatting and inputted the correct value in inches. Results In terms of height, my observations in figure 1.1 shows the graph is relatively bell shaped. The summary statistics acquired from TC-Stats in figure 1.2 show the mean=67.275, median= 67, and standard deviation= 5.320. Mean is the appropriate measurement of the center considering the data distribution is symmetrical. Fig.1.1 All subjects in study.

Fig.1.2

According to figure 1.3, the histogram is relatively bell shaped and has a class width of 20. In terms of summary statistics, figure 1.4 shows the mean= 171, median= 166.500, and the standard deviation is reported at 44.708. Due to the symmetrical shape of the data, the appropriate measurement of the center would be the mean. Fig.1.3

Fig.1.4

The ring size among our sample was relatively symmetrical. In figure 1.5, the histogram is bell shaped with a class width of 20. The summary statistics in figure 1.6 indicate the data has a mean of 7.741, median=7, and standard deviation of 2.501. Since the data is distributed symmetrically, the appropriate measurement of the center will be the mean.

Fig. 1.5

Fig. 1.6

In terms of shoe size within our sample, our graph in figure 2.1 shows the data to be relatively bell shaped with a class width of 1. Furthermore, we can see the data values in summary statistics located in figure 2.2 with a mean of 8.925, median= 9, and standard deviation of 2.474. The appropriate measurement of center will also be the mean.

Fig.2.1

Fig.2.2

The graph in figure 2.3 on the following page will clearly show the data is skewed right. Surprisingly, the highest gross income from our sample range between 0 and 30,000 a year. According to summary statistics in figure 2.4, the mean= 50,313.512, median= 42,000, and the standard deviation is 57,456.391. The data is skewed which will suggest using the median to find the measurement of the center. Fig.2.3

Fig.2.4

According to the graph in figure 3.1, most of the people within my sample of forty were affiliated with the republican political party. Looking at the political party choices available from the survey, almost half of the Fig.3.1

sample is republican/conservative. Democrats were just 5% below the republicans. According to the graph in figure 3.2, nearly half percent of the sample of forty people chose not to re-elect President Obama. Moreover, the other half were either undecided or approved of the re-election. Considering nearly half of the sample of forty were republicans that could have played a role in why they chose not to reelect Obama.

Fig. 3.2

Based on figure 3.3, almost half of the people within our sample suggested they were in favor of the health care bill. The rest of the people from the sample were either against it or undecided. Political affiliation could have possibly played a role in the decisions these people took Fig. 3.3 According to figure 3.4, 62.5% of the people within our sample were in favor of the death penalty. Furthermore, we can see a pattern form based on the responses of our sample. Considering nearly half of the people surveyed in our sample were republican, this can lead to a more conservative response on some of the questions asked in the survey.

Fig. 3.4

10

The graph in figure 4.1 states that 80% of the people from the sample of forty use their right hand. To my surprise the percentage is higher than I expected. Although more than half of the people Fig.4.1 from our sample were right handed, that would not affect the response to other questions in the survey. According to the graph in figure 4.2, most people drank 32-42 ounces of water. The data distribution of the graph is skewed right. Furthermore, summary statistics in figure 4.3 show a mean of 70.545, a median of 49, and a standard deviation of 62.817. Since the data distribution is skewed, the measurement of the center will be the median.

Fig.4.2

11

Fig.4.3

Discussions Based on my study that was conducted I was surprised by the results of the tests that were made. Moreover, conducting the test based on my hypothesis was sufficient to comment on conclusions that were made. Some of my hypotheses were, is there a relationship between an appropriate combination of a persons height and weight which concluded with there is enough evidence to suggest a relationship between the two. Another hypothesis was, is there a difference in gross income based on gender and the conclusion was there is not enough evidence to suggest a difference between both gender. Overall, the results of my tests were thorough and most conclusions were satisfactory with the results. I was able to find enough evidence on most of the tests conducted to answer my question in regards to whether or not human characteristics differ from each other and based on gender.

12

Appendix

Phase II

The sample size (n) is forty for all the preceding graphs and collected data. The sampling method I used is Simple Random Data ( TC-Stats Random Data lower bound: 1 , upper bound: 2500, insert numbers at row 1, stop inserting numbers at row 40 ) using (TC-Stats edit select column sort column A).

The numbers are sorted in ascending form by

211 356 630 745 911 1087 1099 1195 52 57 60 61 61 62 63 63

1200 1201 1257 1336 1358 1377 1409 1479 63 65 65 65 65 65 66 66

1547 1574 1612 1640 1689 1770 1808 1838 66 67 67 67 67 67 68 69

1943 1959 2028 2052 2111 2120 2150 2197 69 69 69 70 70 70 70 71

2199 2234 2243 2321 2429 2431 2497 2582 72 72 72 74 75 75 77 79

The height of every person surveyed from the sample The sample size (n) for height is forty
subjects chosen randomly from the population.

The numbers are sorted in ascending form. I


use (TC-Stats edit height sort column A).

Graph is on page 4. The histogram looks moderately bell shaped when graphed. The box-and-whisker plot also shows the data being approximately bell shaped. The five number summary for box plot is Min=52, Q1=65, Med=67, Q3=70, and Max=79. deviation is 5.320.

Summary statistics show the mean of 67.275, mode of 65 and 67, and the standard The range is H-L=79-52=27

13

The weight of every person surveyed from the sample The sample size (n) for weight is forty
individuals picked randomly from the population.

The numbers are sorted in ascending order


using TC-Stats edit weight sort column A).

Graph is on page 5. The graph seems to be approximately bell shaped and we can further see this in
the box plot

93 108 108 110 121 125 130 130

132 135 135 138 145 147 150 150

156 160 160 166 167 170 174 175

175 180 186 195 200 210 210 210

210 214 215 225 245 250 250 280

The five number summary for the box plot is Min=93, Q1=135, Med=166.5,
Q3=210, Max=280.

The summary statistics show the mean=93, mode=210, and standard


deviation=44.708.

The range is H-L=93-280=187

The annual gross income of individuals surveyed from the sample


The sample size (n) for annual gross income is forty people selected randomly from population. The numbers are sorted in ascending order by using the TC-Stats edit gross income column sort column A). Graph is on page 7. The graph is heavily skewed right

0 0 0 0 0 0 0 0

0 0 0 0 0 0 3000 15000

24000 27000 40000 40000 42000 45000 46000 48000

49000 49000 55000 60000 70000 75000 75000 79000

80000 96000 100000 120000 125000 150000 185000 250000

14

The box plot gives us a more accurate description of the data, and provides us with more evidence to suggest the data for this category is skewed right. The five number summary for the box plot was Min=0, Q1=0, Med=42000, Q3=75000, Max=2500000. The range H-L=0-250000= 250000 The summary statistics show the mean for this data is 50313.512, the mode is 0, and the standard deviation is 57456.391.

The amount of water in ounces consumed by individuals within our sample data that were surveyed The sample size (n) for ounces of water consumed by each person is forty and
was chosen randomly.

The numbers are listed in


ascending form using TCStats edit water consumed sort column A.

Graph is on page 10. The histogram shows the data


skewed to the right

The range is H-L= 256 - 0 = 256 Summary statistics used in TC-

0 0 2 8 12 12 16 16

24 24 30 32 32 32 32 36

36 40 48 48 48 50 50.7 56

60 60 64 64 72 72 100 120

128 128 128 144 160 200 240 256

Stats show the Mean= 70.545, Mode= 32, and Standard Deviation= 62.817.

The five number summary for the box plot is Min.= 2, Q1= 32, Med.= 49, Q3100,
Max= 256

15

Phase III
1A. Is there a relationship between a persons height and weight? To determine the appropriate measurement of association, scatter plot will be used.

Scatter plot: Shows approximately positive linear association.


Height of Each Person

52 57 60 61 61 62 63 63

63 65 65 65 65 65 66 66

66 67 67 67 67 67 68 69

69 69 69 70 70 70 70 71

72 72 72 74 75 75 77 79

Ho:=0 HA:>0 = 0.05 Test: Pearsons correlation. Assumptions: Scatter plot indicates linear association

Weight of each person

93 108 108 110 121 125 130 130

132 135 135 138 145 147 150 150

156 160 160 166 167 170 174 175

175 180 186 195 200 210 210 210

210 214 215 225 245 250 250 280

Pearsons r = .6528 CI: (0.4284, 0.8014) P-value = 4.655E-06/2 0.0000 Decision: Reject the null hypothesis

16

Conclusion: There is enough evidence to suggest a positive association between a persons height and weight, and I am 95% confident that the true proportion correlation coefficient lies between the points 0.4284 and 0.8014.

17

1B. Is there a relationship between a persons shoe size and ring size? To determine the appropriate measurement of association, scatter plot will be used.

Scatter plot: Shows approximately positive linear association. Ring Size

7.5 8 12 7 7 8 8 7

6 7 7.5 7 8 7 7 6

6 8 7 6 4 7 10 9
Shoe Size

6 11 5

Ho:=0 HA:>0 = 0.05 Test: Pearsons correlation. Assumptions: Scatter plot indicates linear association

11 9 10 4 11 12 9 9

6.5 10 11 6 13 7 10 8

8 9 15 8 8 6 9 6

10 7 8.5 8 10 8 5.5 15

11 9 5 9 6.5 10 12 7

Pearsons r = .6759 CI: (0.3982, 0.8401) P-value = 0.0001/2 0.0000 Decision: Reject the null hypothesis

18

Conclusion: There is enough evidence to suggest a positive association between a persons ring size and shoe size, and I am 95% confident that the true proportion correlation coefficient lies between the points 0.3982 and 0.8401. 2. Is there a difference in gross income based on gender? Gender Male Annual Gross Income 70000 15000 42000 0 40000 48000 100000 0 9600 0 49000 49000 0 125000 46000 40000 0 0

150000 80000 45000 0 120000 79000 Female 60000 0 0 0 0 75000

250000 3000 0 0

75000 55000 0 24000

185000 27000

The parameter of interest is means since I am comparing two variables, and the measurement scale is ratio. Test: 2-sample t-test Assumptions: X1~N: Violation X2~N Gross income of Males

Due to violation of the normal plot graph for men, I will use the non-parametric test: Test: Wilcoxon Rank-Sum HO: M-F=0 HA: M-F>0 = 0.05 P-value: .3582

19

Decision: Fail to reject Conclusion: There is not enough evidence to suggest there is a difference in gross income based on gender.

20

3. Is there a relationship between political party and A. If the respondent feels President Obama will be re-elected? D= Democratic Observed Values I= Independent Political Party O= Other R= Republican U= Undecided President Obama Re-elected Yes= 14 No= 18 U= 8 R= 15 D= 13 I= 7 O= 5

The data is categorical, and the measurement scale is nominal. Test: 2 Test of Independence HO: Political party and Obama re-elected are independent HA: Political party and Obama re-elected are dependent. = 0.05 Assumptions: Rows and columns are independent Satisfies the properties of a multinomial experiment. All expected values are at least 1. No more than 20% of the expected values are less than 5. Expected values Political Party President Obama Re-elected Test Statistic: 5.9076 P-value: 0.1162 Decision: Fail to reject Conclusion: There is not enough evidence to suggest Political party and President Obama being re-elected is dependent. 14.5000 14.5000 15.5000 15.5000 7.5000 7.5000 2.5000 2.5000

21

3. Is there a relationship between political party and B. If the respondent is in favor of the Health Care Bill as passed? D= Democratic Observed Values I= Independent Political Party O= Other Health Care Bill Yes= 16 R= Republican U= Undecided The data is categorical, and the measurement scale is nominal. Test: 2 Test of Independence HO: Political party and Health Care Bill are independent HA: Political party and Health Care Bill are dependent. = 0.05 Assumptions: Rows and columns are independent Satisfies the properties of a multinomial experiment. All expected values are at least 1. No more than 20% of the expected values are less than 5. Expected values Political Party Health Care Bill Test Statistic: 6.3880 P-value: 0.0942 Decision: Fail to reject Conclusion: There is not enough evidence to suggest Political party and in favor of Health Care Bill are dependent. 15.5000 15.5000 12.5000 12.5000 9.5000 9.5000 2.5000 2.5000 No= 12 U= 12 R= 15 D= 13 I= 7 O= 5

22

3. Is there a relationship between political party and C. If the respondent is in favor of the death penalty? D= Democratic I= Independent O= Other R= Republican U= Undecided The data is categorical, and the measurement scale is nominal. Test: 2 Test of Independence HO: Political party and Death penalty are independent HA: Political party and Death penalty are dependent. = 0.05 Assumptions: Satisfies the properties of a multinomial experiment. All expected values are at least 1. No more than 20% of the expected values are less than 5. Expected values Political Party Death Penalty Test Statistic: 12.0370 P-value: 0.0073 Decision: Reject null hypothesis Conclusion: There is enough evidence to suggest Political party and in favor of death penalty are dependent. 20.0000 20.0000 13.5000 13.5000 4.0000 4.0000 2.5000 2.5000 Political Party Death Penalty R= 15 Yes= 25 Observed Values D= 13 No= 14 I= 7 U= 1 O= 5

23

4. Is there a relationship between handedness and a. In favor of the death penalty? R= Right L= Left A= Ambidextrous U= Undecided Handedness Death Penalty R= 32 Yes= 25 Observed Values L= 5 No= 14 A= 3 U= 1

The data is categorical, and the measurement scale is nominal. Test: 2 Test of Independence HO: Handedness and Death penalty are independent HA: Handedness and Death penalty are dependent. = 0.05 Assumptions: Satisfies the properties of a multinomial experiment. All expected values are at least 1. No more than 20% of the expected values are less than 5. Expected values Handedness Death Penalty Test Statistic: 6.1228 P-value: 0.0468 Decision: Reject null hypothesis Conclusion: There is enough evidence to suggest Handedness and in favor of death penalty are dependent. 28.5000 28.5000 9.5000 9.5000 2.0000 2.0000

24

4. Is there a relationship between handedness and b. Amount of water consumed? Handedness Amount of water consumed by right handed individuals are 32. Amount of water consumed by left handed individuals are 5. Amount of water consumed by ambidextrous handed individuals are 3. R=32 L= 5 A= 3

One-way ANOVA HO: R= L= A HA: At least 1 not equal = 0.05 Assumption: X1~N, X2~N, X3~N

32 16 2

16

12 48 24 consumed Water 50

64

128 12 36 0

50.7 60 144 32 8 48 64 72 32

240 30

128 100 56 72 120 36 40

160 32 48 0

24 128 60

256 200

(normal plot shows gross violation)

Test: Kruskal-Wallis HO: R=L=A

25

HA: At least 1 not equal = 0.05 Assumptions: Gross violation

P-value= .0118 Decision: reject Ho Conclusion: There is not enough evidence to suggest there is a difference in group earnings.

5. Is there a relationship between handedness and b. Change in political party?

R= Right L= Left A= Ambidextrous The data is categorical, and the measurement scale is nominal. Test: 2 Test of Independence HO: Handedness and change in party are independent HA: Handedness and change in party are dependent. = 0.05 Assumptions: Satisfies the properties of a multinomial experiment. All expected values are at least 1. No more than 20% of the expected values are less than 5. Handedness Change in Political Party R= 32 Not Applicable= 34 Observed Values L= 5 Consider myself conservati ve but assoc. with tea party= 3 A= 3 Consider myself liberal but assoc. with tea party= 3

Expected values Handedness Change in party 33.0000 33.0000 4.0000 4.0000 3.0000 3.0000
26

Test Statistic: .5606 P-value: 0.7556 Decision: Fail to reject Conclusion: There is not enough evidence to suggest Handedness and change in political party are dependent.

You might also like