Professional Documents
Culture Documents
Statistic Research Project By: Gabriel Gonzalez Statistic 1510 Professor Brian Jean October 29, 2011 Taft College
Abstract I conducted an experiment to find whether most human characteristics are similar to each others based on gender and to determine the differences between different people. I conducted a sample of 40 out of the population of 2,627 individuals that were surveyed. Furthermore, I stated my hypothesis, methods, and tested my data values. To my surprise, most categories concluded with enough evidence to suggest a decision which didnt required further testing. Overall, the experiment went pretty well with only a few conclusions stating there wasnt enough evidence to find a difference or relationship in the test. Introduction The motivation for conducting the experiment was based on assumptions made from people who thought a persons characteristics do not from an appropriate combination. In order to verify these assumptions, each category such as weight, height, ring, and shoe size were calculated from our sample. My hypothesis states there is a relationship between an appropriate combination of a persons weight and height. For example, height and weight can be closely related to each other because a tall person may weigh heavier than a short person. In terms of ring and shoe size, both variables can also form an appropriate combination. Most of the time, not all, people will have a bigger ring and shoe size if they are tall. Ring and shoe size can be determined if a person is tall or short, husky or thin. Furthermore, I will conduct a study to find if height and weight will form an appropriate combination with ring and shoe size. Over the years, women have fought for equal gender rights and wages. Although men and women may have the same position in the work field, my hypothesis states there is a significant
difference in gross income based on gender. Historically, men have been placed in higher paying positions in part because most jobs are male dominant and also because women are often times underestimated for their capabilities in the work field. People will aspire to support political figures and laws pertaining to that persons political party. Many people assume political parties are not affiliated with laws and presidential re-elections, but they certainly are. My hypothesis suggests there is indeed a relationship between political parties and the re-election of Obama, health care bill, and in favor of the death penalty. The relationship between political parties and the three variables will be closely associated. The three variables are response to president Obama re-elected, in favor of health care bill, and in favor of the death penalty. People with different handedness that are in favor of the death penalty and how much water consumed are clearly independent. A persons handedness does not affect whether they are in favor of the death penalty and how much water they will consume. For example, a person wouldnt say, because I am right handed I will favor the death penalty and drink thirty ounces of water. This is erroneous, so hypothetically there would be no relationship between a persons handedness, the amount of water they consume, or whether or not there in favor of the death penalty. Methods The data used for my experiment was collected randomly at Taft College, and also friends and family. All the people involved in the report selected ten people, and I combined all the surveys into one survey. I took a survey of ten randomly selected people, and entered the information on the computer, thus adding to the database of people previously surveyed. The
population of interest consists of 2,627people randomly surveyed for the experiment. From the population of 2,627 people, I conducted a random number sample on TC-Stats, and generated a sample of forty people. The random number generator helped decrease chance of error or bias. It also helps create an equal probability each person in the population would be chosen. Each person from the sample was chosen based on the random number process in TCStats. I generated a random sample based on a random number generator inside TC-Stats, a statistical package designed for the Ipad, and I got my sample of forty individuals. I than recorded the information from each person in my sample. The information recorded includes the response from the questions asked in the survey. As I organized my data, I noticed a couple errors and missing data among the information derived from my sample. Some people put the wrong formatting in height while other people did not put in any number for water consumption. For water consumption, I added the zero value for any missing data, and for height I corrected the formatting and inputted the correct value in inches. Results In terms of height, my observations in figure 1.1 shows the graph is relatively bell shaped. The summary statistics acquired from TC-Stats in figure 1.2 show the mean=67.275, median= 67, and standard deviation= 5.320. Mean is the appropriate measurement of the center considering the data distribution is symmetrical. Fig.1.1 All subjects in study.
Fig.1.2
According to figure 1.3, the histogram is relatively bell shaped and has a class width of 20. In terms of summary statistics, figure 1.4 shows the mean= 171, median= 166.500, and the standard deviation is reported at 44.708. Due to the symmetrical shape of the data, the appropriate measurement of the center would be the mean. Fig.1.3
Fig.1.4
The ring size among our sample was relatively symmetrical. In figure 1.5, the histogram is bell shaped with a class width of 20. The summary statistics in figure 1.6 indicate the data has a mean of 7.741, median=7, and standard deviation of 2.501. Since the data is distributed symmetrically, the appropriate measurement of the center will be the mean.
Fig. 1.5
Fig. 1.6
In terms of shoe size within our sample, our graph in figure 2.1 shows the data to be relatively bell shaped with a class width of 1. Furthermore, we can see the data values in summary statistics located in figure 2.2 with a mean of 8.925, median= 9, and standard deviation of 2.474. The appropriate measurement of center will also be the mean.
Fig.2.1
Fig.2.2
The graph in figure 2.3 on the following page will clearly show the data is skewed right. Surprisingly, the highest gross income from our sample range between 0 and 30,000 a year. According to summary statistics in figure 2.4, the mean= 50,313.512, median= 42,000, and the standard deviation is 57,456.391. The data is skewed which will suggest using the median to find the measurement of the center. Fig.2.3
Fig.2.4
According to the graph in figure 3.1, most of the people within my sample of forty were affiliated with the republican political party. Looking at the political party choices available from the survey, almost half of the Fig.3.1
sample is republican/conservative. Democrats were just 5% below the republicans. According to the graph in figure 3.2, nearly half percent of the sample of forty people chose not to re-elect President Obama. Moreover, the other half were either undecided or approved of the re-election. Considering nearly half of the sample of forty were republicans that could have played a role in why they chose not to reelect Obama.
Fig. 3.2
Based on figure 3.3, almost half of the people within our sample suggested they were in favor of the health care bill. The rest of the people from the sample were either against it or undecided. Political affiliation could have possibly played a role in the decisions these people took Fig. 3.3 According to figure 3.4, 62.5% of the people within our sample were in favor of the death penalty. Furthermore, we can see a pattern form based on the responses of our sample. Considering nearly half of the people surveyed in our sample were republican, this can lead to a more conservative response on some of the questions asked in the survey.
Fig. 3.4
10
The graph in figure 4.1 states that 80% of the people from the sample of forty use their right hand. To my surprise the percentage is higher than I expected. Although more than half of the people Fig.4.1 from our sample were right handed, that would not affect the response to other questions in the survey. According to the graph in figure 4.2, most people drank 32-42 ounces of water. The data distribution of the graph is skewed right. Furthermore, summary statistics in figure 4.3 show a mean of 70.545, a median of 49, and a standard deviation of 62.817. Since the data distribution is skewed, the measurement of the center will be the median.
Fig.4.2
11
Fig.4.3
Discussions Based on my study that was conducted I was surprised by the results of the tests that were made. Moreover, conducting the test based on my hypothesis was sufficient to comment on conclusions that were made. Some of my hypotheses were, is there a relationship between an appropriate combination of a persons height and weight which concluded with there is enough evidence to suggest a relationship between the two. Another hypothesis was, is there a difference in gross income based on gender and the conclusion was there is not enough evidence to suggest a difference between both gender. Overall, the results of my tests were thorough and most conclusions were satisfactory with the results. I was able to find enough evidence on most of the tests conducted to answer my question in regards to whether or not human characteristics differ from each other and based on gender.
12
Appendix
Phase II
The sample size (n) is forty for all the preceding graphs and collected data. The sampling method I used is Simple Random Data ( TC-Stats Random Data lower bound: 1 , upper bound: 2500, insert numbers at row 1, stop inserting numbers at row 40 ) using (TC-Stats edit select column sort column A).
The height of every person surveyed from the sample The sample size (n) for height is forty
subjects chosen randomly from the population.
Graph is on page 4. The histogram looks moderately bell shaped when graphed. The box-and-whisker plot also shows the data being approximately bell shaped. The five number summary for box plot is Min=52, Q1=65, Med=67, Q3=70, and Max=79. deviation is 5.320.
Summary statistics show the mean of 67.275, mode of 65 and 67, and the standard The range is H-L=79-52=27
13
The weight of every person surveyed from the sample The sample size (n) for weight is forty
individuals picked randomly from the population.
Graph is on page 5. The graph seems to be approximately bell shaped and we can further see this in
the box plot
The five number summary for the box plot is Min=93, Q1=135, Med=166.5,
Q3=210, Max=280.
0 0 0 0 0 0 0 0
0 0 0 0 0 0 3000 15000
14
The box plot gives us a more accurate description of the data, and provides us with more evidence to suggest the data for this category is skewed right. The five number summary for the box plot was Min=0, Q1=0, Med=42000, Q3=75000, Max=2500000. The range H-L=0-250000= 250000 The summary statistics show the mean for this data is 50313.512, the mode is 0, and the standard deviation is 57456.391.
The amount of water in ounces consumed by individuals within our sample data that were surveyed The sample size (n) for ounces of water consumed by each person is forty and
was chosen randomly.
0 0 2 8 12 12 16 16
24 24 30 32 32 32 32 36
36 40 48 48 48 50 50.7 56
60 60 64 64 72 72 100 120
Stats show the Mean= 70.545, Mode= 32, and Standard Deviation= 62.817.
The five number summary for the box plot is Min.= 2, Q1= 32, Med.= 49, Q3100,
Max= 256
15
Phase III
1A. Is there a relationship between a persons height and weight? To determine the appropriate measurement of association, scatter plot will be used.
52 57 60 61 61 62 63 63
63 65 65 65 65 65 66 66
66 67 67 67 67 67 68 69
69 69 69 70 70 70 70 71
72 72 72 74 75 75 77 79
Ho:=0 HA:>0 = 0.05 Test: Pearsons correlation. Assumptions: Scatter plot indicates linear association
Pearsons r = .6528 CI: (0.4284, 0.8014) P-value = 4.655E-06/2 0.0000 Decision: Reject the null hypothesis
16
Conclusion: There is enough evidence to suggest a positive association between a persons height and weight, and I am 95% confident that the true proportion correlation coefficient lies between the points 0.4284 and 0.8014.
17
1B. Is there a relationship between a persons shoe size and ring size? To determine the appropriate measurement of association, scatter plot will be used.
7.5 8 12 7 7 8 8 7
6 7 7.5 7 8 7 7 6
6 8 7 6 4 7 10 9
Shoe Size
6 11 5
Ho:=0 HA:>0 = 0.05 Test: Pearsons correlation. Assumptions: Scatter plot indicates linear association
11 9 10 4 11 12 9 9
6.5 10 11 6 13 7 10 8
8 9 15 8 8 6 9 6
10 7 8.5 8 10 8 5.5 15
11 9 5 9 6.5 10 12 7
Pearsons r = .6759 CI: (0.3982, 0.8401) P-value = 0.0001/2 0.0000 Decision: Reject the null hypothesis
18
Conclusion: There is enough evidence to suggest a positive association between a persons ring size and shoe size, and I am 95% confident that the true proportion correlation coefficient lies between the points 0.3982 and 0.8401. 2. Is there a difference in gross income based on gender? Gender Male Annual Gross Income 70000 15000 42000 0 40000 48000 100000 0 9600 0 49000 49000 0 125000 46000 40000 0 0
250000 3000 0 0
185000 27000
The parameter of interest is means since I am comparing two variables, and the measurement scale is ratio. Test: 2-sample t-test Assumptions: X1~N: Violation X2~N Gross income of Males
Due to violation of the normal plot graph for men, I will use the non-parametric test: Test: Wilcoxon Rank-Sum HO: M-F=0 HA: M-F>0 = 0.05 P-value: .3582
19
Decision: Fail to reject Conclusion: There is not enough evidence to suggest there is a difference in gross income based on gender.
20
3. Is there a relationship between political party and A. If the respondent feels President Obama will be re-elected? D= Democratic Observed Values I= Independent Political Party O= Other R= Republican U= Undecided President Obama Re-elected Yes= 14 No= 18 U= 8 R= 15 D= 13 I= 7 O= 5
The data is categorical, and the measurement scale is nominal. Test: 2 Test of Independence HO: Political party and Obama re-elected are independent HA: Political party and Obama re-elected are dependent. = 0.05 Assumptions: Rows and columns are independent Satisfies the properties of a multinomial experiment. All expected values are at least 1. No more than 20% of the expected values are less than 5. Expected values Political Party President Obama Re-elected Test Statistic: 5.9076 P-value: 0.1162 Decision: Fail to reject Conclusion: There is not enough evidence to suggest Political party and President Obama being re-elected is dependent. 14.5000 14.5000 15.5000 15.5000 7.5000 7.5000 2.5000 2.5000
21
3. Is there a relationship between political party and B. If the respondent is in favor of the Health Care Bill as passed? D= Democratic Observed Values I= Independent Political Party O= Other Health Care Bill Yes= 16 R= Republican U= Undecided The data is categorical, and the measurement scale is nominal. Test: 2 Test of Independence HO: Political party and Health Care Bill are independent HA: Political party and Health Care Bill are dependent. = 0.05 Assumptions: Rows and columns are independent Satisfies the properties of a multinomial experiment. All expected values are at least 1. No more than 20% of the expected values are less than 5. Expected values Political Party Health Care Bill Test Statistic: 6.3880 P-value: 0.0942 Decision: Fail to reject Conclusion: There is not enough evidence to suggest Political party and in favor of Health Care Bill are dependent. 15.5000 15.5000 12.5000 12.5000 9.5000 9.5000 2.5000 2.5000 No= 12 U= 12 R= 15 D= 13 I= 7 O= 5
22
3. Is there a relationship between political party and C. If the respondent is in favor of the death penalty? D= Democratic I= Independent O= Other R= Republican U= Undecided The data is categorical, and the measurement scale is nominal. Test: 2 Test of Independence HO: Political party and Death penalty are independent HA: Political party and Death penalty are dependent. = 0.05 Assumptions: Satisfies the properties of a multinomial experiment. All expected values are at least 1. No more than 20% of the expected values are less than 5. Expected values Political Party Death Penalty Test Statistic: 12.0370 P-value: 0.0073 Decision: Reject null hypothesis Conclusion: There is enough evidence to suggest Political party and in favor of death penalty are dependent. 20.0000 20.0000 13.5000 13.5000 4.0000 4.0000 2.5000 2.5000 Political Party Death Penalty R= 15 Yes= 25 Observed Values D= 13 No= 14 I= 7 U= 1 O= 5
23
4. Is there a relationship between handedness and a. In favor of the death penalty? R= Right L= Left A= Ambidextrous U= Undecided Handedness Death Penalty R= 32 Yes= 25 Observed Values L= 5 No= 14 A= 3 U= 1
The data is categorical, and the measurement scale is nominal. Test: 2 Test of Independence HO: Handedness and Death penalty are independent HA: Handedness and Death penalty are dependent. = 0.05 Assumptions: Satisfies the properties of a multinomial experiment. All expected values are at least 1. No more than 20% of the expected values are less than 5. Expected values Handedness Death Penalty Test Statistic: 6.1228 P-value: 0.0468 Decision: Reject null hypothesis Conclusion: There is enough evidence to suggest Handedness and in favor of death penalty are dependent. 28.5000 28.5000 9.5000 9.5000 2.0000 2.0000
24
4. Is there a relationship between handedness and b. Amount of water consumed? Handedness Amount of water consumed by right handed individuals are 32. Amount of water consumed by left handed individuals are 5. Amount of water consumed by ambidextrous handed individuals are 3. R=32 L= 5 A= 3
One-way ANOVA HO: R= L= A HA: At least 1 not equal = 0.05 Assumption: X1~N, X2~N, X3~N
32 16 2
16
12 48 24 consumed Water 50
64
128 12 36 0
50.7 60 144 32 8 48 64 72 32
240 30
160 32 48 0
24 128 60
256 200
25
P-value= .0118 Decision: reject Ho Conclusion: There is not enough evidence to suggest there is a difference in group earnings.
R= Right L= Left A= Ambidextrous The data is categorical, and the measurement scale is nominal. Test: 2 Test of Independence HO: Handedness and change in party are independent HA: Handedness and change in party are dependent. = 0.05 Assumptions: Satisfies the properties of a multinomial experiment. All expected values are at least 1. No more than 20% of the expected values are less than 5. Handedness Change in Political Party R= 32 Not Applicable= 34 Observed Values L= 5 Consider myself conservati ve but assoc. with tea party= 3 A= 3 Consider myself liberal but assoc. with tea party= 3
Expected values Handedness Change in party 33.0000 33.0000 4.0000 4.0000 3.0000 3.0000
26
Test Statistic: .5606 P-value: 0.7556 Decision: Fail to reject Conclusion: There is not enough evidence to suggest Handedness and change in political party are dependent.