You are on page 1of 4

Fall 2010 STAT 200 Assignment #1 Due: 5pm on Wednesday Oct 6, 2010

1. Newly hatched chicks were randomly allocated into six groups, and each group was given a dierent feed supplement (Anonymous (1948) Biometrika, 35, 214.) Their weights in grams after six weeks are given along with feed types. The data are in the le chickweight.xls. Note: You may use Excel to obtain summary statistics and plotting graphs. (a) Identify all the variables and indicate their associated types (quantitative or categorical). (1 mark) (b) Create a histogram for the weight for all 71 chicks. Describe the shape, centre, and spread of the histogram. Include in your description statistics that represent the mean, median, Interquartile range, and standard deviation. (6 marks) (c) Randomly pick two feed types. Describe the procedure that you used to randomly select your two feed types. Your description needs to be clear enough that another student could follow it to pick their two feed types. And it needs to be random in that, if another student followed your procedure, any two pairs of feed types would have the same chance of being picked. (2 marks) (d) Create two side-by-side boxplots for your two randomly selected feed types together with their 5-number summaries. (4 marks) (e) Compare your two selected feed types and support your conclusions with the results from (d). (2 marks) (f) How would you explain the shape of your histogram in (b), given your results in (d)? (1 mark) (g) i. Is this an observational study or an experiment? Justify your answer. (1 mark) ii. Identify the following: (4 marks) factor(s) (what variable(s) is/are being manipulated by the study investigator?) treatment(s) response variable (what is the outcome?) control group (if it does not exist, please indicate that it does not exist in your solution) iii. Is blocking used in this study? Justify your answer. (1 mark)

SEE THE ATTACHED SOLUTION.

2. A researcher wishes to determine if a new drug will reduce the symptoms of allergy suerers. The researcher gives the new drug to 50 volunteers who suer from allergies, and nds that 40 of them report a signicant reduction in their allergy symptoms. (a) Suggest three ways to improve the design of this study. (3 marks) Answer: (1) Obtain an SRS of allergy suerers, (2) Use a control group, (3) Randomize the study participants to one of the new drug group and the control group, (4) Blind both the study participants and the researcher or the person who evaluates the data (b) Identify the population, the parameter of interest and the statistic. (3 marks) Answer: population: all allergy suers, parameter: percent of people with improved symptoms in the population, statistic: percent of people with improved symptoms in the sample.

3. For a large introductory statistics course, the instructor wishes to study the eect of using clickers on course performance. He has never used clicker before, so he would want to try this out on a sample of his students. From the class, the instructor randomly chooses 10 students from each degree category (there are a total of 8 categories including Bachelor of Arts, Bachelor of Science, Bachelor of Pharmaceutical Sciences, Bachelor of Applied Science, etc). The 80 students are recruited in the study, and they have a choice to use or not use clicker throughout the term. At the end of the term, the instructor claims that among the 80 students, there is a statistically signicant dierence in the course performance between students who use clickers and those who dont. (a) What sampling technique did the instructor use in selecting students to participate in the study? (1 mark) Answer: Stratied sampling (b) Is the study an experiment? Justify your answer. (1.5 marks) Answer: No. The instructor does not intervene on the use of clicker by the 80 students. (c) Based on the results of the study, is it legitimate to conclude that using clickers causes an improvement in course performance? Justify your answer. (1.5 marks) Answer: No. This is an observational study. One cannot establish a cause-andeect relationship between the use of clicker and course performance from an observational study.

4. In a Canadian neighbourhood, 75% of households speak only English at home, and 5% of households speak only French at home. (a) A household is randomly chosen from this neighbourhood. Consider the following two events: A = the chosen household speaks only English at home B = the chosen household speaks only French at home Are the two events disjoint? Are they independent? Justify your answers probabilistically. (3 marks) Answer: For a household that speaks only English at home, it is not possible that it speaks only French. The two events cannot happen together, i.e. P (A and B) = 0. Hence, they are disjoint. The two events are not independent because: P (A) P (B) = 0.75 0.05 = 0.0375 = P (A and B) = 0 (b) Based on the information given in this question, can the probability that a randomly chosen household speaks only Spanish at home equal 22%? Justify your answer probabilistically. (2 marks) Answer: No. For a household that speaks only one language at home, it is not possible that it speaks another language. Therefore the events A, B and C (dened to be the event that a household speaks only Spanish) are disjoint. If P (C) = 0.22, then by the Addition Rule, P (A or B or C) = P (A) + P (B) + P (C) = 0.75 + 0.05 + 0.22 = 1.02 A probability cannot exceed 1. Therefore P (C) cannot equal 0.22. (c) Based on the information given in this question, is it legitimate to deduce that the percentage of households in this neighbourhood that speaks both English and French at home equals 0.750.05 = 0.0375? Explain why or why not. (2 marks) Answer: No. Firstly, there is insucient information to deduce the probability that a household speaks English at home and the probability that a household speaks French at home. Secondly, it is incorrect to apply the Multiplication Rule here (multiplying the two probabilities 0.75 and 0.05 which represent two dependent events A and B, respectively). (d) Four households are randomly chosen from the neighbourhood. Find the probability that three households speak only English at home and one speaks only French at home. (3 marks) Answer: Assuming that the four households are independent, then the probability is (by the Multiplication Rule) = 4 0.753 0.05 = 0.084375. The household that speaks only French at home can be the rst or the second or the third or the fourth household, and hence the constant 4 in the probability.

(e) Four households are randomly chosen from the neighbourhood. Given that at least one of the four households speaks only French at home, nd the probability that exactly one speaks only French at home. (5 marks) Answer: Let E represent the event that at least one of the four households speaks only French at home, and F represent the event that exactly one speaks only French at home. We are looking for P (F |E) = P (E and F )/P (E). P (E) = 1 P (all do not speak only French at home) = 1 (1 0.05)4 = 0.1855. P (E and F ) = P (F ) = 40.05(10.05)3 = 0.1715. So P (F |E) = 0.1715/0.1855 = 0.9244

5. (a) A lot of 50 refrigerators contains 30 refrigerators which are defective. Suppose 3 refrigerators are randomly selected from this lot, without replacement. i. Find the probability that all three selected refrigerators are defective. (4 30 marks) [ 50 29 28 = 0.207)] 49 48 ii. Find the probability that the third selected refrigerator is defective if the rst 30 two selected refrigerators are defective. (1.5 marks) [ 48 = 0.625] iii. With no information as to whether the rst two selected refrigerators are defective or not, nd the probability that the third selected refrigerator is defective. (1.5 marks) [ 30 = 0.6] 50 (b) Suppose we again select refrigerators from the lot in (a) at random, without replacement. In order to have the probability that all the selected refrigerators are non-defective to be less than 0.1, what is the minimum number of refrigerators that we need to select? (3 marks) [ 20 19 = 0.155, 20 19 18 = 0.058, i.e. select 3] 50 49 50 49 48 (c) Suppose we select refrigerators from another lot. This lot is large enough that we can assume that our sampling is done with replacement. In this lot, 60% of refrigerators are defective. In order to have the probability that at least one selected refrigerator is non-defective to be at least 0.9, what is the minimum number of refrigerators that we need to select? (3 marks) [ (0.4)2 = 0.16, (0.4)3 = 0.064, i.e. select 3.]

You might also like