Professional Documents
Culture Documents
Sampling Distributions
Statistics VS parameters
Essentially, we would like to know the parameter. But in most cases it is hard to know the parameter since the population is too large. So we have to estimate the parameter by some proper statistics computed from the sample.
Quick Review
p = population proportion = sample proportion (it is called p-hat) p = population mean x = sample mean
Empirical rule: For Variables with a Normal (Bell-Shaped Distribution) ~68% of the values fall within +/- 1 standard deviation of the mean. ~95% of the values fall within +/-2 standard deviations of the mean.
Repeated Samples
Imagine repeating this survey many times, and each time we record the sample proportion of those who have engaged in under-age drinking. What would the look like? sampling distribution of p
Sample (n=200) 1 2 3 4 Sample Proportion
1 p 2 p 3 p 4 p 5 p
is a random variable p
5
150,000
150,000 p
0.4
0.5
0.6
0.7
0.8
0.06
0.05
Probability
0.04
0.03
0.02
0.01
0.00 69 74 79 84 89 94 99 104 109 114 119 124 129 134 139 144 149 154 159 164 169
Sampling Dist. of p
is simply X/n it follows that the sampling Since the p is the same as that of the binomial distribution of p distribution divided by n.
: E( p ) E ( X n) p Mean of p : sd ( p ) sd ( X n) Std.Dev.of p : se ( p ) Standard Error of p np(1 p ) n p (1 p ) n
(1 p ) p n
1.
2.
Recent studies have shown that about 20% of American adults fit the medical definition of being obese. A large medical clinic would like to estimate what percent of their patients are obese, so they take a random sample of 100 patients and find that 18 percent are obese. Suppose in truth, the same percentage holds for the patients of the medical clinic as for the general population, 20%. Give a numerical value of each of the following.
d. The mean of the sampling distribution of p = p = .2 e. The standard deviation of the sampling p (1 p ) = .04 , distribution of p
n
0.12
0.08
0.10
0.04
0.06
0.00
0.02
0.0
0.5
1.0
1.5
50
55
60
65
70
75
80
62
63
64
65
66
67
68
Consider a random variable X with mean and standard deviation . The sampling distribution of the sample mean for sample of size n, is normal with
Mean of
x E ( x) n
Std.Dev.of x sd ( x)
What about for skewed or non-normal data?
100
200
300
400
500
600
CDs
Situation 3: Clearly CDs is a right skewed data set. Suppose our population looked something like this, let us take repeated samples from this population and see what the sample mean looks like.
n=4
1200
n=8
500
200
100
200
300
50
100
150
200
250
800
600
n = 16
n = 32
400
200
50
100
150
200
200
400
600
800
40
60
80
100
120
140
160
180
Using that CD sample as the population, = 87.6, = 87.8 The sample means from the previous slide had the following summary statistics:
Sample Size
N=4 N=8 N = 16 N = 32
Mean
86.6 86.8 86.7 86.6
Std. Deviation
43.2 30.9 21.9 15.6
Note: that the mean remains constant, and the std. deviation decreases as the sample size increases!
For non-normal data the sampling distribution of the sample mean is approximately normal with mean and standard deviation / n
Conditions!
The above is true if the sample size is large enough, usually n greater than 30 is sufficient.
What next?
We have shown that both the sampling distribution of the sample proportion, and the sampling distribution of the sample mean are both normal under certain conditions.
Now we can use what we know about normal distributions to draw conclusions about p and
x!
Situation 4, demonstrates how to use the sampling distribution of p-hat to draw conclusions.
Situation 4: A certain antibiotic in known to cure 85% of strep bacteria infections. A scientist wants to make sure the drug does not lose its potency over time. He treats 100 strep patients with a 1 year old supply of the be the proportion of individuals who antibiotic. Let p are cured.
ASSUME the drug has NOT lost potency, answer the following questions
1. 2.
? p
If we repeated this study many times we would expect 95% of to fall within what interval? What is the probability that more than 90% in the sample are cured?
3.
4.
Suppose the scientist observed a cure rate of only 75%, would he be justified in concluding the 1 year old drug is less effective?
is Then the sampling distribution of p approximately normal with mean p=.85 and standard deviation p(1 p) = .036.
n
2.
If we repeated this study many times we would to fall within what interval? expect 95% of p The empirical rule states that for a normally distributed variable ~95% of the values fall within +/- 2 standard deviations of the mean. should fall within So 95% of the p .85+/- 2*.036 or there is 95% probability that the proportion cured should be between 78% and 92%
3.
What is the probability that more than 90% in the sample are cured?
4.
Suppose the scientist observed a cure rate of only 75%, would he be justified in concluding the 1 year old drug is less effective?
In other words, assuming the cure rate is actually 85%, what is the chance he would observe as sample .75)? proportion equal or less than 75%? What is P( p
Z-score = [.75-.85]/.036 = -2.80 .75) = P(Z< -2.80) = .0026 P( p
We will see some examples about how to use the sampling distribution of the sample mean in class activitiesbut it is similar idea.