You are on page 1of 15

Langara College

Stat 2225
Introduction
The Practice of Statistic for Business and Economics is an introductory
text in Statistics at a level that is accessible to students who are taking
statistics as a first or second semester course. The book is divided into two
integral areas of concentration:

data analysis (Chapters 1 to 3), which deals with methods and


strategies for organizing, describing and exploring data with graphs
and numerical summaries; and methods of producing data using
random sampling.

Probability and Inference (Chapters 4 to 17), which deals with


techniques for drawing conclusions from data using tools of probability
to account for sample-to-sample variations. Note that Chapters 15
through 17 are provided as optional companion chapters of the text.
Make sure your copy of the text includes these optional chapters.

This course assumes that you are familiar with the topics in Chapters 1
through 9 on the course outline of the Stat 1181 course. However, we
will do a quick review of some these topics and then follow up with

discussions of selected topics in Chapter 9 through 16 (refer to the


Stat 2225 course outline).
The text emphasizes a basic mantra: statistics is learned best by doing
statistical problems. Hence the text provides many exercises for practice.
However, being an introductory text, the level of mathematical rigor
needed to establish the validity of the underlying statistical concepts has
been largely left out. Read pages xxxv through xxxvii on the authors
recommendation on how best to study the material in the text.
This course deals essentially with techniques of statistical inference which,
as stated earlier, deals with using a fact about a sample to infer conclusions
about a wider population of interest. We begin our review of Stat 1181
material the some basic vocabularies about statistical inference that you
will find in Section 3.3 and 4.4.

A population is an entire group of individuals (humans or objects)


about which or from whom we want information.

A sample is s part of the population that we actually examine in


order to gather information.

Note that population is defined in terms of our desire for knowledge. If


we want to draw a conclusion about homelessness in Vancouver, all
residents in that city is our population even if we survey only residents of
Vancouver Downtown East Side.

Closely related to population and sample is the distinction between a


parameter and a statistic.
A parameter is a number that describes a population. A parameter is a
fixed number, but in practice we hardly know its value.
A statistic is a number that describes a sample. The value of a statistic is
known when have taken a sample, but it can change from sample to
sample. We often use a statistic to estimate an unknown parameter.
For example, a GM automobile engineer may be interested in the mean fuel
efficiency (the number of litres of gasoline per 100 kilometres drive) of its
new model automobile. However, this number will never be known since it
is not practical for GM to put all of such vehicles on the road, drive them
under varying road conditions in order to collect the data to determine the
mean fuel mileage. In reality, GM will test a random sample of the new
model vehicles and use the data obtained to estimate the mean fuel
efficiency of all the new models.
Sampling variability refers to the fact that the value of a statistic varies
from one sample to the next. The variability of a statistic is described by
the spread of its sampling distribution. The sampling distribution of a
statistic is the distribution of values taken by the statistic in all possible
samples of the same size from the same population.

The variability of a statistic is often described by the standard deviation of


its sampling distribution. Statistics from a larger samples tend to have
smaller standard deviations.
The Law of Large numbers
Suppose you draw a random sample of independent observations from a
population with mean . As the number of observations increases, the mean
x of the observed values gets closer and closer to .

Notice that the law of large numbers describes the long average of the
means of samples from a population as the samples increase in size. It is
intuitively clear that as more and more observations are taken, the sample
would tend to look more and more like the population. So the mean of a
large sample would be close to the mean of the population.

The Mean and Standard Deviation of a Sample Mean


Suppose that x is the mean of a random sample of size n from a large
population with mean and standard deviation . Then the mean of the
sampling distribution of x is and its standard deviation is / n .
This last result has important implications for statistical inference.

The mean of the distribution of x is . That is, in repeated sampling,


x will vary around as its center, but there is no systematic tendency

to overestimate or underestimate .

Sample averages are less variable than individual values. They are
even less variable as the sample size increases.

The shape of a sampling distribution depends on the shape of the


population. If the population is Normal, then so is the sampling
distribution for any sample size n. What happens when the population is
not Normal? As the sample size increases, the sampling distribution
changes shape, looking less than that of the population and more like a
Normal distribution. When the sample size is large enough, the sampling
distribution is very close to a Normal. This result is called the Central
Limit theorem The theorem describes the shape of the sampling
distribution of a sample statistic as the sample size increases. We state the
result for the distribution of a sample mean and the distribution of a
sample proportion separately:
The Central Limit Theorem (CLT) for a Sample Mean (see page 271):
Suppose a SRS of size n is drawn from a population with mean and
standard deviation . When n is large enough, the sampling distribution of
the sample mean x is approximately Normal with mean and standard
deviation / n .

The CLT justifies why Normal distributions are common models for
observed data. Any variable that is the mean or sum of many small
influences will have approximately a Normal distribution.
Example: Exercise 4.110, page 275.
Solution: (a) We want the probability that the score of a randomly chosen
single student is 21 or higher. This is given by the area under the graph of
the Normal curve to the right of 21.
Since the population of the scores X has a mean =18.6 and standard
x

deviation =5.9, its standard score Z

x 18.6
. So X=21 =>
5.9

Z=0.41. The area to the right of Z=0.41 is 1-0.6591=0.341. Thus the


required probability is 0.341.
(b) The mean of the sampling distribution is 18.6; the standard
deviation of the sampling distribution is X / n 5.6 / 50 0.792.
(c) We want the probability that x the mean score for n=50 students is 21
or higher. This is the area under the graph of its sampling distribution to
x

21 18.6

the right of 21. The standard score of the scores is Z 0.792 3.03.
x
The area to the right of Z=3.03 is 1-0.9988=0.0012. Thus the required
probability is 0.0012.

Your Turn: Exercise Set 4.4, page 276 #s 4.113, 4.115, 4.117, 4.124

Solution: #4.113: 500,000,000 is a parameter; it is a number describing


all of Apple songs sold in the given time interval. The number 5.6 ( is a
statistic since it is a number describing a sample of all past, presnt and
future iTunes transactions.
#4.115. 19 is a parameter since it a number describing all of the businesses
in the US. The number 14 is a statistic because it is a number describing
the 100 businesses in North Dakota.
#4117 (a) The sampling distribution of the mean of 3 weighings is a
Normal curve with a mean of 123 mg and standard deviation of
X / n 0.08 / 3 0.046 since the distribution of individual weighings is Normal.
x

x 123

(b) The standard score of the mean of 3 weighings is Z x 0.046 .


So P( X 124) P(Z 21.65) 0.

#4.124. (a) The sampling distribution of x is approximately normal since


the means are each based on a sample of size 52 (weeks). The mean of the
distribution is 2.2 and the standard deviation is x / n 1.4 / 52 0.194.
(b) The standard score of x , the mean number of accidents at the
x

x 2.2

intersection is Z 0.194 . So P( x 2) P(Z 1.03) 0.1515.


x
(c) Fewer than 100 per year => x 100 / 52 1.92. Moreover,
P( x 1.92) P(Z 1.44) 0.0749.

CLT for a sample proportion (refer to page 459)


Suppose a SRS of size n is taken from a large population that contains
population proportion p of successes. Let x be the number of successes in
the sample and let p be the sample of successes,
p

x
.
n

Then:
For large sample sizes, the sampling distribution of p is
approximately a Normal curve.
The mean of the sampling distribution is p.
The standard deviation of p is

p(1 p )
.
n

Example: #8.1, page 459.


Solution
(a)

The sample size is 760.

(b)

Here success means a bank expects to acquire another bank


within five years. Thus the number of successes in the survey
responses is 283.

(c)

283
0.372.
760

Exercise 8.3, page 461.


Solution

SE , the standard error of p , is defined to be SE


p
p

p (1 p )
,
n

where p is the sample proportion of successes. Thus with p

283
0.372,
760

(a)

and n=760, we get SE


p

0.372(1 0.372)
760

0.018.

(b) For a 95% confidence interval for the population proportion p,


use the formula p z*SE 0.372 1.96*0.018 0.372 0.035 0.337,0.407).
p
(Note that z* =1.96 was determined using the Standard Normal table).
(c) The confidence interval in (b) expressed in percents is
(33.7%,40.7%)

Your Turn: Exercise 8.4, page 461.

Inference for a Population Mean (Section 6.1-6.4)


The basic idea of inference is this: determine as accurately as possible the
value of a parameter of a population based on data from a random sample
of the population. To draw an inference about a population mean, one must
consider which of the following scenarios best describes the distribution of
the population:
A. the distribution is a Normal curve and its standard deviation is predetermined or can be determined accurately from the sample data;

B. the distribution is a Normal curve but its standard deviation is not


known and cannot be determined accurately from the sample data.
C. the distribution of the population is not a Normal curve

In pratice, preliminary analysis on the sample data is needed to determine


which of the scenarios best describes the population of interest and then
assume that the sample actually meets those conditions.

Scenario (or Assumption) A


A simple random sample of size n is taken from a population with a known
standard deviation and unknown mean . Based on preliminary tests on
the sample data, it can be concluded that the sample is from a Normal
distributed population. If x is the mean of the random sample of size n,
then the sampling distribution of x is a Normal curve with mean and
standard deviation

Interval Estimate of a Normal Population Mean (See page 340)


It can be easily shown that a 100(1- )% confidence interval estimate of
is thus given by the formula

x z*
,
n
where z* is the point on the Standard Normal curve such that the area to
its right is / 2.

Exercise 6.9, page 346.


Solution
(a)

The managers mistake was using =0.8 instead of

0.8
0.08 for the standard deviation of the sampling
100

distribution of x .
(b)

A confidence interval estimates a population value which is


unknown rather than a sample value which is known. He should
replace the words sample mean with population mean.

(c)

The managers conclusion is invalid because the interval estimate


is about a population mean not a population proportion. Based on
the study, the manager can be 95% confident that the mean
rating they would receive if all customers were surveyed would be
between 7.1432 and 7.4568 out of 10.

(d)

The manager is confusing the sampling distribution of the mean


ratings for 100 customers with the distribution of the individual
ratings of 100 randomly selected customers. The central limit
theorem states that the former is normal but does not say
anything about the latter, which may even be a skewed
distribution. In that case, a histogram of a random sample of 100
from such population will certainly not be a Normal curve but is
more likely to be skewed.

Exercise 6.24, page 350

Solution
(a)

4.5

0.92 (kg).
With n=24 and 4.5, x
n
24

(b)

For a 95% confidence interval for , the required formula is

x z*
.
n

So with x 61.9 , z* 1.96 and


0.92 , we have
n
61.9 1.96*0.92

= (60.1,63.7).

Scenario A (continued) Test of Significance for a Normal Population Mean


Recall that the procedure for testing a hypothesis involves four steps:
State the null and alternative hypotheses
Choose a suitable test statistic and determine its value from the
sample data
Obtain the P-value
Compare the P-value with the specified level of significance and
state your conclusion in the context of the specific setting.
To test the null hypothesis H : against a specified alternative
0
0
hypothesis Ha , the appropriate test statisitic is

x
0
z
/ n

This statistic is called the one-sample z statistic. Its sampling


distribution is the Standard Normal curve. We use this information to
determine the P-value corresponding the test statistic for a specified H a .
Study the summary on page 361 for computing a P-value for different
forms of H a .

Exercise 6.49, page 363.


Solution:
(a) To test H : against H a : , with a test statistic z=1.9
0
0
0
the P-value = P( z >1.9) = 1-0.9713 =0.0287.
(b) To test H : against H a : , with a test statistic z=1.9
0
0
0
the P-value = P( z <1.9) = 0.9713.
(c)

To test H : against H a : , with a test statistic z=1.9


0
0
0
the P-value = 2P( z >1.9) = 2(1-0.9713) =0.0574.

Exericise 6.68, page 369.


Solution:
(a)

To test H : 115 against Ha : 115, given that 30, x 133.2


0
x
0 = 133.2 115 =3.03.
and n=25, the test statistic z
30 / 25
/ n
Thus P-value = P( z >3.03) = 1-0.9988 =0.0012.

(b)

The test of significance procedure used in (a) assumes that the


sample is taken from a normal population; it also assumes that the
students used in the study represent a random sample of US college
students. All three assumptions are desirable but if the sample is not
representative, the conclusion drawn from it may not hold.

You might also like