You are on page 1of 32

Chapter 1 Introduction

The teaching of theory ( 3 hours)

Objective:
1. master common statistical terms, such as Homogeneity and variation; Variable, Population
and Sample; the types of data, Parameter and Statistic; Sampling and sampling error;
probability. etc.
2. know well What’s biostatistics? The main Applications and uses of biostatistics as a science;
how to learn the subject well?
3. understand scope of biostatistics, the association among medical statistics, health statistics,
vital statistics.

Emphasis:
1. master common statistical terms and their notations.
2. know well What’s biostatistics? The main Applications and uses of biostatistics as a science.
3. understand scope of biostatistics, the association among medical statistics, health statistics,
vital statistics.
master

Difficulty:
Homogeneity and variation; probability

contents:
1. what’s biostatistics? And how to learn the subject well?
In Webster’s International Dictionary: biostatistics is a science dealing with the collection,
analysis, and presentation of masses of numerical data.
In Dictionary of Epidemiology: biostatistics is a science and art of dealing with variation in data
through collection,classification,and analy-sis in such a way as to obtain reliable results.
2.the application of biostatistics as a science. Such as Find limits of normality ; Find the
difference between means and proportions is significant or not, Find the correlation between
variables and so on.
3.scope of biostatistics, the main contents of the textbook, the association among medical
statistics, health statistics, vital statistics.
4. common statistical terms, include homogeneity and variation; variable, observation unit,
observation, data; the types of data(quantitative and qualitative data
); population and sample; parameter and statistic; sampling and sampling error; probability, the
notations of the terms.

Chapter 2 The process of statistical work

The teaching of theory (5 hours)


Objective
1. master the process of statistical work: Collection of data, Sorting data or classification of data,
Analysis of data); the steps of drawing frequency distribution table.
2. know well the resources and presentation of data Collection of data, the methods of sorting
data and analyzing of data),the uses of frequency distribution table.
3. understand the association between scientific design and statistical conclusion in research
work.

Emphasis:
1. the process of statistical work.
2. the main resources and presentation of data.
3.The steps of drawing frequency distribution table.
4. the methods for analysis of data: descriptive statistics and inferential statistics:

Difficulty:
The methods of inferential statistics.

contents:
1. The process of statistical work: Collection of data, Sorting data/classification of data,
Analysis of data
2.Sources and presentation of data: records: the routine and ready-made information in medical
work;experiments on individuals in laboratory; surveys or investigations in community or other
certain sites,etc..
3.Sorting data or classification of data:
correct the mistakes occurred in original records firstly; then need classification in sorting data--
drawing frequency distribution tables is often used in the process.
4. Analysis of data-- descriptive statistics: statistical indices,statistical table and graph;
inferential statistics: estimating of population’s parameter and tests of Hypotheses: t-test, Z-test,χ
2(Chi-Square)test, analysis of variance(ANOVA), Linear correlation and regression, etc.
5. how to draw the frequency distribution table for continuous quantitative data and discrete
quantitative data: ; locate the maximal and minimal value; work out the range; estimate the
number of groups and the class interval;list the limits of groups.
6. the uses of frequency distribution table: Find the type of distribution of data is symmetrical
distribution or asymmetric distribution; Find out the characters of frequency distribution table;
central tendency and tendency of dispersion; Easy to find extreme value; Easy to choose suitable
indices or methods to analysis of the data.

the teaching of practice (2 hours)

Emphasis:
1. master the process the statistical work and
2. master the steps of drawing the frequency distribution table.
3. know well the uses of frequency table.

Contents:
1. what’s the process of statistical work?
2. review the steps of drawing a frequency distribution for continuous quantitative data?
3. exercises of drawing a frequency table for a given data.

Height values from 110 7-year boys of one certain city in 1992 are followed as below, drawing it’s frequency
distribution table and narrate it’s characters and show it’s type of distribution.

112.4 117.2 122.7 123.0 113.0 110.8 118.2 108.2 118.9 118.1 123.5 118.3 120.3 116.2 114.7 119.7 114.8 119.6 113.2 120.0 119.7
116.8 119.8 122.5 119.7 120.7 114.3 122.0 117.0 122.5 ll9.8 122.9 128.0 121.5 126.1 117.7 124.1 129.3 121.8 112.7 120.2 120.8
126.6 120.0 130.5 120.0 121.5 114.3 124.1 117.2 124.4 116.4 119.0 117.1 114.9 129.1 118.4 113.2 116.0 120.4 112.3 114.9 124.4
112.2 125.2 116.3 125.8 121.0 115.4 121.2 117.9 120.1 118.4 122.8 120.1 112.4 118.5 113.0 120.8 114.8 123.8 119.1 122.8 120.7
117.4 126.2 122.1 125.2 118.0 120.7 116.3 125.1 120.5 114.3 123.1 122.4 110.3 119.3 125.0 111.5 116.8 125.6 123.2 119.5 120.5
127.1 120.6 132.5 116.3 130.8

Chapter 3 The describing indices for quantitative data

The teaching of theory (6 hours)

Objective
1. master the names and conditions of applyinging the indices for describing the central
tendency of quantitative data.
2. master the names and conditions of applying the indices for describing the tendency of
disperse of quantitative data.
3. know well the calculation of the describing indices for quantitative data.
4. understand the meaning and calculation of geometric mean.

Emphasis:
1.master the names, meaning and conditions of applying the indices for describing the central
tendency of quantitative data.
2.master the names, meaning and conditions of applying the indices for describing the tendency
of disperse of quantitative data.
3. how to choose the suitable indices for a given data?

Difficulty:
1. The meaning and calculation of percentile, Median, quartile range, Variance and standard
deviation.
2. the condition of applying CV.

contents:
indices for describing quantitative data includes two parts: central tendency and tendency of
disperse.
1.the indices for describing the central tendency of quantitative data such as mean, median and
mode, etc; the conditions when using these different indices.
Mean is applied when the data is symmetrical distribution, especially normal distribution;
Median is applied to datum of asymmetric distribution(or not so evenly distributed) and one or
more value on ends are wide apart. etc.
2.calculation of mean for small and large sample:

x=
x 1 + x 2 + ... + x n
=
∑ x , x = ∑ fx g
=
∑ fx g
, x g is mid - value
n n ∑f n
of every group in frequency distribution table.

Calculation of median for small and large sample:


For a small sample,when n is odd , Median is the mid-value of the group after all the observations
are arranged in ascending (or descending) order;when n is even, Median is the arithmetic mean
of the middle two values after observations are arranged in order.
But for a large sample, should using the equation listed below:
i
Px = L + ( n ⋅ x% − f L )
fx

In the equation, Px—percentile


L---lower limit of the group which median lies
i---class interval of the group which median lies;
fx---frequency of the group which the median lies;
fL---cumulative frequency before the group median lies ;
n---the number of the sample.

3. the indices for describing the tendency of disperse for quantitative data, such as range
,quartile range(Q), variance, standard deviation(SD), and Coefficient of Variation(CV), etc;
the conditions when using these different indices.
Q is applied to asymmetric distribution mainly; variance and SD is applied to symmetric
distribution, especially normally distributed data; CV is applied to:①compare the variation of
two groups of data which has different measurement units.② compare the variation of two
groups of data whose means differ very obviously.

4. calculation of indices of Quartile range, SD for quantitative data and CV:


Q=QU- QL= P75 - P25

s=
Σ( x − x ) 2
=
∑x 2
− (Σx ) 2 n
(for small sample ),
n−1 n−1

s=
Σ( x − x ) 2
=
∑ fx 2
− (Σfx ) 2 n
(for large sample )
n−1 n−1
S
CV = × 100%
x

the teaching of practice ( 4 hours)

Emphasis:
1.master the names and conditions of applying the indices for describing the central tendency of
quantitative data.
2.master the names and conditions of applying the indices for describing the tendency of disperse
of quantitative data.
3.know well how to select the suitable indices for a given data and calculate them in further.

Contents:
1.introduction of calculator(fx-82TL),common calculation and statistical calculation and learn to
use the calculator to work out the indices for describing the characters of quantitative data.
2. calculate the mean,SD for a symmetrical distribution data, and calculate the median and
quartile range for a asymmetrical distribution data.
3.Calculate the coefficient of variation(CV) for given data.

Chapter 4 Normal distribution and normal curve

The teaching of theory (4hours)

Objective
1.master the conception of normal distribution and the characters of normal distribution.
2.know well the association between the interval of individual values and the area under normal
curve.
3.understand the standardized normal distribution and law of the area distribution under the
standard normal curve.
4master the applications of normal distribution and learn to choose the suitable methods to work
out the normal limits for a variable of medical data.
5.understand the principle and methods of Quality Control in medical study.

Emphasis:
1.the conception of normal distribution and the characters of normal distribution.
2. law of the area distribution under the standard normal curve, and the association between
probability and standard normal deviate(Z).
3. choose the suitable methods to work out the normal limits for a given data.

Difficulty:
Transition from a normal distribution to standard normal distribution, The relation between
probability and standard normal deviate(Z)
contents:
1.the conception of normal distribution: The maximum number of frequencies lies in the middle,
and fewer at the extremes , decreasing smoothly towards both sides,the nature or shape of a
distribution is called normal distribution or (Gaussian distribution).

2.the characters of normal distribution or normal curve

(1) centrality : the distribution centralize at “µ” , the curve is highest at “µ” above on the
abscissa.
(2) Symmetry: the curve is symmetrical about the vertical line of “x=µ”.

(3) normal distribution have two parameters: is location parameter, s is shape parameter.

(4) The normal curve have two inflexions, lies on the two points where x=µ±σ
(5) The total area under the normal curve is 1 or 100%, and the area distributed according to one
certain law.
3.the theory of area distribution under normal curve
(1) the area in the range of µ±1.96σ occupied 95% of the total area under normal curve.

(2) The observations lies in the range of µ±1.96σ is 95% of all the observations.
(3) Draw a observation/individual from the population at random, the probability of it lies in the
range of µ±1.96σ is 95%.
Standard normal distribution and it also has the same law
The area distribution under standard normal curve have one certain law also, for easy to
apply, statistician work out a table to show the relation between area and “z” value of standard
normal distribution(Appendices I, after P325 )
5. the main applications of normal distribution
(1) Find out normality limits
①select a large number of “normal” persons at random to make it a representative
sample.②make sure one tailed or two tailed normality limits according to the professional
knowledge.③make sure the suitable proportion:80%,90%,95%,or 99%.④select suitable
methods to work out the normal limits.
(2)Quality control:μ±3σis control line, μ±2σis warning line, μis central line

the teaching of practice (3 hours)

Emphasis:
1.Master the association between probability and standard normal deviate(Z).
2. master selecting suitable methods to work out the normal limits.

Contents:
1.how frequently of the height values higher than 124cm among the110 boys aged 7 years old?
2.the proportion of the height values lies between 116~122cm among the boys?
3.90% of the boys will centralize on which range?
4.seeing about the actual frequency is consistent with the theoretical frequency or not through
counting the numbers in the range of x ± 1s , x ± 1.96 s, x ± 2.58 s.
5. Work out 95% normal limits of the height of the 110 boys of 7-year old.
6.now a value of height from a 7-year boy is 110.2, then the boy is normal or abnormal if judged
by 95% normal limits?

Chapter 5 The describing indices for qualitative data

The teaching of theory (5 hours)

Objective
1. Master the conception of categories of relative number, those indices applied to describe the
qualitative data.
2. know well the calculation of the indices, such as rate, proportion and ratio, etc .
3. master the items we should pay attention to when applying relative number.
4. master the difference between mortality rate or death rate(CDR) and Case fatality rate(CFR),
also Incidence rate(IR) and Prevalence rate(PR).
5.know well the indices in demography pertaining to vital events.
6.understand the analysis of dynamic time series data.

Emphasis:
1.master the indices applied to describe the qualitative data.
2. master the items we should pay attention to when applying relative number.
3. differentiate mortality rate or death rate(CDR), Case fatality rate(CFR); Incidence rate(IR),
Prevalence rate(PR).

Difficulty:
The difference between rate and proportion when applying them.

contents:
1. review what’s qualitative data, and put forward relative number is the describing index for this
type of data.
2. Categories of relative number : proportion, rate, ratio
1) The denominator shouldn’t too small when calculating rate.
(2) not confusing the rate and proportion.
(3) calculating the total rate correctly.
(4)Pay attention to whether two rates (or proportion) are comparable or not when comparing
them.
(5) when comparing two the rates (or proportions), should test statistical hypothesis.
5.common indices in vital statistics, include those indices in demography pertaining to vital
events, death events, and disease events. Such as Population size, Proportion of population,
dependency ratio; mortality rate or death rate(CDR), Case fatality rate(CFR); Incidence rate(IR),
Prevalence rate(PR).
6. what’s dynamic time series data? And the indices for analyzing this type of data.

the teaching of practice (3hours)

Emphasis:
Age population deaths deaths caused proportion of cancer Death rate of cancer Age specific
(years) by cancer in total deaths(%) (100thousands) death rate(%0)
0~ 82920 4 2.90
20~ 63 19.05 25.73
40~ 28161 172 42
60~ 32
total 167090 715 90 12.59

1.Master the conception of categories of relative number, and the meaning of rate, proportion and
ratio.
2. master the items we should pay attention to when applying relative number ,especially do not
confusing rate and proportion in medical study when applying them.
3.know well how to calculate the indices.

Contents:
1.categories of relative number, and the meaning of rate, proportion and ratio.
2.the calculation of rate and proportion, and use them correctly.
3.the items we should pay attention to when applying relative number.
4.do exercises list as below:
(1) Fill the blanks in the table and describe the data in brief.
(2) Describe the data using the indices you have learned.
2. Through the survey of health service, we got the data: Proportion of population in some area
(1)calculate the proportion of elders. (2)calculate the dependency ratio.
(3)calculate the proportion of women aged 15~49 years old.

Chapter 6 statistical table and graph

The teaching of theory (3 hours)

Objective
1.know well basic conception of statistical table and statistical graph.
2.master the categories of statistical table and statistical graph, and they are used to what kind of data.
Age group Male(%) Female(%) Age group Male(%) Female(%)
0~ 4.2 4.0 45~ 2.4 2.7
5~ 3.2 3.1 50~ 2.1 2.4
10~ 4.4 4.2 55~ 1.2 2.2
15~ 5.5 5.3 60~ 1.3 2.4
20~ 5.1 5.2 65~ 1.1 1.4
25~ 6.0 6.1 70~ 0.8 1.2
30~ 4.3 4.5 75~ 0.5 0.9
35~ 3.2 3.3 80~ 0.2 0.5
40~ 2.3 2.5 85~ 0.1 0.2

3.master the principles of drawing statistical table and statistical graph.


4.know well choose suitable statistical graph to describe the data in research work.

Emphasis:
1.master the categories of statistical table and statistical graph, and they are used to what kind of data.
2.master the principles of drawing statistical table and statistical graph.

Difficulty:
How to choose suitable statistical graph to describe the data.

contents:
1.Statistical table and Statistical chart are important ways to describe or express the data, it can make the data
legible and clearly at a glance.
statistical table is the format which uses the table form to describe the data.
statistical graph is The format which using the form of geometrical graph such as point, line, and area etc to
describe the data.
2.the categories of statistical table include Simple table and Combined table; statistical graph includes Bar
graph, histogram, proportion graph, line chart, scatter diagram, map diagram, etc.
3 what kind of data the statistical graphs applied to? We should select different graph for different study
objective and different type of data.
(1)bar graph applied to discrete data, the height of the equal-wide bar indicates the size of magnitude.
(2)histogram applied to continuous data, use the area of the rectangles to indicate the frequency of each group.
(3)proportional graph(circle or percent bar)using the length/area of a bar to indicate the proportion of every
parts in one event, or using the sector’s area to express the proportion of different parts of one same event.
(4)line chart is applied to continuous data generally, it shows the rising, falling or fluctuations trend of an event
occurring over a period of time such as birth rate, death rate, cancer deaths, etc.
(5)Scatter diagram—using spots to show the nature of correlation between two variables characters X and Y in
the same person(s) or group(s).

(6)map diagram ,see it in page 33-34.


4. the principles of drawing statistical table.
(1)title—on the top middle. express the main contents of the table, generally includes the time,area and the
event.
(2)line—not too much lines, generally includes 3 lines such as top line, secondary line, bottom line,you need
adding another line before the bottom line when you have “total”.
(3)attributes on the left in simple table, and on the left and top-middle in combined table;indices should write
under the second attributes.
(4)figures in table : write in Arabic numerals in table, specificate the same decimal for one index, not leave
blank in table, fill in “0” if it is, if absent using “…”, if the value not exit using “—”.
(5)notes:if some figures need explaining, label as “*” ,meanwhile explain it’s meaning on the bottom of the
table.
5. the principles of drawing statistical graph.
(1)title lies on the bottom middle, if there are many tables in the same paper, you should use Fig1,Fig2,Fig3 etc.
(2) Generally, the ratio of vertically and horizontally is 5:7 in bar graph, histogram, scatter graph. beginning
from “0” on ordinate,when necessary using “//” to cut off.
(3) write units of attributes on X-axis and Y-axis.

(4) if there are 2 or more than 2 attributes, should use different lines or different colors to distinguish them, at
the same time append cutline to illuminate .

the teaching of practice (2 hours)

Emphasis:
1.master the categories of statistical table and statistical graph, and they are used to what kind of data.
2.master the principles of drawing statistical table and statistical graph.
Contents:
1.the categories of statistical table include Simple table and Combined table; statistical graph includes Bar
graph, histogram, proportion graph, line chart, scatter diagram, map diagram, etc.
2. what kind of data the statistical graphs applied to? We should select different graph for different study
objective and different type of data.
3.the principles of drawing statistical table and statistical graph.
4.drawing statistical table and statistical graph for a given data

Choose suitable table or graph to describe the following data.


1.In the second national health service survey, we find that:
63.84% urban women delivery a baby in hospital,20.76% in maternal and child health service station, 7.67% in
township hospital, 7.77% in others places;For rural women, 20.38% of them give a baby to birth in hospital,
4.66% in maternal and child health service station, 16.38% in township hospital, and 58.58% in others places.
2. the mortality of three causes of death some area in 1952 and 1992(1/100,000)

Causes of deaths 1952 1992


tuberculosis 165.2 27.4
heart diseases 72.5 83.6
tumor 57.2 178.2

Chapter 7 standard error and the estimation of parameters

The teaching of theory (6 hours)

Objective:
1. Master the meaning and calculation of standard error of means and proportions
2. Mater the difference between standard deviation of means and standard error of means
3. Master the meaning of the limits of desired confidence, especially
95% limits of desired confidence.
4. Know well the applications and uses of the SE of Mean and proportions
5. Know well the calculating process of Confidence Interval of population mean and proportion.

Emphasis:
1.Master and comprehend the meaning of standard error of means and proportions.
2.master and comprehend the meaning of the limits of desired confidence.

Difficulty:
How to use different equation to estimate the confidence interval of means and proportion.

contents:
1. Standard error of mean and proportion are important units that measures chance variation.
Whatever the sampling procedure or care taken while selecting the sampling, the sampling estimates of
statistics will differ from population parameters, because of chance error or biological variability.
2. They are measurements of chance variation and sampling error. which reflects the difference of sample
means or proportion and population means or proportion So don’t regard it as error or mistake.
3. Calculation of standard error of means.
⑴To calculate the SE, find the mean (μ) of the sample means and then the differences of individual means
from this grand mean. Use the following formula:

S =
∑( X − µ) 2

n −1

⑵Usually only one large sample is drawn and its standard deviation is calculated. Then SE of mean is
calculated by the following formula:
S
SX = n
⑶then SE and SD are combined closely by the above formula.
4. Applications and uses of the SE of Mean
⑴Firstly to work out the limits of desired confidence within which the population mean would lie.
⑵Secondly to determine whether the sample is drawn from a known population or not when its mean is
known.
⑶Finally to calculate the desired confidence limits, that is to say, to estimate the population parameters.
5. estimation of the limits of desired confidence of population means.
⑴Firstly t distribution method, we can use below formula on condition of population unknown and sample
size is small.

( x − t 0.05 / 2,v s x , x + t 0.05 / 2,v s x)


⑵Secondly normal distribution method, there are two states. One is when the population SD known and
sample number enough, according to standard normal distribution method, we can resort to following formula:
x ± zα / 2 s x
⑶the other is the population SD unknown and sample number enough (n>50), according to standard normal
distribution method, we can resort to following formula: x ± 1.96 s x
6.calculation of standard error of proportion can be taken by such formula:

p×q
SEP =
n
7.application and uses of SEP
⑴to find confidence limits of population proportion when the sample proportion is known.
⑵to determine if a sample is drawn from the known population or not when the population proportion is
known.
⑶to find the standard error of difference between two proportion to judge their statistics significance.
8.calculate the standard error of difference between two proportion denoted as SE(p1-p2), we can use the
following formula:
p1 q1 p 2 q 2
SE ( P1 − P2 ) = +
n1 n2
the teaching of practice (3 hours)

Emphasis:
1.mater the difference between standard deviation of means and standard error of means.
2.master and comprehend the calculating process of the limits of desired confidence.
3.master the meaning of the limits of desired confidence interval.

Contents:
1, thinking and answering? Try to summarize the difference between 95% normal limits and 95% confidence
limit.(hints: from meaning ,formula, and application)
2 :Calculate and analysis of the data:
The total cholesterol (mmol/L) from 50 male adult between 40-50 as follows:
4.47 3.37 6.14 3.95 3.56 4.23 4.31 4.71 5.69 4.12 4.56 4.37 5.39 6.30 5.21 7.22 5.54 3.93 5.21 6.51
5.18 5.77 4.79 5.12 5.20 5.10 4.70 4.74 3.50 4.69 4.38 4.89 6.25 5.32 4.63 3.61 4.44 4.43 4.25 4.03
4.50 4.25 4.03 5.85 4.09 3.35 4.08 4.79 5.30 4.97
(1) Calculate the SE
(2) Estimate the population means 95% and 99% confidence limits, and compare the difference between them
and explain it.
3 if typhoid mortality from a sample of 100 is 20% and that of another sample of 100 it is 30%, find the
standard error of difference between two proportion.

Chapter 8 Design of experiment and sampling techniques in a survey.

The teaching of theory (4 hours)

Objective
1.Know well the process of experimental study.
2.Master the essential factors and basic principles of design of experiment.
3.Master the methods of design of experiment, such as paired design, completely random design, randomized
block design, etc.
4.Master the sampling techniques in a survey.
5.understand the methods of Multistage sampling and Multiphase sampling.

Emphasis:
1.Master the essential factors and basic principles of design of experiment.
2.Master the methods of design of experiment, such as paired design, completely random design, randomized
block design, etc.
3.Understand experimental error and how to reduce or eliminate experimental error.
4.Master the sampling techniques in a survey.

Difficulty:
1.Three essential factors and four basic principles of design of experiment.
2.How to control experiment error.
Contents:
1.The process of design of experiment.
(1) Definition of the problem—Definition of the problem you intend to study.
(2) Aims and objective—Definition of the aims and objective of the study.
(3) Review of literature—Critically review the literature on the problem under study.
(4) Hypothesis—State your hypothesis or assumption about the problem.
(5) Plan of action—Prepare an overall plan or design of your study. Steps of the plan:
Definition of population under study; Selection of sample;Specifying the nature of study;Ruling out the
observer and instrument error;Recording of data;Work schedule.
2.Three important elements of design of experiment including study subjects, treatment (study factor),
experimental effect.
(1)Study subjects are the units that the treatment applied to.
(2)Treatment is the specific experimental condition which applied to the study subjects.
(3)Experimental effect is a measured characteristic after treatment applied to the study subjects.
3.Four principles of design of experiment including control, randomization, replication and equilibrium.
4.The common methods of design of experiment.
(1)Paired design: paired two study subjects according to the main factors those will not be probed in our study
then random allocation the two study subjects of every pair into control group and trail group.
(2)Completely random design—random allocation the homogeneous study subjects into multi- trail groups.
(3)Randomized block design—divided the study subjects into different blocks according to the main factors
those not be probed in our study then random allocation the study subjects of every block into trail groups.
5.The sampling techniques in a survey.
(1)Simple random sampling: A sampling procedure that assures that every object in the population has an equal
chance of being selected. The method is applicable when the population is small, homogeneous and readily
available.
(2)Systematic sampling: From the sampling frame, a starting point is chosen at random, and thereafter at
regular intervals,Suppose that the N units in the population are numbered 1 to N in some order. To select a
systematic sample of n units, if K≈N/n then every unit is selected commencing with a randomly chosen
number between 1 and k..
(3)Stratified sampling: the whole population is divided into several subgroups or strata and then units are
selected randomly from each stratum.
(4)Cluster sampling: the entire population is divided into groups, or clusters, and a randomly selected several
clusters from them, then all observations enveloped in the selected clusters will be our study objects.
(5)Multistage sampling: this method refers to the sampling procedures carried out in several stages using
random sampling techniques. This is employed in large country surveys. In the first stage, random numbers
of districts are chosen in all the states, followed by random numbers of villages and units respectively.
(6)Multiphase sampling: Part of the information is collected from the whole sample and part from the
subsample.
6.experimental error.
(1)Systematic errors are statistical fluctuations in the measured data due to the precision limitations of the
measurement device.
(2)Random errors are statistical fluctuations in the measured data due to some incidental or uncontrolled
factors.
The teaching of practice (2 hours)

Emphasis:
1.Know well the procedure of experimental study.
2.Master the essential factors and basic principles of design of experiment.
3.Master the methods of design of experiment, such as paired design, completely random design, randomized
block design, etc.
4.Master the sampling techniques in a survey.

Contents:
1.what are the essential factors in an experimental study.
2.How to use design methods such as paired design, completely random design, randomized block design in
practice.

Exercise:
1.Dose salted drinking water affect blood pressure of mice? Please point out the study subjects, treatment
(study factor), experimental effect in the experiment.
2.According above, if provided 20 mice and water containing 1% NaCl, how to design this experimental study?

Chapter 9 significance of difference in means(testing statistical hypothesis)

The teaching of theory (6 hours)

Objective
1. know well the objective and principle of testing statistical hypothesis.
2. master the methods of testing statistical hypothesis under different designed data.
3.master the basic process of testing statistical hypothesis.
4. master the typeⅠ,type Ⅱerror and the meaning of power of a test.
5. master the Criteria of applying different methods of statistical test.
6. understand the association between CI and statistical test.
7. understand the normality test and variance equity test.

Emphasis:
1. master the basic process of testing statistical hypothesis.
2. master the methods of testing statistical hypothesis for different designed data, such as t-test, Z-test, etc.
3.master the Criteria of applying different methods of statistical test.
4.master the typeⅠ,type Ⅱerror and the meaning of power of a test.

Difficulty:
1.principle of testing statistical hypothesis.
2. the meaning of null hypothesis(H0)

contents:
1. What’s testing statistical hypotheses? The process or methods to infer the population parameter is same or
not according to the sample’s data.. using a example to show the objective of testing hypothesis.
2. The Principle of testing hypothesis: we suppose population parameters are same firstly (null hypotheses),then
using the sample’s data to calculate the testing statistic, and using it to judge the probability of null
hypotheses is true. If the probability is very large, we can accept the null hypotheses; if the probability is very
small(generally P<0.05 or 0.01),we can reject it.
wo hypothesis in testing: There are two probable reasons that make the difference between means of the
sample and the population:①the sample came from the known population, i.e, the difference is due to
chance.②the sample not came from the known population, but from another population, the difference is not
due to chance, but they are different from each other in fact.
Corresponding to the two reasons, we have two hypothesis:H0 (called as null hypothesis ):stating that
hypotheses of no difference between the sample’s mean and the known population mean is same. If the
hypotheses is true, we can infer that the present difference between the sample’s mean and the known
population mean is due to chance or sampling error. Another hypothesis is H1 (called as alternative
hypothesis) stating that the sample’s mean is different from the known parameter(the population mean is not
same). If H1 is true, we can infer that the present difference between two means is existing in fact, not only
due to sampling error.
4. the process of testing statistical hypothesis.
①establish hypotheses and the level of significance;②choose suitable method for testing, and calculate the
testing statistics, t-test、Z-test、F-test for quantitative data, χ2-test, or Z-test for qualitative data, and so
on.;③Judge the P value and infer the conclusion. Use t-test as a example: if |t|≥t(α,ν),P≤α,reject
H0,accept H1,
At the level of α, we can draw the conclusion that the difference is statistically significant; if |t|<t(α,ν) ,
P>a , not reject H0 , At the level of α, we can draw the conclusion that the difference is not significant, only
due to chance.
5 different methods of testing for different designed data.
① one single sample’s test ② Paired sample’s test ③Two independent samples’ test
X − µ0 x X1 − X 2
t= , ν = n -1 t= , ν = n -1 t= ,ν = n1 + n 2 − 2
Sx Sx S x1 − x2
X − µ0 x
t= t= 1 1  ( n1 −1) S12 + ( n2 −1) S 22
Sx / n S x1 −x2 = S c2 
n + n , S C =
2
S/ n  1 2  n1 + n 2 − 2

When the size of samples are larger than 30,we should use Z-test for different designed data. we should pay
attention to select different methods or formula according to different conditions:Type of data, design of data,
the size of sample.
6. Criteria for Applying t-test: ①Random sample; ②Quantitative data; ③Variable normally distributed. ④
population Variances should have homogeneity in different samples. In generally if Sample size less than 30,
we often applying t-test, otherwise we can use z-test(for large samples).
7. the typeⅠ,type Ⅱerror and power of a test.
When hypotheses H0 is true, but it is rejected in our sample, we maybe commit typeⅠerror, If α=0.05 , we
maybe commit this type of error 5 times out of 100 samples theoretically; When hypotheses H0 is false, but it
is accepted in our sample, we maybe commit typeⅡerror. The probability of typeⅡerror is β, it is unknown
usually. But in general, when α increase, β will decrease; power of a test: If the difference of two means is
exit in fact , the ability of we can find out the difference through the testing at a level, the power of a test is
noted as 1- β. if 1-β=0.9 , we can have 90 times of conclusion that the difference is statistically significant
out of 100 times testing.
8. the association between CI and testing hypothesis: We can also use confidence interval to test the
significance of difference between means, but the confidence interval can not give us the concrete P value(see
it in book).
9. the normality test (see it in chapter14:computer software for analyzing of data)and variance equity test or
variance ratio test(see it page 151 in textbook).

the teaching of practice (3 hours)

Emphasis:
1.master the basic process of testing statistical hypothesis.
2. master the methods of testing statistical hypothesis for different designed data.
3.master the Criteria of applying different methods of statistical test.

Contents:
1.the process of testing statistical hypothesis.①establish hypotheses and the level of significance;②choose
suitable method for testing, and calculate the testing statistics;③Judge the P value and infer the conclusion.
2. different methods of testing for different designed data.
① one single sample’s test ② Paired sample’s test ③Two independent samples’ test

X − µ0 x X1 − X 2
t= , ν = n -1 t= , ν = n -1 t= ,ν = n1 + n2 − 2
Sx Sx S x1 − x2

When the size of samples are larger than 30, Z-test will be applied. we should pay attention to select different
formula or methods or formula according to different conditions: Type of data, design of data, the size of
sample.
3. Criteria for Applying t-test: ①Random sample; ②Quantitative data; ③Variable normally distributed. ④
population Variances should have homogeneity in different samples. In generally if Sample size less than 30,
we often applying t-test, otherwise we can use z-test(for large samples).
4. review the basic conceptions in testing statistical hypothesis using examples.
When hypotheses H0 is true, but it is rejected in our sample, we maybe commit typeⅠerror; When
hypotheses H0 is false, but it is accepted in our sample, we maybe commit typeⅡerror. The probability of
typeⅡerror is β, β is unknown usually. in general, when α increase, β will decrease; power of a test: If the
difference of two means is exit in fact , the ability of we can find out the difference through the testing at a
level, the power of a test is noted as 1- β.
5. doing exercises list as below:
(1)A lots of study shows: the mean bi-pate diameter (BPD) of normal male neonate is 9.3 cm. Now a doctor
investigated 12 normal male neonates from a mountainous area, their BPD recorded as following: 9.95 9.33
9.49 9.00 10.09 9.15 9.52 9.33 9.16 9.37 9.11 9.27. Test whether BPD of male neonate in the mountainous area
is more than general neonate.

(2) In a clinical trial to assess the value of new tranquilliser on psychoneurotic patients with each patient being
given a week’s treatment with the drug, the drug was considered effective if it lowered anxiety score after
treatment, Test the efficacy of drug on the following results.
Before treatment: 22 18 17 19 22 12 14 11 19 7
After treatment: 19 11 14 17 23 11 15 19 11 8
(3)Blood glucose level of pigeons is known to be higher than that of rabbits. Prove it by applying proper
statistical test to the following data.
No Blood glucose level per 100 ml
Pigeons . rabbits
1 200 145
2 186 125
3 176 100
4 184 112
5 170 127
6 172 139
7 170 151
8 163 140
9 176 159
10 173 132

Chapter 10 Analysis of Variance

The teaching of theory (4 hours)

Objective
1.master the application of Analysis of Variance (ANOVA) or F test.
2.criteria for applying ANOVA.
2.know well the principle of analysis of variance.
3.master the process of analysis of variance.
4.know well the comparisons between any two means applying q test(Newman-Keuls methods)or Dunnett-t
test
5.understand transformations for variable when analyzing of data.

Emphasis:
1.master applications of Analysis of Variance(ANOVA).
2.master criteria for applying Analysis of Variance.
3.master the process of analysis of variance.
4.know well the principle of analysis of variance.

Difficulty:
Division of variance in data and the principle of Analysis of Variance

contents:
1. applications of Analysis of Variance
(1) In general, we use F-test to compare three or more than three means, to find the difference among them
is significant or not.
(2) analysis the interaction between two factors or more than two factors.
(3) Applied to test regression equation.
(4) used for Variance Ratio test (P151).
2 criteria for applying Analysis of Variance.
(1)All the samples is independent;
(2)All the samples came from normally distributed population.

(3) the population variance of the samples is equal, i.e. σ 1 = σ 2 = ... = σ n


2 2 2

The principle of ANOVA:


Divided the total variation into two parts of variation such as between-classes and within classes. What’s
total variation? Noted as SStotal,it is the sum of squared deviation of x from mean.
between-class variation noted as SSbetween,this part of variation affect the effect of treatment factors,it is
calculated by the sum of squared deviation of sample mean from total mean.
Within group variation noted as SSwithin,this part of variation affect the size of random error(individual
variation and measurement error) , it is sum of squared deviation of x from the mean of it’s class:

SS total = ∑(xij − x )2 , SS between = ∑ n(


i x i − x ),
2
SS 组内 = ∑(xij − xi)2

SS total = SS between + SS within ,ν total = ν between + ν within ,ν total = N − 1,ν between = K − 1,ν within = N − K

SS MS between
MS = ,F =
ν MS within
For express the variation of every part more reasonably, we should use the equation listed below:

If the treatment haven’t produced the effect, MS between ≈ MS within , F ≈ 1

If the treatment factor did product the effect , MS between > MS within , F > 1

When the effect of treatment is larger, variation between classes is larger too,then F value will be larger than 1

more visible, MS between >> MS within , F >> 1 exceed which limit the difference will be statistically significant?
We can use F table to infer the conclusion:

If F≥Fα (ν1,ν2),P≤α,we can think that the treatment factors produced the effect;
if F<Fα (ν1,ν2),P>α, we can think that the treatment factors did not product the effect.
4. the process of analysis of variance:
(1)establish hypotheses and the level of significance.
H0: all the population mean is same; H1: all the population mean is not same completely.
(2) choose suitable method for testing, and calculate the testing statistics: apply F-test, we should calculate the

basal data firstly, such as ∑ x, ∑ x 2


, x , s ,and so on, then calculate SS,MS of every part, finally work out F,

judge the size of P and draw the corresponding conclusion.


5.comparisons between any two means applying q test(Newman-Keuls methods)or Dunnett-t test for
many means from experiments group compared with that of the control group.
X − XB XA − XB
q= A = , X − XC XT − XC
S X A−XB MS within 1 1 tD = T =
( + ) S XT − XC 1 1
2 n A nB MS within( + )
nT nC
6. transformations for variable, when our data are not meet the demand for applying the methods mentioned
above, we can consider some certain transformation for original data, such as transformation of logarithm,
square root transformation, arcsine transformation, etc.

the teaching of practice (3 hours)


Emphasis:

1.master applications of Analysis of Variance(ANOVA).


2.master criteria for applying Analysis of Variance.
3.master the process of analysis of variance and q test or Dunnett’t test between two means.

Contents:
1. applications of Analysis of Variance
(1) In general, we use F-test to compare three or more than three means, to find the difference among them
is significant or not.
(2) analysis the interaction between two factors or more than two factors.
(3) Applied to test regression equation.
(4) used for Variance Ratio test
2.criteria for applying analysis of variance.
(1)All the samples is independent;
(2)All the samples came from normally distributed population.
(3) the population variance of the samples is equal.
3. the process of analysis of variance:

calculate the basal data firstly, such as ∑ x, ∑ x 2


, x , s , then calculate SS, then MS, F according to the

equation listed below, judge the size of P, if we got P≤α,we can draw the conclusion that the means are
different significantly, we can compare any two means using Newman-Keuls q test or Dunnett’t test.
SS MS between
MS = ,F = , ifF ≥ F(ν 1 ,ν 2 ) , P ≤ α
ν MS within

4 . do exercises: inoculate mouse with vaccine of typhoid and chincough after they were infected by
poliomyelitis, recorded the survival days of these mouse. The vaccine infect the survival days significantly?
typhoid chincough Control group
5 6 8
7 6 9
8 7 10
9 8 10
9 8 10
10 9 11
10 9 12
11 10 12
11 10 14
12 11 16
Chapter 11 Chi-square test (χ2 test)

The teaching of theory (6 hours)

Objective
1. know well the characters of χ2 distribution.
2. master the applications of χ2 test and the principle of χ2 test.
3. master the χ2 test for completely random designed data of fourfold table and R×C table and the condition of
applying them.
4. master the χ2 test for paired designed data of fourfold table.
5. understand the method of exact probability and the method of χ2 division.

Emphasis:
1. master the applications of χ2 test and the principle of χ2 test.
2. master the χ2 test for completely random designed data of fourfold table and R×C table and the condition of
applying them.
3. master the χ2 test for paired designed data of fourfold table.

Difficulty:
the principle of χ2 test and the characters of χ2 distribution.

contents:
1. χ2 distribution is one probability of continuous random variable. It originates from standard normal
distribution, if Z2 corresponds to χ2 distribution with df of 1, there are k independent standard normal
distributionsZ1,Z2,Z3,Z..Zk, and the Z12 Z22….will form series of χ2 distribution curve with df of ν.
①χ2 is more than 0, its value varies from 0 to +∞; ②the shape of χ2 curve depends on degree of freedom,
when ν is small, its curve illustrates positive abnormal, when ν is larger, its curve tenders to normal
distribution;③when ν=1, its distribution corresponding to standard normal distribution.
2. the applications of χ2 test: ①find the difference between or among Proportions or rates from independent
groups. ②Association of two variables or attributes. ③Goodness of fit for one certain distribution.
3.master the principle of χ2 test: when the study factors produce effect, the actual frequency(A) should be very
different from theoretical frequency(T),because calculation of T under the surpose of H0 is true),therefore, the χ
2 value should be very large, when the value larger than T ν ,we can infer that the two rates or proportion is
different significantly. contrariwise, when when the study factors have not produce effect, the actual
frequency(A) should be very near from theoretical frequency(T), the χ2 value would be very small, when the
value less than Tν ,we can infer that the two rates or proportion is not different significantly.
4. χ2 test for completely random designed data of fourfold table and R×C table and their condition:
①Fourfold table data
(A −T )2 ( ad − bc ) 2 × (a + b + c + d )
χ =∑
2
, χ2 =
T (a + b)(c + d )(a + c)(b + d )
( A − T − 0.5) 2 ( ad − bc − N / 2) × N
2

χ =∑
2
,χ =
2

T ( a + b)(c + d )(a + c)(b + d )

(a + b)!(c + d )!(a + c)!(b + d )!


P=
a!b! c! d ! n!
② R×C table data

A2
χ 2 = n(∑ − 1)
, v = (r − 1)(c − 1)
ni m j

The condition of applying the formula above :①no cell T<1;②those cells with 1≤T≤5 not more than 1/5 of
total cell.
After H0 is rejected ,only showing that all the population rates is different in general, more details about any
two rates should use method of χ2 division in further.
5. χ2 test for paired designed data of fourfold table: The same object getting two different methods to check,
The final results can sorted as a crossed table.

. A method
B method + -- total
+ a b a+b
-- c d c+d
total a+c b+d a+b+c+d
(b − c) 2 ( b − c − 1) 2
χ = 2
(b + c ≥ 40); χ =
2
(b + c < 40)
b+c b+c

for this kind of data, we can use the former χ2 test formula to test the correlation of two variables or attributes.

the teaching of practice (4 hours)

Emphasis:
1.master the χ2 test for completely random designed data of fourfold table and R×C table and the condition of
applying them.
2. master the χ2 test for paired designed data of fourfold table.

Contents:
1.the applications of χ2 test: ①find the difference between or among Proportions or rates from independent
groups. ②Association of two variables or attributes. ③Goodness of fit for one certain distribution.
2 χ2 test for completely random designed data of fourfold table and R×C table and their condition:
①Fourfold table data
The 2condition of) 2applying the two formula N ≥40 and T≥5, and when 1≤T<5,we should use adjusted
(A−T ( ad − bc ) 2 × (a +above:
b + c + d)
χ
formula∑= for chi-square
T
, χ 2
=
test: (a + b)(c + d )(a + c)(b + d )
when N<40 or T<1, we should calculate exact probability.
( A − T − 0.5) 2 2 ( ad − bc − N / 2) 2 × N
②χR×C =∑
2
table data ,χ =
T ( a + b)(c + d )(a + c)(b + d )
The condition of applying the formula above :①no cell T<1;②those cells with 1≤T≤5 not more than 1/5 of
2
A
total

χ 2 =cell.
n( After H0 −is1)
about any twonirates
rejected
, v = (r ,only
− 1)(cshowing
− 1) that all the population rates is different in general, more details
m j should use method of χ2 division in further.
3. χ2 test for paired designed data of fourfold table: The same object getting two different methods to check,
The final results can sorted as a crossed table.
b − cof
for2 this(kind ) 2data, we can use the ( b − cχ2− test
former 1) 2 formula to test the correlation of two variables or attributes.
χ = (b + c ≥ 40); χ = 2
(b + c < 40)
b+c b+c
4.do exercise list as below:
(1)Some workers in a mineral powder plant have got occupational dermatitis. In order to keep their health,
there is new exposure suit made, to test its effectiveness, 15 workers are selected randomly to dress new
exposure suit, others still use former suit. The data are showed as follows, please compare whether there is
difference between these two groups.
Occupational dermatitis prevalence of those two type of exposure suit
Type of suit occupational dermatitis total Prevalence rate(%)
Positive number Negative number
new 1 14 15 6.7
former 10 18 28 35.7
total 11 32 43 25.6

(2)Officers of FDA want to examine aflatoxin polluting of peanut from 3 areas. The results are provided as
follows, try to compare if aflatoxin polluting rate of these three areas are different?

Comparison of aflatoxin polluting rate of these three areas


area Number of samples total Polluting rate(%)
polluted No polluted
A area 6 23 29 79.3
B area 30 14 44 31.8
C area 8 3 11 27.3
Total 44 40 84 47.6
(3) Using two methods to check 120 patients of galactophore cancer, one method find out 60% out of all the
patents is positive, another method find out 50% is positive, and both two methods find out 35% is positive at
the same time, then find the association between the two methods? Which method is more effective?

Chapter 12 significance of difference in proportions of large samples

The teaching of theory (3 hours)

Objective
1.master application and calculation of standard error of proportion(SEP).
2.master the methods of hypothesis testing for rates or proportions from large samples.
3.know well standard error of difference between two proportions, SE(P1-P2).
4.understand the association between chi-square test and Z test for comparing rates or proportions from large
samples

Emphasis:
1.master application and calculation of standard error of proportion(SEP).
2.master the methods of hypothesis testing for rates or proportions from large samples

Difficulty:
The meaning and calculation of standard error of difference between two proportions.

contents:
1. the meaning and calculation of standard error of proportion(SEP)
Standard error of proportion may be defined as a unit that measures variation which occurs by chance in the
proportion of a character from sample to sample or from sample to population. It should be calculated as:

pq
SEP = ,q = 1− p
n
2.Applications of SEP:
(1) to find confidence limits of population proportion(P) when the sample proportion(p) is known. See it in
chapter 7.(2)to determine if a sample is drawn form the unknown population or not when the population
proportion P is known.(3) to find the standard error of difference between two proportions.(4) to find the size of
sample.
3. the meaning and calculation of standard error of difference between two proportions, SE(P1-P2).
The differences in the pairs of proportions or percentages of samples drawn from the same population are also
normally distributed as was seen in case of deference between two means.
1 1
SE ( p1− p 2 ) = PQ ( + )
n1 n 2

P and Q are combined percentages of positive and negative characteristics in both the samples.
3. the methods of hypothesis testing for rates or proportions from large samples.
(1) hypothesis test for a proportion from one large sample’s and an known population proportion:
p−P p−P
Z= =
SEP pq
n
(2)hypothesis test for two proportions from two large independent samples:
In actual practice, we do not know the value of population proportion and we have only two samples. So we
have to substitute the value noticed in one sample in place of P and compare it with that of the other, the
assumptions are:①n1 and n2 are large ② samples are selected at random. The significance of difference is
found by normal deviate, Z test:

p1 − p 2 p1 − p 2
Z= =
SE ( p1− p 2 ) 1 1
PQ( + )
n1 n2

4. the association between chi-square test and Z test for comparing rates or proportions from large samples.
When we compare two rates or proportions from large samples, If the degree of freedom is 1, we can got the
correlation between chi-square test and Z test as χ2 =(Z)2, this is determined by two distributions of chi-
square and Z distribution.

the teaching of practice (2 hours)

Emphasis:
1.master the calculation of standard error of proportion(SEP) and standard error of difference between two
proportions, SE(P1-P2).
2.master the methods of hypothesis testing for rates or proportions from large samples.

Contents:
1. hypothesis test for a proportion from one large sample’s and an known population proportion:
p−P p−P
Z= =
SEP pq
n
2. hypothesis test for two proportions from two large independent samples:
In actual practice, we do not know the value of population proportion and we have only two samples. So we
have to substitute the value noticed in one sample in place of P and compare it with that of the other, the
assumptions are:①n1 and n2 are large ② samples are selected at random. The significance of difference is
found by normal deviate, Z test:

p1 − p 2 p1 − p 2
Z= =
SE ( p1− p 2 ) 1 1
PQ( + )
n1 n2

3. do exercises:
(1) In a locality with 1000 unprotected population, 8 percent died of smallpox in a specified year. Of the
unprotected 250 were vaccinated and only 12 of them died in following year. The vaccinator claimed that
vaccination was responsible for reducing the mortality in the vaccinated population. Justify his claim
(2) In an epidemiological study of diabetes in urban and rural population of Ahmedabad district, the following
data was obtained. Compute the prevalence in the areas and determine if the results differ statistically.
You can applying chi-square test and Z test meanwhile, and testify the conclusion the association between the
two statistic values of testing.
Area Diabetes No diabetes Total
Rural 45 3450 3495
Urban 107 3409 3516
Total 152 6859 7011

Chapter 13 nonparametric statistics (6 hours)

The teaching of theory

Objective
1.master the conception and conditions of applying nonparametric methods in statistics.
2.master the methods of Rank Sum Test from different designed data.
3.know well the principle of Rank Sum Test.
4.understand the methods of comparison between any two groups

Emphasis:
1.master the conception and condition of applying nonparametric methods in statistics.
2.master the methods of Rank Sum Test from different designed data.

Difficulty:
the principle of Rank Sum Test.

contents:
1.nonparametric : variables are not based on any assumption or distribution, we just only infer the population
distributions from samples are same or not significant in statistics.
Applying nonparametric test under the conditions listed below mainly:
(1) quantitative data not normally distributed.
(2) the distribution are not made certain.
(3).data from samples of ordinal categories.
(4) the population variance are not equal.
2. the principle of Rank Sum Test: when the two population is consistent, the rank sum from two samples
should be very near after all the values listed by ascending order. contrariwise, when the two population is not
same, the rank sum from two samples should be very different apparently, we can infer that the two means or
proportion is different significantly.
3. the methods of Rank Sum Test from different designed data.
(1) one single sample’s test:
calculate the difference between every value and known population median and list them by ascending order,
sum up the ranks of positive and negative difference respectively . record them as T+ and T- ,select the smaller
one as statistics of testing, then infer the conclusion using the boundary T table.
(2) Paired sample’s test:
calculate the difference between every paired data and list them by ascending order, sum up the ranks of
positive and negative difference respectively . record them as T+ and T- ,select the smaller one as statistics of
testing, then infer the conclusion using the boundary T table. Pay attention to calculate the mean rank for those
values who have the same rank.
(3)Two independent samples’ test:
list all the value from samples by ascending order, sum up the ranks of two groups respectively . record them as
T1 and T2 . if the size from two samples are same, select the rank sum of smaller sample as statistics of testing;
if the size from two samples are not same , select the smaller rank sum as statistics of testing, finally infer the
conclusion using the boundary T table. Pay attention to calculate the mean rank for those values who have the
same rank too.
(4) Three or more than three independent samples’ test:
list all the value from samples by ascending order, sum up the ranks of two groups respectively, pay attention to
calculate the mean rank for those values that have the same rank. Record them as R1 ,R2 , R3…Rn and so on.
Then calculating the H statistics of testing using the following equation, finally infer the conclusion using the
boundary H table.

12  Ri2 
H= ∑  − 3( N + 1)
N ( N + 1)  ni 

4. comparison between any groups.
When we got the conclusion of H0 is rejected , only showing that all the population distributions are different
in general, more details about any two population should use method of comparison between any two samples
in further.
Ri − R j Ri − R j
Z ij = =
σ R −R N ( N + 1) 1 1
i j
( + )
12 ni n j

the teaching of practice (2 hours)

Emphasis:
1.master condition of applying nonparametric methods.
2.master the methods of Rank Sum Test from different designed data.

Contents:
1. the condition s of applying nonparametric test :(1) quantitative data not normally distributed;(2) the
distribution are not made certain;(3).data from samples of ordinal categories;(4) the population variance are not
equal, etc.
2. the methods of Rank Sum Test from different designed data:(1) one single sample’s test; (2) Paired sample’s
test;(3)Two independent samples’ test;(4) Three or more than three independent samples’ test. Go over the
process of analysis method, take notes for list the values by ascending order, and pay attention to calculate the
mean rank for those values that have the same rank.
3.finish following exercises:
(1) to find the efficacy of long running on the function of heart, 15 male students are sampled randomly,
measured their pulse rates before running, and measured again after long running during 5 months, the datum
are as below, find the long running effect the pulse rate significantly?
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Before training 70 7 5 6 63 56 58 60 67 65 75 66 56 59 72
6 6 3
After training 48 5 6 6 48 55 54 45 50 48 56 48 62 49 50
4 0 4

(2) measured the following data from two groups, find the Pb in blood from two group of works is different or
not sifnificantly?
Worker in Pb environment: 0.82 0.37 0.97 1.21 1.64 2.08 2.13
Worker not in Pb environment:0.24 0.24 0.29 0.33 0.44 0.58 0.63 0.72 0.87 1.01

Chapter 14 linear correlation and regression

The teaching of theory (6 hours)

Objective:
1.know well drawing the scatter diagram and the types of correlation.
2.master analysis of linear correlation and regression, and their process of analysis.
3.master the condition of linear correlation and regression.
4.know well the application of regression equation.
5.master the association between linear correlation and regression.
6.understand spearman’s rank order correlation and the types of linear correlation

Emphasis:
1.master analysis of linear correlation and regression, and their process of analysis.
2.master the condition of linear correlation and regression.
3.master the association between linear correlation and regression.

Difficulty:
the hypothesis of r and b.

contents:
1. the process of linear correlation analysis.
(1) drawing scatter diagram.
the types of correlation: Perfect correlation; Moderate correlation; Absolutely no correlation
(2)the calculation of correlation coefficient r

L XY − −
∑( X − X )(Y − Y )
r= =
L XX LYY − − − −
∑( X − X )( X − X )∑(Y − Y )(Y − Y )

(3)the hypothesis of r:First, make hypothesis, such as H0: ρ=0 H1: ρ≠0 α = 0.05
Then, calculate the statistics of t; regard r ( 0.5 ) as statistics v = n-2
calculate t (n ≤ 50) value or z value( n>50)

r ( n − 2) z = r (n − 1)
t=
(1 − r )
2

finally, according to boundary r value and infer the size of p value, draw the conclusion in statistics.
2. the process of linear regression analysis.
(1)According to the known data, draw a scatter diagram to display the relationship between these two sets of
results.

(2)Calculation of b and write out the regression equation. Yˆ = a + bX

∑ X ∑Y
L XY ∑ ( X − X )(Y − Y ) ∑ XY − n
b= = = a = Y − bX
L XX ∑ ( X − X )(Y − Y ) ∑X∑X ,
∑ XX − n
(3) The hypothesis of regression equation.
the hypothesis of regression equation is same to that of correlation, so we can substitute the hypothesis of
correlation for that of regression equation.
(4)the condition of correlation and regression:
for linear correlation: continuous data or quantitative data; Associated variable are normally distributed
for linear regression: the variables Y must follow normal distribution, but X can be measured precisely and
controlled strictly.
3. the application of regression equation.
(1)Describing the dependent relationship between two variables.
(2)Making use of the Regression equation to make forecast.
4. association between linear correlation and regression.
(1)from the type of data, For regression, the variables Y must follow normal distribution, the X can be
measured precisely and controlled strictly. For correlation, the two variables(X, Y) must follow normal
distribution.
(2)form the applications, regression descriptive numerical relationship, but correlation only explain the degree
and direction of relationship.
5.spearman’s rank order correlation.
(1)The condition of spearman’s rank order correlation.
The two variables doesn’t follow normal distribution, or neither of the measurement the two variables belong to
order data.
(2)The calculation of coefficient of spearman’s rank order correlation.
6∑ d 2
rs = 1 −
n(n 2 − 1)

the teaching of practice (3 hours)

Emphasis:
1.master the process of linear correlation analysis, especially, calculation of correlation coefficient of r and it’s
hypothesis testing.
2. master the process of linear regression analysis, especially, calculation of regression coefficient of b and a,
hypothesis testing of b.

Contents:
1.the process of linear correlation analysis: drawing of scatter diagram, calculation of coefficient of correlation,
hypothesis testing of r.
2. the process of linear regression analysis: drawing of scatter diagram, calculation of coefficient of regression,
hypothesis testing of b; work out regression equation and draw then regression line.
3.introduction of using calculator to do linear correlation and regression.
The step of the use of calculator:①Select mode for calculating;②Clear away the memory of
calculator;③Input the data of X and Y together;④recall the answer such as r, a ,b,etc.
4 doing exercises listed below:
The data of the two variable weight (X) and vital capacity (Y) in 12 female college as following, make
analysis of correlation.
Weight(Kg) 42 42 46 46 46 50 50 50 52 52 58 58
vital capacity(L) 2.55 2.20 2.75 2.40 2.80 2.81 3.41 3.10 3.46 2.85 3.50 3.00
(1)draw a scatter diagram to display the relationship between these two sets of results.
(2)Calculate the coefficient of correlation.
(3)hypothesis of r.
(4)derive the regression equation of Y on X.
(5)hypothesis of b.
(6)draw the regression line.

Chapter 15 computer software for analyzing of data

The teaching of theory (4 hours)

objective
1.know well common software to analysis of data, such SAS(Statistics Analysis System),SPSS(Statistical
Package for the social Science),STATA(Statistics/Graphics/Data management),etc.
2. master the programs to deal with the data we obtain from our research work, locate the main results and draw
the conclusion.
3 know well the criteria to use different testing method of statistical hypothesis, and fin out the criteria of the
data applying SAS program by students themselves.

Emphasis:
1. The SAS programs to analysis of data we obtain from our research work, locate the main results and draw
the conclusion.
2. under different criteria, we should use different method to test the statistical hypothesis; under different
criteria we usually have different conlusion.

Difficulty:
1.Write a right program and edit a program when it is wrong.
2.locate the main result and draw the conclusion correctly.

contents:
1 program and main results for quantitative data
(1) Paired sample’s test ①normality test②t-test for paired designed data
(2) one sample’s test ①normality test②t-test for one sample data
(3)Two independent samples’ test: ①normality test ②equality test of Variances ③t-test for two independent
data
2. program and main results for qualitative data
(1) fourfold table data: Tmin and Continuity Adj. Chi-Square
(2) R×C table data: the criteria of applying R×C χ2 test
3 program and main results for linear correlation and regression
(1) locate coefficient of correlation and ‘P’ value for it’s hypothesis testing
(2) locate coefficient of regression and ‘P’ value for it’s hypothesis testing.
(3) write out the regression equation from the main results: Yˆ = a + bX
(4) compare the hypothesis results of coefficient of correlation and regression: t r = t b

the teaching of practice (6 hours)


Emphasis:
Applying SAS programs to analysis of data, find out the main results and draw the conclusion in further by
students themselves.

Contents:
1.program and main results for quantitative data: (1) Paired sample’s test; (2) one sample’s test; (3)Two
independent samples’ test:
2.program and main results for qualitative data:(1) fourfold table data (2) R×C table data
3.program and main results for linear correlation and regression:(1)locate coefficient of correlation and
regression; (2) compare the hypothesis testing for coefficient of correlation and regression. (3)write out the

regression equation: Yˆ = a + bX
4.do exercises:
(1) A lots of study shows: the mean bi-pate diameter (BPD) of normal male neonate is 9.3 cm. Now a doctor
investigated 12 normal male neonates from a mountainous area, their BPD recorded as following: 9.95 9.33
9.49 9.00 10.09 9.15 9.52 9.33 9.16 9.37 9.11 9.27. Test whether BPD of male neonate in the mountainous
area is more than general neonate.

(2) In a nutritional study, 13 children were given a usual diet plus vitamins A and D tablets while the second
comparable group of 12 children was taking the usual diet. After one year, the gain in weight in pounds was
noted as given in table below, can we say that vitamins A and D were responsible for this difference?
Children on usual diet: 1 3 2 4 2 1 3 4 3 4 3 2 2 3
Children on vitamins: 5 3 4 3 2 6 3 2 3 6 7 5 3

(3) The patients of lymphoma were randomly divided into two groups, respectively treated with single and
compound medication. Get the number of patients getting better as following. Test the two rates are
significantly different or not.
Groups Number of lymphoma patients treated
Numbers of getting better Not-getting better
Single medication 2 10
Compound medication 14 14

(4) Officers of FDA want to examine aflatoxin polluting of peanut from 3 areas. The results are provided as
follows, try to compare if aflatoxin polluting rate of these three areas are different?
Comparison of aflatoxin polluting rate of these three areas
area Number of samples total Polluting rate(%)
polluted No polluted
A area 6 23 29 79.3
B area 30 14 44 31.8
C area 8 3 11 27.3
Total 44 40 84 47.6

(5) During a laboratory experiment muscular contractions of a frog muscle were measured against different
doses of a given drug. The height of the curve was considered as the response to the drug. The observations
were as below.

Serial number of experiment


1 2 3 4 5
Dose of drug 0.3 0.4 0.6 0.8 0.9
Response to drug 54.0 59.0 60.0 65.0 70.0
a. Calculate correlation coefficient and its significance.
b. Determine the regression coefficient ‘b’.
c. Determine the expected value of Y for the given values of X using regression equation Yˆ = a + bX .

You might also like