Professional Documents
Culture Documents
Measures of Spread
Range: minimum to maximum values (Interquartile range, median of top / bottom ÷ 2)
Variance (s2): average squared difference of each data value from the mean
2 ∑ ( X− X̄ )2
s=
(n−1) n = #data values X̄ = mean
Standard Deviation (s): square root of variance, SD = √Variance
MOST Commonly used, increases with sample size. If n is large, data symmetric / unimodal:
Then Mean ± 1s = 68% of data
Mean ± 2s = 95% of data
Mean ± 3s = 99% of data
Standard Error: Describes the variability of a sample statistic, used primarily for
constructing confidence intervals SE = SD
√n
Sources of Variability
Biologic: Inter-individual: different people vary
Intra-individual: same individual varies over time
Measurement: Inter-observer: 2 observers have different values (kappa)
Intra-observer: same observer varies over time
Analytical: mechanical or laboratory error
Page 1
Study Guide – Biostatistics Page 2
Probability: for an event that can either occur or not occur. ∩ = “AND”, U = “OR”
P(A) = probability that event A occurs
0 ≤ P(A) ≥ 1 for any event A
Independence: Events are independent if the occurrence of one has no effect on the other
P(A│B) = P(A) and P(B│A) = P(B)
Therefore;
P (A∩B) = P(B) * P(A│B) = P(B) P(A)
MULTIPLICATION RULE
P(A∩B∩…P(K) = P(A)+P(B)+…P(K) (if independent events)
Complementary: Events are mutually exclusive and contain all the outcomes in the sample
If 2 events A and Ā are complementary: then P(A) = 1 – P( Ā )
Conditional Probability: The conditional probability that the event A occurs given that the
event B has occurred:
Page 2
Study Guide – Biostatistics Page 3
Baye’s Theorem:
EXAMPLE: Virtual colonoscopy has a sensitivity of 90% and a specificity of 96%. What is the
positive predictive value (PPV) given a prevelance of 5%?
Page 3
Study Guide – Biostatistics Page 4
Changing mean →→ changes location, not shape of curve (i.e. moves left or right)
Changing standard deviation →→ changes shape, not location of curve (flat, tall)
Increasing standard deviation →→ curve flattens
SAMPLING DISTRIBUTIONS:
Sampling distribution of the mean: Normal population, Repeated sampling from a normal
distribution with mean μ and SD σ gives a normal distribution of sample means
SE (standard error)of the mean = σ / √n, which is the square root of variance
Areas under the curve for sampling distribution of the mean can be standardized
Because it has a normal distribution
( X̄−μ ) σ
Z= 2 σ2 σ x́ = ( X́ −μ)
σ /√n σ x́ =
recall, n , SE = Square root = √ n so, Z = σ
Sampling distribution of the mean: Non-normal population: If sample size is large (more
than 30), sampling distribution holds due to Central Limit Theorem
Central Limit Theorem: If a population has a finite mean (μ) and finite variance (σ2), then the
distribution of sample means derived from repeated sampling from this distribution approaches
the normal distribution with sample mean μ and variance σ2/n as the sample size increases (>30).
The MEANS from MULTIPLE samples of same pop will have normal distribution (vs. data)
Standard Normal (Z) Distribution: Normal distribution with μ =0 and σ = 1. Area under
curve = 1, area under any portion of curve/distribution = probability of observing value there.
Z-Table allows calculation of area under curve between any 2 points on x-axis
Page 4
Study Guide – Biostatistics Page 5
α/2 α/2
The wider the interval, the MORE confident
we are about the mean being present under
the curve. The narrower = less confident,
but MORE PRECISE (smaller range)
Page 5
Study Guide – Biostatistics Page 6
z p(1− p)
p 1−
∝
2
√ n
μ=np σ 2=npq
^p q^
Confidence Interval: 95% CI = ^p ± z .975
√ n
Page 6
Study Guide – Biostatistics Page 7
HYPOTHESIS TESTING
Ho = Null hypotheses
Must contain =, ≥, or ≤
Investigator usually wants to reject Ho (find Ho to be false)
Ha = Alternative hypothesis
Disagrees with Ho
\ thought to be true
Outcomes of hypothesis testing
Ho is actually true and it is accepted
Ho is actually false and it is rejected
Ho is actually true and it is rejected (Type I Error) (reject truth)
Ho is actually false and it is accepted (Type II Error) (accept false)
Ho Null Hypothesis
Actually True Actually False
Decision Accept Ho Correct Type II Error
Reject Ho Type I Error Correct
Page 7
Study Guide – Biostatistics Page 8
P-value:
Not a test statistic (it is a measure of probability)
Significance level at which the observed value of the test statistic would just be
significant
Probability that the observed difference from Ho is due to chance alone when Ho is true
Page 8
Study Guide – Biostatistics Page 9
To use Chi-Square data must be discrete, mutually exclusive and each box >5
Fisher’s Exact Test: when sample sizes are small (boxes in chart are less than 5)
Allows calculation of exact p-value, For 2x2 tables with small cell size
Page 9
Study Guide – Biostatistics Page 10
(b−c )2
McNemar Chi Square =
(b+ c)
ANALYSIS OF VARIANCE (ANOVA): One continuous variable or one categorical variable
with at least 3 groups. Independent, normally distributed. Use F statistic to differentiate groups.
EXAMPLE: Does cholesterol vary by race? Study groups = black, white, Asian, other
H o : μB =μW =μ A=μ O
Have to compare each of the populations to each other, which is tedious, but done by a
computer. Results can be used to find which populations are different from the others.
Page 10
Study Guide – Biostatistics Page 11
Assumptions”
Linearity: data should follow a straight line
Homogeneity of variance: variability of y should be the same at each x
Normality: values of y at each x should be normally distributed
SURVIVAL ANALYSIS: Follows a group of people over time. Contribution of each subject
weighted by time. 4 person years = one subject for 4 years, or a combination of subjects for 4
years.
Survival analysis used vs. other statistics because patients followed for variable lengths of time
and information tends to be incomplete (people are lost to follow-up).
Kaplan-Meier Life Table: 6 columns, Last one = “cumulative survival rate at time t”
This can be made into a survival curve. These curves can be compared with various methods
Log Rank Test (Peto’s), Gehan’s Generalized Wilcoxon Test, Cox mantel Test
Peto’s Generalized Wilcoxon Test, Cox’s F-Test, Mantel-Haenzel Chi Square Test
(So, it sucks Mantel Peto’s Cox to compare survival curves)
Page 11
Study Guide – Biostatistics Page 12
Cox Proportionate Hazard Model Regression: Used to compare survival analysis between
two groups that vary in age, sex, disease severity or other variables. Uses multiple logistical
regression techniques combined with survival analysis.
Repeated analysis of the data as they accrue over time and stopping the experiment or
trial when statistical significance is reached: leads to incorrect conclusions
Page 12
Study Guide – Biostatistics Page 13
Page 13
Study Guide – Biostatistics Page 14
Type of Data
Goal Normal Non-Normal Binomial Survival Time
Rank, Score, (2 possible
or measure outcomes)
Page 14