You are on page 1of 32

Confidence Intervals

Prof. Benjamin HK Yip


Division Family Medicine
School of Public Health and Primary Care
1
Overview
Outline
! Background
! Confidence intervals (CIs)
! Examples

Learning Objectives
! to understand CI construction
! to be able to name 3 factors that affect CIs
! to be able to interpret CIs found in literature
2
Motivation
! Research/Clinical Questions: Do HK young
adults have a low BMD (L2-L4)?
! Ex: In this particular class, the sample mean of
BMD (L2-L4) is 0.96 g/cm
2
.
! Questions:
" How meaningful is this sample mean?
" Will you trust this estimates?
3
Statistical inference
! Methods for drawing conclusions about a population
from a sample data,
" Parameter estimation and Confidence interval
" Hypothesis testing (p-values)
! Question: What allows us to make valid inferences
about a population based only on a sample?
" Probability (see previous lecture)
" Random process, i.e., randomization is the key
4
Why should I sample instead of using
the entire population?
The reasons to avoid using entire population are
following:
! Cost (!, $) and time
! Impractical
! Inaccurate
" There is a lot of error to control and monitor
" Lists are rarely up to date.
! Random sampling

5
Terminology
Population
! Group of individuals or objects that we would like to
study
Sample
! Subset of a population. Hopefully it should represent
characteristics of the population if it was drawn
randomly.
6
Terminology
Population parameter
! a quantity that describes a population.
Sample statistics
! an estimate of the population parameter
Statistical inference
! process of drawing conclusions about a population
based on observations in a sample
7
Framework for statistical inference
8
Sampling
Inference
Population
Sample
Random sampling, study design
Statistical estimation, hypothesis testing

!
2
"
#
x
s
2
r
p
Example of Population, sample and
parameters
Arbitrary population:
! Objective: Smoking = > cancer
! Population: ???
! The underlying truth process, which is
universally true for the Population
9
Definitions
Confidence interval, CI
! a range of values that probably contain the
population value

Confidence limits
! the values that state the boundaries of the
confidence interval
10
Construction
Most CIs have the following form:

Sample +/- (critical value)x(SE of sample statistics)
statistics

margin of error
11
Construction
! The sample statistics is point estimate based on sampled
data (eg, sample mean, sample proportion)
! The critical value represents the desired confidence level
based on distribution theory (normal, t, Poisson).
! The SE of sample statistics is a measure of the precision
of the sample estimation. In case the estimate is about a
mean (central tendency) then it can also be called as
Standard error of the mean (SEM). SE differ to SD, but
they are related (see later slides).
12
Critical value
! Decide distribution which the desired CI is based on.
" Continuous: Normal or Student-t
" Count: Poisson
" Binary: Binomial
! In general, normal (z-table) is the default distribution,
given the sample size is large enough (Central limit
theorem).
! Decide type I error rate ("): Incorrect to claim a
significant results (False positive). In general:
" " = 0.05
13
95% CI for the mean
! Sample statistic: sample mean
! " = 0.05
! Critical value: z
1-"/2
= z
1-0.05/2
or 100(1-0.05/2)
th
=
0.975 percentile of standard normal distribution.
From the standardized normal table (or so called z-
table), this value is 1.96
14
x
Recall
15
Pr !1.96 < z <1.96
( )
= 0.95
Population Sample

Mean
Unbiased
Estimator


m or
#
Standard deviation
Unbiased
Estimator


16
x
SD=
1
n !1
x
i
! x
( )
2
"
SE and SD
SE =
!
N
=
SD
N
SE vs SD
! Standard Deviation tells you the variability of your
data.
! Standard Error of the mean, SEM, tells you how
good is your estimate of the mean (accuracy). Its in
general smaller than SD, but dont let this be a
reason for you to choose to use it!
! Which one to use is depending on the content, you
want to describe the variability of the data or the
accuracy of your mean estimation?
17
Construction of 95% CI
Mean (large sample)


Mean (small sample)


Proportion (large sample)

18

z
1!!/2
"SE # x 1.96"
SD
N

t
n!1
"
SD
N
p z
1!!/2
"SE # p1.96"
p(1! p)
n
Probability and Confidence Interval
! From the CLT we know that
! From a N(0,1)-table we have
! Rearranging gives
! Thus, the interval is a 95% CI for !

November 08, 2012 Benjamin Yip 19

!
SE
~ z ~ N(0, 1)
Pr !1.96 <

!
SE
<1.96
"
#
$
%
&
'
= 0.95
Pr(

!1.96"SE < <

+1.96"SE) = 0.95
. . 96 . 1 e s !
*Theory behind CI
! Constructing a CI is simple, only need 3
components: statistics (e.g., mean), SE of the
statistics, and desired % CI.
! However, the logic behind is more complicated. It
involves three type of standard deviation (SD): SD of
the population parameter, SD of the sample, and SD
of the sampling distribution.
20
!"#$% '$ ()*(+*),$ ,#$ -)./*$ .$)% '$ )0$ +-+)**1
2%,$0$-,$3 %4, 2% ,#$ .$)% 45 ,#2- /)06(+*)0 -)./*$7 8+, 2%
,#$ .$)% 540 2%32923+)*- 45 ,#2- ,1/$ : 2% -,)6-6()* ,$0.-7 45
,#$ /4/+*)64% 504. '#2(# ,#$ -)./*$ (4.$- 504.; "$
+-+)**1 (4**$(, 3),) 2% 403$0 ,4 <$%$0)*2=$ 504. ,#$. )%3 -4
+-$ ,#$ -)./*$ .$)% )- )% $-6.),$ 45 ,#$ .$)% 540 ,#$
'#4*$ /4/+*)64%; >4' -)./*$ .$)% '2** 9)01 504. -)./*$
,4 -)./*$? ,#$ ')1 ,#2- 9)02)64% 4((+0- 2- 3$-(028$3 81 ,#$
!-)./*2%< 32-,028+64%@ 45 ,#$ .$)%; "$ ()% $-6.),$ #4'
.+(# -)./*$ .$)%- '2** 9)01 504. ,#$ -,)%3)03 3$92)64% 45
,#2- -)./*2%< 32-,028+64%7 '#2(# '$ ()** ,#$ -,)%3)03 $0040 45
,#$ $-6.),$- 45 ,#$ .$)%; A- ,#$ BC 2- ) ,1/$ 45 -,)%3)03
3$92)64%7 (4%5+-24% 2- +%3$0-,)%3)8*$;@
A*,.)% D E*)%37 EFG HIIJ?KKLMNIK
21
22
Only 1 CI missed
the true mean.
Indicates the true mean
(75mmHg)
*95% CI for the mean diastolic BP for 20
simulated studies, 50 subjects in simulation

An example
! Suppose that you would like to know the effect of a
newly developed drug (drug A) and a current drug
(drug B) on systolic blood pressure (SBP).
! Let say 35 patients were randomly assigned to
receive drug A and another 35 assigned to drug B.
The average (mean) SBP among drug A and drug B
patients was 107 mmHg (SD=19) and 125 (20)
mmHg, respectively.
! Construct 95% CI for each group, do the CIs overlap
and what is the interpretation?
23
Mean =

z
1-"/2
=


SE = SD/sqrt(N) =

95% CI =

24
95%CI = mean z
1!!/2
SE = mean z
1!!/2
SD
N
95% CI for Drug A
Mean = 125 mmHg

z
1-"/2
= 1.96


SE = SD/sqrt(N) = 20/sqrt(35) = 3.38

95% CI = 125 1.96x3.38 or (118.4, 131.6) mmHg



25
95%CI = mean z
1!!/2
SE = mean z
1!!/2
SD
N
95% CI for Drug B
26
Graph the CIs
100 110
120 130 140
100 110
120 130 140
Drug A
Drug B
Non-overlapping CIs indicating a true (i.e., signicant) mean
difference: Drug A is more effective to lower SBP than drug B.
In general:
27
Sourse: http://www.measuringusability.com/blog/ci-10things.php
Factors that affect the width of a CI are:
! Targeted confidence level, 1-"
(higher % wider CI)
! Sample size, N
(larger sample size, shorter CI)
! Variability or standard deviation, # (or SD)
(higher SD, wider CI)

28
mean z
1!!/2
SD
N
Factors that affect the width of a CI are:
Targeted CI, 1-!
! Intuition: a higher confidence interval level without
improving data quality means a larger margin of
error.
! As the targeted confidence interval increases, the CI
width increases, given all other quantities remain
unchanged.
29
Factors that affect the width of a CI are:
Sample size, N
! Intuition: a larger sample size means more
information, which implies better inference
! As the sample size increases, the CI width
decreases, given all other quantities remain
unchanged.
30
Factors that affect the width of a CI are:
Variability, SD
! Intuition: more variability or larger spread means
more difficult to estimate population value without
large amounts of data
! As the variability increases, the CI width increases,
given all other quantities remain unchanged
31
5 things to know about CI
1. CI tells you the most likely range of the unknown
population statistics (e.g., mean, proportion).
2. CI provides both the location and precision of a
measure
3. Three things influence the width of a CI (", N, SD)
4. Our CI estimated from sample data may or may not
contain the population average.
5. Overlap in CIs is a quick way to check for statistical
significance. However, the term significance is
more related to hypothesis testing.
32

You might also like