Advanced Epidemiology Gebremedhin Berhe, MPH, Ass't Prof

Advanced Epidemiology
Gebremedhin Berhe, MPH, Ass’t Prof

Department of Epidemiology, School of Public Health,
Mekelle University
1
Survival/Event History Analysis
2
Session objective
• At the end of the session students will be

familiarize with
– Basic concepts of survival
– Life Table (Actuarial Table)
– Kaplan-Meier (Product Limit) Approach
– Hazard function
– Log rank test
– Cox regression
3
Brain storming questions
• Cohort ( Open vs Closed cohort) population
• Cumulative incidence vs incidence density
• Cohort vs experimental study design
• Longitudinal/panel data
4
Survival Analysis
• This lecture introduces quantitative methods for

"time to event outcome variable".
• A time to event variable reflects the time until a

participant has an event of interest (e.g., heart
attack, goes into cancer remission, death).
5
Survival Analysis…
• The method is ‘time to event analysis or survival analysis’.
• “Survival= remaining free of a particular outcome over time
• Analyse durations or length of time to reach endpoint
• Data are usually censored
– Don’t follow sample long enough for everyone to get to

the endpoint (e.g. death)
6
Examples of time to event data
• Time to death
• Time to incidence of disease
• Unemployed - time till find job
• Time to birth of first child
• Smokers – time till quit smoking
7
Survival analysis: key concepts
– States
– Events
– Risk period(time)
– Spell
– Censoring
8
States
• States are categories of the outcome variable of interest
• Each person occupies exactly one state at any moment in time
• Examples
– alive, dead
– single, married, divorced, widowed
– never smoker, smoker, ex-smoker
• Set of possible states called the state space

9
Events(failure)
• A transition from one state to another
• Need to precisely defined.
• Example
– Death; disease (diagnosis, start of symptoms, relapse)
– Menopause
– Recurrence; response
10
Risk period /“Time”:
• The period of time that someone is at risk of a particular event is

called the risk period
• All subjects at risk of an event at a point in time called the risk

set
• Example
– Can only experience divorce if married
– Randomization in clinical trial is the time origin
11
Time…
• Time zero, or the time origin, is the time at which participants

are considered at-risk for the outcome of interest.
• Time at risk is measured from the start of the study (i.e., at

enrollment)
• Follow up time is measured from time zero until the event

occurs, the study ends or the participant is lost.
12
Time…
• Need to know when the clock starts
– Time from recruitment into the study
– Time from employment
– Time from diagnosis (prognostic studies)
– Time from infection
– Calendar time
– Age
• Event history analysis is to do with the analysis of the duration
of a non occurrence of an event or the length of time during
the risk period
13
Spell
• Many events are repeatable. For example, unemployment, child
birth, migration, infection
• For repeatable events, the interval between the start of exposure

and the next occurrence of an event is called spell. We can have
birth spells(interval), migration spells (trips)
• In event history analysis, individual spells are treated as distinct

observations. We can have multiple observations for same
person.
14
Censoring
• An observation is censored if it has incomplete information
• Occurs when information about duration is incomplete
• Different types
– Right
– Left
– Interval
15
Right censoring
• It is the most common type censoring
• Happens when the person did not have an event during the time
that they were studied
• Common reasons for right censoring
– the study ends
– the person drops-out of the study

– Loss to follow-up
– Administrative
16
Example: clinical trial
Time 0 STUDY END
17
Example: clinical trial
Time 0 STUDY END
18
Drop-out or LTFU
Time 0 STUDY END
19
Competing Risks: another type of censoring
• When a person withdraws from the study because of death (if

death is not the event of the interest)
• Event types ‘compete’ with one another
• Example of competers:
– Death from lung cancer
– Death from heart disease
– Death from car accident
20
Left censoring
• The event has occurred prior to the start of the study
• We know the event occurred, but unsure when prior to

observation
• In this kind of study, exact time would be known if it occurred

after the study started
• Example:
– Survey question: when did you first smoke?
– Alzheimers disease: onset generally hard to determine
– HPV: infection time

21
Interval censoring
• Due to discrete observation times, actual times not
observed
• Example: progression-free survival

– Progression of cancer defined by change in tumor size
– Measure in 3-6 month intervals
– If increase occurs, it is known to be within interval, but not exactly

when.
• Times are biased to longer values
• Challenging issue when intervals are long

22
Survival function
• A survival function produces the probability of surviving beyond a

specific point in time (denoted t).
• S(t) = P (surviving from time = 0 to time = t)
= P (surviving during interval = [0, t])
• or, equivalently,
• S(t) = P (surviving beyond time t) = P (T ≥ t).
• S(t) is a decreasing function (negative slope).
• Survival curves are often plotted as step function

23
X-year survival rate
• Many applications have ‘landmark’ times that historically

used to quantify survival
• Examples:
– Breast cancer: 5 year relapse-free survival
– Pancreatic cancer: 6 month survival
– Acute myeloid leukemia (AML): 12 month relapse-free

survival
Survival function…
Example:
• Follow up of 6 patients (2 years)
– 3 Deaths
– 2 censored (lost) before 2 years
– 1 survived 2 years
• Question: what is the cumulative incidence (or the
Cumulative Survival) up to 2 years?
25
Person ID
1 (24)
2 (6)
3 (18)
4 (15)
5 (13)
6 (3)
Jan Jan Jan

1999 2000 2001
Death Crude Survival:

Censored observation (lost to follow-up, withdrawal) 3/6= 50%
( ) Number of months to follow-up 26
Change time scale to “follow-up” time:
Person ID
1 (24)
2 (6)
3 (18)
4 (15)
5 (13)
6 (3)
0 1 2
Follow-up time (years)
27
Estimating the Survival Function
• To summarizing the experiences of the participants,

We can use
• Life Table (Actuarial Table)
• Kaplan-Meier (Product Limit) Approach
• Life table summarizes the experiences of participants over a

pre-defined follow-up period in a cohort study or in a clinical
trial until the time of the event of interest or the end of the
study, whichever comes first.
28
Life table…
• Life table is the cumulative probability of surviving to each.
• We first has to organize the follow-up times into equally

spaced intervals.
• The number of intervals chosen matters and typically

influences the analytic results.
29
Life table
Interval in Number Average Number of Lost to Proportion Among Survival
Years At Risk Number Deaths Follow-Up, Dying Those at Probability
During At Risk During Ct During Risk, St
Interval, During Interval, Dt Interval, Proportion
Nt Interval, qt Surviving
Nt* Effective Interval, pt
number of
persons at
risk
0 6 6 0 0 0 1 1
1st yr 6 6-(1/2) = 1 1 1/5.5 = 1-0.18= 1(0.82) =
5.5 0.18 0.82 0.82
2nd yr 4 4-(1/2) = 2 1 2/3.5 = 1-0.57 = (0.82)(0.43
3.5 0.57 0.43 )=0.35
30
Life table…
• Assume that censored observations over the period contribute
one-half the persons at risk in the denominator (censored
observations occur uniformly throughout follow-up interval).
• Based on this assumption, the number of persons alive at the

beginning of the interval (li ) is adjusted and called the effective
number of persons at- risk.
• All individuals in the same interval have the same probability of

death.
31
Life table…
• An issue with the life table approach shown above is that the
survival probabilities can change depending on how the intervals
are organized, particularly with small samples.
32
Kaplan-Meier (Product Limit) Approach
• Kaplan-Meier is a popular approach which addresses this

issue by re-estimating the survival probability each time an
event occurs.
• An assumption: censoring is independent of the likelihood of

developing the event of interest and that survival probabilities
are comparable in participants who are recruited early and
later into the study.
33
Kaplan-Meier
• When comparing several groups, it is also important that

these assumptions are satisfied in each comparison group and
that for example, censoring is not more likely in one group
than another.
• With the actuarial life table approach we consider equally

spaced intervals, while with the Kaplan-Meier approach, we
use observed event times and censoring times
• If N is large and/or if life-table intervals are small, results are

similar 34
Kaplan-Meier…
Time, Number at Risk Number of Number Survival Probability

Moths Nt Deaths Censored St+1 = St*((Nt+1-
Dt Ct Dt+1)/Nt+1)
6 0 1
0
3
6 1 0 1*(6-1)/6=0.83
6 5 0 1 0.83*(5-0)/5=0.83
13 4 1 0 0.83*(4-1)/4=0.62
15 3 1 0.62*(3-0)/3=0.62
18 2 1 0 0.62*(2-1)/2=0.31
1 0 1 0.31*(1-0)/1=0.31
24
Here we are not considering equality of intervals

35
Kaplan-Meier Method
Calculate the cumulative probability of event (and survival) based on
conditional probabilities at each event time
Step 1: Sort the survival times from shortest to longest
Person ID
1 (24)
2 (6)
3 (18)
4 (15)
5 (13)
6 (3)
0 1 2
36
Kaplan-Meier Method
Calculate the cumulative probability of event (and survival) based on

conditional probabilities at each event time
Step 1: Sort the survival times from shortest to longest
Person ID
6 (3)
2 (6)
5 (13)
4 (15)
3 (18)
1 (24)
0 1 2
37
Step 2: For each time of occurrence of an event, compute the
conditional survival
Person ID
6 (3)
2 (6)
5 (13)
4 (15)
3 (18)
1 (24)
0 1 2
At 3 months , one dies and 5 survive. Thus:

• Incidence of event at exact time 3 months: 1/6
• Probability of survival beyond 3 months: 5/6=(1-1/6)=0.83
38
Person ID
6 (3)
2 (6)
5 (13)
4 (15)
3 (18)
1 (24)
0 1 2
At 13 months, one of them dies; 3 survive

• Probability of survival beyond 13 months: ¾ =(1-1/4)=0.75
39
Person ID
6 (3)
2 (6)
5 (13)
4 (15)
3 (18)
1 (24)
0 1 2
At 18 months, one of them dies and 1 survive.

• Probability of survival beyond 18 months: ½ =0.5
40
Kaplan-Meier Method…
Conditional Probability of an Event (Survival)
• The probability of an event (or of survival) at time t (for the
individuals at risk at time t), that is, conditioned on being at
risk at exact time t.
Step 3: For each time of occurrence of an event, compute the

cumulative survival (survival function), multiplying conditional
probabilities of survival.
3 months: S(3)=5/6=0.833
12 months: S(13)=5/63/4=0.625
18 months: S(18)=5/6 3/41/2 =0.3125
41
Plotting the survival function:
Time (mo) Si
3 0.833
13 0.625
Survival 18 0.3125
1.00
0.833
0.80 0.625
0.60
0.40 0.3125 0.3125
0.20
0 5 10 15 20 25
Month of follow-up
The cumulative incidence (up to 24 months): 1-0.3125 = 0.6875 (or 69%)

42
Hazard Function (Cumulative Incidence Curve)
• A little harder to conceptualize
• An instantaneous relative rate h(t) is usually called a hazard

rate in human populations and a failure rate in other contexts.
• It is sometimes called the force of mortality or an

instantaneous rate of death or, from physics, relative velocity.
• Some investigators prefer to generate cumulative incidence

curves, as opposed to survival curves which show the
cumulative probabilities of experiencing the event of interest
43
Hazard function
• It is computed as 1-St and can be computed easily from the
life table using the Kaplan-Meier approach
Time, moths Number at Number of Number Survival Failure

Risk Deaths Censored Probability Probability
Nt Dt Ct St 1-St
3 6 1 0 0.8 0.2
6 5 1 0.8 0.2
13 4 1 0 0.6 0.4
15 3 1 0.6 0.4
18 2 1 0 0.3 0.7
24 1 0 1 0.3 0.7
44
Plotting the hazard function:
Cumulative Survival Cumulative Hazard

1.00 1.00
0.8
0.7
0.80 0.80
0.6
0.60 0.60
0.4
0.40 0.3 0.40
0.2
0.20 0.20
0 5 10 15 20 25
Month of follow-up
45
Some hazard shapes
• Increasing
– Onset of Alzheimer's , natural aging and wear
• Decreasing
– Survival after surgery, early failures due to device or
transplant failures
• U-shaped
– Age specific mortality, populations followed from birth
• Constant
– Time till next email arrives
46
Comparing Survival Curves
• We are often interested in assessing whether there are

differences in survival (or cumulative incidence of event) among
different groups of participants
• For example, in a clinical trial with a survival outcome,

comparing survival between participants receiving a new drug as
compared to a placebo (or standard therapy).
• In an observational study, we might be interested in comparing

survival between men and women
47
The Log Rank Test
• The test compares the entire survival experience between

groups and can be thought of as a test of whether the survival
curves are identical (overlapping) or not.
• Survival curves are estimated for each group, considered

separately, using the Kaplan-Meier method and compared
statistically using the log rank test.
48
Comparing survival by group using Kaplan-Meier graphs
1.00
0.75
0.50
0.25
0.00
0 5 10 15
analysis time
sex = male sex = female

49
The log-rank test….
• A non –parametric test that assesses the null hypothesis that

there are no differences in survival times between groups
• The log rank statistic is approximately distributed as a chi-

square test statistic.
• ……More Practice,( Assignment)

50
Design of survival studies
• When the main outcome of interest is survival time, panning

a study should include some special considerations.
• It is important to realize the power of the test to compare

survival in two or more group related to the total sample size
but to the number of events of interest such as deaths
• When there is small risk of event of interest a vast study may

be needed.
51
Design of survival studies…
• One way to increase power of a study is therefore to consider

taking a more common event as the end point of the study.
• Other ways to increase power are to increase to total sample

size and to extend the length of follow up of each subjects.
52
The Cox regression model
53
Cox regression
• The regression model introduced by cox(1972) is used widely

when it is desired to investigate several variables at the same
time.
• It is also Known as proportional hazard regression analysis
• Cox’s method is a semi parametric approach- no particular type

of distribution is assumed for the survival tie, but strong
assumption is made that effects the different variables on the
survival are constant over time.
54
Cox regression…
• Example: Do men and women have different risk of developing

lung cancer based on cigarette smoking?
• By conducting a cox regression model with cigarette

usage(cigarettes smoked per day) and gender entered as
covariate, you can test hypothesis regarding the effects of
gender and cigarette usage on the time- to- onset for lung
cancer
55
with Cox del
Cox regression….
• No longer modelling the duration
• Modelling the hazard
• Hazard: measure of the probability that an event occurs at

time t conditional on it not having occurred up until t
• Also known as the Cox proportional hazard model
56
Cox regression….
• Cox regression can model time invariant and time varying

explanatory variables
• Eg. “current age” rather than age at baseline is a time varying

variable; recode current age to include in the model.
• It is used to relate several risk factors or exposures, considered

simultaneously, to survival time.
• The model produces hazard ratios, equivalent to OR in logistic

regression
57
Cox regression….
• At any point in time, t, an individual, i, has an instantaneous

risk of reaching the end point(often known as the hazard, or
hi(t), given that the individual has not reached it up to that
point in time.
• We can use cox proportional model to test the independent

effect of a number of explanatory variables(factors) on the
hazard.
58
Cox regression equation
hi (t )  h0 (t ) exp( 1 xi1   2 xi 2  .......   n xin )
hi (t ) is the hazard function for individual i
h0 (t ) is the baseline hazard function and can take any form

It is estimated from the data (non parametric)
xi1 , xi 2 ,...., xin are the covariates(predictors)
1 ,  2 ,....,  n are the regression coefficients estimated from the data

Cox regression….
• The predicted hazard (i.e., h(t)), or the rate of suffering the

event of interest in the next instant, is the product of the
baseline hazard (h0(t)) and the exponential function of the
linear combination of the predictors.
• Thus, the predictors have a multiplicative or proportional

effect on the predicted hazard.
Hazard Ratio(HR)
• Hazard rate is a measure of effect, which is the risk of

failure (i.e., the risk or probability of suffering the event of
interest), given that the participant has survived up to a
specific time.
• Hazard represents the expected number of events per one

unit of time.
Hazard ratio…
• For example, if the hazard is 0.2 at time t and the time units
are months, then on average, 0.2 events are expected per
person at risk per month.
• Another interpretation is based on the reciprocal of the

hazard. For example, 1/0.2 = 5, which is the expected event-
free time (5 months) per person at risk.
Hazard ratio…
• The hazard ratio can be estimated from the log rank test.
• The hazard ratio is the ratio of the total number of observed

to expected events in two independent comparison groups.
Hazard ratio
For a predictor
• HR=1: the predictor does not affect survival.
• HR <1: the predictor is protective (i.e., associated with

improved survival)
• HR >1: the predictor is associated with increased risk (or

decreased survival).
64
Interpreting HR from cox regression
• The hazard ratio is the ratio of the hazard for a unit change in
the covariate
– HR = 1.3 for men vs. women(ref); event: mortality
– The risk of mortality is increased by 30% for men

compared with women
• Hazard ratio assumed constant over time
– At any time point, the hazard of mortality for a man is 1.3

times the hazard for a woman.
65
Assumptions in cox regression
• Assumption of proportional hazards
• No censoring patterns
• True starting time
• Plus assumptions for all modelling
– Sufficient sample size
– Independent observations
– No multi-collinearity
66
Summary
• In event history analysis we are not only interested in whether

or not an event occurs, but also the length of time until an
event occurs.
• The time until an event occurs is referred to as the waiting

time, failure time or survival time.
• Log rank test is use to compare the survival experience of two

or more groups, how ever it can’t be used to explore the
effects of several variables on survival.
67
Summary
• Life table, kaplan-meier survival analysis, and cox

regression are method for modeling time to event data in
the presence of censored cases,
• Cox regression allows you to include predictor variables in
your model.
• In cox’s method, the strong assumption is that the effects
of different variables on survival are constant over time.
68
Assignment
• Log rank test
• HR calculation
• Testing the proportional hazards assumption
– A. graphically
– B. statistically
69
• Questions and Discussion
70
• Thank you
71

Advanced Epidemiology Gebremedhin Berhe, MPH, Ass't Prof

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Epidemiology Gebremedhin Berhe, MPH, Ass't Prof

Uploaded by

Copyright:

Available Formats

Advanced Epidemiology

Gebremedhin Berhe, MPH, Ass’t Prof

• At the end of the session students will be

• Cohort ( Open vs Closed cohort) population

• Cumulative incidence vs incidence density

• Cohort vs experimental study design

• This lecture introduces quantitative methods for

• A time to event variable reflects the time until a

• The method is ‘time to event analysis or survival analysis’.

• “Survival= remaining free of a particular outcome over time

• Analyse durations or length of time to reach endpoint

• Data are usually censored

– Don’t follow sample long enough for everyone to get to

• Each person occupies exactly one state at any moment in time

– single, married, divorced, widowed

– never smoker, smoker, ex-smoker

• Set of possible states called the state space

• A transition from one state to another

• Need to precisely defined.

• The period of time that someone is at risk of a particular event is

• All subjects at risk of an event at a point in time called the risk

– Can only experience divorce if married

– Randomization in clinical trial is the time origin

• Time zero, or the time origin, is the time at which participants

• Time at risk is measured from the start of the study (i.e., at

• Follow up time is measured from time zero until the event

• For repeatable events, the interval between the start of exposure

• In event history analysis, individual spells are treated as distinct

• An observation is censored if it has incomplete information

• Occurs when information about duration is incomplete

• It is the most common type censoring

• Common reasons for right censoring

– the study ends

– the person drops-out of the study

Time 0 STUDY END

Time 0 STUDY END

Time 0 STUDY END

• When a person withdraws from the study because of death (if

• Event types ‘compete’ with one another

– Death from heart disease

– Death from car accident

• The event has occurred prior to the start of the study

• We know the event occurred, but unsure when prior to

• In this kind of study, exact time would be known if it occurred

– Survey question: when did you first smoke?

– Alzheimers disease: onset generally hard to determine

– HPV: infection time

• Example: progression-free survival

– Measure in 3-6 month intervals

– If increase occurs, it is known to be within interval, but not exactly

• Times are biased to longer values

• Challenging issue when intervals are long

• A survival function produces the probability of surviving beyond a

• S(t) = P (surviving from time = 0 to time = t)

= P (surviving during interval = [0, t])

• S(t) = P (surviving beyond time t) = P (T ≥ t).

• S(t) is a decreasing function (negative slope).

• Survival curves are often plotted as step function

• Many applications have ‘landmark’ times that historically

– Pancreatic cancer: 6 month survival

– Acute myeloid leukemia (AML): 12 month relapse-free

• Follow up of 6 patients (2 years)