You are on page 1of 71

Advanced Epidemiology

Gebremedhin Berhe, MPH, Ass’t Prof


Department of Epidemiology, School of Public Health,

Mekelle University

1
Survival/Event History Analysis

2
Session objective

• At the end of the session students will be


familiarize with
– Basic concepts of survival
– Life Table (Actuarial Table)
– Kaplan-Meier (Product Limit) Approach
– Hazard function
– Log rank test
– Cox regression
3
Brain storming questions

• Cohort ( Open vs Closed cohort) population

• Cumulative incidence vs incidence density

• Cohort vs experimental study design

• Longitudinal/panel data

4
Survival Analysis

• This lecture introduces quantitative methods for


"time to event outcome variable".

• A time to event variable reflects the time until a


participant has an event of interest (e.g., heart
attack, goes into cancer remission, death).

5
Survival Analysis…

• The method is ‘time to event analysis or survival analysis’.

• “Survival= remaining free of a particular outcome over time

• Analyse durations or length of time to reach endpoint

• Data are usually censored

– Don’t follow sample long enough for everyone to get to


the endpoint (e.g. death)

6
Examples of time to event data

• Time to death
• Time to incidence of disease
• Unemployed - time till find job
• Time to birth of first child
• Smokers – time till quit smoking

7
Survival analysis: key concepts
– States

– Events

– Risk period(time)

– Spell

– Censoring

8
States
• States are categories of the outcome variable of interest

• Each person occupies exactly one state at any moment in time

• Examples

– alive, dead

– single, married, divorced, widowed

– never smoker, smoker, ex-smoker

• Set of possible states called the state space


9
Events(failure)

• A transition from one state to another

• Need to precisely defined.

• Example
– Death; disease (diagnosis, start of symptoms, relapse)

– Menopause

– Recurrence; response

10
Risk period /“Time”:

• The period of time that someone is at risk of a particular event is


called the risk period

• All subjects at risk of an event at a point in time called the risk


set

• Example

– Can only experience divorce if married

– Randomization in clinical trial is the time origin

11
Time…

• Time zero, or the time origin, is the time at which participants


are considered at-risk for the outcome of interest.

• Time at risk is measured from the start of the study (i.e., at


enrollment)

• Follow up time is measured from time zero until the event


occurs, the study ends or the participant is lost.

12
Time…
• Need to know when the clock starts
– Time from recruitment into the study
– Time from employment
– Time from diagnosis (prognostic studies)
– Time from infection
– Calendar time
– Age
• Event history analysis is to do with the analysis of the duration
of a non occurrence of an event or the length of time during
the risk period
13
Spell
• Many events are repeatable. For example, unemployment, child
birth, migration, infection

• For repeatable events, the interval between the start of exposure


and the next occurrence of an event is called spell. We can have
birth spells(interval), migration spells (trips)

• In event history analysis, individual spells are treated as distinct


observations. We can have multiple observations for same
person.

14
Censoring

• An observation is censored if it has incomplete information

• Occurs when information about duration is incomplete

• Different types

– Right

– Left

– Interval

15
Right censoring

• It is the most common type censoring

• Happens when the person did not have an event during the time
that they were studied

• Common reasons for right censoring

– the study ends

– the person drops-out of the study


– Loss to follow-up
– Administrative
16
Example: clinical trial

Time 0 STUDY END

17
Example: clinical trial

Time 0 STUDY END

18
Drop-out or LTFU

Time 0 STUDY END

19
Competing Risks: another type of censoring

• When a person withdraws from the study because of death (if


death is not the event of the interest)

• Event types ‘compete’ with one another

• Example of competers:
– Death from lung cancer

– Death from heart disease

– Death from car accident

20
Left censoring

• The event has occurred prior to the start of the study

• We know the event occurred, but unsure when prior to


observation

• In this kind of study, exact time would be known if it occurred


after the study started

• Example:

– Survey question: when did you first smoke?

– Alzheimers disease: onset generally hard to determine

– HPV: infection time


21
Interval censoring
• Due to discrete observation times, actual times not
observed

• Example: progression-free survival


– Progression of cancer defined by change in tumor size

– Measure in 3-6 month intervals

– If increase occurs, it is known to be within interval, but not exactly


when.

• Times are biased to longer values

• Challenging issue when intervals are long


22
Survival function

• A survival function produces the probability of surviving beyond a


specific point in time (denoted t).

• S(t) = P (surviving from time = 0 to time = t)

= P (surviving during interval = [0, t])

• or, equivalently,

• S(t) = P (surviving beyond time t) = P (T ≥ t).

• S(t) is a decreasing function (negative slope).

• Survival curves are often plotted as step function


23
X-year survival rate

• Many applications have ‘landmark’ times that historically


used to quantify survival

• Examples:
– Breast cancer: 5 year relapse-free survival

– Pancreatic cancer: 6 month survival

– Acute myeloid leukemia (AML): 12 month relapse-free


survival
Survival function…

Example:

• Follow up of 6 patients (2 years)

– 3 Deaths
– 2 censored (lost) before 2 years
– 1 survived 2 years
• Question: what is the cumulative incidence (or the
Cumulative Survival) up to 2 years?

25
Person ID
1 (24)

2 (6)

3 (18)

4 (15)

5 (13)

6 (3)

Jan Jan Jan


1999 2000 2001

Death Crude Survival:


Censored observation (lost to follow-up, withdrawal) 3/6= 50%
( ) Number of months to follow-up 26
Change time scale to “follow-up” time:

Person ID
1 (24)

2 (6)

3 (18)

4 (15)

5 (13)

6 (3)

0 1 2
Follow-up time (years)

27
Estimating the Survival Function

• To summarizing the experiences of the participants,


We can use
• Life Table (Actuarial Table)
• Kaplan-Meier (Product Limit) Approach

• Life table summarizes the experiences of participants over a


pre-defined follow-up period in a cohort study or in a clinical
trial until the time of the event of interest or the end of the
study, whichever comes first.

28
Life table…

• Life table is the cumulative probability of surviving to each.

• We first has to organize the follow-up times into equally


spaced intervals.

• The number of intervals chosen matters and typically


influences the analytic results.

29
Life table
Interval in Number Average Number of Lost to Proportion Among Survival
Years At Risk Number Deaths Follow-Up, Dying Those at Probability
During At Risk During Ct During Risk, St
Interval, During Interval, Dt Interval, Proportion
Nt Interval, qt Surviving
Nt* Effective Interval, pt
number of
persons at
risk
0 6 6 0 0 0 1 1
1st yr 6 6-(1/2) = 1 1 1/5.5 = 1-0.18= 1(0.82) =
5.5 0.18 0.82 0.82
2nd yr 4 4-(1/2) = 2 1 2/3.5 = 1-0.57 = (0.82)(0.43
3.5 0.57 0.43 )=0.35

30
Life table…
• Assume that censored observations over the period contribute
one-half the persons at risk in the denominator (censored
observations occur uniformly throughout follow-up interval).

• Based on this assumption, the number of persons alive at the


beginning of the interval (li ) is adjusted and called the effective
number of persons at- risk.

• All individuals in the same interval have the same probability of


death.

31
Life table…

• An issue with the life table approach shown above is that the
survival probabilities can change depending on how the intervals
are organized, particularly with small samples.

32
Kaplan-Meier (Product Limit) Approach

• Kaplan-Meier is a popular approach which addresses this


issue by re-estimating the survival probability each time an
event occurs.

• An assumption: censoring is independent of the likelihood of


developing the event of interest and that survival probabilities
are comparable in participants who are recruited early and
later into the study.

33
Kaplan-Meier

• When comparing several groups, it is also important that


these assumptions are satisfied in each comparison group and
that for example, censoring is not more likely in one group
than another.

• With the actuarial life table approach we consider equally


spaced intervals, while with the Kaplan-Meier approach, we
use observed event times and censoring times

• If N is large and/or if life-table intervals are small, results are


similar 34
Kaplan-Meier…

Time, Number at Risk Number of Number Survival Probability


Moths Nt Deaths Censored St+1 = St*((Nt+1-
Dt Ct Dt+1)/Nt+1)

6 0 1
0
3
6 1 0 1*(6-1)/6=0.83
6 5 0 1 0.83*(5-0)/5=0.83
13 4 1 0 0.83*(4-1)/4=0.62
15 3 1 0.62*(3-0)/3=0.62
18 2 1 0 0.62*(2-1)/2=0.31
1 0 1 0.31*(1-0)/1=0.31
24

Here we are not considering equality of intervals


35
Kaplan-Meier Method
Calculate the cumulative probability of event (and survival) based on
conditional probabilities at each event time
Step 1: Sort the survival times from shortest to longest
Person ID
1 (24)
2 (6)
3 (18)
4 (15)
5 (13)
6 (3)

0 1 2
Follow-up time (years)

36
Kaplan-Meier Method

Calculate the cumulative probability of event (and survival) based on


conditional probabilities at each event time
Step 1: Sort the survival times from shortest to longest

Person ID
6 (3)
2 (6)
5 (13)
4 (15)
3 (18)
1 (24)

0 1 2
Follow-up time (years)

37
Step 2: For each time of occurrence of an event, compute the
conditional survival
Person ID
6 (3)
2 (6)
5 (13)
4 (15)
3 (18)
1 (24)

0 1 2
Follow-up time (years)

At 3 months , one dies and 5 survive. Thus:


• Incidence of event at exact time 3 months: 1/6
• Probability of survival beyond 3 months: 5/6=(1-1/6)=0.83

38
Person ID
6 (3)
2 (6)
5 (13)
4 (15)
3 (18)
1 (24)

0 1 2
Follow-up time (years)

At 13 months, one of them dies; 3 survive

• Incidence of event at exact time 13 months: 1/4


• Probability of survival beyond 13 months: ¾ =(1-1/4)=0.75

39
Person ID
6 (3)
2 (6)
5 (13)
4 (15)
3 (18)
1 (24)

0 1 2
Follow-up time (years)

At 18 months, one of them dies and 1 survive.

• Incidence of event at exact time 18 months: 1/2


• Probability of survival beyond 18 months: ½ =0.5
40
Kaplan-Meier Method…
Conditional Probability of an Event (Survival)
• The probability of an event (or of survival) at time t (for the
individuals at risk at time t), that is, conditioned on being at
risk at exact time t.

Step 3: For each time of occurrence of an event, compute the


cumulative survival (survival function), multiplying conditional
probabilities of survival.
3 months: S(3)=5/6=0.833
12 months: S(13)=5/63/4=0.625
18 months: S(18)=5/6 3/41/2 =0.3125
41
Plotting the survival function:
Time (mo) Si

3 0.833
13 0.625
Survival 18 0.3125

1.00
0.833
0.80 0.625
0.60

0.40 0.3125 0.3125

0.20

0 5 10 15 20 25
Month of follow-up

The cumulative incidence (up to 24 months): 1-0.3125 = 0.6875 (or 69%)


42
Hazard Function (Cumulative Incidence Curve)

• A little harder to conceptualize

• An instantaneous relative rate h(t) is usually called a hazard


rate in human populations and a failure rate in other contexts.

• It is sometimes called the force of mortality or an


instantaneous rate of death or, from physics, relative velocity.

• Some investigators prefer to generate cumulative incidence


curves, as opposed to survival curves which show the
cumulative probabilities of experiencing the event of interest
43
Hazard function
• It is computed as 1-St and can be computed easily from the
life table using the Kaplan-Meier approach

Time, moths Number at Number of Number Survival Failure


Risk Deaths Censored Probability Probability
Nt Dt Ct St 1-St

3 6 1 0 0.8 0.2
6 5 1 0.8 0.2
13 4 1 0 0.6 0.4
15 3 1 0.6 0.4
18 2 1 0 0.3 0.7
24 1 0 1 0.3 0.7
44
Plotting the hazard function:

Cumulative Survival Cumulative Hazard


1.00 1.00
0.8
0.7
0.80 0.80
0.6
0.60 0.60
0.4
0.40 0.3 0.40
0.2
0.20 0.20

0 5 10 15 20 25
Month of follow-up

45
Some hazard shapes

• Increasing
– Onset of Alzheimer's , natural aging and wear
• Decreasing
– Survival after surgery, early failures due to device or
transplant failures
• U-shaped
– Age specific mortality, populations followed from birth
• Constant
– Time till next email arrives
46
Comparing Survival Curves

• We are often interested in assessing whether there are


differences in survival (or cumulative incidence of event) among
different groups of participants

• For example, in a clinical trial with a survival outcome,


comparing survival between participants receiving a new drug as
compared to a placebo (or standard therapy).

• In an observational study, we might be interested in comparing


survival between men and women
47
The Log Rank Test

• The test compares the entire survival experience between


groups and can be thought of as a test of whether the survival
curves are identical (overlapping) or not.

• Survival curves are estimated for each group, considered


separately, using the Kaplan-Meier method and compared
statistically using the log rank test.

48
Comparing survival by group using Kaplan-Meier graphs
1.00
0.75
0.50
0.25
0.00

0 5 10 15
analysis time

sex = male sex = female


49
The log-rank test….

• A non –parametric test that assesses the null hypothesis that


there are no differences in survival times between groups

• The log rank statistic is approximately distributed as a chi-


square test statistic.

• ……More Practice,( Assignment)


50
Design of survival studies

• When the main outcome of interest is survival time, panning


a study should include some special considerations.

• It is important to realize the power of the test to compare


survival in two or more group related to the total sample size
but to the number of events of interest such as deaths

• When there is small risk of event of interest a vast study may


be needed.

51
Design of survival studies…

• One way to increase power of a study is therefore to consider


taking a more common event as the end point of the study.

• Other ways to increase power are to increase to total sample


size and to extend the length of follow up of each subjects.

52
The Cox regression model

53
Cox regression

• The regression model introduced by cox(1972) is used widely


when it is desired to investigate several variables at the same
time.

• It is also Known as proportional hazard regression analysis

• Cox’s method is a semi parametric approach- no particular type


of distribution is assumed for the survival tie, but strong
assumption is made that effects the different variables on the
survival are constant over time.
54
Cox regression…

• Example: Do men and women have different risk of developing


lung cancer based on cigarette smoking?

• By conducting a cox regression model with cigarette


usage(cigarettes smoked per day) and gender entered as
covariate, you can test hypothesis regarding the effects of
gender and cigarette usage on the time- to- onset for lung
cancer

55
with Cox del
Cox regression….

• No longer modelling the duration

• Modelling the hazard

• Hazard: measure of the probability that an event occurs at


time t conditional on it not having occurred up until t

• Also known as the Cox proportional hazard model

56
Cox regression….

• Cox regression can model time invariant and time varying


explanatory variables

• Eg. “current age” rather than age at baseline is a time varying


variable; recode current age to include in the model.

• It is used to relate several risk factors or exposures, considered


simultaneously, to survival time.

• The model produces hazard ratios, equivalent to OR in logistic


regression
57
Cox regression….

• At any point in time, t, an individual, i, has an instantaneous


risk of reaching the end point(often known as the hazard, or
hi(t), given that the individual has not reached it up to that
point in time.

• We can use cox proportional model to test the independent


effect of a number of explanatory variables(factors) on the

hazard.

58
Cox regression equation

hi (t )  h0 (t ) exp( 1 xi1   2 xi 2  .......   n xin )

hi (t ) is the hazard function for individual i

h0 (t ) is the baseline hazard function and can take any form


It is estimated from the data (non parametric)

xi1 , xi 2 ,...., xin are the covariates(predictors)

1 ,  2 ,....,  n are the regression coefficients estimated from the data


Cox regression….

• The predicted hazard (i.e., h(t)), or the rate of suffering the


event of interest in the next instant, is the product of the
baseline hazard (h0(t)) and the exponential function of the
linear combination of the predictors.

• Thus, the predictors have a multiplicative or proportional


effect on the predicted hazard.
Hazard Ratio(HR)

• Hazard rate is a measure of effect, which is the risk of


failure (i.e., the risk or probability of suffering the event of
interest), given that the participant has survived up to a
specific time.

• Hazard represents the expected number of events per one


unit of time.
Hazard ratio…

• For example, if the hazard is 0.2 at time t and the time units
are months, then on average, 0.2 events are expected per
person at risk per month.

• Another interpretation is based on the reciprocal of the


hazard. For example, 1/0.2 = 5, which is the expected event-
free time (5 months) per person at risk.
Hazard ratio…

• The hazard ratio can be estimated from the log rank test.

• The hazard ratio is the ratio of the total number of observed


to expected events in two independent comparison groups.
Hazard ratio

For a predictor

• HR=1: the predictor does not affect survival.

• HR <1: the predictor is protective (i.e., associated with


improved survival)

• HR >1: the predictor is associated with increased risk (or


decreased survival).

64
Interpreting HR from cox regression

• The hazard ratio is the ratio of the hazard for a unit change in
the covariate

– HR = 1.3 for men vs. women(ref); event: mortality

– The risk of mortality is increased by 30% for men


compared with women

• Hazard ratio assumed constant over time

– At any time point, the hazard of mortality for a man is 1.3


times the hazard for a woman.
65
Assumptions in cox regression
• Assumption of proportional hazards

• No censoring patterns

• True starting time

• Plus assumptions for all modelling

– Sufficient sample size

– Independent observations

– No multi-collinearity

66
Summary

• In event history analysis we are not only interested in whether


or not an event occurs, but also the length of time until an
event occurs.

• The time until an event occurs is referred to as the waiting


time, failure time or survival time.

• Log rank test is use to compare the survival experience of two


or more groups, how ever it can’t be used to explore the
effects of several variables on survival.
67
Summary

• Life table, kaplan-meier survival analysis, and cox


regression are method for modeling time to event data in
the presence of censored cases,
• Cox regression allows you to include predictor variables in
your model.
• In cox’s method, the strong assumption is that the effects
of different variables on survival are constant over time.

68
Assignment

• Log rank test

• HR calculation

• Testing the proportional hazards assumption

– A. graphically

– B. statistically

69
• Questions and Discussion

70
• Thank you

71

You might also like