You are on page 1of 28

CDISC Italian User Group 2007

Analysis Data
Model (ADaM)

Annamaria Muraro
Helsinn Healthcare

Data flow in clinical studies


Raw Datasets (=SDTM)
Data from a clinical trial
Source: CRF

Analysis Datasets
(=ADaM data)
Datasets used in the
analysis, restructured and
contain additional
information (derived
variables, flags, etc.)
Source: raw datasets
Two sets of data
Each with a specific purpose
2

FDA requirements

Analysis Data Model: General Considerations


Document http://www.cdisc.org/models/adam/V2.0/index.html
Analysis Data Model Version 2.0 (November 2006)
key principles for analysis datasets
conventions for standard analysis variables
provides a model for subject-level analysis dataset
Metadata for Analysis Datasets
Analysis dataset metadata
Analysis variable metadata
Analysis results metadata

Analysis datasets are discussed within the context of electronic


submissions to the FDA but the same principles and standards
will apply, regardless of the purpose of the analysis datasets

Key Principles for Analysis Datasets Creation


Analysis datasets should:
facilitate a clear and unambiguous communication of the
content, source and quality of the datasets supporting the
statistical analyses
be useable by currently available tools (SAS XPT)
be Analysis-ready or One Statistical Procedure Away
redundancy may be acceptable
well documented: metadata and other documentation
should provide clear description of the analytic results,
including statistical method, transformations, assumptions,
derivations and imputations performed
include the optimum number of datasets
garantee traceability
5

SDTM and ADaM

SDTM
Source data
Vertical
No redundancy
Character variables
Each domain is specific to
itself
Dates are ISO8601 character
strings
Two chars for dataset name
Data transfer
Interoperability

ADaM
Derived data
Structure may not necessarily by
vertical
Redundancy is needed for easy
analysis
Numeric variables
Combines variables across
multiple domains
Dates are formatted as numeric
(e.g. SAS dates) to allow
manipulation
Dataset Name: ADXXXX
Analytic & graphical analysis
Clear communication of statistical
analysis and related decision

BOTH ARE NEEDED FOR FDA REVIEW !


6

Analysis Dataset Variables


Analysis dataset variables should be compliant with
SDTM standards
Maintain SDTM variable attributes (if the identical variable also
exists in an SDTM dataset)
Follow naming conventions for datasets and variables consistent
with the SDTM conventions, where feasible

Analysis variables to be included

Identifiers
Analysis Population Indicators
Analysis Date Variables
Analysis Study Day Variables
Visit time Variables
Numeric Code Variables
Analysis Treatment Variables

Analysis Dataset Variables


Analysis Population

Analysis datasets should include analysis


population flag at whatever level (eg. subject,
visit or measurement) is necessary to clearly
describe the population set used for any
analysis
Variables used to identify specific population
FULLSET, SAFETY, PPROT

Population flags may be required at Visit level


FULLV, SAFV

Population flags may be present in the SDTM


(supplemental domain)
8

Analysis Dataset Variables


Numeric Code Variables
When a numeric version of a categorical variables is
required for statistical purposes: append an N to the
SDTM variable name

Analysis Dataset Variables


Analysis Treatment Variables Variables

Treatment variables are required to be present in all analysis datasets


Planned Treatment (TRTP char, TRTPN numeric)
Actual Treatment (TRTA char, TRTAN numeric)

If an analysis is performed on the actual treament instead of the


planned treatment, actual treatment variables are required in addition
to the planned treatment variables

10

Subject-Level Dataset (ADSL)


One record per subject
All the variables for describing the analysis population

Demographic data (age, sex, race, other relevant factors)


Baseline characteristics
Disease factors
Treatment code/group
Factors that could affect response to therapy
Other relevant variables (smoking, alcohol intake, ....)
Population flags

Data included in the subject-level analysis dataset can


be used as source for data used in other analysis
datasets (derive variables only once!)

11

ADSL, Example

SAMPLE DATASET FOR ADSL


Obs

Studyid

USUBJID

SAFETY

ITT

PPROT

COMPLT

XX0001

0001-1

XX0001

0001-2

DSREAS

AGE

AGEGRP

30

21-35

ADVERSE EVENT

38

36-50

SAMPLE DATASET FOR ADSL (continued)


Obs

AGEGRPN

SEX

RACE

RACEN

TRTP

TRTPN

HEIGHTBL

WEIGHTBL

BMIBL

WHITE

DRUG A

170

63.5

21.97

ASIAN

PLACEBO

183

86.2

25.74

Dataset named
ADxxxxxx

SDTM variable
with no changes

ADaM Treatment
Variable

12

Vital Signs Analysis Dataset: horizontal structure


VS SDTM is
the source

Demographic
variables

Treatment
variables
Analysis
Population

Variable
Name
STUDYID
USUBJID

Variable Label

Type

Source

Char
Char

Controlled Terms
or Format
$15.
$30.

Study Identifier
Unique Subject Identifier

SUBJID
SITEID
VSBLFL

Subject Identifier for the Study


Study Site Identifier
Baseline Flag

Char
Char
Char

$5.
$5.
Y or Null

ADSL.SUBJID
ADSL.SITEID
VS.VSBLFL (where VS.VSTESTCD in ('DIABP' 'SYSBP' 'HR'))

VISITNUM
VISIT

Visit Number
Visit Name

Num
Char

3.
$100.

VS.VISITNUM
VS.VISIT

WGT_BASE Body Weight Baseline Measurement


WGT_VAL
Body Weight Visit Measurement
WGT_CHG Body Weight Change from Baseline

Num
Num
Num

5.1
5.1
5.1

VS.VSSTRESN (where VSTESTCD = 'WEIGHT' and


VS.VSBSFL='Y')(where VSTESTCD = 'WEIGHT' )
VS.VSSTRESN
ADVS.WGT_VAL - ADVS.WGT_BASE

HR_BASE
HR_VAL
HR_CHG

Heart Rate (beats/minute) Baseline


Heart Rate (beats/minute) Visit
Heart Rate (beats/minute) Change

Num
Num
Num

3.
3.
3.

VS.VSSTRESN (where VSTESTCD = 'HR' and VS.VSBSFL='Y')


VS.VSSTRESN (where VSTESTCD = 'HR' )
ADVS.HR_VAL - ADVS.HR_BASE

SBP_BASE
SBP_VAL

Systolic Blood Pressure (mmHg) Baseline Num


Systolic Blood Pressure (mmHg) Visit
Num

3.
3.

VS.VSSTRESN (where VSTESTCD = 'SYSBP' and


VS.VSBSFL='Y')
VS.VSSTRESN (where VSTESTCD = 'SYSBP' )

SBP_CHG
.......
AGE

Systolic Blood Pressure (mmHg) Change Num

3.

ADVS.SBP_VAL - ADVS.SBP_BASE

Age in AGEU at Reference Date/Time

Num

3.

ADSL.AGE

AGEU
SEX
SEXN

Age Units
Sex
Sex Numeric

Char
Char
Num

years
F,M,U
1=Male, 2=Female

ADSL.AGEU
ADSL.SEX
ADSL.SEXN

RACE

Race

Char

ADSL.RACE

RACEN

Race Numeric

Num

White, Black, Hispanic,


Asian, Other
1=White, 2=Black
3=Hispanic, 4=Asian
9=Other

...........
TRTP

Planned Treatment Group

Char

TRTPN
TRTA
TRTAN

Planned Treatment Group Numeric Code Num


Actual Treatment Group
Char
Actual Treatment Group Numeric Code
Num

SAFETY
FULLSET
PPROT

Safety Set
Full Analysis Set
Per-Protocol Set

Char
Char
Char

VS.STUDYID
VS.USUBJID

ADSL is the
source

ADSL.RACEN

ADSL.TRTP
ADSL.TRTPN
ADSL.TRTA
ADSL.TRTAN
Y, N
Y, N
Y, N

ADSL.SAFETY
ADSL.FULLSET
ADSL.PPROT

13

Adverse Events Analysis Dataset


Keep variables
from AE SDTM

Add numeric
variables

Variable
Name

Variable Label

Type

STUDYID

Study Identifier

Char

Variable
Name

Variable Label

Type

USUBJID

Unique Subject Identifier

Char

AEPRE

Pre-Treatment Adverse Event

Char

SUBJID

Subject Identifier for the Study

Char

AETRTEM

Treatment Emergent Adverse Event

Char

SITEID

Study Site Identifier

Char
AEPOST

Post-Treatment Adverse Event

Char

AESEQ

Sequence Number

Num

AETERM

Reported Term for the Adverse Event

Char

HEIGHTBL

Baseline Height (cm)

Num

AEDECOD

Dictionary-Derived Term

Char

WEIGHTBL

Baseline Body Weight (kg)

Num

AEBODSYS Body System or Organ Class

Char

AGE

Age in AGEU at Reference Date/Time

Num

AESEV

Severity/Intensity

Char

AGEU

Age Units

Char

AESEVN

Severity/Intensity Numeric

Num

SEX

Sex

SEXN

Sex Numeric

Num

AESER

Serious Event

Char

RACE

Race

Char

AEACN

Action Taken with Study Treatment

Char

RACEN

Race Numeric

Num

AEREL

Causality

Char

AERELN

Causality Numeric

Num

RACEOTH

Specify Other Race

AEOUT

Outcome of Adverse Event

Char

AEOUTN

Outcome of Adverse Event Numeric

Num

.....

Add derived
variables

Add flags for


Treatment
Emergent AE

Add demographic
Char
variables from ADSL

.....

Char
Add treatment
variables from ADSL

TRTP

Planned Treatment Group

Char

TRTPN

Planned Treatment Group Numeric Code

Num

AESTDT

Start Date of Adverse Event Numeric

Num

TRTA

Actual Treatment Group

Char

AESTDY

Study Day of Onset of Event

Num

TRTAN

Actual Treatment Group Numeric Code

Num

SAFETY

Safety Set

AERELAT

Event Related to Study Drug

Char

AEDUR

Duration of Adverse Event (days)

Num

....

Add population
flag from ADSL

Char

14

Analysis Dataset Documentation


Provide the link between the general description of the
analysis (as found on the study protocol, SAP) and the
source data
The source of the analysis dataset should be clearly
documented, allowing the reviewer to trace back data
items to their source
Documentation includes:

Analysis dataset metadata


Analysis variable metadata
Analysis results metadata
Other (SAS programs and/or other written documentation)

15

Analysis Dataset Metadata


Should contain:
Dataset name, Dataset description, Structure, Purpose, Keys,
Location, Documentation
Link to detailed
documentation

16

Analysis Variable Metadata


ADSL (example from CDISC guideline) / 1
describes each variable in the analysis dataset
provides details about where the variable came from in
the source data or how the variable was derived

17

ADSL / 2

18

ADSL / 3

19

Analysis Results Metadata


Describes the major attributes of each important analysis
results

A unique identifier
for the analysis

Reason for performing the


analysis (pre-specified,
exploratory, reg request

Analysis name

Description

Table 5.1: Demographic data - full analysis set

Summary of demographic data for full


analysis set
Summary of demographic data for perprotocol set
Summary of demographic data for safety set

Table 5.2: Demographic data - per-protocol set


Table 5.3: Demographic data - safety set
Table 5.4: Demographic data by country - full
analysis set
Table 5.5: Demographic data by gender - full
analysis set

Reason

Analysis pre-specified in
SAP
Analysis pre-specified in
SAP
Analysis pre-specified in
SAP
Summary of demographic data by country for Analysis pre-specified in
full analysis set
SAP
Summary of demographic data by gender for Analysis pre-specified in
full analysis set
SAP

Name of the datasets


/ subset used in the
analysis
Dataset

Documentation

ADSL
select records with FULLSET=Y
ADSL
select records with PPROT=Y
ADSL
select records with SAFETY=Y
ADSL
select records with FULLSET=Y
ADSL
select records with FULLSET=Y

SAP Section XX
SAP Section XX
SAP Section XX
SAP Section XX
SAP Section XX

20

Select a strategy for ADaM implementation


http://www.lexjansen.com/pharmasug/2005/fdacompliance/fc03.pdf

Parallel method
SDTM
CDMS
ADaM

Linear method
CDMS

ADaM

Draft
SDTM

ADaM

Hybrid method
CDMS

SDTM

SDTM

Other approaches

21

Implementation issues, Helsinn experience

Key aspects discussed during implementation:

Datasets

Vertical vs horizontal structure


Analysis ready and redundancy
Clear link between SDTM and ADaM (AE ADAE, VS ADVS etc.): traceability

Subject level: full complaint with CDISC ADaM


Defined a generation sequence
One analysis dataset for each SDTM dataset (ADAE, ADIE, ADMH, ADPE, ADEX, ADCM,
ADLB etc.)
More than one dataset when needed (example EG, ADEG for par and findings)
Keep the vertical structure when possible (just add variables)
Efficacy datasets: study specific, no specifications
Additional datasets needed for the analysis may be created (example: to store
totals/denominators to be used in the summaries)

Variables

Variables in SDTM SUPPQUAL merged back to the original domain (ex. Race, other)
Common set of variables in each dataset (age, gender, race, stratifications variables,
treatment planned/actual)
Analysis population flag: added to each dataset
Numeric variables: added as needed for the analysis (dates, numeric version of categorical
variables)
Add dataset specific variables (analysis day, TE, change from baseline etc.)
22

Benefits
(even if you are not working on a submission)

Minimized programming effort


Reduce risk of programming error
Less validation effort
Reuse of programs
Reduce the time need for analysis datasets
creation (we can spend more time to analysis)
Integrated Analysis make easier

23

ADaM Work in Progress

Develop Implementation Guide

The ADaM team is working on an implementation guide that will build on


the considerations discussed in the Analysis Data Model Version 2.0.
This implementation guide will outline specific standards and
recommendations for the structure and content of analysis data sets
will contain a library of examples of analysis data sets that would serve to
support specific statistical methodology used within clinical trials, such as
- Change from Baseline
- Time to Event

Develop Training Course

Cross-team activities including:

- Categorical Analysis
- Adverse Events

SDS/ADaM Pilot project


DEFINE.XML and analysis data
Trial Design Model for 2-3 frequently used trial designs
Controlled terminology to be used for analysis data

24

Questions

25

Analysis Dataset Creation documentation


back-up
Descriptions for each dataset:
the source datasets
processing steps
scientific decisions pertaining to creation

Clearly distinguish:
derivations & decision rules specified a priori
decisions that were data-driven

Key issues:
derived variables documentation: algorithms
handling of missing data
data item specific derivations, i.e change to a data value for a
specific observation

Analysis dataset creation programs may be used as


documentation
26

Standardized process for analysis datasets


creation back-up
ADSL should be created before other ADaM datasets
Derivations should be performed only once (more
efficient and reduces the risk of discrepancies)
Define the datasets creation order (depending on
existing relationships between ADaM datasets)
Some SDTM variables may be not needed in the ADaM
The list of ADaM datasets may be shorter than the
SDTM (no suppqual datasets, efficacy data may be
combined in one dataset)
There is still a lot of freedom in the possible set-up of ADaM structure.
Define a standard approach!

27

Analysis Results Metadata back-up


Describes the major attributes of each important
analysis results
Links statistical results to
analysis datasets and programs used to generate the analysis
metadata describing the analysis
reason for performing the analysis

Should contain
ANALYSIS NAME: A unique identifier for this analysis. May include a table
number or other sponsor-specific reference.
DESCRIPTION: A text description documenting the analysis performed.
REASON: The reason for performing this analysis. Examples may include Prespecified, Exploratory, and Regulatory Request.
DATASET: the name of the analysis dataset used for this analysis. The column
may also include specific selection criteria (e.g. where SAFETY=Y)
DOCUMENTATION: information about how the analysis was performed (text
description, link to another document or the analysis generation program)

28

You might also like