You are on page 1of 13

DESIGN AND ANALYSIS OF

EXPERIMENTS IN
THE HEALTH SCIENCES

DESIGN AND ANALYSIS


OF EXPERIMENTS IN
THE HEALTH SCIENCES
Gerald van Belle
School of Public Health
The University of Washington
Seattle, WA

Kathleen F. Kerr
School of Public Health
The University of Washington
Seattle, WA

Copyright 2012 by John Wiley & Sons, Inc. All rights reserved
Published by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or
by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as
permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to
the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax
(978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should
be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ
07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representations or warranties with respect to the accuracy or
completeness of the contents of this book and specically disclaim any implied warranties of
merchantability or tness for a particular purpose. No warranty may be created or extended by sales
representatives or written sales materials. The advice and strategies contained herein may not be suitable
for your situation. You should consult with a professional where appropriate. Neither the publisher nor
author shall be liable for any loss of prot or any other commercial damages, including but not limited to
special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our
Customer Care Department within the United States at (800) 762-2974, outside the United States at (317)
572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may
not be available in electronic formats. For more information about Wiley products, visit our web site at
www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Van Belle, Gerald.
Design and analysis of experiments in the health sciences / Gerald van
Belle, Kathleen F. Kerr.1st ed.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-470-12727-8 (hardback)
1. Medical informatics. 2. Medical sciencesStatistical methods.
3. Experimental design. I. Kerr, Kathleen F., 1970 II. Title.
R858.V36 2012
610.72 7dc23
2011044306
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1

GvB: For West African Vocational Schools (WAVS)


KK: For Alex and Eve

CONTENTS
Preface
1

xiii

The Basics
1.1
1.2
1.3
1.4
1.5
1.6
1.7

1.8
1.9
1.10

1.11

1.12
1.13

Four Basic Questions / 1


Variation / 4
Principles of Design and Analysis / 5
Experiments and Observational Studies / 9
Illustrative Applications of Principles / 11
Experiments in the Health Sciences / 12
Adaptive Allocation / 15
1.7.1
Equidistribution / 15
1.7.2
Adaptive Allocation Techniques / 16
Sample Size Calculations / 18
Statistical Models for the Data / 20
Analysis and Presentation / 22
1.10.1 Graph the Data in Several Ways / 22
1.10.2 Assess Assumptions of the Statistical Model / 22
1.10.3 Conrmatory and Exploratory Analysis / 23
1.10.4 Missing Data Need Careful Accounting / 23
1.10.5 Statistical Software / 24
Notes / 24
1.11.1 Characterization Studies / 24
1.11.2 Additional Comments on Balance / 25
1.11.3 Linear and Nonlinear Models / 25
1.11.4 Analysis of Variance Versus Regression Analysis / 26
Summary / 26
Problems / 26

vii

viii

CONTENTS

Completely Randomized Designs


2.1
2.2
2.3
2.4
2.5

2.6
2.7
2.8
2.9
2.10

2.11

2.12
2.13
3

Randomization / 31
Hypotheses and Sample Size / 32
Estimation and Analysis / 32
Example / 34
Discussion and Extensions / 36
2.5.1
Preparing Data for Computer Analysis / 36
2.5.2
Treatment Assignment in this Example / 37
2.5.3
Check on Randomization / 37
2.5.4
Partitioning the Treatment Sum of Squares / 37
2.5.5
Alternative Endpoints / 38
2.5.6
Dummy Variables / 38
2.5.7
Contrasts / 39
Randomization / 41
Hypotheses and Sample Size / 41
Estimation and Analysis / 41
Example / 42
Discussion and Extensions / 44
2.10.1 Two Roles for ANCOVA / 44
2.10.2 Partitioning of Sums of Squares / 45
2.10.3 Assumption of Parallelism / 46
Notes / 47
2.11.1 Constrained Randomization / 47
2.11.2 Assumptions of the Analysis of Variance and
Covariance / 48
2.11.3 When the Assumptions Dont Hold / 49
2.11.4 Alternative Graphical Displays / 50
2.11.5 Sample Sizes for More Than Two Levels / 51
2.11.6 Limitations of Computer Output / 51
2.11.7 Unequal Sample Sizes / 51
2.11.8 Design Implications of the CRD / 51
2.11.9 Power and Alternative Hypotheses / 52
2.11.10 Regression or Analysis of Variance? / 52
2.11.11 Bioassay / 52
Summary / 53
Problems / 53

Randomized Block Designs


3.1
3.2

31

Randomization / 64
Hypotheses and Sample Size / 64

63

CONTENTS

3.3
3.4
3.5

3.6
3.7
3.8
3.9
3.10

3.11
3.12
3.13
3.14
3.15
3.16

3.17
3.18

Estimation and Analysis / 64


Example / 65
Discussion and Extensions / 67
3.5.1
Evaluating Model Assumptions / 67
3.5.2
Multiple Comparisons / 69
3.5.3
Number of Treatments and Block Size / 71
3.5.4
Missing Data / 71
3.5.5
Does It Always Pay to Block? / 71
3.5.6
Concomitant Variables / 72
3.5.7
Imbalance / 74
Randomization / 77
Hypotheses and Sample Size / 77
Estimation and Analysis / 77
Example / 77
Discussion and Extensions / 79
3.10.1 Implications of the Model / 79
3.10.2 Number of Latin Squares / 79
Randomization / 80
Hypotheses and Sample Size / 81
Estimation and Analysis / 82
Example / 82
Discussion and Extensions / 85
3.15.1 Partially Balanced Incomplete Block Designs / 85
Notes / 86
3.16.1 Analysis Follows Design / 86
3.16.2 Relative Efciency / 86
3.16.3 Additivity of the Model / 87
Summary / 88
Problems / 88

Factorial Designs
4.1
4.2
4.3
4.4
4.5
4.6

ix

Randomization / 95
Hypotheses and Sample Size / 95
Estimation and Analysis / 96
Example 1 / 97
Example 2 / 100
Notes / 103
4.6.1
Regression Analysis Approaches / 103
4.6.2
Almost Factorial / 105

93

CONTENTS

4.7
4.8

Multilevel Designs
5.1
5.2
5.3
5.4
5.5

5.6

5.7
5.8

4.6.3
Design Structure and Factor Structure / 105
4.6.4
Effect and Interaction Tables / 105
4.6.5
Balanced Design / 105
4.6.6
Missing Data / 106
4.6.7
Fixed, Random, and Mixed Effects Models / 106
4.6.8
Fractional Factorials / 108
Summary / 109
Problems / 110

Randomization / 118
Hypotheses and Sample Size / 118
Estimation and Analysis / 119
Example / 121
Discussion and Extensions / 127
5.5.1
Whole-Plot and Split-Plot Variability / 127
5.5.2
Getting the Computer to Do the Right Analysis / 128
Notes / 129
5.6.1
Fractional FactorialsExample / 129
5.6.2
Missing Data / 129
Summary / 130
Problems / 130

Repeated Measures Designs


6.1
6.2
6.3
6.4
6.5
6.6

6.7
6.8

117

Randomization / 136
Hypotheses and Sample Size / 136
Estimation and Analysis / 137
Example / 139
Discussion and Extensions / 142
Notes / 143
6.6.1
RBD and RMD / 143
6.6.2
Missing Data: The Fundamental Challenge in RMD / 143
6.6.3
Correlation Structure / 144
6.6.4
Derived Variable Analysis / 144
Summary / 144
Problems / 145

135

CONTENTS

Randomized Clinical Trials

xi

149

7.1
7.2
7.3
7.4
7.5
7.6
7.7

Endpoints / 151
Randomization / 152
Hypotheses and Sample Size / 153
Follow-Up / 154
Estimation and Analysis / 154
Examples / 155
Discussion and Extensions / 159
7.7.1
Statistical Signicance and Clinical Importance / 159
7.7.2
Ethics / 161
7.7.3
Reporting / 162
7.8 Notes / 163
7.8.1
Multicenter Trials / 163
7.8.2
International Harmonization / 167
7.8.3
Data Safety Monitoring / 167
7.8.4
Ancillary Studies / 168
7.8.5
Subgroup Analysis and Data Mining / 168
7.8.6
Meta-Analysis / 169
7.8.7
Authorship and Recognition / 169
7.8.8
Communication / 169
7.8.9
Data Sharing / 170
7.8.10 N-of-1 Trials / 170
7.9 Resources / 171
7.10 Summary / 171
7.11 Problems / 171
8

Microarrays
8.1
8.2

8.3
8.4
8.5
8.6
8.7

Introduction / 179
Genes, Gene Expression, and Microarrays / 179
8.2.1
Genes and Gene Expression / 179
8.2.2
Gene Expression Microarrays / 180
Examples of Microarray Studies / 186
Replication and Sample Size / 188
Blocking and Microarrays / 189
Randomization and Microarrays / 190
Microarray Data Analysis Issues / 191
8.7.1
Image Analysis / 191
8.7.2
Data Preprocessing / 193

179

xii

CONTENTS

8.7.3
Identifying Differentially Expressed Genes / 196
8.7.4
Multiple Testing / 196
8.7.5
Gene Set Analysis / 198
8.7.6
The Class Prediction Problem / 198
8.8 Data Analysis Example / 200
8.9 Notes / 202
8.9.1
Sample Size / 202
8.9.2
FDR Estimation / 202
8.9.3
Evaluation of Data Preprocessing Methods / 203
8.10 Summary / 203
8.11 Problems / 203
Bibliography

207

Author Index

217

Subject Index

223

PREFACE
Why another book on the design and analysis of experiments? There are many design
books with engineering or agricultural applications, but there are few books with a
focus on the health sciences. The focus of this book is laboratory, animal, and human
experiments and scientic investigations in the health sciences. More specically, we
sought to incorporate some newer research areas such as microarrays into the broad
context of design. Finally, it is our opinion that clinical trials are a crucial topic to
cover in a design book for health scientists. Hence this book.
The principles of design and analysis have been enunciated for many years (Fisher,
1925, 1971). It is the application of these principles to research in the health sciences
that forms the content of this book. We illustrate the principles with examples from a
very diverse set of areas from within the health sciences. Most examples are studies
involving humans and animals.
There is a close linkage between design and analysis. Design drives the analysis, and
analysis reveals the design. However, the tie is not one-to-one. Alternative analyses
are available for a specic design and vice versa. Many books on design stress the
analysis. This book attempts to balance aspects of design and analysis.
This book presupposes an introduction to basic statistical concepts: the two laws
of probability (addition and multiplication), t-tests (for both independent and paired
data), simple linear regression analysis including a test for signicance of the regression coefcient, hypothesis testing, and estimation. It assumes that you have seen the
formula for sample size for comparing the means of two groups. Hence, you know
what is meant by a Type I error, Type II error, power, and one-sided versus two-sided
hypothesis.
Chapter 1 discusses basic principles that provide a coherent structure for carrying
out experiments. A thorough understanding and conscientious application of these
principles will pay off in terms of validity of inferences, economy of study, and
generalizability. Chapters 26 discuss ve types of designsand simple extensions
that form the basis for most experimental structures. We chose these types of designs
because they cover the majority research designs in the health sciences. They consist of
completely randomized, randomized block (including Latin squares and incomplete
blocks), factorial, multilevel, and repeated measures designs.

xiii

xiv

PREFACE

Each of the designs in Chapters 26 is discussed under the following headings:


1.
2.
3.
4.
5.
6.
7.
8.

Randomization
Hypotheses and sample size
Estimation and analysis
Example
Discussion and extensions
Notes
Summary
Problems

Chapters 7 and 8 represent specic applications and illustrations of the above


designs to randomized clinical trials and microarrays.
You may notice that journals such as Science and Nature containing reports of
many experiments do not refer much to the concepts enunciated in this book. This is
somewhat unfortunate, because good scientists will follow the principles presented
in this book. However, the presentations in these journals are highly condensed. Only
key results are presented, with several years work often summarized by one table or
one gure.
A bit of historical context is useful in understanding the design of experiments.
It is not just a collection of methods coming down from heaven like the statue of
Athena. In the 1930s, the computational effort was a real stumbling block and an
important research area was nding shortcut ways to the analysis. The computational
burden is no longer a problem today. There is an ongoing interplay between statistical methodology, computational resources, and societal and scientic interests,
each helping to propel advances in the others. For example, the increasing emphasis
on clinical trials starting in the 1960s led to the development of survival analysis,
database management, and appropriate computational procedures.
The website for this book (vanbelle.org) is freely accessible and acts as a supplement. It contains most of the data sets used in the text, frequently in a format that can
be easily imported into most statistical software. The publisher, John Wiley & Sons,
has graciously allowed us to post Chapter 7, Randomized Clinical Trials; this chapter
can be downloaded for free. The hope is, of course, that youll be intrigued enough
by that one chapter to buy the book, and nd the book a useful resource.
On vanbelle.org you can also nd the web pages for the book Statistical Rules
of Thumb by Gerald van Belle. Chapter 2 of that book dealing with sample size
calculations can be downloaded (also with permission of our publisher).
All data analyses today involve computers, and hence, computer packages. These
packages are changed constantly (updated) and new packages are introduced. We
have decided to extract the essence of the computer analysis and present this in the
text. Almost all of the analyses were run in Stata or R. The website is intended to
be dynamic. For example, some of you will rerun an analysis using your preferred

PREFACE

xv

statistical package. If you send it to the website, it will be posted under the appropriate
statistical package heading.
We are indebted to many colleagues: Larissa Stanberry for help with graphics,
Latex, and formatting; Corinna Mar for creative and graceful implementation of
graphics; Art Peterson for helpful discussions about clinical trials; Theo Bammler,
Dick Beyer, and Emily Hansen for feedback and ideas about the microarray chapter;
Sandra Coke for producing some of the graphics; and the many journals and authors
that allowed us to use data from their publications. Of course, we are responsible for contentespecially the errors. We are also indebted to our editor, Susanne
Steitz-Filler, for her patience as we extended our deadlines.
Books generate royalties. All the royalties from this book will be distributed to
charitable organizations as follows. Gerald van Belles share is assigned to West
African Vocational Schools (WAVS) (wavschools.org) that works in Guinea-Bissau,
one of the poorest nations in the world. Kathleen Kerrs share is dedicated to Northwest
Harvest and the Seattle Public Library Foundation.
Gerald van Belle
Kathleen F. Kerr

You might also like