Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Cancer Genomics: From Bench to Personalized Medicine
Cancer Genomics: From Bench to Personalized Medicine
Cancer Genomics: From Bench to Personalized Medicine
Ebook1,633 pages17 hours

Cancer Genomics: From Bench to Personalized Medicine

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Cancer Genomics addresses how recent technological advances in genomics are shaping how we diagnose and treat cancer. Built on the historical context of cancer genetics over the past 30 years, the book provides a snapshot of the current issues and state-of-the-art technologies used in cancer genomics. Subsequent chapters highlight how these approaches have informed our understanding of hereditary cancer syndromes and the diagnosis, treatment and outcome in a variety of adult and pediatric solid tumors and hematologic malignancies. The dramatic increase in cancer genomics research and ever-increasing availability of genomic testing are not without significant ethical issues, which are addressed in the context of the return of research results and the legal considerations underlying the commercialization of genomic discoveries. Finally, the book concludes with "Future Directions", examining the next great challenges to face the field of cancer genomics, namely the contribution of non-coding RNAs to disease pathogenesis and the interaction of the human genome with the environment.

  • Tools such as sidebars, key concept summaries, a glossary, and acronym and abbreviation definitions make this book highly accessible to researchers from several fields associated with cancer genomics.
  • Contributions from thought leaders provide valuable historical perspective to relate the advances in the field to current technologies and literature.
LanguageEnglish
Release dateNov 21, 2013
ISBN9780123972743
Cancer Genomics: From Bench to Personalized Medicine

Related to Cancer Genomics

Related ebooks

Medical For You

View More

Related articles

Related categories

Reviews for Cancer Genomics

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Cancer Genomics - Graham Dellaire

    1

    Introduction

    Outline

    Chapter 1 Historical Perspective and Current Challenges of Cancer Genomics

    Chapter 1

    Historical Perspective and Current Challenges of Cancer Genomics

    Graham Dellaire¹ and Robert J. Arceci²,    ¹Departments of Pathology and Biochemistry & Molecular Biology, Dalhousie University, Halifax, NS, Canada,    ²Children’s Center for Cancer and Blood Disorders, Hematology/Oncology and The Ron Matricaria Institute of Molecular Medicine, Phoenix Children’s Hospital, Department of Child Health, University of Arizona, College of Medicine, Phoenix, AZ, USA

    From the first descriptions of cancer in Egypt around 3000 BC to our current one week whole-genome sequence, the history of integrating new ideas into the practice of medicine has been unrelenting, although not without its failures as well its successes. This chapter represents a brief historical summary of some of the key success stories in our understanding of cancer that has led to our current age of cancer genomics. As the Chinese proverb states, When you drink from the well, remember who dug it, and reflecting on this rich and varied history, we conclude the chapter with a discussion of current and future challenges to the application of our new and developing understanding of cancer genomes to patient therapy.

    Keywords

    chromothripsis; chromoplexy; DNA microarrays; DNA sequencing; epigenetics; kataegis; massively parallel sequencing; Philadelphia chromosome; RNA-sequencing

    Contents

    A Historical Perspective on the Development of Cancer Genomics

    Current and Future Challenges

    Glossary

    Abbreviations

    References

    Key Concepts

    • Advances in nucleic acid sequencing technology have had a profound impact on the field of cancer genomics and have enabled the interrogation of the genetic basis of cancer at the single nucleotide level

    • Cancer genomics has provided a detailed view of the complexity of the cancer genome, including the extraordinary ability to sustain and thrive on alterations of DNA

    • Translating cancer genomics into clinically real time and actionable personalized medicine is beginning to be tested, although significant advances in the speed of providing molecular data, analysis, utilizing combination targeted agents and understanding clinical response or no response will require new generations of technology and bioinformatic tools

    A Historical Perspective on the Development of Cancer Genomics

    Several hundred years BC, Hippocrates is attributed with providing us with the term carcinoma and thus cancer, originating possibly from the image of finger-like extensions (veins) from a tumorous (main body of the tumor) breast lesion that shared resemblance to the shape of a crab. Around 400 years later, the Roman physician Celsus translated the Greek (karkinos) into the Latin word for crab, which led to the term cancer [1,2]. A relative late-comer to this nomenclature narrative, in 168 BC, Galen introduced the terms "oncos, meaning swelling" to describe tumors, leading to the term defining the field of oncology [3,4].

    The next 2000 years witnessed several key events that helped to refine further the still ongoing main areas of cancer investigation and treatment. Maimonides in AD 1190 appears to have been the first to document surgically removing tumors [3]. The recognition of cancer clustering in distinct populations was introduced in 1713 with Razmazzini’s observation of the low cervical but high breast cancer incidence in nuns [2]. The observation that environmental and occupational exposures can be associated with increased incidence of specific cancers also became evident. In the first half of the 1800s, Recmier appears to have reignited the flare for nomenclature by writing about metastasis in 1829 to describe the movement of some cancers to different parts of the body [3]. Muller’s notes on the cellular origin of cancer also in 1838, and Paget’s subsequent seed and soil hypothesis over 50 years later in 1889, established the cognitive paradigm for the cell biological basis for cancer and the concept of microenvironmental niches [2,3]. The first half of the 20th century was ushered in by a set of remarkable observations in cancer biology that were made before the discovery of DNA. These included the theories of Rous regarding the potential viral origin of some cancers in 1910, derived from his work on avian sarcomas, and the concept of the somatic mutation theory of cancer by Boveri in 1914 that stemmed from his work on polyspermic development in invertebrates [5,6].

    While not necessarily providing a deeper understanding of the mechanisms of carcinogenesis, the first half of the 20th century in many ways broke open the gates of cancer treatment. After being commissioned by the US government to understand the physiological consequences of nitrogen mustard gas used in warfare, Louis Goodman and Alfred Gilman recognized the key bone marrow toxicity of this agent and subsequently introduced its intravenous use for the treatment of lymphoid malignancies in 1946 [7]. Soon afterward in 1948, the antimetabolite aminopterin was used to treat several children with acute lymphoblastic leukemia by Farber and colleagues, a treatment built on the work of the chemist Subbarao [8]. A decade later, in 1958, Hertz, Li and colleagues reported the first cure of a metastatic tumor, namely a gestational-related choriocarcinoma, with another antimetabolite, methotrexate [9].

    However, despite such encouraging forays into treating patients, few cures were achievable with surgery, radiation therapy and chemotherapy. In this regard, the extraordinary efforts of Ms Mary Lasker following her husband’s death from cancer should not go unmentioned. Through her efforts and the Citizens Committee for the Conquest of Cancer through the 1960s, they challenged government, physicians and scientists to push forth with a War on Cancer [10–12]. And this was in spite of a significant number of naysayers who had concluded in various publications that we knew enough to cure cancer and all that was needed was to translate the knowledge that was available at the time. Such a lack of vision was thankfully thwarted by those who propitiously concluded that only through scientific discovery and its ongoing application would improvements in cancer outcomes occur. In 1971, the US National Cancer Act was passed by Congress and then President Nixon signed it into law within 2 weeks, an astonishingly rapid accomplishment on the part of government and one that should inform current, often stalled efforts [10–13]. The consequences of the above investment, along with other efforts across the globe [14], led to an infusion of intellectual engagement and financial support for conquering cancer. The results included the establishment of clinical trial groups, comprehensive cancer centers, an explosion of new anticancer agents from the lab and from nature, and the beginning of work focused on the biological understanding of cancer. This latter work built of course on the model and profound implications of the seminal discovery of the structure of the DNA double helix by Watson and Crick in 1953 [15], a discovery that would earn them the Nobel Prize in Physiology or Medicine in 1962, an accolade they shared with colleague Maurice Wilkins. In many ways, the journey to our present age of high-throughput and genome-wide discoveries in cancer biology began with this fundamental description of the fabric of life and, as such, this discovery makes a suitable origin point from which to chronicle the key events in cancer genomics in the last 60 years (Figure 1.1).

    Figure 1.1 Historical milestones in cancer genomics. Key milestones in the field of cancer genomics are depicted starting with the elucidation of the structure of DNA by Watson and Crick in 1953. These milestones are depicted over a line graph of the total number of publications listed in the Pubmed database of the National Center for Biotechnology Information (NCBI) with the key-words Cancer+(Genetics or Gene) (in blue), or Cancer+(Genomics or Genome) (in green) from 1945 to 2013.

    With the structure of the molecule of heredity in hand, the latter half of the 20th century saw a number of major contributions to our understanding of the biochemical and genetic underpinnings of cancer. These contributions included the identification of the Philadelphia chromosome as a genetic marker of chronic myelogenous leukemia (CML) by Nowell and Hungerford in 1960, and the subsequent identification of chromosomes 9 and 22 as the translocation partners underlying this anomalous chromosome by Rowley in 1973 [16,17]; the identification of the first cellular proto-oncogene, SRC, by Varmus and Bishop in the 1976 [18] leading to the realization that cellular genes could become deregulated resulting in tumorigenesis; and the identification of the p53 protein in 1979 as the primary molecular target underlying transformation by the DNA tumor virus, simian virus 40 (SV40) [18–20]. Empowered by the molecular tools developed for the study of tumor viruses, scientists studying cancer made many more seminal discoveries in the 1980s and early 1990s, including the identification of several tumor suppressor genes including retinoblastoma (RB) [21,22], the gene encoding p53 (TP53) [18–20,23,24] and adenomatous polyposis coli (APC) gene [25–27]. All these discoveries were foreshadowed by Knudson’s two hit hypothesis of tumorigenesis and his pioneering epidemiological studies of retinoblastoma in 1971 [28], which laid the conceptual framework for how loss of heterozygosity (LOH) of a tumor suppressor gene contributes to cancer development. It was during the 1980s that non-genetic mechanisms of oncogene regulation were first identified. One such epigenetic mechanism of gene regulation was the loss of cytosine nucleotide methylation in CpG doublets, known as hypomethylation, which was first demonstrated by Feinberg and Vogelstein in 1983 and later shown to regulate the expression of oncogenes such as HRAS [29]. Soon after, in 1986, it would be demonstrated by Baylin et al. that increased CpG methylation, termed hypermethylation, also occurred in cancer cells by studying the methylation pattern of the calcitonin gene [30], and, in 1989, Horsthemke and colleagues demonstrated hypermethylation of the RB tumor suppressor gene in retinoblastoma [31].

    Through the 1990s and into the twenty-first century further groundbreaking work would identify key genes underlying hereditary susceptibilities to breast cancer (e.g. BRCA1) [32] and colon cancer (e.g. MSH2) [33,34], the identification of the telomerase gene and demonstration of its role in subverting senescence [35–38], and the discovery of cancer stem cells in leukemias and solid tumors [39–41].

    The sum total of an organism’s genetic information was first referred to as a genome by Hans Winkler in 1920, who used the term genom to describe the haploid chromosomal set of an organism [42]. The term genomics would later be coined in 1986 by Thomas H. Roderick, a geneticist at the Jackson Laboratory in Bar Harbor, Maine [42] and refers to the study of an organism’s entire complement of genetic information. The history of genomics and the application of genomics to cancer biology is a story of technological advances and, arguably, DNA sequencing technologies have had the greatest impact and form the foundation of cancer genomics. These advances began in 1977 with the development of chemical sequencing of DNA by Maxam and Gilbert and the dideoxy nucleotide method of sequencing by Sanger and Coulson (reviewed in [43]). The Sanger technique in particular would go on to dominate DNA sequencing for nearly three decades before the development of massively parallel approaches to sequencing such as pyrosequencing (e.g. 454 sequencing), reverse-termination (e.g. Illumina sequencing), sequencing by ligation (e.g. massively parallel signature sequencing (MPSS)), polony sequencing, and single molecule sequencing (reviewed in [44]). Another transformative technology was the development of DNA microarrays in 1995 [45]. Although DNA microarrays have been largely supplanted by next-generation sequencing and approaches such as RNA-sequencing (RNA-seq) developed in 2008 (reviewed in [46]), they were, and remain, a relatively cost-effective platform for the profiling of gene expression and genotyping. In addition, the development of DNA microarrays led to important advances in our understanding of copy number variation in cancer by dramatically increasing the resolution by which chromosomal changes could be observed when compared to predecessor technologies such as comparative genome hybridization (CGH) [47,48]. The emergence of these novel technologies was leveraged to provide novel insights into the driver mechanisms by which normal cells are transformed into cancer.

    In the late 1990s, the Cancer Genome Anatomy Project (CGAP) [49] was initiated on the shoulders of The Human Genome Project that had begun in the early 1990s and completed in 2003, leading to the identification of a standard set of approximately 25 000 human genes [50]. In 2006, the National Cancer Institute (NCI) launched The Cancer Genome Atlas (TCGA), first as a pilot and then as a full project in 2009, which was rapidly followed by the formation of the International Cancer Genome Consortium (ICGC) in 2010 that set a goal of characterizing 25 000 cancer genomes in 50 different cancer types [51]. More recent efforts of NCI have led to the creation of the Cancer Genome Characterization Initiative (CGCI), the Cancer Target Discovery and Development (CTD²) Program, and a pediatric cancer version of TCGA that links clinical treatment and outcomes with detailed genomic analysis called the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) program (http://ocg.cancer.gov/).

    In the wake of the development of massively parallel sequencing technologies and the launch of large-scale cancer genome projects such as TCGA and the ICGC, there have been a number of important observations in the genetic basis of cancer development and progression that may have never been uncovered without high-throughput and integrated genomics approaches (reviewed by Garraway and Lander [52]). These observations include the discovery of distinctive chromosome shattering events termed chromothripsis, characterized first in chronic lymphocytic leukemia (CLL) but highly prevalent in bone cancer and occurring in up to 3% of all cancers [53]; the discovery of chromosomal chains referred to as chromoplexy in prostate cancer genomes [54,55]; the identification of a novel mechanism of highly localized hypermutation in a single chromosomal region termed kataegis to describe the shower of mutations associated with DNA deaminases-induced breaks in breast cancer genomes [56–58]; and the demonstration of a high degree of previously unrecognized tumor heterogeneity between both cancers of the same tissue and among the clonal populations of cells within a single individual’s tumor [57,59–61]. The latter discovery may be the most profound in that it surely darkens the prospect of successful personalized therapy if strategies are not developed to tackle tumor heterogeneity.

    Current and Future Challenges

    Building on such discoveries, the 21st century has seen and will continue to see the exploitation of cancer genomic and transcriptomic sequencing efforts for the development of mutation directed, targeted therapies, such as imatinib mesylate, a tyrosine kinase inhibitor initially developed to inhibit the growth of leukemia cells carrying the BCR/Abl fusion gene product encoded by the Philadelphia chromosome in chronic myelogenous leukemia [62]. Subsequently, a plethora of other targeted agents have been developed that would make Paul Ehrlich, the originator of the concept of Magic Bullets to treat cancer, a proud progenitor [63]. In addition, there has been an expansion in tumor selective immunologically based therapies using monoclonal antibodies, such as those targeting the HER2 receptor in breast cancer and the epidermal growth factor receptor (EGFR) in lung and other solid tumors [64], built on the work of Kohler and Milstein in the 1970s [65], Cooley, Steinman [66,67] and others (reviewed by [66]) [67,68], who collectively built the scientific foundation from which the promise of immune cell-based therapies and cancer vaccines has been fostered [69].

    Sixty years since the discovery of the structure of DNA, all the progress we have described in our understanding of cancer biology now seems like it is just a prelude to the current era of cancer genomics and personalized medicine. The extraordinarily rapid advances in nucleic acid and protein sequencing have led us to the brink of generating more data than the human brain can usefully manage. This has in turn led to the development of increasingly sophisticated analytical tools. The added complexities of the role of epigenetic regulation, compensatory cell signaling pathways and the profoundly complex universe of the non-coding Dark Matter portion of the genome, have both informed and humbled most investigators and clinicians [52].

    The purpose of this book is to provide a snapshot of where cancer omics and our understanding of the genetic basis of cancer stand today. But we should also reflect on what the key challenges are in translating cancer genomics into meaningful outcomes for cancer patients. While predicting the future is usually futile, there are several observations that seem on solid ground in terms of next steps to applying omics to the effective treatment of patients with cancer as well as providing for earlier detection and predisposition. Although DNA, RNA and protein sequencing are becoming more rapid and accurate, the ability to provide complete sequence coverage, integration for pathway analysis and functional assessment is not yet a reality in clinically actionable time frames for most patients. There are limited numbers of gene mutation and expression-based diagnostics that can certainly be turned around in short order. However, while such information can frequently provide information that can be used to offer novel treatments to patients, the contextual basis to understand treatment responses and, importantly, lack of response, is lacking with such limited approaches.

    The real time ability to provide sufficiently detailed information for informed molecularly guided therapy still requires the development of more rapid, less expensive and integrated sequencing, functional and analytical assessment of a patient’s cancer (including the clonal complexity), as well as their own germline and its associated idiosyncrasies of drug metabolism and sensitivity [70–72]. This further emphasizes the need for physicians to be trained in not only the art of medicine, but also for the ability to act on integrated omic information and interact with multidisciplinary teams of not just other medical specialties, but those skilled in the technical and bioinformatic aspects of disease analysis. In addition, the ability to understand contextual responses is likely to require new approaches to computer learning that can be applied to the cancer problem [73]. The promise of in silico testing of new treatments that has a high level of clinical predictability may eventually pre-empt the need for larger scale testing in humans, an expensive and often risky business at best.

    Our goals (and abilities) to (1) cure everyone, (2) not hurt them while doing it, and (3) do so without spending a disproportionate amount of money, should emerge as guiding principles from these beginning steps into applying genomics to cancer diagnosis and treatment. There is no lack of creativity and passion on the part of those committed to these goals. However, there are real and potential obstacles that represent real enemies of success, as Cyrano de Bergerac’s last words make evident. There is the Falsehood that our knowledge of cancer is sufficient to cure this disease, which we must overcome by recognizing with humility that there is still much to be learned; there is Compromise that is made in choosing treatments for cancer, which must be balanced by striving for better therapies and better outcomes; there is Prejudice that limits our ability to think outside of our own boxes; and there is Treachery, whether intended or not, clothed in the form of overregulation, underfinancing, and overbearing risk aversion. We should not lapse into thinking that treatments have killed more people than cancer.

    While not nearly a completed story of our understanding of cancer, this text and the incredible work of investigators and the profound generosity of patients, who contribute to a goal beyond themselves by taking part in clinical trials, will hopefully continue to drive us forward towards the goal of eradicating the burden of cancer. In this regard, cancer genomics represents both the latest toolkit for the characterization of malignancies and the next step in the evolution of our understanding of the mechanisms of cancer development and progression which, ultimately, will be used to develop safer and more effective cancer therapies.

    Glossary

    Chromoplexy From the Greek plexy, meaning weave or braid, this term refers to large chains of rearrangements of chromosomes than can affect multiple chromosomes.

    Chromothripsis From the Greek thripsis, meaning shattering, this term refers to chromosome shattering with subsequent multiple rearrangements.

    DNA methylation Addition of a methyl group to the cytosine base of DNA. The methylation status of a regulatory DNA sequence can silence or promote the expression of genes. DNA hypomethylation and hypermethylation refer to reduced or increased DNA methylation, respectively.

    DNA microarrays High-throughput gene expression quantification technology based on DNA probe-target hybridization (of RNA or DNA) and subsequent fluorescence detection. Also referred to as gene chips.

    Epigenetic A term used to describe the regulation of gene expression by mechanisms that modify DNA but do not change the sequence of the gene. These mechanisms include the methylation of DNA as well as the post-translational modification of histones by a host of events including phosphorylation, methylation, acetylation and ubiquitination. Although epigenetic changes are dynamic they are also heritable and can persist during cell division and be transmitted to offspring.

    Kataegis A Greek word meaning thunder or thunderstorm, this term refers to patterns of localized hypermutation.

    Massively parallel sequencing A term used to refer collectively to high-throughput DNA sequencing approaches that employ miniaturized and highly parallel platforms to sequence millions of short sequence reads of usually 50–400 nucleotides. These techniques are also referred to as next-generation sequencing (NGS) or second-generation sequencing approaches.

    Philadelphia chromosome A chromosomal abnormality created by the translocation of human chromosomes 9 and 22 (i.e. t(9;22)(q34;q11)) that is associated with chronic myelogenous leukemia (CML). The translocation fuses the BCR and ABL kinase genes resulting in the expression of an oncogenic fusion protein that can be targeted by the tyrosine kinase inhibitor imatinib mesylate.

    RNA-sequencing (RNA-seq) High-throughput (or next-generation) sequencing of a sample’s cDNA to characterize its transcriptome.

    Abbreviations

    CGAP Cancer Genome Anatomy Project

    CGCI Cancer Genome Characterization Initiative

    CGH Comparative genomic hybridization

    CLL Chronic lymphocytic leukemia

    CML Chronic myelogenous leukemia

    CTD² Cancer Target Discovery and Development program

    ICGC International Cancer Genome Consortium

    LOH Loss of heterozygosity

    MPSS Massively parallel signature sequencing

    TCGA The Cancer Genome Atlas

    TARGET Therapeutically Applicable Research to Generate Effective Treatments

    References

    1. Castiglioni A. A History of medicine New York: Alfred A. Knope; 1941.

    2. American Cancer Society. The History of Cancer. Available from: <http://www.cancer.org/acs/groups/cid/documents/webcontent/002048-pdf.pdf>; 2012 [accessed June 2013].

    3. Morton LTM. A chronology of medicine and related sciences Aldershot, England: Scholar Press; 1997.

    4. Gurunluoglu R, Gurunluoglu A. Paul of Aegina: landmark in surgical progress. World J Surg 2003;27(1):18–25.

    5. Manchester KL. Theodor Boveri and the origin of malignant tumours. Trends Cell Biol. 1995;5:384–387.

    6. Rous P. A sarcoma of the fowl transmissible by an agent separable from the tumor cells. J Exp Med. 1911;13:397–411.

    7. Goodman LS, Wintrobe MM, Dameshek W, Goodman MJ, Gilman A, McLennan MT. Nitrogen mustard therapy; use of methyl-bis (beta-chloroethyl) amine hydrochloride and tris (beta-chloroethyl) amine hydrochloride for Hodgkin’s disease, lymphosarcoma, leukemia and certain allied and miscellaneous disorders. J Am Med Assoc. 1946;132:126–132.

    8. Farber S, Diamond LK. Temporary remissions in acute leukemia in children produced by folic acid antagonist, 4-aminopteroyl-glutamic acid. N Engl J Med. 1948;238:787–793.

    9. Hertz R, Bergenstal DM, Lipsett MB, Price EB, Hilbish TF. Chemotherapy of choriocarcinoma and related trophoblastic tumors in women. J Am Med Assoc. 1958;168:845–854.

    10. DeVita Jr. VT, Chu E. A history of cancer chemotherapy. Cancer Res. 2008;68:8643–8653.

    11. Chabner BA, Roberts Jr. TG. Timeline: chemotherapy and the war on cancer. Nat Rev Cancer. 2005;5:65–72.

    12. Gaudilliere JP. Essay review: cancer and science: the hundred years war. J Hist Biol. 1998;31:279–288.

    13. Cairns J. The evolution of cancer research. Cancer Cells. 1989;1:1–8.

    14. Pinell P, Brossat S. The birth of cancer policies in France. Sociol Health Illn. 1988;10:579–607.

    15. Watson JD, Crick FH. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature. 1953;171:737–738.

    16. Nowell PC, Hungerford DA. Chromosome studies on normal and leukemic human leukocytes. J Natl Cancer Inst. 1960;25:85–109.

    17. Rowley JD. Letter: A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature. 1973;243:290–293.

    18. Stehelin D, Varmus HE, Bishop JM, Vogt PK. DNA related to the transforming gene(s) of avian sarcoma viruses is present in normal avian DNA. Nature. 1976;260:170–173.

    19. Linzer DI, Maltzman W, Levine AJ. The SV40 A gene product is required for the production of a 54,000 MW cellular tumor antigen. Virology. 1979;98:308–318.

    20. Lane DP, Crawford LV. T antigen is bound to a host protein in SV40-transformed cells. Nature. 1979;278:261–263.

    21. Cavenee WK, Dryja TP, Phillips RA, et al. Expression of recessive alleles by chromosomal mechanisms in retinoblastoma. Nature. 1983;305:779–784.

    22. Friend SH, Bernards R, Rogelj S, et al. A human DNA segment with properties of the gene that predisposes to retinoblastoma and osteosarcoma. Nature. 1986;323:643–646.

    23. Wolf D, Rotter V. Inactivation of p53 gene expression by an insertion of Moloney murine leukemia virus-like DNA sequences. Mol Cell Biol. 1984;4:1402–1410.

    24. Rotter V, Wolf D, Pravtcheva D, Ruddle FH. Chromosomal assignment of the murine gene encoding the transformation-related protein p53. Mol Cell Biol. 1984;4:383–385.

    25. Groden J, Thliveris A, Samowitz W, et al. Identification and characterization of the familial adenomatous polyposis coli gene. Cell. 1991;66:589–600.

    26. Kinzler KW, Nilbert MC, Su LK, et al. Identification of FAP locus genes from chromosome 5q21. Science. 1991;253:661–665.

    27. Nishisho I, Nakamura Y, Miyoshi Y, et al. Mutations of chromosome 5q21 genes in FAP and colorectal cancer patients. Science. 1991;253:665–669.

    28. Knudson Jr. AG. Mutation and cancer: statistical study of retinoblastoma. Proc Natl Acad Sci USA. 1971;68:820–823.

    29. Feinberg AP, Vogelstein B. Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature. 1983;301:89–92.

    30. Baylin SB, Hoppener JW, de Bustros A, Steenbergh PH, Lips CJ, Nelkin BD. DNA methylation patterns of the calcitonin gene in human lung cancers and lymphomas. Cancer Res. 1986;46:2917–2922.

    31. Greger V, Passarge E, Hopping W, Messmer E, Horsthemke B. Epigenetic changes may contribute to the formation and spontaneous regression of retinoblastoma. Hum Genet. 1989;83:155–158.

    32. Hall JM, Lee MK, Newman B, et al. Linkage of early-onset familial breast cancer to chromosome 17q21. Science. 1990;250:1684–1689.

    33. Leach FS, Nicolaides NC, Papadopoulos N, et al. Mutations of a mutS homolog in hereditary nonpolyposis colorectal cancer. Cell. 1993;75:1215–1225.

    34. Fishel R, Lescoe MK, Rao MR, et al. The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis colon cancer. Cell. 1993;75:1027–1038.

    35. Nakamura TM, Morin GB, Chapman KB, et al. Telomerase catalytic subunit homologs from fission yeast and human. Science. 1997;277:955–959.

    36. Meyerson M, Counter CM, Eaton EN, et al. hEST2, the putative human telomerase catalytic subunit gene, is up-regulated in tumor cells and during immortalization. Cell. 1997;90:785–795.

    37. Bodnar AG, Ouellette M, Frolkis M, et al. Extension of life-span by introduction of telomerase into normal human cells. Science. 1998;279:349–352.

    38. Vaziri H, Benchimol S. Reconstitution of telomerase activity in normal human cells leads to elongation of telomeres and extended replicative life span. Curr Biol. 1998;8:279–282.

    39. Al-Hajj M, Wicha MS, Benito-Hernandez A, Morrison SJ, Clarke MF. Prospective identification of tumorigenic breast cancer cells. Proc Natl Acad Sci USA. 2003;100:3983–3988.

    40. Bonnet D, Dick JE. Human acute myeloid leukemia is organized as a hierarchy that originates from a primitive hematopoietic cell. Nat Med. 1997;3:730–737.

    41. Bhatia M, Wang JC, Kapp U, Bonnet D, Dick JE. Purification of primitive human hematopoietic cells capable of repopulating immune-deficient mice. Proc Natl Acad Sci USA. 1997;94:5320–5325.

    42. Yadav SP. The wholeness in suffix -omics, -omes, and the word om. J Biomol Tech. 2007;18:277.

    43. Hutchison III CA. DNA sequencing: bench to bedside and beyond. Nucleic Acids Res. 2007;35:6227–6237.

    44. Pettersson E, Lundeberg J, Ahmadian A. Generations of sequencing technologies. Genomics. 2009;93:105–111.

    45. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270:467–470.

    46. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63.

    47. Pollack JR, Perou CM, Alizadeh AA, et al. Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat Genet. 1999;23:41–46.

    48. Pinkel D, Segraves R, Sudar D, et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet. 1998;20:207–211.

    49. Strausberg RL. The Cancer Genome Anatomy Project: new resources for reading the molecular signatures of cancer. J Pathol. 2001;195:31–40.

    50. Timeline of genomics. Genomics Proteomics Bioinformatics 2004;2:132–142.

    51. Hudson TJ, Anderson W, Artez A, et al. International network of cancer genome projects. Nature. 2010;464:993–998.

    52. Garraway LA, Lander ES. Lessons from the cancer genome. Cell. 2013;153:17–37.

    53. Stephens PJ, Greenman CD, Fu B, et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell. 2011;144:27–40.

    54. Berger MF, Lawrence MS, Demichelis F, et al. The genomic complexity of primary human prostate cancer. Nature. 2011;470:214–220.

    55. Baca SC, Prandi D, Lawrence MS, et al. Punctuated evolution of prostate cancer genomes. Cell. 2013;153:666–677.

    56. Taylor BJ, Nik-Zainal S, Wu YL, et al. DNA deaminases induce break-associated mutation showers with implication of APOBEC3B and 3A in breast cancer kataegis. Elife. 2013;2:e00534.

    57. Nik-Zainal S, Alexandrov LB, Wedge DC, et al. Mutational processes molding the genomes of 21 breast cancers. Cell. 2012;149:979–993.

    58. Lada AG, Dhar A, Boissy RJ, et al. AID/APOBEC cytosine deaminase induces genome-wide kataegis. Biol Direct. 2012;7:47 [discussion].

    59. Gerlinger M, Rowan AJ, Horswell S, et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med. 2012;366:883–892.

    60. Ding L, Ley TJ, Larson DE, et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012;481:506–510.

    61. Shah SP, Morin RD, Khattra J, et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature. 2009;461:809–813.

    62. Goldman JM, Melo JV. Chronic myeloid leukemia – advances in biology and new approaches to treatment. N Engl J Med. 2003;349:1451–1464.

    63. Strebhardt K, Ullrich A. Paul Ehrlich’s magic bullet concept: 100 years of progress. Nat Rev Cancer. 2008;8:473–480.

    64. Dienstmann R, Markman B, Tabernero J. Application of monoclonal antibodies as cancer therapy in solid tumors. Curr Clin Pharmacol. 2012;7:137–145.

    65. Kohler G, Milstein C. Continuous cultures of fused cells secreting antibody of predefined specificity. Nature. 1975;256:495–497.

    66. Lugosi L. Theoretical and methodological aspects of BCG vaccine from the discovery of calmette and guerin to molecular biology A review. Tuber Lung Dis. 1992;73:252–261.

    67. Steinman RM, Cohn ZA. Identification of a novel cell type in peripheral lymphoid organs of mice II Functional properties in vitro. J Exp Med. 1974;139:380–397.

    68. Steinman RM, Cohn ZA. Identification of a novel cell type in peripheral lymphoid organs of mice I Morphology, quantitation, tissue distribution. J Exp Med. 1973;137:1142–1162.

    69. Zigler M, Shir A, Levitzki A. Targeted cancer immunotherapy. Curr Opin Pharmacol 2013.

    70. Horwitz RI, Cullen MR, Abell J, Christian JB. Medicine (De)personalized medicine. Science. 2013;339:1155–1156.

    71. Sandmann T, Boutros M. Screens, maps & networks: from genome sequences to personalized medicine. Curr Opin Genet Dev. 2012;22:36–44.

    72. Dammann M, Weber F. Personalized medicine: caught between hope, hype and the real world. Clinics (Sao Paulo). 2012;67(Suppl. 1):91–97.

    73. Welch BM, Kawamoto K. Clinical decision support for genetically guided personalized medicine: a systematic review. J Am Med Inform Assoc. 2013;20:388–400.

    Part 2

    Genomics Technologies, Concepts and Resources

    Outline

    Chapter 2 Second-Generation Sequencing for Cancer Genome Analysis

    Chapter 3 Cancer Transcriptome Sequencing and Analysis

    Chapter 4 The Significance of Transcriptome Sequencing in Personalized Cancer Medicine

    Chapter 5 Tissue Microarrays in Studying Gynecological Cancers

    Chapter 6 Cancer Pharmacogenomics in Children

    Chapter 7 Biomarker Discovery and Development through Genomics

    Chapter 8 Preclinical Animal Models for Cancer Genomics

    Chapter 9 Bioinformatics for Cancer Genomics

    Chapter 10 Genomic Resource Projects

    Chapter 2

    Second-Generation Sequencing for Cancer Genome Analysis

    Hye-Jung E. Chun¹,², Jaswinder Khattra²,³,⁴, Martin Krzywinski¹,², Samuel A. Aparicio²,³,⁴ and Marco A. Marra¹,²,⁵,    ¹Canada’s Michael Smith Genome Sciences Centre, Vancouver, BC, Canada,    ²British Columbia Cancer Agency, Vancouver, BC, Canada,    ³Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada,    ⁴Molecular Oncology, British Columbia Cancer Research Centre, Vancouver, BC, Canada,    ⁵Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada

    Cancer results from accumulated mutations in the genome. Sequencing is an accurate method to detect mutations. Second-generation sequencing technology, commonly referred to as next-generation sequencing technology, enables rapid, efficient and affordable DNA sequencing, and is transforming the scale and scope of cancer research. The technology is sufficiently flexible and affordable to allow sequencing of many cancer genomes, and thus facilitates both sequencing of samples from large patient cohorts and during disease progression in individual cancer patients. The high depths of redundant sequence coverage that can be obtained using some second-generation sequencing technologies, along with sequencing reads amplified from single DNA molecules, facilitate detection of subclones of cells in tumors.

    Large-scale genome sequencing of hundreds or even thousands of cancer samples is being conducted by several groups that aim to identify and characterize cancer driver mutations. Goals of such work, previously infeasible with Sanger sequencing instruments, are to use this information to improve cancer prognosis, diagnosis and therapeutic decision-making. The speed of data analysis is rate limiting, and investigators are struggling to accommodate and interpret the data deluge produced by second-generation technologies. In this chapter, we discuss cancer properties that are revealed by sequencing and the implication of such properties in experimental design and data interpretation. We describe past, current and upcoming sequencing technologies and the application of second-generation sequencing technologies in cancer genomics. Finally, we discuss the impact of second-generation sequencing technology in shaping personalized medicine.

    Keywords

    cancer; cancer genomics; tumor heterogeneity; next-generation sequencing; second-generation sequencing; third-generation sequencing; mutation discovery; whole genome sequencing; single molecule sequencing; single cell sequencing; personalized medicine

    Contents

    Introduction

    Cancer Characterization Using Sequencing Approaches

    Cancer Is a Genetic Disease

    Cancer Properties Are Amenable to Second-Generation Sequence Analysis

    Impact of Tumor Heterogeneity on Clinical Diagnosis and Treatments

    Advances in Sequencing Technologies

    First- and Second-Generation Sequencing Technologies

    Progress in Cancer Genomics Research Using Sequencing Approaches

    Applications of Second-Generation Sequencing Technologies

    Discoveries Using Second-Generation Sequencing Technologies

    Novel Properties of Cancer

    Novel Biomarkers and Therapeutic Targets

    Development of Personalized Medicine

    Future of Cancer Genome Sequencing

    Third-Generation Sequencing Technologies

    Clinical Application of Second-Generation Sequencing

    Single Cell Genotyping of Bulk Tumors

    Conclusion

    Glossary

    Acronyms and Abbreviations

    References

    Key Concepts

    • Cancer is a disease driven by mutations, which can be comprehensively profiled using second-generation sequencing technologies. Understanding the heterogeneous genetics of tumors and subclonal population structure are important for data analysis and interpretation

    • First-generation Sanger sequencing technology that relies on chain termination rather than amplification, enabled discoveries of well-known oncogenes and tumor suppressor genes

    • Second-generation sequencing technologies use in vitro amplification of DNA templates and innovative cycle sequencing chemistries and methods, which drastically increased sequencing capacity. These technologies escalated the scale and scope of cancer genomics research to an unprecedented level

    • Third-generation sequencing technologies bypass in vitro clonal amplification of the DNA templates and may be used to perform single molecule, single strand and single cell sequencing

    • Unbiased, genome-wide profiling of cancer genomes and transcriptomes enabled discoveries of mutations in genes that were previously not implicated in tumorigenesis or those with less well-understood roles in cancer. Novel oncogenic phenomena such as frequent chromosomal rearrangements in epithelial cancer types and spatial clusters of substitution mutations were observed through analysis of sequencing data. Detection of nucleic acids from microbes in tumor sequencing data also highlighted the linkage between infection and oncogenesis

    • Sequencing technologies are critical in enhancing cancer genome research which, in turn, offers much promise to molecular evidence-based cancer diagnosis, patient stratification and cancer therapeutics

    Introduction

    In 1866, Gregor Mendel reported on his 8-year-long plant hybridization experiment that studied behaviors of seven distinct observable traits of garden peas [1]. With this historic work, Mendel established the fundamental foundation of genetics by demonstrating that genetic information of an organism was inherited by its offspring following particular rules of heredity, and in discrete and separable entities (termed genes) that were responsible for observable traits of an organism. Major milestones in genetics were made following Mendel’s discovery, among which include the identification of the biochemical identity of genes [2], the elucidation of the DNA structure [3] and the discovery of the genetic code [4]. More recently, it has become feasible to determine the complete genetic information of an organism, referred to as the genome, which has spawned the field of genomics.

    The first steps towards the current era of genomics began in the late 1970s, when Frederick Sanger pioneered the development of a DNA sequencing technology that determined the linear nucleotide sequence of DNA in a genome. Propelled by automated Sanger sequencing, much effort ensued to decipher genetic information at a genome-wide scale, including the monumental achievement of the reference human genome sequencing at the turn of the 21st century [5,6]. The reference genome sequence served as the common comparator for aligning re-sequenced individual human genomes and annotating single nucleotide and structural variants in DNA sequences to infer their phenotypic consequences. Insights gained by sequencing genomes have greatly impacted all fields of biomedical research and, in particular, cancer biology.

    Cancer predisposition and progression are associated with DNA mutations. Hence, mutation discovery through sequencing can reveal cancer properties, including the molecular and genetic basis of cancer. Since the discovery of the first cancer-specific mutation in HRAS in the T24 bladder cancer cell line was made using a sequencing approach [7], sequencing technologies have been instrumental in identifying oncogenes that promote cancer development and tumor suppressor genes whose inactivation leads to oncogenesis. However, the relatively low sequencing efficiency of slab gel-based and capillary-based sequencing approaches restricted the scope of cancer characterization studies to small numbers of samples and to sub-genome scale analyses by evaluating expressed sequence tags (ESTs), serial analysis of gene expression (SAGE) and products of single or multiplex polymerase chain reaction (PCR) and the like.

    The arrival of second-generation sequencing technologies has elevated the scale and scope of genomics research to unprecedented and ever-increasing levels. The adoption of these technologies has led to a fundamental shift away from serial Sanger sequencing towards massively parallelized sequencing-by-synthesis approaches (reviewed in [8]). These approaches are relatively scalable, cheap and accessible, and are amenable to automation. While the new technologies have enhanced all types of sequencing analysis approaches, it is the access to entire genome sequences – and many of them – that has become particularly important in comprehensive mutation discovery in cancers. Multiple cancer genomes sequenced at high coverage revealed molecular heterogeneity even within the same bulk tumor, and provided evidence of subclonal populations whose structure changed as disease progressed [9]. Other studies showed that environmental influences, such as exposure to ultraviolet rays, can shape the mutation spectrum, and therefore the molecular heterogeneity in a cancer genome [10]. With continued advancement of sequencing and microfluidics technologies, cancer characterization may be achieved at the single cell level [11]. The impact of cancer genome sequencing projects is already being felt, and has facilitated insights into cancer development, progression and treatment resistance that are fundamentally changing the way researchers think about cancer biology.

    Cancer Characterization using Sequencing Approaches

    Cancer Is a Genetic Disease

    More than a century ago, Theodor Boveri made prescient observations of abnormal chromosomes in malignant cells during mitosis [12]. The discovery of proto-oncogenes in the late 1970s provided evidence that cancers were caused by genetic aberrations that transformed normal cells to malignant ones [13]. Knudson’s two hit hypothesis introduced the concept of mutation accumulation in cancer from statistical analysis of 48 unilateral (one eye affected) and bilateral (both eyes affected) retinoblastomas whose frequency distribution could be explained by a series of two hits (i.e. mutations) that occurred at approximately equal rates: an inherited (i.e. germline) mutation followed by a somatic one in familial cases, or two somatic mutations in sporadic cases [14]. The two hit hypothesis provided a framework for understanding how a germline mutation could increase predisposition to cancer requiring fewer somatic mutations to reach malignancy. Loeb and colleagues [15] advanced the concept of a mutator phenotype that accelerated the accumulation of nucleotide variants that could serve as substrates for Darwinian selection, favoring the growth of certain mutated cells above the growth of others. The concept of accumulated mutations causing malignant progression was subsequently advanced using colorectal cancer as a model system [16]. Furthermore, as predicted by Boveri, the discovery of the Philadelphia chromosome in chronic myeloid leukemia (CML) showed the oncogenic effect of a specific chromosomal rearrangement [17].

    Presently, cancers are understood to develop from the accumulation of multiple genetic and epigenetic alterations, predominantly of somatic origin and are therefore not heritable (i.e. present in the cancer cells, but not in normal cells from the same person). Somatic alterations in cancers include substitution mutations, copy number alterations, insertions and deletions (collectively referred to as indels), structural rearrangements such as gene fusions, altered gene expression and the presence of nucleic acids of viral and other microbial origin; all of which can be detected using sequencing approaches. The multigenic nature of cancers lends well to comprehensive interrogation at single nucleotide resolution enabled by the use of second-generation sequencing technologies. These technologies allow putative pathologic lesions in the DNA and RNA of an individual’s tumor to be mapped comprehensively and economically, offering an opportunity to decipher the functional basis of malignant phenotypes and aiding our understanding of biomolecular networks in cancer.

    Cancer Properties Are Amenable to Second-Generation Sequence Analysis

    Cancers are heterogeneous entities. At the sample level, a bulk tumor can contain an admixture of cancer cells, non-cancer normal cells, necrotic cells, immune and stromal cells representing the tumor microenvironment. Also, cancer stem cells may be present, which possess self-renewal capacity and may contribute to therapeutic resistance and relapsed disease [18]. At the molecular level, cancer genomes have nucleotide sequences and/or chromosomal structures that are different not only from their normal tissue, but also differ within the same tumor mass due to multiple subclonal populations of cancer cells (Figure 2.1).

    Figure 2.1 Tumor heterogeneity at multiple levels. A tumor biopsy sample from a bulk tumor mass typically contains subpopulations of various cell types: cancer cells, non-cancer normal cells, and stromal cells including fibroblasts, immune cells and endothelial cells from blood vessels. Within a single biopsy sample, tumor subclonal populations with different mutational spectra have differential selective advantage as they arise over time.

    Cancers are also clonal in origin [19]. The term clone generally refers to a mass of cells with a common cell of origin. To define the clonal structure of a cell population, a stable biomolecular feature is required. One such feature derived from the second-generation sequencing is the frequency of a DNA sequence variant as measured by its relative abundance in sequence reads. This mutant allele frequency can be a function of tumor cellularity and the size of a subclonal population harboring that variant. Another molecular feature routinely identified using second-generation sequencing is the chromosomal rearrangement events. In a study by Campbell and colleagues [20], structural rearrangements in pancreatic cancer genomes were analyzed to infer clonal structures and phylogenetic relationships among metastases. Heritable genetic and epigenetic variations in cells in a bulk tumor confer unequal traits such as cellular growth that can be either positively or negatively selected during cancer evolution (reviewed in [21]). This selection pressure can result in growth or purge of subclonal populations as a consequence of stresses such as the limited availability of nutrients or oxygen to the tumor, the immune response or anticancer drug treatment. The ability to track changes in the mutational spectrum of tumors as they respond to such stresses can provide insight to the mechanistic role of mutations in shaping tumor progression.

    Second-generation sequencing technologies have been instrumental in profiling somatic mutations and uncovering molecular heterogeneity in cancer genomes at single nucleotide resolution. For example, Shah and colleagues [22] reported a continuous distribution of somatic non-synonymous mutations across 104 breast cancer samples with the triple-negative subtype, irrespective of tumor cellularity or copy number variation. Other studies revealed yet another kind of heterogeneity, that of intratumoral spatial heterogeneity, and showed distinct mutational spectra in different regions of the same tumor tissue [23]. These studies suggested that the current paradigm of using single tumor biopsy samples for cancer genome characterization is flawed, allowing only a partial representation of the true mutational landscape in cancers.

    Impact of Tumor Heterogeneity on Clinical Diagnosis and Treatments

    Traditional cancer classification has relied on gross assessments based on anatomic and morphological features. Although semi-quantitative nuclear and histological grading allows some quantitative measurements of tumor heterogeneity, this clinical standard is often limited to assigning the tumor to one grade even if co-occurring pockets in the same tissue section exhibit another grade [24]. On the other hand, apparently identical histopathological features between patients’ individual tumors can show differential progression and therapeutic response [25].

    Tumor heterogeneity has become an important clinical feature for diagnosis and therapeutic decision making. Several studies demonstrated the effect of histological and molecular heterogeneity on cancer progression and on different sensitivity levels to anticancer treatments [26]. In the case of breast cancers, heterogeneity of hormone receptor status predicted differential response to the estrogen receptor antagonist, Tamoxifen™, which became one of the first evidence-based molecular targeted therapies [27].

    Tumor heterogeneity is also a confounding factor in the identification of molecular profiles based on single tumor biopsy samples. Single biopsy samples are subject to random sampling error, which can reduce the accuracy of prognostic gene expression signatures, as shown in renal carcinoma [23], or can obscure important differences between primary and metastatic tumors [28]. Assessing molecular signatures from subclonal populations, and charting the three-dimensional architecture and spatial relationship among them, can be informative in addition to taking multiple biopsy samples. An overlay of clonal genotype and functional imaging assays can also enhance our understanding of genotype variants that confer malignant potential. Sampling error is ever more of an issue in the analysis of the clinical relevance of micrometastases and rare circulating tumor cells, requiring much more investigation to determine the utility of such analyses for patient treatment and follow up. Finally, cancers of an unknown primary origin may also benefit from genome analysis by allowing the identification of the most likely cell of origin based on gene expression, miRNA or epigenetic profiling [29–31].

    Advances in Sequencing Technologies

    The DNA sequencing technology in the late 1970s revolutionized the way in which scientists could understand biology by sequencing genes. Commonly referred to as Sanger or dideoxy sequencing, the chain-termination method developed by Sanger’s group [32] became the dominant sequencing platform for the next three decades.

    The second wave of sequencing technologies came in 2005 with the arrival of so-called next-generation sequencing approaches. These technologies generally shared an emphasis on massively parallel capacity with dramatically increased sensitivity and cost-effectiveness, allowing re-sequencing of individual genomes, and thus comprehensively characterizing whole genomes at an unprecedented scale. A rapid development of these next-generation DNA sequencing technologies resulted in various sequencing platforms that have different DNA template preparation and amplification, sequencing and detection strategies (reviewed in [8]). In this chapter, we use the term second-generation sequencing technologies for the commercially available next-generation sequencing technologies that rely on clonal amplification of single DNA molecules. The upcoming sequencing technologies that bypass DNA template amplification, thus enabling single molecule sequencing, are referred to as third-generation and are discussed later in this chapter.

    First- and Second-Generation Sequencing Technologies

    First-generation sequencing technology included the Sanger sequencing method described in 1977 [32]. The Sanger method had the advantage of reduced quantities of toxic chemicals and radioactive isotopes, which made it the preferred sequencing platform among other methods. The Sanger method is based on DNA polymerase-dependent synthesis of a DNA template using natural 2′-deoxynucleotides (dNTPs) and termination of synthesis of the template by incorporating 2′,3′-dideoxynucleotides (ddNTPs). The competitive incorporation of dNTPs or ddNTPs to a growing chain of oligonucleotides results in stochastic termination of the oligonucleotide sequence and generates DNA strands with varying lengths. These strands are then separated according to their lengths using polyacrylamide gel electrophoresis, and the chain-terminating ddNTP moiety is revealed. Many technological innovations were followed to increase the sequencing throughput and efficiency, including the development of capillary-based polymer gel electrophoresis that allowed faster DNA fragment separation at higher resolution compared to the slab gel-based electrophoresis [33]. The automated high-throughput Sanger sequencing became the gold standard for accurate DNA sequencing [34]. Also, the long read length (approximately 800 bp) [8] is useful for de novo sequencing and assemblies. However, the method requires a laborious and lengthy process of in vivo amplification of DNA templates in bacterial hosts, which results in some loss of DNA fragments during the cloning process [35]. Also, the expense associated with Sanger sequencing limited the application of the method to large population-based DNA sequencing experiments, such as those typically now done in cancer genomics studies. Furthermore, since the individual nucleotide identity is revealed by the fluorescence trace peak that is generated from multiple DNA molecules, the technology is limited to detection of rare variant alleles or multiple variant alleles in a heterogeneous population such as the communities of cells comprising tumors (reviewed in [34]).

    In 2005, Roche/454 Life Sciences developed the first commercially available second-generation sequencing technology [36]. A major innovation of 454 sequencing was the application of emulsion PCR for in vitro DNA template amplification and the development of pyrosequencing in picoliter-scale reactors that achieved parallelized sequencing capacity [37]. For emulsion PCR, sheared genomic DNA fragments are ligated to a common PCR adapter, and individual single-stranded fragments are then captured using streptavidin beads. An individual fragment with a bead is then incorporated into an oil emulsion where single droplets containing theoretically one DNA fragment and all necessary PCR reagents are suspended in oil. Thus, the emulsion droplet serves as an individual PCR reactor and produces millions of clonal copies of the template adhering to the streptavidin bead. After amplification, the emulsion droplets are broken open, and each streptavidin bead with amplified DNA sequences is sequenced in a picoliter-sized fiberoptic well using pyrosequencing. Pyrosequencing is a sequencing-by-synthesis approach that relies on the use of pyrophosphate (PPi) released after a complementary ddNTP incorporation and light emission caused by chemiluminescent ATP sulfurylase and luciferase [38]. The light emission and the pattern of specific ddNTP incorporation events detected at each sequencing cycle determine the template sequence. Since a single type of ddNTP (corresponding to either A, G, T or C) is introduced at a time, the sequencing is asynchronous, resulting in varying lengths of reads depending on the sequence composition of the templates and the order of ddNTP introduced at each sequencing cycle. A major drawback of the 454 sequencing method is the relatively inaccurate detection of homopolymers (i.e. a series of consecutive bases of the same nucleotide, e.g. AA or AAAAA) due to incorrect estimation of the number of PPi released using signal intensities. Thus, the majority of sequencing errors from the 454 technology result from insertions and deletions. On the other hand, a key advantage of 454 sequencing is the long read lengths that are up to 1000 bp, which is particularly useful for de novo sequencing, such as in ancient DNA sequencing and metagenomics (reviewed in [39]).

    The Illumina sequencing method circumvents in vitro cloning amplification of DNA templates by amplifying DNA templates that are tethered onto a solid glass surface known as a flow cell using a process called bridge PCR [40]. The process is facilitated by the unique design of the flow cells which are densely populated by forward and reverse PCR primer adapters. Upon introduction of an adapter-ligated DNA template molecule onto the flow cell surface, the molecule arches over and hybridizes to the complementary adapter, creating a bridge that serves as the substrate for amplification. Millions of templates are amplified over iterative cycles, each generating over a thousand clonal copies from a single DNA molecule. Like the 454 system, the Illumina method also uses a sequencing-by-synthesis approach, but a key difference is the inclusion of modified nucleotides with reversible terminators. This innovation allows the addition of only one base per sequencing cycle, thus enhancing the accuracy of sequencing through homopolymer runs. At each sequencing cycle, a mixture of four nucleotides labeled with chemically cleavable fluorescent dyes is added. The reversible terminators have a chemically cleavable moiety at the 3′ OH position, which allows for controlled incorporation of one nucleotide per sequence cycle [41]. The identity of the incorporated nucleotide is revealed by fluorescence. While this sequencing method results in accurate sequencing of homopolymer repeats, the reliable read length is relatively short due to both fluorescence decay and dephasing over longer cycles, which results from incomplete cleavage of fluorescence tags or reverse terminating moieties within template clusters. This tends to yield substitution errors and a high proportion of such errors occur in the base after guanine [42]. Short read lengths can limit the ability to align (or map) uniquely the sequence reads to the reference genome, especially in repeat regions of the genome. Obtaining such short reads from each end of the DNA fragments in paired-end sequencing, as opposed to sequencing only one end of template fragments, significantly improves the accuracy of read mapping and thus the assessment of redundant coverage of the genome [43]. Paired-end sequencing is particularly useful for identifying genomic rearrangement events, and does not depend on high sequence coverage to detect such events provided that they are properties of a dominant clone within the tumor sample.

    Another second-generation sequencing technology is the supported oligonucleotide ligation and detection (SOLiD™) system [44]. Similar to the 454 sequencing method, single-stranded DNA templates are amplified in vitro using emulsion PCR. The uniqueness of the SOLiD system is the sequencing-by-ligation approach, in which DNA ligase is used instead of DNA polymerase to add short oligonucleotides [45]. After the emulsion PCR stage, polonies (amplified polymerase colonies) are transferred to a glass slide for sequencing. At each sequencing cycle, fluorescently labeled octamers of degenerate bases are ligated and then chemically cleaved between the fifth and sixth base from the 3′ end of the nucleotide sequence. The cycle of octamer ligation, fluorescence detection and chemical cleavage of the octamer to remove fluorescent tags is repeated over multiple iterations. In each subsequent iteration, ligation of octamers is offset by one base. This strategy is designed to increase the base calling accuracy by the double interrogation of each base of the templates. The sequence identity is deduced from a di-base color encoding system based on the double interrogations of each base by the resultant colors associated with a nucleotide at a specific position in the octamer.

    Ion Torrent sequencing technology also utilizes the emulsion PCR amplification method and the sequencing-by-synthesis approach. However, unlike other second-generation sequencing technologies, it does not depend on the use of light to detect signals [46]. Instead, it uses a semiconductor chip known as an ion-sensitive field effect transistor (ISFET), to measure the change in pH as hydrogen ions are released by DNA polymerase during DNA replication [47]. pH changes are converted into voltage changes that are detected for sequence readout. The number of bases incorporated is proportional to the magnitude of voltage change. Parallelized sequencing can be achieved by having multiple ISFETs on a chip. Similar to sequences generated using 454 sequencing, Ion Torrent generates reads with frequent insertion and deletion sequencing errors (approximately 1.5 indels per 100 bp), the majority of which are associated with homopolymer repeats [48].

    Although short reads generated using second-generation sequencing technologies (ranging between 100 and 200 bp, with the exception of 454 sequencing) are usually not suitable for de novo sequencing, short reads generated at high volume from re-sequenced genomes are highly economical and useful for variant discovery by aligning them to the reference genome to identify genetic alterations. Highly redundant coverage of a genome enables accurate detection of such alterations including rare ones. Thus, re-sequencing of individual tumor genomes for mutation discovery has been the most common application of the second-generation sequencing technologies in cancer genomics. The various sequencing technologies, their optimal use and the caveats associated with each technique are summarized in Table 2.1.

    Table 2.1

    Summary of First-, Second- and Third-Generation Sequencing Technologies

    Progress in Cancer Genomics Research Using Sequencing Approaches

    Prior to the emergence of second-generation sequencing technologies, many cancer genomics studies were carried out using targeted approaches that focused on profiling a small number of targeted genes across large sample cohorts [49] or profiling protein-coding regions of the genome from small sample cohorts using a PCR-based targeted sequencing approach [50]. Other genomics projects such as the Cancer Genome Anatomy Project, the Human Cancer Genome Project and the Cancer Genome Project used Sanger sequencing to sequence partial transcripts (known as tags) to catalog gene expression in tumors (reviewed in [51]). These studies used methods such as expressed sequence tags (ESTs), serial analysis of gene expression (SAGE), cap-analysis of gene expression (CAGE) [52] and massively parallel signature sequencing (MPSS) [51]. Tag-based methodologies effectively quantified gene expression levels and identified transcription start sites, but only partial transcript coverage could be achieved, making the data less than optimal for discovering novel and alternatively spliced transcripts. Using these methods, more than 1% of protein-coding genes were found to have recurrent somatic mutations that were causally implicated with tumorigenesis (reviewed in [53]). These early technologies provided important glimpses into how entire genome sequences would be crucial to inform fully on cancer biology.

    The second-generation sequencing technologies enabled genome-wide molecular profiling at a single nucleotide resolution, and enabled simultaneous interrogation of multiple modalities of molecular alterations that include single nucleotide variations (SNVs), indels, copy number variations (CNVs) and structural rearrangements in chromosomal DNA such as inversions and translocations (Figure 2.2).

    Figure 2.2 Profiling molecular alterations using sequencing approaches. Second-generation sequencing is an effective method to profile simultaneously multiple modalities of genetic alterations in cancer genomes. Such alterations include SNVs, chromosomal copy number alterations and structural rearrangements. Sheared DNA fragments are sequenced as reads, which are then aligned against the reference human genome sequences. Mismatches in alignments, relatively different depths of sequence coverage and unusual alignment patterns (e.g. gaps in the middle of a read, reversed alignment orientation in read pairs from paired-end sequencing) are analyzed to identify putative genetic alterations.

    With affordable sequencing technology, it was possible to sequence cancer and matched normal genomes to identify somatic mutations in cancer. In the seminal study in which the first cancer genome was sequenced [54], Ley and colleagues identified somatic mutations by comparing the genomic DNA sequence of cytogenetically normal acute myeloid leukemia (AML) to the genomic DNA sequence generated from a matched normal sample. Since this work, burgeoning numbers of genomes from various cancer types have been sequenced. Typically, approximately 30-fold redundant sequence coverage (denoted as 30×) was used for whole genome sequencing using second-generation sequencing technologies [55]. However, especially for detecting SNVs, 30× in sequence coverage is often inadequate, and 60× or more in sequence coverage was shown to be necessary for accurately determining the genotype of at least 95% of the genome [56]. Even higher sequence coverage would be recommended for sequencing tumor samples with low enrichment of tumor cells or for aneuploid cancer genomes. Identifying rare mutations requires even higher sequence coverage as demonstrated by Shah and colleagues, in which approximately 20 000× in sequence coverage enabled the identification of rare mutations representing clonal subpopulations [22].

    With highly accessible routine re-sequencing of cancer genomes, cancer genomics studies have begun to characterize genomes of almost all major cancer types using thousands of patient samples. Such large-scale studies include The Cancer Genome Atlas (http://cancergenome.nih.gov), International Cancer Genome Consortium (http://icgc.org), and Molecular Taxonomy of Breast Cancer International Consortium [57].

    In contrast to relatively straightforward sequence data generation, data analysis and management has become a significant bottleneck. Analysis tools need to be developed for accurately distinguishing cancer-specific alterations from false positives. Cancer genomics studies are increasingly complex, involving whole genome, transcriptome and epigenome profiles. Hence, particularly needed is the development of computational tools for integrative data analysis and statistical models for testing the significance of combinatorial molecular alterations in cancer, and visualization of such interactions for effective communication. Some of the biggest obstacles in data interpretation lie in the lack of analysis transparency in published literature, the difficulty in accessing the appropriate high-performance computing infrastructure and the lack of best practices established for data analyses, similar to those established in the 1000 Genomes project (reviewed in [58]). These informatic challenges must be met to translate the progress made using second-generation sequencing

    Enjoying the preview?
    Page 1 of 1