You are on page 1of 5

647

Interplay of transcriptomics and proteomics


Priti S Hegde, Ian R Whitey and Christine Debouckz
Despite the obvious attractions of parallel proling of transcripts and proteins on a global omic scale, there are practical and biological differences involved in their application. Transcriptomics is now a robust, high-throughput, cost-effective technology capable of simultaneously quantifying tens of thousands of dened mRNA species in a miniaturized, automated format. Conversely, proteomic analysis is currently much more limited in breadth and depth of coverage owing to variations in protein abundance, hydrophobicity, stability, size and charge. Nevertheless, transcriptomic and proteomic data can be compared and contrasted provided the studies are carefully designed and interpreted. Differential splicing, post-translational modications and data integration are among some of the future challenges to tackle.
Addresses Department of Transcriptome Analysis, GlaxoSmithKline Pharmaceutical Research & Development, 1250 South Collegeville Road, Collegeville, PA 19426, USA y Department of Gene Cloning & Expression Proteomics, GlaxoSmithKline Pharmaceutical Research & Development, Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, UK z Division of Genomic & Proteomic Sciences, Genetics Research, GlaxoSmithKline Pharmaceutical Research & Development, 1250 South Collegeville Road, Collegeville, PA 19426, USA e-mail: christine_m_debouck@gsk.com

making possible the global, genome-wide analysis of changes in DNA (genotyping), RNA expression (transcriptomics) and protein expression (proteomics), thus creating the opportunity to utilize a systems approach for studying biology and medicine. Indeed, one technology as a stand-alone does not sufce for gaining a comprehensive understanding of physiology and pathophysiologies. An approach that harmoniously integrates the various omic platforms and their data is key to unravelling this complexity and to generating meaningful hypotheses from the overloaded web of data and information. Integrating knowledge from genomic technologies is easier said than done. Early studies suggested that mRNA levels cannot be consistently relied upon to predict protein abundance [1,2]. This limited predictive value was recognized in later studies and is explained partly by fundamental biological differences between the transcription and translation processes, and partly by experimental challenges. On the biological front, differences can result from RNA splicing that is not detectable by the microarray platform in use, differential RNA and protein turnover, post-translational modications, allosteric protein interactions and proteolytic processing events. On the experimental front, challenges in experimental design and data interpretation, as well as technological limitations, contribute to some of the differences observed. In addition to discussing these difculties, this article also highlights some of the areas where transcriptomics and proteomics have successfully been implemented in parallel, and raises some of the issues that need to be addressed to maximize the value of data obtained from combining the technologies in focused studies. Hence, comparing transcriptomic and proteomic data leads to a glass half empty, glass half full scenario, wherein one can either be chagrined about their discordance or energized by their synergies.

Current Opinion in Biotechnology 2003, 14:647651 This review comes from a themed issue on Pharmaceutical biotechnology Edited by Brian Metcalf and Rino Rappuoli 0958-1669/$ see front matter 2003 Elsevier Ltd. All rights reserved. DOI 10.1016/j.copbio.2003.10.006

Introduction
A disease state is accompanied by signicant or subtle changes in the expression of many genes and/or their protein products, some as the cause of the disease and others as the result. Human physiology with its immense complexity is an intricately coordinated system wherein a myriad of regulatory and feedback mechanisms allow response and adaptability to various internal and external stimuli. Understanding the molecular mechanisms of disease is fundamental to the rational development of safe and effective therapies, and the study of RNA and protein expression patterns has made, and continues to make, critical contributions to this understanding. The sequencing of genomes and dramatic advances in the nature and throughput of molecular technologies are
www.current-opinion.com

The glass half empty: transcriptomics and proteomics are not equivalent
The total complement of mRNA in a cell or tissue at any given moment constitutes its transcriptome. A transcriptome forms the template for protein synthesis, resulting in the corresponding protein complement or proteome. In eukaryotic systems, mRNAs bound to multiple ribosomes (polysomes) undergo active translation resulting in protein synthesis. By contrast, translationally inactive mRNAs are associated with single ribosomes called monosomes. There is a constant ux of mRNA molecules
Current Opinion in Biotechnology 2003, 14:647651

648 Pharmaceutical biotechnology

between these two states and this is key in regulating protein synthesis. Other factors come into play during mRNA translation, such as the rapid decay of mRNAs in response to various stimuli, and these can have a profound impact on the amount of protein synthesized [3,4]. In addition, post-transcriptional events, such as alternative mRNA splicing, increase the diversity of proteins that can be synthesized from a xed number of genes. If the technique used for mRNA analysis does not permit the distinction between spliced species, then it precludes the accurate prediction of protein proles from transcriptomic results. Sucrose density gradients have been used to isolate polysomes and to determine the translational state of mRNA [5]. As proteins are synthesized from the polysomebound fraction, identifying gene expression changes in messages from these fractions is one way to use microarrays to study mRNAs that are actively translated. Although this does help in focusing on translationally controlled genes, the ability to gauge protein abundance still remains a challenge with current technologies. In addition, although proteins may have a more stable halflife than mRNAs, protein turnover, specic proteolytic processing, and post-translational modications can also have a large impact on the nature and level of protein expression [6]. Several recent studies have made attempts to cross compare protein expression with mRNA expression [7, 810]. For example, one of these studies targeted galactose utilization in the yeast Saccharomyces cerevisiae as a model [7]. In this study, the authors identied 997 genes with signicantly altered mRNA levels upon one or more perturbations in the galactose utilization pathway. Using isotope-coded afnity tags and tandem mass spectrometry, they also generated protein identications and abundances for 289 proteins and found a correlation of r 0.6 between corresponding protein and mRNA levels. Fifteen mRNA messages exhibited no change in expression while the corresponding protein expression did change, probably as a result of post-transcriptional regulation. This correlation is quite high considering that post-transcriptional regulation contributes signicantly to differential protein expression. Conversely, there have been reports of very poor correlations between protein and message expression levels. For example, Chen et al. [9] compared mRNA and protein expression for a cohort of genes in lung adenocarcinomas and observed a correlation of r 0.025. In addition to biological explanations, several technical issues contribute to the imperfect correlation between transcriptomic and proteomic expression data. For example, each technology has its own limitations, as illustrated by the correlation of r 0.8 between microarrays and quantitative reverse transcriptasepolymerase chain reacCurrent Opinion in Biotechnology 2003, 14:647651

tion (RTPCR; Taqman) despite the fact that both methods measure changes in mRNA. Furthermore, whereas microarrays can routinely measure small changes in mRNA expression, detection of subtle changes in protein expression, using platforms like two-dimensional gels (as used by Chen et al. [9]), is practically more challenging at this time. Also, most current transcriptomic analysis platforms are not set up to systematically capture changes in splice variants, whereas proteomics can typically detect the proteins encoded by these variants. Lastly, when mRNA abundances are compared with protein expression, the errors or idiosyncrasies from the platform technologies add on to the random noise and this makes true correlations difcult to decipher. Despite these drawbacks, there exist analytical methods that can bypass the issue of noise comparison. Importantly, integrating mRNA and protein expression data into a common framework, scaling and merging the data together, and analyzing changes in categories such as functional protein class, subcellular localization, secondary structure and so on gives a broader view of systematic changes than comparing individual genes at the transcript and protein levels [11].

The glass half full: transcriptomics and proteomics are complementary


While it is important to keep in mind the differences discussed above when comparing the transcriptomic and proteomic approaches, each technology does provide a unique perspective as well as opportunities for synergies towards discovering and interpreting new biology. For example, transcriptomics has a distinct advantage in high-throughput and moderate cost, but is not routinely set up to systematically detect changes in splice variants. This might be a key issue to address, as roughly 50% of human genes are believed to undergo alternative splicing [12]. Another drawback to the utility of transcriptomics is the limited availability of human target tissues, especially clinically annotated diseased tissues, for use in expression proling. One notable exception where transcriptomics has proven to be a key player has been oncology, owing to the relative ease of collecting tumor samples [13,14]. For example, transcriptomics has been used successfully to identify patient populations that respond to cancer chemotherapy. Chang et al. [15] were one of the rst groups to report clinical trials in which microarrays were used to determine drug response (response to docetaxel in breast carcinomas). Using gene expression proling to identify ngerprints that are predictive of disease outcome is another area where transcriptomics has proven impactful [14]. Combinations of RNA and protein detection approaches have recently aided in the identication of biomarkers in cancer. Following transcriptomic studies, bioinformatic sequence analysis tools were used to predict secreted proteins, based on the presence of signal peptide cleavage
www.current-opinion.com

Interplay of transcriptomics and proteomics Hegde, White and Debouck 649

Table 1 Serum biomarkers identified through genomic technologies for early detection of cancer. Biomarker TIMP-1, CA-19-9, CEA Prostatin Osteopontin YKL-40 Von Willebrand Factor, immunoglobulin M, a1-antichymotrypsin, villin, immunoglobulin G MIC-1 Haptoglobin-a subunit Kallikrein 10 Indication Pancreatic cancer Ovarian cancer Ovarian cancer Glioblastoma multiforme Prostate cancer Platform DNA microarrays and immunohistochemistry DNA microarrays and immunohistochemistry DNA microarrays and immunohistochemistry DNA microarrays and ELISA Protein antibody arrays Detection in serum Reference [16] [17] [18] [19] [20]

Human carcinomas Ovarian cancer Ovarian cancer

DNA microarrays and ELISA SELDI-TOF DNA microarrays and western blots

[21] [26] [22]

CA-19-9, carbohydrate antigen 19-9; CEA, carcinoembryonic antigen; ELISA, enzyme-linked immunosorbent assay; MIC-1, macrophage inhibitory cytokine-1; SELDI-TOF, surface-enhanced laser desorption/ionization time-of-flight, TIMP-1, tissue inhibitor of metalloproteinase 1; YKL-40, serum glycoprotein, chitinase 3-like protein 1.

sites, or to identify transmembrane domains in cell-surface proteins. This transcriptomic-directed approach allowed the investigators to focus the lower throughput, more time-consuming protein analysis on secreted or cellsurface proteins for their assessment as potential candidate genes of clinical signicance for diagnostic, prognostic or therapeutic purposes [1622]. Table 1 summarizes data from several studies that resulted in the identication of putative biomarkers followed by validation on various protein expression analysis platforms. These genes have the potential to be serum markers for cancer. Direct serum proteome analysis (independent of a transcriptomics-aided approach as described above) allows the characterization of proteins that are actively secreted or cleaved off from the cell surface, as well as those proteins that are released from necrosing, apoptotic and damaged cells. One proteomic technology that has facilitated the identication in serum of specic protein expression spectra associated with ovarian cancer [23], as well as prostate [24] and breast [25] cancers, is the socalled SELDI-TOF mass spectroscopy technique (surface-enhanced laser desorption/ionization time-of-ight). Although the protein spectra produced by SELDI-TOF can discriminate cancer from non-cancer, they do not provide the identity of the protein markers involved. This requires additional work, such as that described in the recent report by Ye et al. [26], where the haptoglobina subunit was identied as a potential biomarker, using a similar SELDI-TOF approach followed by afnity chromatography and protein sequence determination. As early diagnosis of cancers is critical for effective disease intervention and management, such technology promises to play a key role in the clinic. However, it should be noted that although serum is proving a rich source of disease biomarkers, the varied abundance of plasma proteins (>9 orders of magnitude) necessitates prefractionation of the material if low abundance proteins are to be proled [27,28].
www.current-opinion.com

Proling of mRNA expression enables a global picture of transcriptional activity to be mapped for a given system in the context of a specic disease state, whereas targeted proteomics enables the physical presence and location of clinically relevant proteins to be dened. The introduction of Gleevec and Herceptin [29], specic cancer therapies directed at cell-surface targets, underscores the key role plasma membrane proteins are playing in cancer etiology and therapy. As a result, several reports are emerging on the de novo characterization of plasma membrane proteins from cancer cells. Typical studies include those carried out by Shin et al. [30] who demonstrated the physical presence of chaperone proteins at the surface of cancer cells or by Adam et al. [31] who identied novel proteins of unknown function at the cell surface, which they postulate might ultimately be of clinical or diagnostic benet.

Conclusions
The amount of data being generated by todays armamentarium of genetic and genomic platform technologies far outstrips the current capacity and capabilities of statistical tools and informatic packages. It is urgently necessary to improve the way in which transcriptomic and proteomic results are combined with data generated from biochemical, genetic and metabonomic approaches, protein interaction studies, model organism biology, clinical analyses and so on [32]. In the case of oncology, for example, major cross-disciplinary initiatives will be necessary to fully interrogate and enhance our understanding of this complex group of diseases. The Danish Centre for Translational Breast Cancer Research (http:// www.cancer.dk/bio/dctb1.asp) provides a good example of an integrated initiative that has been recently set up with the aim of improving diagnosis, treatment and quality of life of patients [33]. Ultimately, full and effective integration across genetic and genomic platform technologies is likely to be
Current Opinion in Biotechnology 2003, 14:647651

650 Pharmaceutical biotechnology

achieved via small incremental changes to the way in which data are compiled, processed and mined. Only when such effective systems are in place will it be possible to fully utilize these global approaches for furthering the understanding of human physiology and diseases.

14. Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, Behm FG, Raimondi SC, Relling MV, Patel A et al.: Classication, subtype discovery and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression proling. Cancer Cell 2002, 1:133-143. 15. Chang JC, Wooten EC, Tsimelzon A, Hilsenbeck SG, Gutierrez MC, Elledge R, Mohsin S, Osborne CK, Chamness GC, Allred DC et al.: Gene expression proling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet 2003, 362:362-369. 16. Zhou W, Sokoll LJ, Bruzek DJ, Zhang L, Velculescu VE, Goldin SB, Hruban RH, Kern SE, Hamilton SR, Chan DW et al.: Identifying markers for pancreatic cancer by gene expression analysis. Cancer Epidemiol Biomarkers Prev 1998, 7:109-112. 17. Mok SC, Chao J, Skates S, Wong K, Yiu GK, Muto MG, Berkowitz RS, Cramer DW: Prostatin, a potential serum marker for ovarian cancer: identication through microarray technology. J Natl Cancer Inst 2001, 93:1458-1464. 18. Kim JH, Skates SJ, Uede T, Wong Kk KK, Schorge JO, Feltmate CM, Berkowitz RS, Cramer DW, Mok SC: Osteopontin as a potential diagnostic biomarker for ovarian cancer. JAMA 2002, 287:1671-1679. 19. Tanwar MK, Gilbert MR, Holland EC: Gene expression microarray analysis reveals YKL-40 to be a potential serum marker for malignant character in human glioma. Cancer Res 2002, 62:4364-4368. 20. Miller JC, Zhou H, Kwekel J, Cavallo R, Burke J, Butler EB, Teh BS, Haab BB: Antibody microarray proling of human prostate cancer sera: antibody screening and identication of potential biomarkers. Proteomics 2003, 3:56-63. 21. Welsh JB, Sapinoso LM, Kern SG, Brown DA, Liu T, Bauskin AR, Ward RL, Hawkins NJ, Quinn DI, Russell PJ et al.: Large-scale delineation of secreted protein biomarkers overexpressed in cancer tissue and serum. Proc Natl Acad Sci USA 2003, 100:3410-3415. 22. Shvartsman HS, Lu KH, Lee J, Lillie J, Deavers MT, Clifford S, Wolf JK, Mills GB, Bast RC Jr, Gershenson DM et al.: Overexpression of kallikrein 10 in epithelial ovarian carcinomas. Gynecol Oncol 2003, 90:44-50. 23. Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Furaso VA,  Steinberg SM, Milles GB, Simone C, Fishman DA, Kohn EC, Liotta LA: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 2002, 359:572-577. A key study showing the potential of SELDI-TOF for the generation of disease-specic diagnostic protein prole patterns. 24. Banez LL, Prasanna P, Sun L, Ali A, Zou Z, Adam BL, McLeod DG, Moul JW, Srivastava S: Diagnostic potential of serum proteomic patterns in prostate cancer. J Urol 2003, 170:442-446. 25. Li J, Zhang Z, Rosenweig J, Wang YY, Chan DW: Proteomics and bioinformatics approaches for identication of serum biomarkers to detect breast cancer. Clin Chem 2002, 48:1296-1304. 26. Ye B, Cramer DW, Skates SJ, Gygi SP, Pratomo V, Fu L, Horick NK, Licklider LJ, Schorge JO, Berkowitz RS, Mok SC: Haptoglobin-a subunit as potential serum biomarker in ovarian cancer: identication and characterization using proteomic proling and mass spectrometry. Clin Cancer Res 2003, 9:2904-2911. 27. Pieper R, Gatlin CL, Makuski AJ, Russo PS, Schatz CR, Miller SS,  Su Q, McGrath AM, Estock MA, Parmar PP et al.: The human serum proteome: display of nearly 3700 chromatographically separated protein spots on two-dimensional electrophoresis gels and identication of 325 distinct proteins. Proteomics 2003, 3:1345-1364. An extensive study that exemplies many of the challenges associated with the search for serum-borne biomarkers. 28. Adkins JN, Varnum SM, Auberry KJ, Moore RJ, Angell NH, Smith RD, Springer DL, Pounds JG: Towards a human blood serum proteome: analysis by multidimensional separation coupled with mass spectrometry. Mol Cell Proteomics 2002, 1:947-955. www.current-opinion.com

Acknowledgements
We would like to thank our colleagues Steve Clark, Paul Cutler, Neil Jones and Hugh Olsen for their review and advice on the manuscript.

References and recommended reading


Papers of particular interest, published within the annual period of review, have been highlighted as:  of special interest  of outstanding interest 1. Anderson L, Seilhamer J: A comparison of selected mRNA and protein abundances in human liver. Electrophoresis 1997, 18:533-537. Gygi SP, Rochon Y, Franza BR, Aebersold R: Correlation between protein and mRNA abundance in yeast. Mol Cell Biol 1999, 19:1720-1730. Ross J: mRNA stability in mammalian cells. Microbiol Rev 1995, 59:423-450. Guhaniyogi J, Brewer G: Regulation of mRNA stability in mammalian cells. Gene 2001, 265:11-23. Zong Q, Schummer M, Hood L, Morris DR: Messenger RNA translation state: the second dimension of high-throughput expression screening. Proc Natl Acad Sci USA 1999, 96:10632-10636. Pratt JM, Petty J, Riba-Garcia I, Robertson DH, Gaskell SJ, Oliver SG, Beynon RJ: Dynamics of protein turnover, a missing dimension in proteomics. Mol Cell Proteomics 2002, 1:579-591.

2.

3. 4. 5.

6.

Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 2001, 292:929-934. An in-depth comparison of gene expression and protein expression to demonstrate the utilization of a systems approach to biology. 8. Grifn TJ, Gygi SP, Ideker T, Rist B, Eng J, Hood L, Aebersold R: Complementary proling of gene expression at the transcriptome and proteome levels in Saccharomyces cerevisiae. Mol Cell Proteomics 2002, 1:323-333. Chen G, Gharib TG, Huang CC, Taylor JM, Misek DE, Kardia SL, Giordano TJ, Iannettoni MD, Orringer MB, Hanash SM, Beer DG: Discordant protein and mRNA expression in lung adenocarcinomas. Mol Cell Proteomics 2002, 1:304-313.

7. 

9.

10. Washburn MP, Koller A, Oshira G, Ulaszek RR, Plouffe D, Cediu C, Winzeler E, Yates JR III: Protein pathway and complex clustering of correlated mRNA and protein expression analyses in Saccharomyces cerevisiae. Proc Natl Acad Sci USA 2003, 100:3107-3112. 11. Greenbaum D, Jansen R, Gerstein M: Analysis of mRNA expression and protein abundance data: an approach for the comparison of enrichment of features in the cellular population of proteins and transcripts. Bioinformatics 2002, 18:585-596. 12. The Genome International Sequencing Consortium: Initial  sequencing and analysis of the human genome. Nature 2001, 409:860-921. This publication describes the sequencing of the human genome and new insights gleaned from the genome analysis. 13. van de Vijver MJ, He YD, vant Veer LJ, Dai H, Hart AA, Voskuil DW,  Schreiber GJ, Peterse JL, Roberts C, Marton MJ et al.: A geneexpression signature as a predictor of survival in breast cancer. N Engl J Med 2002, 347:1999-2009. An independent validation of results obtained from a previous study that identied a gene expression ngerprint as a predictor of survival in breast cancer. Current Opinion in Biotechnology 2003, 14:647651

Interplay of transcriptomics and proteomics Hegde, White and Debouck 651

29. Drevs J, Medinger M, Schmidt-Gersbach C, Weber R, Unger C: Receptor tyrosine kinases: the main targets for new anticancer therapy. Curr Drug Targets 2003, 4:113-121. 30. Shin BK, Wang K, Yim AM, Le Naour F, Brichory F, Jang JH, Zhao R, Puravs E, Tra J, Michael CW et al.: Global proling of the cell surface proteome of cancer cells uncovers an abundance of proteins with chaperone function. J Biol Chem 2003, 278:7607-7616. 31. Adam PJ, Boyd R, Tyson KL, Fletcher GC, Stampls A, Hudson L, Poyser HR, Redpath N, Grifths M, Steers G et al.: Comprehensive

proteome analysis of breast cancer cell membranes reveals unique proteins with potential roles in clinical cancer. J Biol Chem 2003, 278:6482-6489. 32. Basik M, Mousses S, Trent J: Integration of genomic technologies for accelerated cancer drug development. Biotechniques 2003, 35:580-593. 33. Celis JE, Gromov P, Gromova I, Moreira JM, Cabezon T, Ambartsumian N, Grigorian M, Lukanidin E, Thor Straten P, Guldberg P et al.: Integrating proteomic and functional genomic technologies in discovery-driven translational breast cancer research. Mol Cell Proteomics 2003, 2:369-377.

www.current-opinion.com

Current Opinion in Biotechnology 2003, 14:647651

You might also like