Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Annotating New Genes: From in Silico Screening to Experimental Validation
Annotating New Genes: From in Silico Screening to Experimental Validation
Annotating New Genes: From in Silico Screening to Experimental Validation
Ebook258 pages

Annotating New Genes: From in Silico Screening to Experimental Validation

Rating: 0 out of 5 stars

()

Read preview

About this ebook

In recent years, a number of academic and commercial software packages and databases have been developed for the analysis and screening of biological data; however, the usability of these data is compromised by so-called novel genes to which no biological function is assigned. Annotating new genes outlines an approach to the analysis of evolutionary-conserved, heart-enriched genes with unknown functions, offering a step-by-step description of the procedure from screening to validation. The book begins by offering an introduction to the databases and software available, before moving on to cover programming guidelines, including a specific case study on the use of C-It for in silico screening. The second half of the book offers a step-by-step guide to experimental validation concepts and procedures, as well as an overview of additional potential applications of this approach in the field of stem cells and tissue regeneration, before a concluding chapter summarises the concepts and theories presented.
  • Focuses not only on screening but also on biological validations
  • Provides details of databases and software (web interface) products for biologists with minimal computation skills
  • Offers a step-by-step outline of the procedure involved
LanguageEnglish
Release dateAug 6, 2012
ISBN9781908818126
Annotating New Genes: From in Silico Screening to Experimental Validation
Author

Shizuka Uchida

Shizuka Uchida is a group leader in the Max-Planck Institute for Heart and Lung Research in Bad Nauheim, Germany. He is a trained bioinformatician and a developmental biologist with a specific focus on adult stem cell research, and has extensive experience in combining the power of computers and data mining methods with state-of-the-art experimental techniques.

Related to Annotating New Genes

Computers For You

View More

Reviews for Annotating New Genes

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Annotating New Genes - Shizuka Uchida

    Shizuka.Uchida@mpi-bn.mpg.de

    1

    Introduction

    Everyone, at some point, must have been told: ‘You look like your father (or mother).’ A natural answer is: ‘Of course, I am his/her son/daughter.’ Such conversations have probably existed since humans were first able to communicate. However close the resemblance, though, we are not the same. We differ significantly from our parents in various aspects. And these differences are more evident if we compare ourselves to our friends, neighbors, teachers, professors, etc. So what creates these differences? Yes, genes.

    At the turn of the 21st century, the draft of the human genome was completed. Regarded as a blueprint of a human being, the information contained in our DNA was considered to provide the ultimate answer to who we are. Fueled by the popular media, optimistic views have persisted that in the near future, we may be able to cure all diseases that threaten our lives. However, when these DNA sequences were analyzed, it was a surprise that the number of human genes is less than that of lower organisms (e.g. Caenorhabditis elegans). According to the GENCODE (http://www.gencodegenes.org/) Project, release 8, the number of human coding loci is 21,494. The key words here are ‘coding loci’. Dr Jen Harrow, the Joint Head of Vertebrate Annotation at the Wellcome Trust Sanger Institute, avoids using the word ‘gene’ in this regard; there are protein-coding genes as well as non-coding genes present in the human genome.

    A series of articles by ‘Functional Annotation of Mammalian genome (FANTOM)’ projects, the ‘ENCyclopedia Of DNA Elements (ENCODE)’ consortium and others have clearly indicated that a majority of the human genome is transcribed in the form of RNAs, yet only a few per cent of them fall under the category of protein-coding genes; the current estimate is that only ~ 1.2% of mammalian genomes encode for protein-coding genes (Clark et al., 2011). Previously, non-coding RNAs (ncRNAs) were discarded as transcriptional noise and experimental error. However, with the discovery of micro-RNAs (miRNAs) and other types of ncRNAs [e.g. long non-coding RNAs (lncRNAs)], it is now clear that a minor proportion of RNAs are indeed translated as coding for proteins. The concept of ncRNAs is not new, for example as with ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs), which are necessary for protein translation. The obvious question is thus is the definition of a ‘gene’ correct? In other words, how do we define a gene? This turns out to be very difficult to answer. Several years ago, Nature and Science published articles discussing this point (Pearson, 2006; Pennisi, 2007). An interesting article was also published by the members of the ENCODE consortium (Gerstein et al., 2007), in which the authors provided an extensive review of the definitions of a gene used over the past century. After careful consideration, they proposed a new definition: ‘A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.’ As stated in their article, important aspects of this definition include that proteins and RNA products (e.g. ncRNAs) must be functional. With this definition, the authors proposed to move the long-standing biological question of ‘what is a gene?’ to ‘what is a function?’. We therefore still have a long way to go to fully understand RNAs (both traditional genes and ncRNAs), which will require further biological experiments.

    With the emergence of high-throughput techniques, such as microarrays and next-generation (deep) sequencing, the amount of biological data accumulated has expanded tremendously in the last few decades. These data now number not in the terabytes but in the petabytes (Stewart et al., 2007). We clearly cannot interpret these data by hand, and numerous academic and commercial software products and databases are now available to facilitate their efficient analysis. However, the knowledge extracted from these data through such tools is still limited due to dependence on previous biological results and on connecting such results in an understandable manner. This dependency dismisses the presence of so-called ‘functionally unknown genes’. These are putative genes whose presence is predicted from computational sequence analyses, but whose biological functions remain unknown; no published results are available for such genes (Uchida et al., 2009).

    For example, when a researcher performs microarray experiments to compare cells or tissues in one condition with another (e.g. operated and non-operated hearts), he or she hopes to identify known pathways in addition to new potential candidate genes that may interact with such pathways. However, in most cases, one may be able to identify several genes that are known to belong to the pathways and a number of genes with unknown function. When such unknown genes pop up in the results, an immediate response might be that these genes may possess functions related to a certain condition under study. However, it takes tremendous effort, money and time to elucidate the biological functions of such genes. To reduce the number of such genes to be studied further, other experiments, such as real-time reverse transcriptase PCR, must be performed to validate the results of the microarray data. Even then, the number of such functionally unknown genes is so large that understanding their biological functions would take several years or decades, if this were possible at

    Enjoying the preview?
    Page 1 of 1