Annotating New Genes: From in Silico Screening to Experimental Validation
()
About this ebook
- Focuses not only on screening but also on biological validations
- Provides details of databases and software (web interface) products for biologists with minimal computation skills
- Offers a step-by-step outline of the procedure involved
Shizuka Uchida
Shizuka Uchida is a group leader in the Max-Planck Institute for Heart and Lung Research in Bad Nauheim, Germany. He is a trained bioinformatician and a developmental biologist with a specific focus on adult stem cell research, and has extensive experience in combining the power of computers and data mining methods with state-of-the-art experimental techniques.
Related to Annotating New Genes
Computers For You
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5The Invisible Rainbow: A History of Electricity and Life Rating: 4 out of 5 stars4/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5The Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5Elon Musk Rating: 4 out of 5 stars4/5Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 0 out of 5 stars0 ratingsCompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsThe ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 0 out of 5 stars0 ratingsThe Professional Voiceover Handbook: Voiceover training, #1 Rating: 5 out of 5 stars5/5Master Builder Roblox: The Essential Guide Rating: 4 out of 5 stars4/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology Rating: 0 out of 5 stars0 ratingsGrokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I. Rating: 4 out of 5 stars4/5Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands Rating: 5 out of 5 stars5/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1 Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Dark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5CompTIA Security+ Practice Questions Rating: 2 out of 5 stars2/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5
Reviews for Annotating New Genes
0 ratings0 reviews
Book preview
Annotating New Genes - Shizuka Uchida
Shizuka.Uchida@mpi-bn.mpg.de
1
Introduction
Everyone, at some point, must have been told: ‘You look like your father (or mother).’ A natural answer is: ‘Of course, I am his/her son/daughter.’ Such conversations have probably existed since humans were first able to communicate. However close the resemblance, though, we are not the same. We differ significantly from our parents in various aspects. And these differences are more evident if we compare ourselves to our friends, neighbors, teachers, professors, etc. So what creates these differences? Yes, genes.
At the turn of the 21st century, the draft of the human genome was completed. Regarded as a blueprint of a human being, the information contained in our DNA was considered to provide the ultimate answer to who we are. Fueled by the popular media, optimistic views have persisted that in the near future, we may be able to cure all diseases that threaten our lives. However, when these DNA sequences were analyzed, it was a surprise that the number of human genes is less than that of lower organisms (e.g. Caenorhabditis elegans). According to the GENCODE (http://www.gencodegenes.org/) Project, release 8, the number of human coding loci is 21,494. The key words here are ‘coding loci’. Dr Jen Harrow, the Joint Head of Vertebrate Annotation at the Wellcome Trust Sanger Institute, avoids using the word ‘gene’ in this regard; there are protein-coding genes as well as non-coding genes present in the human genome.
A series of articles by ‘Functional Annotation of Mammalian genome (FANTOM)’ projects, the ‘ENCyclopedia Of DNA Elements (ENCODE)’ consortium and others have clearly indicated that a majority of the human genome is transcribed in the form of RNAs, yet only a few per cent of them fall under the category of protein-coding genes; the current estimate is that only ~ 1.2% of mammalian genomes encode for protein-coding genes (Clark et al., 2011). Previously, non-coding RNAs (ncRNAs) were discarded as transcriptional noise and experimental error. However, with the discovery of micro-RNAs (miRNAs) and other types of ncRNAs [e.g. long non-coding RNAs (lncRNAs)], it is now clear that a minor proportion of RNAs are indeed translated as coding for proteins. The concept of ncRNAs is not new, for example as with ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs), which are necessary for protein translation. The obvious question is thus is the definition of a ‘gene’ correct? In other words, how do we define a gene? This turns out to be very difficult to answer. Several years ago, Nature and Science published articles discussing this point (Pearson, 2006; Pennisi, 2007). An interesting article was also published by the members of the ENCODE consortium (Gerstein et al., 2007), in which the authors provided an extensive review of the definitions of a gene used over the past century. After careful consideration, they proposed a new definition: ‘A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.’ As stated in their article, important aspects of this definition include that proteins and RNA products (e.g. ncRNAs) must be functional. With this definition, the authors proposed to move the long-standing biological question of ‘what is a gene?’ to ‘what is a function?’. We therefore still have a long way to go to fully understand RNAs (both traditional genes and ncRNAs), which will require further biological experiments.
With the emergence of high-throughput techniques, such as microarrays and next-generation (deep) sequencing, the amount of biological data accumulated has expanded tremendously in the last few decades. These data now number not in the terabytes but in the petabytes (Stewart et al., 2007). We clearly cannot interpret these data by hand, and numerous academic and commercial software products and databases are now available to facilitate their efficient analysis. However, the knowledge extracted from these data through such tools is still limited due to dependence on previous biological results and on connecting such results in an understandable manner. This dependency dismisses the presence of so-called ‘functionally unknown genes’. These are putative genes whose presence is predicted from computational sequence analyses, but whose biological functions remain unknown; no published results are available for such genes (Uchida et al., 2009).
For example, when a researcher performs microarray experiments to compare cells or tissues in one condition with another (e.g. operated and non-operated hearts), he or she hopes to identify known pathways in addition to new potential candidate genes that may interact with such pathways. However, in most cases, one may be able to identify several genes that are known to belong to the pathways and a number of genes with unknown function. When such unknown genes pop up in the results, an immediate response might be that these genes may possess functions related to a certain condition under study. However, it takes tremendous effort, money and time to elucidate the biological functions of such genes. To reduce the number of such genes to be studied further, other experiments, such as real-time reverse transcriptase PCR, must be performed to validate the results of the microarray data. Even then, the number of such functionally unknown genes is so large that understanding their biological functions would take several years or decades, if this were possible at