Professional Documents
Culture Documents
Sanger sequencing and microarrays. Sanger sequencing technology was first used for
transcriptomics, which enabled methods such as SAGE (serial analysis of gene expression). SAGE
was one of the first attempts to quantify gene expression on a global basis. Almost instantaneously,
microarrays utilizing complementary probe hybridization, quickly emerged and come to dominate the
field of transcriptomics profiling for the next decade.
NGS. The advent of next-generation technologies has enabled the sequencing approach to surpass
microarray approach. In 2006, the first RNA-seq paper was published by utilizing454/Roche technology.
The era of RNA-seq dominance began in 2008 with the maturity of Illumina/Solexa technology. The
most popular technical platforms for RNA-Seq has been the Illumina Genome Analyzer and Hi-Seq.
While the Illumina/Solexa technology can generate gigabases of data per run (initially 1GB per run for
the Genome Analyzer in 2006 and 600 GB per run for the HiSeq in 2012), Roche/454 technology
generates reads long enough for RNA-seq but are hampered by the relatively low throughput and high
cost.
Third generation sequencing. Despite the popularization of the NGS technologies, the application of third
generation sequencing in RNA-seq is on its way. For examples, Heliscope sequencing and single-molecule
real-time (SMRT) sequencing have already been applied in some RNA-seq studies. PacBio SMRT long reads
sequencing technology can easily cover complete transcript from the 5' end to the 3'-poly A tail without the need
of fragmentation to obtain full-length cDNA sequences, which is useful to identify new transcripts and new introns,
thereby accurately identifying isoforms, alternative splicing sites, fusion gene expression, and allelic expression.
Table 1. The advantages of RNA-seq compared with other transcriptomics approaches (Wang et al. 2009).
Challenges of RNA-seq
⚫ Short-read. Illumina sequencing technology has steadily increased read length and throughput
since its introduction in 2007. Long paired-end strand-specific reads are commonly used for higher
levels of mappability and de novo assembly of transcriptomes. Furthermore, the third generation
sequencing technology (such as PacBio and Ion-Torrent) enables full-length transcripts
sequencing.
⚫ PCR biases. Another concern is the impact of PCR amplification on the accuracy of gene
expression quantitation via RNA-seq. Helicos and some of the third sequencer used an
amplification-free technology. There are also PCR-free methods for Illumina sequencing.
⚫ Library construction
Following sample collection, total RNA is usually isolated via organic extraction and/or
silica-membranes of spin columns. Total RNA sample is subsequently processed either by direct
selection of poly(A) RNA or by selective removal of rRNA because the abundant rRNA is usually not the
research focus and greatly reduces the coverage of the useful transcript. Oligo(dT)-based mRNA
purification procedure is widely used in eukaryotes. However, some RNA transcripts that lack the poly(A)
tails are missed. Compared to the poly(A) RNA selection, ribo-depletion approach is preferred because
it enriches all nonribosomal RNA species, including tRNA, ncRNAs, nonpoly(A) mRNA, and
preprocessed RNA. The two most popular rRNA depletion methods are: (i) hybridization of rRNA with
biotin-labeled anti-rRNA probes, followed by removal with streptavidin-caoted magnetic beads; and (ii)
selective degradation of rRNA by a 5’-3’ exonuclease that specifically recognizes rRNA with a 5’
phosphate.
Fragmentation is subsequently conducted to reach the desired length for different NGS technologies.
Some small RNAs, such as microRNAs, piwi-interacting RNAs, and short interfering RNAs, can be
directly sequenced without fragmentation. Larger RNA molecules need to be fragmented into smaller
pieces (200-500 bp) before deep-sequencing technologies by cDNA fragmentation (DNase I treatment
or sonication) or RNA fragmentation (RNA hydrolysis or nebulization). However, each of these methods
can create a different bias in the outcome. For example, cDNA fragmentation is usually strongly biased
towards the identification of sequences from the 3’ ends of transcripts, while RNA fragmentation has
little bias over the transcript but is depleted for transcript ends. Therefore, cDNA fragmentation provides
valuable information about the precise identity of these ends and RNA fragmentation provides access to
precisely identity of the transcript body.
In the classic NGS protocols, adapters are ligated onto shared double-stranded DNA fragments.
However, a major drawback of this approach is the loss of information on transcriptional direction.
Pre-treat the RNA samples with sodium bisulphate can convert the cytidine into uridine. Widespread
C-T transition thereby marks the coding stand of each transcript. Some other methods that maintain
strand-specificity have been proposed, such as direct ligation of RNA adaptors to the RNA sample
before reverse transcription.
⚫ Sequencing
The RNA-seq is currently dominated by three different platforms: Illumina (Genome Analyzer and
HiSeq), Ion Torrent PGM, and Roche 454 Life Science systems. Read lengths range from 200-600 bp
for Illumina, 400 bp for Ion Torrent PGM, and 400-700 bp for 454 pyrosequencing system. 454-based
RNA-seq is particularly attractive for non-model organisms without reference genomes or
transcriptomes. Longer reads or paired-end short reads can reveal connectivity between multiple exons.
RNA-seq is a powerful method to study complex transcriptomes and reveal sequence variations in the
transcribed regions.
⚫ Bioinformatics
Figure 3. A typical analysis pipeline of RNA-seq data.
Quality assessment is the first step for the bioinformatics analysis of RNA-seq, which ensures a
coherent final result by removal of low-quality sequences, over-represented sequences, and adapter
sequences. Once all reads have been filtered and mapped or assembled, gene expression levels can
thus be inferred, leading to a genome-scale transcriptome map in terms of quality and quantity.
RNA-seq also allows detecting differential expression (DE) across treatments of conditions.
Normalization has to be conducted to adjust the differences between samples such as library size and
gene-specific features. Furthermore, RNA-seq enables us to identify SNPs, fusion genes, and
post-transcriptional gene regulation, such as RNA editing, degradation, and translation.
References:
1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews genetics,
2009, 10(1): 57.
2. Qian X, Ba Y, Zhuang Q, et al. RNA-Seq technology and its application in fish transcriptomics. Omics: a journal
of integrative biology, 2014, 18(2): 98-110.
3. Marguerat S, Bähler J. RNA-seq: from technology to biology. Cellular and molecular life sciences, 2010, 67(4):
569-579.