The Technologies and Workflow of RNA-seq

The technologies and workflow of RNA-seq
Technical advances in RNA-seq
Sanger sequencing and microarrays. Sanger sequencing technology was first used for
transcriptomics, which enabled methods such as SAGE (serial analysis of gene expression). SAGE
was one of the first attempts to quantify gene expression on a global basis. Almost instantaneously,
microarrays utilizing complementary probe hybridization, quickly emerged and come to dominate the
field of transcriptomics profiling for the next decade.
NGS. The advent of next-generation technologies has enabled the sequencing approach to surpass
microarray approach. In 2006, the first RNA-seq paper was published by utilizing454/Roche technology.
The era of RNA-seq dominance began in 2008 with the maturity of Illumina/Solexa technology. The
most popular technical platforms for RNA-Seq has been the Illumina Genome Analyzer and Hi-Seq.
While the Illumina/Solexa technology can generate gigabases of data per run (initially 1GB per run for
the Genome Analyzer in 2006 and 600 GB per run for the HiSeq in 2012), Roche/454 technology
generates reads long enough for RNA-seq but are hampered by the relatively low throughput and high
cost.
Third generation sequencing. Despite the popularization of the NGS technologies, the application of third
generation sequencing in RNA-seq is on its way. For examples, Heliscope sequencing and single-molecule
real-time (SMRT) sequencing have already been applied in some RNA-seq studies. PacBio SMRT long reads
sequencing technology can easily cover complete transcript from the 5' end to the 3'-poly A tail without the need
of fragmentation to obtain full-length cDNA sequences, which is useful to identify new transcripts and new introns,
thereby accurately identifying isoforms, alternative splicing sites, fusion gene expression, and allelic expression.
Table 1. The advantages of RNA-seq compared with other transcriptomics approaches (Wang et al. 2009).
Technology Tiling microarray cDNA or EST sequencing RNA-seq

Technology specifications
High-throughput
Principle hybridization Sanger sequencing
sequencing
From several to
Resolution Single base Single base
100 bp
Throughput High Low High
Reliance on genomic sequence Yes No In some cases
Background noise High Low Low
Application
Simultaneously map transcribed regions Limited for gene
Yes Yes
and gene expression expression
Dynamic range to quantify gene expression Up to a
Not practical >8,000-fold
level few-hundredfold
Ability to distinguish different isoforms Limited Yes Yes
Ability to distinguish allelic expression Limited Yes Yes
Practical issues
Required amount of RNA High High Low
Cost for mapping transcriptomes of large
High High Relatively low
genomes
Challenges of RNA-seq
⚫ Short-read. Illumina sequencing technology has steadily increased read length and throughput
since its introduction in 2007. Long paired-end strand-specific reads are commonly used for higher
levels of mappability and de novo assembly of transcriptomes. Furthermore, the third generation
sequencing technology (such as PacBio and Ion-Torrent) enables full-length transcripts
sequencing.
⚫ PCR biases. Another concern is the impact of PCR amplification on the accuracy of gene
expression quantitation via RNA-seq. Helicos and some of the third sequencer used an
amplification-free technology. There are also PCR-free methods for Illumina sequencing.
Workflow of RNA-seq based on NGS
The workflow of RNA-seq by utilizing high-throughput sequencing technology is illustrated in Figure 1.

Briefly, long RNAs are first converted into a library of cDNA fragments through RNA or DNA
fragmentation. Sequencing adaptors are then attached to each cDNA fragment and sequence data are
generated in a high-throughput manner from both ends (paired-end sequencing). The resulting
sequence reads are subsequently aligned with the reference genome or transcriptome, and are
classifies into three types: exonic reads, junction reads and poly(A) end-reads. A base-resolution
expression profile can be generated by using these three types of sequence reads.
Figure 1. A typical workflow of RNA-seq (Wang et al. 2009).
⚫ Library construction
Figure 2. A typical library construction pipeline of RNA-seq.
Following sample collection, total RNA is usually isolated via organic extraction and/or
silica-membranes of spin columns. Total RNA sample is subsequently processed either by direct
selection of poly(A) RNA or by selective removal of rRNA because the abundant rRNA is usually not the
research focus and greatly reduces the coverage of the useful transcript. Oligo(dT)-based mRNA
purification procedure is widely used in eukaryotes. However, some RNA transcripts that lack the poly(A)
tails are missed. Compared to the poly(A) RNA selection, ribo-depletion approach is preferred because
it enriches all nonribosomal RNA species, including tRNA, ncRNAs, nonpoly(A) mRNA, and
preprocessed RNA. The two most popular rRNA depletion methods are: (i) hybridization of rRNA with
biotin-labeled anti-rRNA probes, followed by removal with streptavidin-caoted magnetic beads; and (ii)
selective degradation of rRNA by a 5’-3’ exonuclease that specifically recognizes rRNA with a 5’
phosphate.
Fragmentation is subsequently conducted to reach the desired length for different NGS technologies.
Some small RNAs, such as microRNAs, piwi-interacting RNAs, and short interfering RNAs, can be
directly sequenced without fragmentation. Larger RNA molecules need to be fragmented into smaller
pieces (200-500 bp) before deep-sequencing technologies by cDNA fragmentation (DNase I treatment
or sonication) or RNA fragmentation (RNA hydrolysis or nebulization). However, each of these methods
can create a different bias in the outcome. For example, cDNA fragmentation is usually strongly biased
towards the identification of sequences from the 3’ ends of transcripts, while RNA fragmentation has
little bias over the transcript but is depleted for transcript ends. Therefore, cDNA fragmentation provides
valuable information about the precise identity of these ends and RNA fragmentation provides access to
precisely identity of the transcript body.
In the classic NGS protocols, adapters are ligated onto shared double-stranded DNA fragments.
However, a major drawback of this approach is the loss of information on transcriptional direction.
Pre-treat the RNA samples with sodium bisulphate can convert the cytidine into uridine. Widespread
C-T transition thereby marks the coding stand of each transcript. Some other methods that maintain
strand-specificity have been proposed, such as direct ligation of RNA adaptors to the RNA sample
before reverse transcription.
⚫ Sequencing
The RNA-seq is currently dominated by three different platforms: Illumina (Genome Analyzer and
HiSeq), Ion Torrent PGM, and Roche 454 Life Science systems. Read lengths range from 200-600 bp
for Illumina, 400 bp for Ion Torrent PGM, and 400-700 bp for 454 pyrosequencing system. 454-based
RNA-seq is particularly attractive for non-model organisms without reference genomes or
transcriptomes. Longer reads or paired-end short reads can reveal connectivity between multiple exons.
RNA-seq is a powerful method to study complex transcriptomes and reveal sequence variations in the
transcribed regions.
⚫ Bioinformatics
Figure 3. A typical analysis pipeline of RNA-seq data.
Quality assessment is the first step for the bioinformatics analysis of RNA-seq, which ensures a
coherent final result by removal of low-quality sequences, over-represented sequences, and adapter
sequences. Once all reads have been filtered and mapped or assembled, gene expression levels can
thus be inferred, leading to a genome-scale transcriptome map in terms of quality and quantity.
RNA-seq also allows detecting differential expression (DE) across treatments of conditions.
Normalization has to be conducted to adjust the differences between samples such as library size and
gene-specific features. Furthermore, RNA-seq enables us to identify SNPs, fusion genes, and
post-transcriptional gene regulation, such as RNA editing, degradation, and translation.
References:
1. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature reviews genetics,
2009, 10(1): 57.
2. Qian X, Ba Y, Zhuang Q, et al. RNA-Seq technology and its application in fish transcriptomics. Omics: a journal
of integrative biology, 2014, 18(2): 98-110.
3. Marguerat S, Bähler J. RNA-seq: from technology to biology. Cellular and molecular life sciences, 2010, 67(4):
569-579.
4. Wilhelm B T, Landry J R. RNA-Seq—quantitative measurement of expression through massively parallel

RNA-sequencing. Methods, 2009, 48(3): 249-257.
5. McGettigan P A. Transcriptomics in the RNA-seq era. Current opinion in chemical biology, 2013, 17(1): 4-11.

The Technologies and Workflow of RNA-seq

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Technologies and Workflow of RNA-seq

Uploaded by

Copyright:

Available Formats

The technologies and workflow of RNA-seq

Technical advances in RNA-seq

Technology Tiling microarray cDNA or EST sequencing RNA-seq

Workflow of RNA-seq based on NGS

The workflow of RNA-seq by utilizing high-throughput sequencing technology is illustrated in Figure 1.

Figure 2. A typical library construction pipeline of RNA-seq.

4. Wilhelm B T, Landry J R. RNA-Seq—quantitative measurement of expression through massively parallel

You might also like