Professional Documents
Culture Documents
PratThiru
Outline
IntrotoRNASeq
BiologicalQues6ons
ComparisonwithOtherMethods
RNASeqProtocol
RNASeqApplica6ons
Annota6on
Quan6ca6on
OtherApplica6ons
ExpressionProlingStepsandSoGware
RunningTopHatandCuinks(Commands)
2
GoalsofSequencingtheTranscriptome
Annota6on
Iden6fygenes,exons,splicingevents,ncRNAs,etc.
Novelgenesortranscripts
Quan6ca6on
Abundanceoftranscriptsbetweendierent
condi6ons
Transcriptome:RNAWorld
hYp://nchtalk.geospiza.com/2009/05/smallrnasgetsmaller.html
Transcriptome:Complexity
hYp://www.ncbi.nlm.nih.gov/books/NBK21128/
ComparisonofMethodsfor
StudyingtheTranscriptome
Technology
Tilingmicroarray
cDNAorESTsequencing RNASeq
Principle
Hybridiza6on
Sangersequencing
Highthroughputsequencing
Resolu.on
Fromseveralto100bp
Singlebase
Singlebase
Throughput
High
Low
High
Relianceongenomicsequence
Yes
No
Insomecases
Backgroundnoise
Applica0on
High
Low
Low
Simultaneouslymaptranscribed
Yes
regionsandgeneexpression
Limitedforgeneexpression
Yes
Dynamicrangetoquan.fygene
Uptoafewhundredfold
expressionlevel
Notprac6cal
>8,000fold
Limited
Yes
Yes
Limited
Yes
Yes
High
High
Low
Costformappingtranscriptomes
High
oflargegenomes
High
Rela6velylow
Technologyspecica0ons
Abilitytodis.nguishdierent
isoforms
Abilitytodis.nguishallelic
expression
Prac0calissues
RequiredamountofRNA
Wang,Z.etal.RNASeq:arevolu.onarytoolfortranscriptomicsNatureReviewsGene6cs(2009)
RNASeqExperiment
Wang,Z.etal.RNASeq:arevolu.onarytoolfortranscriptomicsNatureReviewsGene6cs(2009)
Outline
IntrotoRNASeq
BiologicalQues6ons
ComparisonwithOtherMethods
RNASeqProtocol
RNASeqApplica6ons
Annota6on
Quan6ca6on
OtherApplica6ons
ExpressionProlingStepsandSoGware
RunningTopHatandCuinks(Commands)
8
RNASeqApplica6onsAnnota6on:
Alterna6veSplicingEvents
Ozsolak,F.andMilos,P.RNAsequencing:advances,challengesandopportuni.esNatureReviewsGene6cs(2011)
RNASeqApplica6onsAnnota6on:
Iden6fyKnownandNovelTranscripts
Knownexons/gene
GuYman,M.etalAbini.oreconstruc.onofcelltypespecictranscriptomesinmouse
revealstheconservedmul.exonicstructureoflincRNAsNatureBiotechnology(2010)
MappedReads:
novelexonorgene?
UnmappedReads:
novelsplicejunc6ons?
Trapnell,C.etalTranscriptassemblyandquan.ca.onbyRNASeqrevealsunannotated
transcriptsandisoformswitchingduringcelldieren.a.onNatureBiotechnology(2010)
10
AssemblyandMappingRNASeq
Op6ons:
Alignandthen
assemble
Assembleandthen
align
Alignto
genome
transcriptome
Haas,B.J.,andZody,M.C.AdvancingRNASeqanalysisNatureBiotechnology(2010)
11
RNASeqApplica6onsQuan6ca6on:
ExpressionProling
MortazaviA.,etal.Mappingandquan.fyingmammaliantranscriptomesbyRNASeqNatureMethods(2008)
12
NeedforNormaliza6on
Morereadsmappedtoatranscriptifitis
i)long
ii)athigherdepthofcoverage
Normalizesuchthat
i)featuresofdierentlengths
ii)totalsequencefromdierentcondi6ons
canbecompared
13
Quan6fyingExpression:RPKM
RPKM:ReadsPerKilobaseperMillionmapped
reads
RPKM=
C:Numberofmappablereadsonafeature(eg.
transcript,exon,etc.)
L:Lengthoffeature(inkb)
N:Totalnumberofmappablereads(inmillions)
MortazaviA.,etal.Mappingandquan.fyingmammaliantranscriptomesbyRNASeqNatureMethods(2008)
14
RPKMExample
GeneA600basesGeneB1100basesGeneC1400bases
RPKM=12/(0.6*6)=3.33RPKM=24/(1.1*6)=3.64RPKM=11/(1.4*6)=1.31
Sample1
Sample2
C=12C=24C=11
N=6M
C=19C=28C=16
N=8M
RPKM=19/(0.6*8)=3.96RPKM=28/(1.1*8)=1.94RPKM=16/(1.4*8)=1.43
15
Quan6fyingExpression:FPKM
FPKM:FragmentsPerKilobaseoftranscriptper
Millionfragmentsmapped
AnalogoustoRPKMbutdoesnotusereadcounts.
therela6veabundancesoftranscriptsaredescribedin
termsoftheexpectedbiologicalobjects(fragments)
observedfromanRNASeqexperiment,whichinthe
futuremaynotberepresentedbysingleread
Trapnell,C.etalTranscriptassemblyandquan.ca.onbyRNASeqrevealsunannotatedtranscriptsandisoformswitchingduringcell
dieren.a.onNatureBiotechnology(2010)
16
Quan6fyingExpression:
Normaliza6onMethods
Totalcount(eg.RPKM)
UpperQuar6le(eg.75th
percen6le):SimilartoTotal
countbutperlaneupper
quar6leofcountsforgenes
withreadsinatleastonelane.
Quan6le:Foreachlanethe
distribu6onofreadcountsis
matchedtoareference
distribu6ondenedintermsof
mediancounts
Bullard,J.,etal.Evalua.onofsta.s.calmethodsfornormaliza.onanddieren.alexpressioninmRNASeqexperiments
BMCBioinforma6cs(2010)
17
RNASeqApplica6ons:
GeneFusion
Ozsolak,F.andMilos,P.RNAsequencing:advances,challengesandopportuni.esNatureReviewsGene6cs(2011)
18
Outline
IntrotoRNASeq
BiologicalQues6ons
ComparisonwithOtherMethods
RNASeqProtocol
RNASeqApplica6ons
Iden6fyingTranscripts
Quan6ca6on
OtherApplica6ons
ExpressionProlingStepsandSoGware
RunningTopHatandCuinks(Commands)
19
ExpressionProlingWorkow
QC:FilterShortReads
AlignandAssemble
orAssembleandAlign
Computa6onalAnalysis:
Quan6fyExpression,or
otherapplica6ons
VisualizeData
(SeeHotTopicsonMappingNGSReads)
FASTXToolkit
FastQC
R:ShortRead
AlignwithTopHat,assemblewithCuinks
Cucompare,Cudi
SAMtools,BEDtools
R:edgeR,DESeq
IGV(SeeHotTopicsonIGV)
UCSCGenomeBrowser
20
TheTuxedoTools
hYp://mged12deepsequencinganalysis.wikispaces.com/le/view/Cole_MGED_tutorial_slides.pdf
21
TopHatAlgorithm
Trapnell,C.,etalTopHat:discoveringsplicejunc.onswithRNASeqBioinforma6cs(2009)
22
CuinksAlgorithm
Trapnell,C.,etalTranscriptassemblyandquan.ca.onbyRNASeqrevealsunannotatedtranscriptsandisoformswitchingduringcell
dieren.a.onNatureBiotechnology(2010)
23
Outline
IntrotoRNASeq
BiologicalQues6ons
ComparisonwithOtherMethods
RNASeqProtocol
RNASeqApplica6ons
Iden6fyingTranscripts
Quan6ca6on
OtherApplica6ons
ExpressionProlingStepsandSoGware
RunningTopHatandCuinks(Commands)
24
RunningTopHat:AlignReads
TopHatManual:hYp://tophat.cbcb.umd.edu/
manual.html
RunningTopHatonTak
Usage:
tophat[op6ons]<bow6e_index><reads1[,reads2,...,readsN]>[reads1[,reads2,...,readsN]]
eg.
bsubtophatp2solexa1.3qualsmaxmul6hits5os_1_TopHat_Out/nfs/genomes/
mouse_gp_jul_07_no_random/bow6e/mm9s_1_sequence.txt
Op6ons(SeeManualforallavailableop6ons):
o/outputdir
solexaquals
solexa1.3quals
p/numthreads
g/maxmul6hits
SetsthenameofthedirectoryinwhichTopHatwillwriteallofitsoutput.
UsetheSolexascaleforqualityvaluesinFASTQles.
AsoftheIlluminaGApipelineversion1.3,qualityscoresareencodedinPhredscaledbase64.
Usethisop6onforFASTQlesfrompipeline1.3orlater.
Usethismanythreadstoalignreads.Thedefaultis1.
InstructsTopHattoallowuptothismanyalignmentstothereferenceforagivenread,and
suppressesallalignmentsforreadswithmorethanthismanyalignments.Thedefaultis40.
25
TopHatOutput
OutputofTopHatisabamle.Binaryversionof
SequenceAlignment/Map(SAM)le
UseIntegra6veGenomicsViewer(IGV)toviewbam
leoruseSAMtoolstoanalyzebamle
eg.SAMFile
WICMTSOLEXA:1:20:670:1533#137chr13240920330M*00CTGGATCTGGACCTGGACCTGGATCTATAT:::::::::::::::::::::::::::::NM:i:1NH:i:2CC:Z:chr6CP:i:83893005
WICMTSOLEXA:1:69:135:1285#89chr13269437130M*00TGCCTAAACTTATTAAGGCAGGCCATGGGC:((/+:::(+:+':/:+++&+//':++:::NM:i:2NH:i:4CC:Z:chr7CP:i:20934843
WICMTSOLEXA:1:84:584:747#153chr13270083030M*00AGCAAGTTTTTTNTTAGCCCTAGATTCCAG::::::::::::%:::::::::::::::::NM:i:1NH:i:5CC:Z:=CP:i:136301734
WICMTSOLEXA:1:75:1357:1675#163chr1352212825530M=35222870GTGGCTTTGTGGTCTTCACCAACCTTTCTC::::::::::::::::::::::::::::::NM:i:1NH:i:1
WICMTSOLEXA:1:75:1357:1675#83chr1352228725530M=35221280CTGTAGGTGTAATCCTAAATTCTTATTACG::::::::::::::::::::::::::::::NM:i:0NH:i:1
WICMTSOLEXA:1:8:59:283#153chr13522536330M*00TTTCTGCTTTGATTATGGTACTGATGTCTG:::::::::::4::::::::::::::::::NM:i:2NH:i:2CC:Z:chr5CP:i:134317691
WICMTSOLEXA:1:12:1161:945#89chr13523371130M*00TCTACATAGCCCAAACTGGCTTTGGACTCT::::::::::::::::::::::::::::::NM:i:0NH:i:3CC:Z:chr10CP:i:117172515
WICMTSOLEXA:1:45:1469:1826#73chr13620888330M*00CAAGTATTTAATGTTTTCATTAAATTGTTT::::::::::::::::::::::::::4:::NM:i:0NH:i:2CC:Z:chr11CP:i:22903295
WICMTSOLEXA:1:14:536:150#73chr13620943330M*00CTGGAAGACAATGTCCAAAAACTCTGAATC:::::::::::::::::::::::::%::&:NM:i:1NH:i:2CC:Z:chr11CP:i:22903240
WICMTSOLEXA:1:66:646:1188#137chr13662923030M*00AAAAAAAAAACACCACCCCCAACAAAAAAA+00++0+0+''0++++:00::.&:::,:,:NM:i:2NH:i:5CC:Z:chr10CP:i:94881279
26
Cuinks:
AssembleandQuan6fyReads
CuinksManual:
hYp://cuinks.cbcb.umd.edu/manual.html
RunningCuinksonTak
Op6onal:Supplyannota6oninGTFformatwith
Gop6on
Usage:
cuinks[op6ons]<hits.bam>
eg.
bsubcuinksp2os_1_Cuinks_Outs_1_TopHat_Out/accepted_hits.bam
eg.cuinkswillassembleandquan6fyusingknowntranscriptsusingg~lesupplied
bsubcuinksp2Gtranscripts.g~accepted_hits.bam
27
CuinksOutput
OutputofCuinksisaGTFlewithassembledisoforms
eg.
chr1Cuinkstranscript36321447363302701000.gene_id"Neurl3";transcript_id"NM_153408";FPKM"3.7155221121";frac"1.000000";
conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";
chr1Cuinksexon36321447363233981000.gene_id"Neurl3";transcript_id"NM_153408";exon_number"1";FPKM"3.7155221121";frac
"1.000000";conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";
chr1Cuinksexon36325501363255541000.gene_id"Neurl3";transcript_id"NM_153408";exon_number"2";FPKM"3.7155221121";frac
"1.000000";conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";
chr1Cuinksexon36326058363265461000.gene_id"Neurl3";transcript_id"NM_153408";exon_number"3";FPKM"3.7155221121";frac
"1.000000";conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";
chr1Cuinksexon36330183363302701000.gene_id"Neurl3";transcript_id"NM_153408";exon_number"4";FPKM"3.7155221121";frac
"1.000000";conf_lo"0.000000";conf_hi"7.570660";cov"0.649922";
chr1Cuinkstranscript36364578363808744+.gene_id"Arid5a";transcript_id"NM_145996";FPKM"0.0015751054";frac"0.002360";conf_lo
"0.000000";conf_hi"0.081996";cov"0.000263";
chr1Cuinksexon36364578363646814+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"1";FPKM"0.0015751054";frac
"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";
chr1Cuinksexon36373054363731724+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"2";FPKM"0.0015751054";frac
"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";
chr1Cuinksexon36374929363750264+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"3";FPKM"0.0015751054";frac
"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";
chr1Cuinksexon36375333363754984+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"4";FPKM"0.0015751054";frac
"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";
chr1Cuinksexon36375837363808744+.gene_id"Arid5a";transcript_id"NM_145996";exon_number"5";FPKM"0.0015751054";frac
"0.002360";conf_lo"0.000000";conf_hi"0.081996";cov"0.000263";
28
LocalResources
Descrip6onofavailableles,see
/nfs/genomes/BaRC_Genomes_README.txt
Bow6eindex
/nfs/genomes/<species>/bowtie
eg.
/nfs/genomes/mouse_gp_jul_07_no_random/bowtie
GTFles
/nfs/genomes/<species>/gtf
eg.
/nfs/genomes/mouse_gp_jul_07/gtf
29
FurtherReading
RNASeq
Mortazavi,A.,etal.Mappingandquan.fyingmammaliantranscriptomesbyRNASeqNatureMethods
5(7):621628(2008)
Wang,Z.,atal.RNASeq:arevolu.onarytoolfortranscriptomicsNatureReviewsGene6cs10:5763
(2009)
Ozsolak,F.andMilosP.M.RNAsequencing:advances,challenges,andopportuni.esNatureReviews
Gene6cs12:8798(2011)
TopHat
Trapnell,C.,etal.TopHat:discoveringsplicejunc.onswithRNASeqBioinforma6cs25(9)11051111
(2009)
Cuinks
Trapnell,C.,etal.Transcriptassemblyandquan.ca.onbyRNASeqrevealsunannotatedtranscripts
andisoformswitchingduringcelldieren.a.onNatureBiotechnology28(5)511515(2010)
30
OnlineCommunity
ForumandDiscussion
hYp://seqanswers.com/
31