You are on page 1of 34

Compaiative uenomics Tutoiial p1

8acter|a| Comparat|ve Genom|cs 1utor|a|




Accompanying the paper:

Beginners guide to comparative bacterial genome analysis using next-
generation sequence data
David J. Edwards, Kathryn E. Holt

BMC Microbial Informatics, 2013

Last updated March 2013

"# $%&'(% )**%(+,- )&. )&&'/)/0'& ########################################################################## 1
1.1 Downloading E. coli sequences for assembly .................................................... 2
1.2 Examining quality of reads (FastQC) ................................................................. 2
1.3 Velvet assembling reads into contigs .............................................................. 4
1.3.1 Using VelvetOptimiser to optimise de novo assembly with Velvet .............. 6
1.4 Ordering contigs against a reference using Mauve ........................................... 7
1.4.1 Viewing the ordered contigs (Mauve) ........................................................ 10
1.4.2 Viewing the ordered contigs (ACT) ........................................................... 13
1.5 Mauve Assembly Metrics Statistical View of the Contigs .............................. 15
1.6 Annotation with RAST ...................................................................................... 15
1.6.1 Alternatives to RAST ................................................................................. 19
1# 2'(3)4)/05% 6%&'(% )&),-*0* ############################################################################### 17
2.1 Downloading E. coli genome sequences for comparative analysis ................. 20
2.2 Mauve for multiple genome alignment .......................................................... 21
2.3 ACT for detailed pairwise genome comparisons .......................................... 24
2.3.1 Generating comparison files for ACT ........................................................ 24
2.3.2 Viewing genome comparisons in ACT ...................................................... 27
2.4 BRIG Visualizing reference-based comparisons of multiple sequences ....... 29
8# 9-30&6 )&. *3%:0),0*/ /'',* ##################################################################################### 8;
3.1 PHAST for identification of phage sequences ............................................... 34
3.2 ResFinder for identification of resistance gene sequences ........................... 34
3.3 Multilocus sequence typing .............................................................................. 34
3.4 PATRIC online genome comparison tool ...................................................... 34

Compaiative uenomics Tutoiial p2
1. Genome assemb|y and annotat|on

1.1 Down|oad|ng !" $%&' sequences for assemb|y
In this pait of the tutoiial, we will cieate a uiaft quality !" $%&' 014:B4 genome
assembly to use in the compaiative genome analysis. To stait with we neeu
sequences to assemble. Foi the woikeu example we aie using Illumina BiSeq
paiieu-enu ieaus fiom !" $%&' 01u4:B4 stiain TY-2482 (ENA accession
SRR29277u) - available heie
http:www.ebi.ac.ukenauataviewSRR29277u&uisplay=html. Locate the
'Fastq files (ftp)' column anu iight-click on each of the two file links, choosing
'Save Link as.' to save them to youi computei. These aie in fastq foimat (see
http:en.wikipeuia.oigwikiFASTQ_foimat) anu compiesseu using gzip (you
uo not neeu to uncompiess them).

Remembei to uownloau both foiwaiu anu ieveise ieaus (nameu
'SRR29277u_1.fastq.gz' anu 'SRR29277u_2.fastq.gz'). Save these to a new foluei
(uiiectoiy) with a suitable name, e.g. 'compaiison_tut'. This will be oui woiking
uiiectoiy foi the tutoiial.

1.2 Lxam|n|ng qua||ty of reads (IastC)

Piioi to attempting to assemble a set of ieaus, it is goou piactice to examine the
ieaus to see if they aie of goou quality. A simple package to install anu iun to
examine ieaus is FastQC.

<%+*0/%= Bownloau anu install FastQC fiom
http:www.bioinfoimatics.babiaham.ac.ukpiojectsfastqc.
The website also featuies examples of goou anu pooi quality ieau sets foi a
numbei of sequencing platfoims.

2'(3)/0+0,0/-= }ava baseu, available foi Winuows, Linux anu Nac 0S X.
This tutoiial was cieateu using FastQC u.1u.1 on Nac 0S X. Some veisions of }ava
have been uisableu on the Nac 0S X, anu if you uo not have a veision of }ava
highei than veision 7u11 installeu, you may neeu to follow the suggestions in the
FAQ foi }ava (http:www.java.comenuownloaufaqjava_mac.xml).

>&3?/*= foiwaiu anu ieveise ieau sequence files (fastq foimat)

>&*/4?:/0'&*

0nce FastQC has been installeu, open the piogiam to begin. Then:

1. To select the file sequence to check, use 'File > 0pen' in the FastQC
menu. Navigate to the foluei that you put the TY-2482 ieaus anu
select the 'SRR29277u_1.fastq.gz' file. Nake suie 'File Foimat' is set to
Compaiative uenomics Tutoiial pS
'Sequence Files', then hit the '0pen' button. FastQC will commence the
analysis.



2. When the analysis has finisheu, you will be piesenteu with a seiies of
iepoits on the sequences. Select 'Pei base sequence quality' to see a
giaph of the same. It shoulu look like this:



You can also examine the othei iepoits.
Compaiative uenomics Tutoiial p4
Note that this sequence set passes most of the tests, though the sequence
uuplication level is a little high (aiounu 26%). The assembly coulu be impioveu
by fiist iemoving uuplicates by making use of a fastq quality contiol package
such as the commanu line tools FASTX-Toolkit
(http:hannonlab.cshl.euufastx_toolkit) oi Tiimmomatic
(http:www.usauellab.oigcmsinuex.php.page=tiimmomatic). Bowevei, as
the ieaus foi the tutoiial aie of otheiwise goou quality, we shall leave the
impoitant topic of quality contiol, anu its pit-falls, foi otheis to uesciibe. The
websites foi the two packages aie a goou place to stait, along with the
suppoiting infoimation foi FastQC.

You can now close FastQC anu continue with the iest of the tutoiial. If you wish
to save the iepoit befoiehanu, use 'File > Save iepoit.' befoie closing.

1.3 Ve|vet - assemb||ng reads |nto cont|gs
<%+*0/%= Bownloau anu install velvet anu its manual (~2S NB) fiom
http:www.ebi.ac.uk~zeibinovelvet

2'(3)/0+0,0/-= Can be compileu foi Winuows, Nac 0S X anu Linux, though a 64-
bit enviionment anu a minimum of 4uB of RAN aie iecommenueu.
This tutoiial was cieateu using velvet 1.2.u8 on Nac 0S X.

@%A%4%&:%= Zeibino, B. R. anu Biiney, E., velvet: algoiithms foi ue novo shoit
ieau assembly using ue Biuijn giaphs" ()*%+) ,)-, 2uu8. gi.u74492.1u7 |piij
1u.11u1gi.u74492.1u7.

>&*/4?:/0'&), @%A%4%&:%= Zeibino, B. R., 0sing the velvet ue novo assemblei foi
shoit-ieau sequencing technologies" ./00)*1 20%1%$%&- '* 3'%'*4%0+51'$- 6 )7'1%05&
3%5078 9*70)5- :" ;5<)=5*'- """ >)1 5&"?, 2u1u. 1u.1uu2u4712Su9SS.bi11uSsS1.

>&3?/*= foiwaiu anu ieveise ieau sequence files (fastq foimat)

>&*/4?:/0'&*
The 7) *%=% assembly piogiam velvet we useu was installeu with the
'NAXKNERLENuTB' set at 1u1 bp (make MAXKMERLENGTH=101) - see the
velvet manual foi moie uetails. Note that a maximum @-mei of 41 will be
sufficient foi this exeicise, but longei @-meis aie iequiieu when woiking with
longei BiSeq anu NiSeq geneiateu ieaus (which aie now typically >1uu bp).
Note you will also neeu to auu the velvet uiiectoiy to youi path, oi use the full
path to the 'velvetg' anu 'velveth' executables in the commanus below.
1. 0pen a teiminal session anu change the uiiectoiy to that containing the
SRR29277u ieaus files:

cd comparison_tut

2. Fiist we neeu to iun velveth, entei:

velveth out_data_35 35 -fastq.gz -shortPaired -separate
SRR292770_1.fastq.gz SRR292770_2.fastq.gz
Compaiative uenomics Tutoiial pS

This will take ~1-2 minutes anu will piouuce a hash table of the ieaus
using the specifieu @-mei length (k=SS), saving them to the foluei
'out_uata_SS'. The -shoitPaiieu anu -sepaiate tag tells velvet we aie
supplying shoit, paiieu enu ieaus with sepaiate files foi foiwaiu anu
ieveise ieaus. See manual foi othei input options.

S. The next velvet step to iun is velvetg to builu the giaph. Entei:

velvetg out_data_35 -clean yes -exp_cov 21 -cov_cutoff
2.81 -min_contig_lgth 200

This will take ~S minutes. Running this commanu will output a numbei of
files to the same foluei as velveth, incluuing the file containing oui newly
assembleu contigs - this will be labelleu 'contigs.fa'. Ninimum contig
length is set to 2uubp as this is the shoitest length alloweu foi uenBank
submission of uiaft genomes. The coveiage cut-offs specifieu heie aie
ones we have pie-ueteimineu to be optimal foi assembly of this ieau set.
See below foi info on using velvet0ptimisei to set cut-offs foi uiffeient
ieau sets.

4. Copy the contigs file fiom the velvet output foluei anu iename it:

cp out_data_35/contigs.fa SRR292770_unordered.fasta

You can then uelete the output foluei 'out-uata-SS', though you may want
to eithei save oi look at the statistic file, 'stats.txt', befoie uoing so.

Whilst we pioviue 'optimal' values foi the thiee options of velvet (@-mei = SS,
expecteu coveiage = 2u, coveiage cutoff of 2.81), these can be changeu to
examine how each affects the contigs piouuceu. Note: you can ieiun just the
velvetg commanu with new values if you aie vaiying only the lattei two anu
keeping the @-mei constant by keeping the velvet output foluei between iuns of
velvetg.

Compaiative uenomics Tutoiial p6
1.3.1 Us|ng Ve|vetCpt|m|ser to opt|m|se () *%+% assemb|y w|th Ve|vet
To get the 'optimal' values useu heie, we maue use of the Peil sciipt
velvet0ptimisei (we useu veision 2.2.S) available foi uownloau at
http:bioinfoimatics.net.ausoftwaie.velvetoptimisei.shtml. Beie, we pioviue
instiuctions foi iunning velvet0ptimisei to uemonstiate how these values weie
obtaineu, anu foi those inteiesteu in uoing the same - we incluue it as a fuithei
exeicise in making use of velvet. Those inteiesteu in exploiing both even fuithei
shoulu begin with the instiuctional papei by Zeibino (2u1u). (Those not yet
comfoitable with 0nix, Peil anu the commanu line may wish to skip the
following.)

In oiuei to iun velvet0ptimisei, you will also neeu to uownloau anu install both
Peil (veision S.8 oi latei, http:www.peil.oig) anu BioPeil (veision 1.4 oi
latei, http:www.biopeil.oigwikiNain_Page). 0bviously, you also neeu velvet
as above.

1. 0pen a teiminal session anu change to the uiiectoiy containing the ieaus
files.

2. To iun velvet0ptimisei, entei:

VelvetOptimiser.pl -s 33 -e 41 -f '-fastq.gz -shortPaired
-separate SRR292770_1.fastq.gz SRR292770_2.fastq.gz' -o
'-min_contig_lgth 200' -p SRR292770

With these settings, velvet0ptimisei will set up a seiies of velveth iuns using
ouu-numbei kmeis between SS anu 41. It then iuns velvetg foi each, taking the
one with the best NSu as the seeu foi the final optimisation of the coveiage
cutoff, wheie the numbei of bases in contigs longei than 1uubp is useu as the
optimising statistic. The output is the same as foi a iegulai velvet iun, though
the output foluei will have the piefix 'SRR29277u' to keep it sepaiate fiom the
eailiei velvet iun uesciibeu above. The logfile foi the iun
(SRR29277u_logfile.txt) contains uetails of the iun, incluuing the commanus
useu to iun velveth anu velvetg.

Foi those inteiesteu in assembling Ion Toiient sequence ieaus, we iecommenu
you tiy NIRA (veision S, http:www.chevieux.oigpiojects_miia.html). This
assemblei is also useful foi those inteiesteu in assembling ieaus fiom uiffeient
sequencing technologies into the one assembly - NIRA is piobably the best foi
this kinu of assembly pioject. 0nce you have assembleu the ieaus into contigs
using NIRA, the iest of the analysis can make use of the tools anu methous
uesciibeu heie foi Illumina-baseu ieaus.
Compaiative uenomics Tutoiial p7
1.4 Crder|ng cont|gs aga|nst a reference us|ng Mauve
0nce the sequence ieaus have been assembleu into contigs, it is useful to oiuei
them against a suitable iefeience genome. 0ne simple way to accomplish this is
to use the 'Nove Contigs' option available in A5/=) (which is also useu below foi
genome compaiisons).

<%+*0/%= http:asap.ahabs.wisc.euumauve (Incluues uownloau links,
installation instiuctions anu usei guiue)

2'(3)/0+0,0/-= }ava baseu, available foi Winuows, Nac 0S X, anu Linux
This tutoiial was cieateu using Nauve 2.S.1 on Nac 0S X.

@%A%4%&:%= Bailing, A. E., Nau, B. anu Peina, N. T., "piogiessiveNauve: multiple
genome alignment with gene gain, loss anu ieaiiangement". BC%D E*), 2u1u
S(6): e11147.

>&3?/*= These will be youi newly assembleu contigs anu a iefeience genome -
heie we have chosen to use EcSS989 (NCBI accession NC_u11748), a closely-
ielateu stiain with a complete genome, available foi uownloau fiom NCBI. uo to
this link:
ftp:ftp.ncbi.nih.govgenomesBacteiiaEscheiichia_coli_SS989_uiuS9S8S
Anu uownloau the sequence in fasta foimat, NC_u11748.fna (iight-click to save
to youi computei).

If you uon't want to iun youi own velvet assembly, you can uo the iest of the
exeicises using pie-assembleu !" $%&' 01u4:B4 contigs. uo to
http:www.ncbi.nlm.nih.govTiaceswgs.val=AFvSu1 anu click the
'Bownloau' tab, then iight-click the fasta file to youi computei anu unzip it.

>&*/4?:/0'&*
0nce you have installeu Nauve anu locateu youi iefeience genome anu contigs,
we can oiuei the contigs.

1. Launch the Nauve application.

2. Fiom the Tools menu, select 'Nove Contigs'.

S. A uialogue box shoulu appeai, with a box labelleu 'Choose location to
keep output files anu folueis'. Navigate to the foluei with the sequences
anu the copieu contigs, then click the 'Cieate New Foluei' iauio button.
uive this foluei a suitable name, )"F" 'Nauve0utput' anu then hit '0K'.

4. A message shoulu appeai telling you about the iteiative piocess involveu
in ieoiueiing the contigs. Take note of it, then hit '0K' to uismiss it.

S. A uialogue box shoulu appeai, with a box labelleu 'Align anu Reoiuei
Contigs'. Click the button below the box 'Auu Sequence.' anu navigate to
the iefeience genome to align against, in this case 'NC_u11748.fna'.
Compaiative uenomics Tutoiial p8

6. Click the 'Auu Sequence.' button again anu navigate to the fasta file of the
contigs you wish to align, 'SRR29277u_unoiueieu.fasta' fiom the
assembly exeicise above. Check that you have put the iefeience genome
fiist, anu the uiaft seconu, as expecteu by Nauve.


7. Click 'Stait' to iun the ieoiueiing. This might take half an houi oi so total.
A new winuow shoulu appeai maikeu 'Nauve Console' wheie the
piogiess of the iun will be uisplayeu, incluuing any eiioi messages (see
below foi an example). The ieoiueiing will take fiom foui to seven
iteiations (foi Nac 0S X; even up to 16 iteiations anu a bit moie time on a
S2-bit Winuows 0S). A new winuow of the visualization tool shoulu
launch foi each completeu iteiation, maikeu 'Nauve unknown -
alignmentX', wheie X is the iteiation numbei.

If you encountei eiiois, check that you have specifieu the iight files foi
input - they shoulu be fasta oi multi-fasta sequence files.

8. Finally, a message telling you the ieoiuei is completeu shoulu appeai. Bit
'0K' anu quit Nauve - though you can inspect the final alignment (anu the
otheis) befoiehanu.

9. The final set of oiueieu anu oiienteu contigs aie in the fasta file locateu in
the last of the iteiateu alignments. To finu it, look in the 'Nauve0utput'
foluei cieateu above. Foi each iteiation of the ieoiueiing theie will be an
output foluei, so the final output is the contig file locateu in the
subuiiectoiy 'alignmentX' with the highest X, wheie X is the iteiation
numbei. Rename 'SRR29277u_unoiueieu.fasta' in this subuiiectoiy, to
'SRR29277u.fasta' anu copy it to youi main woiking uiiectoiy ('"). the one
with the oiiginal sequence files, make suie you have changeu the name of
Compaiative uenomics Tutoiial p9
the oiueieu contigs file fiist as we will use the unoiueieu contigs in a
latei exeicise, )"F" 'SRR29277u_unoiueieu.fasta'. You can then uelete the
'alignmentX' folueis.

Those who aie useu to 0nix anu sequence analysis may piefei to use a
commanu-line baseu solution foi oiueiing contigs. We iecommenu Abacas
(http:abacas.souicefoige.netinuex.html), which iequiies installation of
N0Nmei (http:mummei.souicefoige.net), Peil anu BioPeil.

The commanu foi oiueiing against a iefeience genome is (assuming you have
attacheu Abacas |veision 1.S.1j to the $path enviionment anu ienameu the
contigs file fiom the fiist exeicise to 'SRR29277u_unoiueieu.fasta' fiist):

abacas.1.3.1.pl r NC_011748.fasta -q
SRR292770_unordered.fasta p nucmer c m b o
SRR292770.fasta

0sing eithei methou, you shoulu enu up with a set of contigs oiueieu against the
iefeience stiain in multi-fasta foimat in a file calleu 'SRR29277u.fasta'. This is
the file to use foi the following steps.
Compaiative uenomics Tutoiial p1u
1.4.1 V|ew|ng the ordered cont|gs (Mauve)
To examine the newly oiueieu contigs, we pioviue two u0I-baseu appioaches.
Foi the fiist, both the piogiam Nauve anu instiuctions foi the compaiison
methou aie as uetaileu below, albeit with a few minoi (but impoitant) changes.

In this example, we will geneiate a multiple alignment of the oiueieu contigs
fiom the 01u4:B4 outbieak genome, the EcSS989 genome useu as the iefeience
foi oiueiing, anu anothei assembly cieateu using moie ieau sets than oui uiaft
genome, anu a uiffeient assemblei. This alteinative assembly of stiain TY-2482
(NCBI accession AFvRu1) is available foi uownloau heie
http:www.ncbi.nlm.nih.govTiaceswgs.val=AFvRu1 in fastq gzip foimat ='5
the uownloau tab. Aftei uownloauing, unzip the file befoie continuing. B4.%4
/C0* ),/%4&)/05% )**%(+,- /' /C% D:EEFGF 4%A%4%&:% 6%&'(% A04*/ H ?*% /C%
0&*/4?:/0'&* 34'50.%. )+'5%#

Instiuctions:

1. Launch the Nauve application

2. Fiom the File menu, select 'Align with piogiessiveNauve.'

S. A uialogue box shoulu appeai, with a box labelleu 'Sequences to align:'.
Click the button below the box 'Auu Sequence.' anu navigate to youi
oiueieu contigs file, 'SRR29277u.fasta'.

4. Click the 'Auu Sequence.' button again anu navigate to the fasta file of a
genome you wish to align. In this case, we will stait with the alteinative
assembly, 'AFvRu1.fasta'. If you pioviue a multi-fasta file containing
contigs, Nauve will concatenate these togethei befoie iunning the
alignment.

S. Repeat step 4 to auu any othei sequences of inteiest. In oui example we
will just auu the EAEC genome EcSS989.

6. Now we neeu to specify the output file. Click the button maikeu '.' to
select an output file. Navigate to the uiiectoiy in which you want the
output to appeai. Now specify a name foi the output file (e.g.
'mauve_output'), anu click 'Save'.
Compaiative uenomics Tutoiial p11



7. Click 'Align.' to iun the alignment. This might take half an houi oi so. A
new winuow shoulu appeai maikeu 'Nauve Console' wheie the piogiess
of the iun will be uisplayeu, incluuing any eiioi messages (example
below).

If you encountei eiiois, check that you have specifieu the iight files foi
input - they shoulu all be fasta oi multi-fasta sequence files, anu can
incluue up to one genome in uenbank foimat (to pioviue an annotation).

8. When the alignment is finisheu, the visualization tool will appeai. To
simplify the image a little, select view -> Style -> uncheck 'LCB connecting
lines'. It shoulu look like this:
Compaiative uenomics Tutoiial p12


Row 1 = 01u4 oiueieu contigs.
Row 2 = alteinative assembly
Row S = EcSS989 (EAEC) genome
Colouieu blocks inuicate iegions of sequence with homology in the othei
genomes.
Reu lines inuicate contig bounuaiies.

Notice the similaiity in the oiueis of oui velvet assembly anu the
alteinative assembly. Both assemblies contain contigs that uon't map to
the iefeience genome.

You can save a static image of what you aie viewing by selecting Tools ->
Expoit -> Expoit image.
Compaiative uenomics Tutoiial p1S
1.4.2 V|ew|ng the ordered cont|gs (AC1)
We will now use ACT to compaie the same thiee genomes, oui 01u4:B4
assembly, the alteinative assembly anu the iefeience genome EcSS989. Note
both assemblies shoulu have been oiueieu against EcSS989 as outlineu above.

Betails of uownloauing anu using ACT aie given below (2.S.1).

>&3?/*= ACT can uisplay paiiwise compaiisons between genomes. To uo this it
neeus the genome sequences themselves (in fasta foimat oi annotateu sequence
foimat such as uenbank oi ENBL files) anu a compaiison file. Compaiison files
can be cieateu on youi computei if you have BLAST installeu, oi using an online
tool like WebACT (http:www.webact.oig) oi BoubleACT (http:www.hpa-
bioinfotools.oig.ukpiseuouble_act.html), see steps 1-2 below.


>&*/4?:/0'&*
0se the instiuctions foi using ACT below to:
1. ueneiate fiist a single fasta file foi the two assemblies (step 1, 2.S.1).
2. ueneiate a compaiison file foi oui 01u4:B4 assembly against both
EcSS989 anu the alteinative assembly sepaiately. (step 2, 2.S.1)
S. view the compaiison(s) in ACT.
a. Launch the ACT application
b. Select File -> 0pen
c. Initially, boxes foi 2 sequence files anu 1 compaiison file will be
uisplayeu. Click 'moie files.' to cieate boxes foi a seconu compaiison
file anu a thiiu sequence file.
u. Click the 'Choose.' buttons to select each of youi two sequence files
anu youi compaiison file. Note that you can loau in youi multi-fasta
contigs files at this point foi the !" $%&' 01u4:B4 anu alteinative
assembly. We want the !" $%&' 01u4:B4 assembly in the miuule, with
the compaiisons to EcSS989 anu the alteinative assembly above anu
below it, like this:


Compaiative uenomics Tutoiial p14

e. The compaiison between the thiee genomes will be uisplayeu. See the
ACT manual foi uetails of how to navigate aiounu the viewei. Beie, we
aie compaiing the new !" $%&' 01u4:B4 genome assembly (miuule)
with EcSS989 (top) anu the alteinative assembly (bottom). We have
zoomeu out by clicking the uown aiiow at the bottom iight of the
winuow. Since oui contigs weie oiueieu against the EcSS989 genome,
all the !" $%&' 01u4:B4 contigs with no homology to EcSS989 ('")" no
colouieu bais linking them to EcSS989) appeai at the enu of the
sequence. Some of these contigs uo map to the genome of the
alteinative assembly. Also note that theie is much highei homology
between oui 01u4 assembly anu the alteinative one.




Compaiative uenomics Tutoiial p1S
1.S Mauve Assemb|y Metr|cs - Stat|st|ca| V|ew of the Cont|gs

<%+*0/%=
http:coue.google.compngoptwikiBow_To_Scoie_uenome_Assemblies_with
_Nauve

2'(3)/0+0,0/-= Available foi Winuows, Nac 0S X, anu Linux veisions of Nauve,
but see the text foi moie uetails. It also iequiies the R statistical piogiam to be
installeu. See above link foi moie uetails.

@%A%4%&:%= 1. Bailing, A. E., )1 5&., "Nauve assembly metiics"G ;'%'*4%0+51'$-,
2u11. 1u.1u9Sbioinfoimaticsbti4S1.

>&3?/*= 0iueieu contigs file (multi-fasta foimat), iefeience genome (moie
closely ielateu the bettei)

>&*/),,)/0'& &'/%= The authois inuicate that it is possible to use Nauve Assembly
Netiics via the Nauve u0I tool when only a single paiiwise compaiison is iun,
but as they uo not pioviue specific instiuctions, we can only uesciibe how to uo
so foi the Nac 0S X. The following may also woik foi the Linux veision, but has
not been testeu by us. 0nfoitunately, we uo not yet have a solution foi installing
Nauve Assembly Netiics in the Winuows-baseu veision of the Nauve u0I.

The simplest way of installing Nauve Assembly Netiics into the u0I tool of
Nauve foi the Nac 0S X is to use the instiuctions foi installing Nauve Assembly
Netiics by sciipt (see the above website foi uetails) with one impoitant change -
euit the taiget '.umg' file to the most cuiient upuate fiom the Nauve uownloau
website. You may still have to install Nauve as an application by 'uiag-anu-uiop'.

Nauve Assembly Netiics aie only available as a u0I tool foi single paiiwise
compaiisons, )"F" between the iefeience genome anu the assembly as oiueieu
contigs. You will know if the tool is installeu successfully if a new button appeais
aftei iunning such a compaiison, as highlighteu heie with the ieu ciicle:



>&*/4?:/0'&*
In this example, we will geneiate Nauve Assembly Netiics foi the assembly we
cieateu using a complete genome fiom the outbieak, !" $%&' 01u4:B4 stiain
2u11C-S49S (NCBI accession NC_u186S8.1; uownloau NC_u186S8.fna fiom
ftp:ftp.ncbi.nih.govgenomesBacteiiaEscheiichia_coli_01u4_B4_2u11C_S49
S_uiu176127)

1. 0sing the instiuctions above, ieoiuei the SRR29277u unoiueieu contigs
to the new iefeience, 2u11C-S49S.

Compaiative uenomics Tutoiial p16
2. When the alignment has finisheu, uon't close Nauve but insteau, on the
final alignment winuow hit the Nauve Assembly Netiics button. This will
launch the iepoit winuow anu shoulu look like this:



Notice that along with the summaiy shown heie, you can also geneiate a iepoit
of SNPs anu gaps in alignment, anu save these iepoits. To unueistanu the iepoit
fully, ieau the iefeience foi Nauve Assembly Netiics given above.

Some highlights fiom oui assembly incluue an NSu of 4S,8u2 bp, with a laigest
contig of 141,S67 bp anu the smallest of 2uu bp (as expecteu, as we set that in
velvet). Notably almost S% of bases of the iefeience genome have been misseu,
though oui assembly has an extia 2% of bases, making up an extia 116 (small)
contigs that uon't align. Theie aie also some 1,1Su SNPs between the two
sequences, anu inteiestingly, oui assembly seemingly has fewei gaps (S71) than
the iefeience 2u11C-S49S (411).

Nauve Assembly Netiics can also be iun as a commanu-line tool, with the
instiuctions foi installing anu iunning the metiics tool pioviueu at the same link
as above. The auvantage of the commanu-line veision is that moie than one
assembly can be testeu foi inclusion in the same iepoit output. As we aie uealing
with a single assembly in this tutoiial, we leave this as an exeicise foi those so
inteiesteu.
Compaiative uenomics Tutoiial p17
1.6 Annotat|on us|ng kAS1

<%+*0/%= http:iast.nmpui.oig (neeu to iegistei to use the seivice)

2'(3)/0+0,0/-= Nost web biowseis.
This tutoiial was cieateu using RAST veision 4 on the Fiiefox web biowsei
(veision 17.u).

@%A%4%&:%= Aziz, R. K., )1 5&", "The RAST Seivei: iapiu annotations using
subsystems technology"G ;A. ()*%+'$-, 2uu8. 1u.11861471-2164-9-7S.

>&3?/*= 0iueieu contigs file (multi-fasta foimat)

>&*/4?:/0'&*
In this example, we will geneiate a uenBank annotation foi the newly assembleu
anu oiueieu contigs of the !" $%&' 01u4:B4 stiain in multifasta foimat (use the
contigs oiueieu against EcSS989, as we have uone). I'? (?*/ A04*/ 4%60*/%4 )
A'4 ) @JK9 ?*%4 )::'?&/#

1. uo to http:iast.nmpui.oig in a web biowsei anu log into youi account.

2. 0nuei the 'Youi }obs' tab (top left coinei) select '0ploau New }ob'.

S. You shoulu be taken to a page titleu '0ploau youi genome'. At the bottom
of the page theie is a box labelleu 'File 0ploau:' click the button anu
navigate to youi oiueieu contigs file ('SRR29277u.fasta'). Then hit the
'0se this uata anu go to step 2' button. This may take a little while as it is
uploauing youi sequence file ovei youi inteinet connection.

4. Eventually the next page will open with the same heauing as the last, with
the sub-heauing 'Review genome uata', anu some contig statistics. You
will be askeu to entei fuithei uetails about the oiganism. In the fiist fielu,
labelleu 'Taxonomy IB', entei the coue foi !" $%&' (S62), anu hit the 'Look
up taxonomy IB at NCBI' button. This will populate the iest of the fielus
foi you, except the last, the stain. Entei 'TY-2482' into the space foi the
stiain, anu then hit the '0se this uata anu go to step S' button.

(Note if you weie uoing this with something othei than !" $%&', you can
finu the iight taxonomy IB at http:www.ncbi.nlm.nih.govtaxonomy).

S. The next page shoulu have sub-heauing 'Complete 0ploau'. You can entei
optional infoimation (Sequencing Nethou = 'othei', Coveiage = '>8x',
Numbei of contigs = "1u1-Suu"), but this is not necessaiy to use RAST.
The othei options foi the RAST annotation pipeline shoulu at least be
consiueieu, though we will use the uefault options as shown when the
page fiist loaus. Youi final page shoulu look like this:
Compaiative uenomics Tutoiial p18


If it uoes, hit the 'Finish the uploau' to stait the job. Youi job will join the
submission queue, anu you will be sent an email (to the auuiess you useu
to iegistei) when the job is completeu. This coulu take a half a uay oi even
much longei, uepenuing on the numbei of jobs in the queue befoie you.

6. 0nce you ieceive the completion email fiom the Annotation Seivei, click
on the link in the email to ietuin to the RAST seivei (if you have loggeu
out, you will have to log back in to continue). This time select '}obs
0veiview' unuei the 'Youi }obs' tab.


Compaiative uenomics Tutoiial p19
7. This will open the }obs 0veiview page, wheie you will see a list of youi
jobs with a numbei of uetails anu the status of the job. Click on the '| view
uetails j' link foi the job (in the 'Annotation Piogiess' column, unuei the
gieen piogiess bais).

8. This opens the "}ob Betails' page anu will incluue the available uownloaus
if the job has completeu. Select 'uenbank (EC numbeis stiippeu)' anu then
hit 'Bownloau'. The file will be call 'S62.<job_no.>.ec-stiippeu.gbk' -
change this to 'SRR29277u.gbk' anu move the file to youi woik foluei
(wheie 'SRR29277u.fasta' is locateu).

1.6.1 A|ternat|ves to kAS1
A numbei of commanu-line tools aie available foi annotation on a local machine.
Foi fast 7) *%=% annotation we iecommenu tiying Piokka
(http:www.vicbioinfoimatics.comsoftwaie.piokka.shtml), though Piokka in
tuin ielies on the installation of a long list of othei piogiams (see the link foi
uetails). Foi those inteiesteu in compaiative annotation, you coulu tiy Bu7
(http:bg7.ohnosequences.com). 0theiwise, you now have an annotateu uiaft
genome foi !" $%&' 01u4:B4 stiain TY-2482, anu can move on to the compaiative
genome analysis that follows.
Compaiative uenomics Tutoiial p2u
2. Comparat|ve genome ana|ys|s

2.1 Down|oad|ng !" $%&' genome sequences for comparat|ve ana|ys|s
In this pait of the tutoiial, we will compaie oui !" $%&' 014:B4 genome assembly
to othei !" $%&' using vaiious softwaie packages on oui computei. You will neeu
to uownloau the piogiams fiom the web using the links given in each section. In
auuition you will neeu to uownloau some !" $%&' uata foi compaiison.

Foi the Nauve anu ACT compaiisons, we will use these:

(This one we have alieauy useu above)
EAEC sti. EcSS989 (NC_u11748) - uownloau NC_u11748.fna fiom
ftp:ftp.ncbi.nih.govgenomesBacteiiaEscheiichia_coli_SS989_uiuS9S8S

EBEC 01S7:B7 sti. EBL9SS (NC_uu26SS) - uownloau NC_uu26SS.fna fiom
ftp:ftp.ncbi.nih.govgenomesBacteiiaEscheiichia_coli_01S7_B7_EBL9SS_uiu
S78S1

To compaie the 01u4:B4 genome to enteiohaemoiihagic !" $%&' (EBEC) anu
enteiopathogenic !" $%&' (EPEC), we will also use these genomes:

- EPEC 026:B11 sti. 11S68 - NC_u1SS61.gbk fiom
ftp:ftp.ncbi.nih.govgenomesBacteiiaEscheiichia_coli_026_B11_11S68_uiu4
1u21
- EPEC 0127:B6 sti. E2S4869 - NC_u116u1.gbk fiom
ftp:ftp.ncbi.nih.govgenomesBacteiiaEscheiichia_coli_0127_B6_E2S48_69_u
iuS9S4S
- aEPEC 0111:B9 sti. E11uu19 - AA}Wu2.gbk fiom
http:www.ncbi.nlm.nih.govTiaceswgs.val=AA}Wu2
- EBEC 01S7:B7 sti. TW14SS9 - NC_u1Suu8.gbk fiom
ftp:ftp.ncbi.nih.govgenomesBacteiiaEscheiichia_coli_01S7_B7_TW14SS9_u
iuS92SS
- EBEC 01S7:B7 sti. EC411S - NC_u11SSS.gbk fiom
ftp:ftp.ncbi.nih.govgenomesBacteiiaEscheiichia_coli_01S7_B7_EC411S_uiu
S9u91
- EBEC 01S7:B7 sti. Sakai - NC_uu269S.gbk fiom
ftp:ftp.ncbi.nih.govgenomesBacteiiaEscheiichia_coli_01S7_B7_Sakai_uiuS7
781
- Stx (shiga-toxin, oi veiotoxin) piophage vT2
http:www.ncbi.nlm.nih.govnuccoieS881S92
- LEE pathogenicity islanu
http:www.ncbi.nlm.nih.govnuccoie2897961
Compaiative uenomics Tutoiial p21
2.2 Mauve - for mu|t|p|e genome a||gnment

We have alieauy intiouuceu Nauve above, foi oiueiing contigs anu inspecting
assembly statistics.

In this example, we will geneiate a multiple alignment of the newly assembleu
anu annotateu 01u4:B4 outbieak genome (uenBank foimat) with an EBEC
chiomosome anu the chiomosome of EAEC stiain EcSS989 (fasta). We will then
view the alignment anu use it to inspect genes that aie annotateu in the outbieak
genome but missing fiom the othei pathogen chiomosomes.

<%+*0/%= http:asap.ahabs.wisc.euumauve (Incluues uownloau links,
installation instiuctions anu usei guiue)

2'(3)/0+0,0/-= }ava baseu, available foi Winuows, Nac 0S X, anu Linux
This tutoiial was cieateu using Nauve 2.S.1 on Nac 0S X.

@%A%4%&:%= Bailing, A. E., Nau, B. anu Peina, N. T., "piogiessiveNauve: multiple
genome alignment with gene gain, loss anu ieaiiangement". BC%D E*), 2u1u.
S(6): e11147

>&3?/*= uenome sequence files (fasta foimat) anu up to one annotateu genome
sequence (uenbank foimat).

>&*/4?:/0'&*

1. Launch the Nauve application

2. Fiom the File menu, select 'Align with piogiessiveNauve.'

S. A uialogue box shoulu appeai, with a box labelleu 'Sequences to align:'.
Click the button below the box 'Auu Sequence.' anu navigate to youi
annotateu genome (uenBank file geneiateu by RAST).

4. Click the 'Auu Sequence.' button again anu navigate to the fasta file of a
genome you wish to align. In this case, we will stait with the genome of
the EBEC 01S7:B7 stiain EBL9SS (NC_uu26SS.fna). If you pioviue a
multi-fasta file containing contigs, Nauve will concatenate these togethei
befoie iunning the alignment.

S. Repeat step 4 to auu any othei sequences of inteiest. In oui example we
will just auu the EAEC genome EcSS989.

6. Now we neeu to specify the output file. Click the button maikeu '.' to
select an output file. Navigate to the uiiectoiy in which you want the
output to appeai. Now specify a name foi the output file (e.g.
'mauve_output') anu click 'Save'.
Compaiative uenomics Tutoiial p22

7. Click 'Align.' to iun the alignment. This might take half an houi oi so. A
new winuow shoulu appeai maikeu 'Nauve Console' wheie the piogiess
of the iun will be uisplayeu, incluuing any eiioi messages.

If you encountei eiiois, check that you have specifieu the iight files foi
input - they shoulu all be fasta oi multi-fasta sequence files, anu can
incluue up to one genome in uenbank foimat (to pioviue an annotation).


Compaiative uenomics Tutoiial p2S

8. When the alignment is finisheu, the visualization tool will appeai. To
simplify the image a little, select view -> Style -> uncheck 'LCB connecting
lines'. It shoulu look like this:



Row 1 = annotateu 01u4 genome.
Row 2 = EBEC genome
Row S = EAEC genome
Colouieu blocks inuicate iegions of sequence with homology in the othei
genomes.
Reu lines inuicate contig bounuaiies.

You can save a static image of what you aie viewing by selecting Tools ->
Expoit -> Expoit image.


9. Notice the EBEC genome has moie 'white space', '")" sequences not in
homology blocks, meaning these sequences aie missing fiom the new
01u4:B4 genome anu EcSS989. The othei genomes have fewei white
blocks, as they shaie a lot of theii genome sequence.

To see what the 'unique' sequences aie in the 01u4:B4 assembly, zoom in
by clicking the '+' magnifying glass at the top of the winuow until you see
boxes appeai unuei the 01u4 sequence; these aie annotateu genes.



Scioll aiounu to a iegion that is not within a colouieu block, anu mouse-
ovei a gene to see its annotation. In oui example, we aie looking at a
iegion of sequence in which IncI1 plasmiu genes have been annotateu. So,
we know the 01u4:B4 genome assembly contains an IncI1 plasmiu.

Compaiative uenomics Tutoiial p24
2.3 AC1 - for deta||ed pa|rw|se genome compar|sons

<%+*0/%= http:www.sangei.ac.ukiesouicessoftwaieact (To uownloau,
click the 'Bownloaus' tab) anu look foi the FTP uownloau link foi youi opeiating
system. Note that the uownloau shoulu contain Aitemis as well as ACT.

2'(3)/0+0,0/-= }ava baseu, available foi Winuows, Nac 0S X, anu Linux
This tutoiial was cieateu using ACT veision 11.u.u on Nac 0S X.

@%A%4%&:%= Caivei T. },, Rutheifoiu, K. N,, Beiiiman, N,, Rajanuieam, N. A.,
Baiiell, B. u. anu Paikhill }., "ACT: the Aitemis Compaiison ToolL# ;'%'*4%0+51'$-8
2uuS. 21:S422-S. 1u.1u9SbioinfoimaticsbtiSSS

>&3?/*= ACT can uisplay paiiwise compaiisons between genomes. To uo this it
neeus the genome sequences themselves (in fasta foimat oi annotateu sequence
foimat such as uenbank oi ENBL files) anu a compaiison file. Compaiison files
can be cieateu on youi computei if you have BLAST installeu, oi using an online
tool like WebACT (http:www.webact.oig) oi BoubleACT (http:www.hpa-
bioinfotools.oig.ukpiseuouble_act.html), see steps 1-2 below.



>&*/4?:/0'&*

In this example we will visualize a compaiison of oui newly assembleu anu
oiueieu !" $%&' 01u4:B4 TY-2482 contigs against enteioaggiegative !" $%&'
EcSS989 (accession NC_u11748) anu the EBEC genome EBL9SS (accession
NC_uu26SS).

2.3.1 Generat|ng compar|son f||es for AC1
1. To geneiate the compaiison file, you will neeu to have both of the genome
sequences in single-fasta foimat. ueneiation of the compaiison will not
woik with multi-fasta sequences such as those containing seveial contig
sequences as output by velvet oi othei assemblies. So, we fiist neeu to
change the multi-fasta contig sequences file into a fasta file with a single
entiy, which incluues all of oui contig sequences concatenateu togethei
into one big sequence. An easy way to uo this is to open the contig file in
Aitemis fiist.
a. Launch Aitemis
b. Select File -> 0pen
c. Navigate to the location of youi contig file in fasta foimat anu click
'0pen'. The contig sequences shoulu be uisplayeu, with the
bounuaiies of each contig maikeu up as a featuie anu colouieu in
alteinative oiangebiown colouis.

Compaiative uenomics Tutoiial p2S


u. To wiite out the concatenateu contig sequences to a single-entiy
fasta file, select File -> Wiite -> All Bases -> FASTA Foimat anu
save name the new file something like 'genomeXX_single.fasta' so
you can easily iuentify this as a single-entiy file.






Compaiative uenomics Tutoiial p26
2. ueneiate a compaiison between youi single-entiy fasta files by one of the
following methous:

a. If you have BLAST installeu locally on youi computei, open up a
teiminal anu type:

makeblastdb -in NC_011748.fasta -dbtype nucl

blastn -query SRR292770_single.fasta -db
NC_011748.fasta -evalue 1 -task megablast -outfmt 6
> SRR292770_NC_011748.crunch

b. If you piefei to use a web-baseu tool, go to the WebACT site
(http:www.webact.oig) anu click the 'ueneiate' tab at the top
of the page. 0nuei 'Sequence 1' paste in the accession foi youi
iefeience genome, e.g. NC_u11748. 0nuei 'Sequence 2' click the
'Biowse' button anu navigate to youi single-entiy genome
sequence file to compaie. Click 'Submit'. It may take a while to
uploau youi sequence (1-1u minutes), anu a while longei foi the
iesults to be ietuineu (1-6u minutes).

When WebACT is finisheu, you will see a Results scieen. Click
'Bownloau files'. Entei a file name (a sensible choice is something
that incluues the full iuentifiei of both genomes being compaieu,
e.g. SRR29277u_NC_u11748.zip), uownloau the file anu unzip it.
Insiue will be a set of files incluuing the input sequences; the
compaiison file is the one nameu ''. Rename it to something moie
infoimative (e.g. SRR29277u_NC_u11748.ciunch) anu copy it to
the uiiectoiy with youi sequence files. (You can now uelete the
iest of the WebACT output.)

c. You can also tiy the BoubleACT website (http:www.hpa-
bioinfotools.oig.ukpiseuouble_act.html). Click the 'Biowse.'
buttons to uploau youi single-entiy genome sequence file anu the
iefeience file foi compaiison, then click the 'Blastn' iauio button,
entei youi email auuiess anu click 'Run genome blast'.

When the compaiison file is cieateu, you will ieceive an email with
a link to uownloau the iesults. The compaiison file is the one
nameu 'genome_blast.iesult'. Right-click to save it to youi
computei in the same uiiectoiy as youi sequence files, anu name it
something moie infoimative (e.g. SRR29277u_NC_u11748.ciunch).

We geneiateu compaiison files foi the !" $%&' 01u4 assembly vs EcSS989
(accession NC_u11748), anu foi the !" $%&' 01u4 assembly vs EBEC
genome EBL9SS (accession NC_uu26SS).

Compaiative uenomics Tutoiial p27
2.3.2 V|ew|ng genome compar|sons |n AC1

1. Launch the ACT application

2. Select File -> 0pen

S. Initially, boxes foi 2 sequence files anu 1 compaiison file will be
uisplayeu. Click 'moie files.' to cieate boxes foi a seconu compaiison
file anu a thiiu sequence file.

4. Click the 'Choose.' buttons to select each of youi two sequence files
anu youi compaiison file. Note that you can loau in youi multi-fasta
contigs file at this point foi the !" $%&' 01u4:B4 genome. We want the
!" $%&' 01u4:B4 assembly in the miuule, with the compaiisons to
EcSS989 anu EBEC above anu below it, like this:





S. The compaiison between the two genomes will be uisplayeu. See the
ACT manual foi uetails of how to navigate aiounu the viewei. Beie, we
aie compaiing the new !" $%&' 01u4:B4 genome assembly (bottom)
with EcSS989 (top). We have zoomeu out by clicking the uown aiiow
at the bottom iight of the winuow. Since oui contigs weie oiueieu
against the EcSS989 genome, all the !" $%&' 01u4:B4 contigs with no
homology to EcSS989 (i.e. no colouieu bais linking them to EcSS989)
appeai at the enu of the sequence.

Compaiative uenomics Tutoiial p28


Zoom into this iegion by clicking on one of the unmappeu contigs
in this aiea anu then clicking the up aiiow to the siue of the
01u4:B4 sequence.

Compaiative uenomics Tutoiial p29
2.4 8kIG - V|sua||z|ng reference-based compar|sons of mu|t|p|e sequences

M'N&,'). A4'(= http:biig.souicefoige.net. The site contains uownloau links,
installation instiuctions, a manual anu a tutoiial which you may finu useful.

2'(3)/0+0,0/-= }ava baseu, available foi Winuows, Nac 0S X, anu Linux
This tutoiial was cieateu using BRIu veision u.9S on Nac 0S X.

OM%3%&.%&:0%*= BRIu also iequiies BLAST be installeu on youi computei.
You can uownloau BLAST+ fiom
ftp:ftp.ncbi.nlm.nih.govblastexecutablesblast+LATEST. Ensuie you select
the file that matches youi opeiating system, e.g. 'ncbi-blast-x.x.x+-univeisal-
macosx.tai.gz' foi Nac 0S X oi 'ncbi-blast-2.2.27+-winS2.exe' foi Winuows.

@%A%4%&:%= Alikhan, N. F., Petty, N. K., Ben Zakoui, N. L. anu Beatson, S. A.,
"BLAST Ring Image ueneiatoi (BRIu): simple piokaiyote genome compaiisons",
;A. ()*%+'$-, 2u11. 12:4u2. PNIB: 2182442S

>&*/4?:/0'&*
N0TE: If you have not useu BRIu befoie, you will piobably finu it useful to woik
thiough the BRIu tutoiial available at http:biig.souicefoige.netbiig-tutoiial-
1-whole-genome-compaiisons befoie woiking thiough the iest of oui example.
1. Select youi iefeience sequence anu the location of youi queiy sequences.
In this analysis, we will use oui ue novo assembleu !" $%&' 01u4-B4 contig
sequences as the iefeience, anu the EBEC anu EPEC genomes as queiies
(see uownloau links in 2.1). We will also incluue the sequences foi the
Stx2 phage anu the LEE pathogenicity islanu.


Compaiative uenomics Tutoiial pSu
2. Click 'Next' to be taken to the 'Customize iings' winuow. This is wheie you
can specify which queiy sequences you want to be iepiesenteu by iings,
anu the oiuei anu coloui they will be uisplayeu.

S. Finu the EcSS989 sequence in the 'uata pool' box anu click 'Auu uata'.
Click the colouieu box anu change to coloui to ieu. In the box maikeu
'Legenu text:' type in a name foi this iing, e.g. '1: EcSS989'.

4. Click 'Auu new iing'. Now finu the EBEC genome in the uata pool anu
click 'Auu uata'. Set the coloui to puiple anu change the legenu text to '2:
EBEC sti 1'.



S. Repeat step S with the iemaining EBEC anu EPEC genomes. We useu the
4 EBEC genomes listeu in this tutoiial unuei 'Bownloaus' anu colouieu
them all puiple, two EPEC genomes colouieu blue, anu one atypical EPEC
genome colouieu gieen.






Compaiative uenomics Tutoiial pS1
6. uo to the Piefeiences menu anu select Image 0ptions. 0nuei the 'ulobal
settings' tab change the 'Wiuth' fielu to 2Suu. This will make the image
canvas wiue enough to uisplay the legenu text next to the iing image,
without obscuiing the image itself. Click 'Save & close'.


7. Click 'Next' anu entei a title foi youi image (this will be piinteu in the
miuule of the ciiculai uiagiam). Click 'Biowse' anu navigate to wheie you
want the output to be saveu, then type in a name foi the output file (this
will be a single image file) anu select the foimat foi the image (e.g. png).
Click 'Submit'.

Compaiative uenomics Tutoiial pS2

8. While BRIu is iunning, it will piint uetails of its piogiess on the console
within the same winuow wheie you just piesseu 'Submit'. When it tells
you it has finisheu, go to wheie you askeu foi the output to be saveu anu
open the image file to view the iesult.


It is easy to see that, in teims of gene content, the novel 01u4:B4
outbieak stiain is closest to EAEC stiain EcSS989 (ieu), then the atypical
EPEC stiain E11uu19 (gieen). Theie aie seveial iegions of the outbieak
stiain's sequence that aie missing fiom the EBEC anu EPEC stiains.
Compaiative uenomics Tutoiial pSS

9. An alteinative way to make the compaiison is to use an EBEC genome as
the iefeience sequence, to see how much of the chaiacteiistic EBEC
sequences aie piesent in the outbieak genome. Click the 'Piev' button to
get to the 'Customize iings' winuow, then click 'Piev' again to get to the
input uata winuow. Change the iefeience sequence to EBEC stiain
EBL9SS anu click 'Next'. Now change iing 2 to be the new 01u4:B4
genome... Click on 'Ring 2' in the list of iings in the fiist box; click the olu
EBEC stiain in the 'uata' box anu click 'Remove uata'; finu the 01u4:B4
file in the uata pool anu click 'Auu uata'. Change the Legenu text to '2:
01u4:B4' anu change the iing coloui to black. We also auueu the Stx2
phage (oiange) anu the LEE pathogenicity islanu (puiple), to make it easy
to see wheie these iegions aie. Click 'Next' anu change the image title anu
output filename to inuicate that the iefeience sequence is an EBEC stiain,
then click 'Submit' to geneiate the image.




1u. Tiy changing the iefeience stiain to the EcSS989, EPEC oi atypical EPEC
genomes anu see how the figuies anu the inteipietations change.
Compaiative uenomics Tutoiial pS4

3. 1yp|ng and spec|a||st too|s
3.1 nAS1 - for |dent|f|cat|on of phage sequences

9-3%= Web seivice
P@Q= http:phast.wishaitlab.com
@%A%4%&:%= Zhou, Y., )1 5&", "PBAST: a fast phage seaich tool"" H/$&)'$ 9$'7-
,)-)50$I, 2u11. 1u.1u9Snaigki48S.

>&3?/= Contigs in FASTA foimat (single oi multiple fasta)

B?/3?/*=
- Summaiy Table (summaiising the location of piophage sequences)
- Betaileu Table (giving the locations of inuiviuual genes within the piophage)
- Ciiculai genome map (showing the locations of piophages within the genome)
- Lineai maps of each piophage (showing the inuiviuual genes)
See the PBAST website foi moie uetaileu uocumentation.
3.2 kesI|nder - for |dent|f|cat|on of res|stance gene sequences

9-3%= Web seivice
P@Q= http:www.cbs.utu.ukseivicesResFinuei
@%A%4%&:%= Zankaii, E., )1 5&", "Iuentification of acquiieu antimiciobial iesistance
genes"" J 9*1'+'$0%3'5& .I)+%1I)0, 2u12. 1u.1u9Sjacuks261.

>&3?/= Contigs in FASTA foimat oi ieaus (will assemble fiist).

B?/3?/*= List of iesistance genes iuentifieu within the sequences
3.3 Mu|t||ocus sequence typ|ng

9-3%= Web seivice
P@Q= http:cge.cbs.utu.ukseivicesNLST
@%A%4%&:%= Laisen, N. v., )1 5&", Nultilocus Sequence Typing of Total uenome
Sequenceu Bacteiia" J .&'* A'$0%3'%&, 2u12. 1u.1128}CN.u6u94-11.

>&3?/= Contigs in FASTA foimat oi ieaus (will assemble fiist).
R)4)(%/%4*= Select the NLST uatabase to queiy.
B?/3?/*= Top hitting alleles foi each locus useu in the NLST scheme, anu the
sequence type (ST) assigneu to that combination of alleles.
3.4 A1kIC - on||ne genome compar|son too|
Foi an intiouuction to what PATRIC can uo, tiy looking at theii analysis of the !"
$%&' 01u4 genome, posteu at http:enews.patiicbic.oig1172e-coli-outbieak-
new-compiehensive-compaiisons

You might also like