You are on page 1of 20

Pre-Conference Workshop

DNA metabarcoding introduction

What is Next Generation Sequencing (NGS)?


New (relatively) methods of sequencing in
massively parallel fashion that enables rapid
sequencing of large numbers of DNA base pairs
The sample can contain a population of DNA
molecules (from multiple individuals)
Complete within hours

Compared to Sanger Sequencing?


Each sample must contain a single template
(single individual)
Complete within days to weeks (depending on
amount of base pairs you require i.e. number of
individuals used for sequencing)

Problems with Sanger sequencing of DNA


barcode amplicons (PCR products):
Needs relatively high amplicon yield
Coamplification of nuclear mitochondrial
pseudogenes (numts)
Confusion with sequences from intracellular
endosymbiotic bacteria (e.g. Wolbachia)
Instances of intra-individual variability
(heteroplasmy)

NGS is able to:


Detect non-targeted species, bacterial sequences,
heteroplasmy
Simplify the lab protocol
Speed things up
Reduce cost per barcode
(excluding cost of sequencer machine)

Metabarcoding
= mass-trapping + mass-PCR-amplification + NGS +
bioinformatics
Parallel acquisition of DNA barcode sequences
(SHORT FRAGMENTS!!!) from hundreds of
specimens simultaneously
Standardised collection and lab techniques

Malaise trap metabarcoding


INSECT
COLLECTION

INSECT
SORTING

BULK DNA EXTRACT

ILLUMINA HTS

BULK PCR

But NGS can also be used for DNA


barcode reference library construction
Requires extensive use of MID tags coming later

NGS Machines
Conventional
platform
Higher sequencing
throughput
Illumina Hiseq, SoLiD

Benchtop platform
More suitable for
targeted sequencing
Illumina MiSeq, Ion
Torrent PGM, Roche454, PacBio

NGS Workflow
Workflow is divided into 4 parts:
Library preparation
Collection of DNA fragments of interest, PCR,
attachment of adapters and MIDs

Template preparation
Preparing fragments for sequencing

Sequencing
By synthesis of complementary strand

Bioinformatics

Library Preparation

Template Preparation
Prepare many identical copies from single
fragment ensure sufficient signal is generated
during sequencing

Bridge amplification
(Illumina)
Cluster generation

Sequencing
By synthesis of the complementary strand
Mediated by polymerase/ligase
Cycle of sequencing flow:
Flow of nucleotides(nt)/reagents

Incorporate nt

new cycle begins

Signal capture

Wash away excess nt/reagents

Detection by light emission/post-light


Fluorescence: Illumina
Chemiluminiscence: Roche

MID Tags (also confusingly sometimes known as barcodes)


Multiplex Identifiers (MID)
= short oligonucleotides designed to facilitate library
multiplexing (mixing of multiple samples which
can then be separated bioinformatically)

Methods

MID tagging

5 considerations when designing MIDs


a) Dont begin/end with the same nucleotide as
adaptor sequence
b) Dont begin/end with the same nucleotide as
the PCR amplification primer
c) Dont allow homopolymers of greater than two
nucleotides
d) Do differ from one another by at least two
nucleotides
e) Avoid successive positive incorporation of two
nucleotides during the TCAG cycle

Design of NGS workflow


a) Sequencing throughput
Amount of sequencing data depends on
i.
ii.

Number of sequencing reads


Sequence read length (number of nucleotides per read)

b) Targeted region
Size of region (e.g. 200 bp)

c) Sample multiplexing
Number of different samples to be run in the same chip
MIDs added to each sample so they can be distingushed
and sorted during data analysis

Design of NGS workflow


General specifications of the most commonly used
NGS platforms as compared to Sanger sequencing

http://journals.cambridge.org/action/displayFulltext?type=
6&fid=10028918&jid=BER&volumeId=105&issueId=06&aid
=10028917&bodyId=&membershipNumber=&societyETOC
Session=&fulltextType=RA&fileId=S0007485315000681#cj
ofig_fig02

You might also like