Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data

Vasanthan Jayakumar, Yasubumi Sakakibara

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms.

Original languageEnglish
Article numberbbx147
Pages (from-to)866-876
Number of pages11
JournalBriefings in Bioinformatics
Volume20
Issue number3
DOIs
Publication statusPublished - 2017 Nov 2

Fingerprint

Genes
Genome
Genome Size
Sequence Analysis
Technology
Research
Datasets

Keywords

  • assembly evaluation
  • de novo assembly
  • PacBio SMRT
  • single-molecule sequencing
  • third-generation sequencing

ASJC Scopus subject areas

  • Information Systems
  • Molecular Biology

Cite this

Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data. / Jayakumar, Vasanthan; Sakakibara, Yasubumi.

In: Briefings in Bioinformatics, Vol. 20, No. 3, bbx147, 02.11.2017, p. 866-876.

Research output: Contribution to journalArticle

@article{fa6378e71298420285df5e4ab381f012,
title = "Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data",
abstract = "Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms.",
keywords = "assembly evaluation, de novo assembly, PacBio SMRT, single-molecule sequencing, third-generation sequencing",
author = "Vasanthan Jayakumar and Yasubumi Sakakibara",
year = "2017",
month = "11",
day = "2",
doi = "10.1093/bib/bbx147",
language = "English",
volume = "20",
pages = "866--876",
journal = "Briefings in Bioinformatics",
issn = "1467-5463",
publisher = "Oxford University Press",
number = "3",

}

TY - JOUR

T1 - Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data

AU - Jayakumar, Vasanthan

AU - Sakakibara, Yasubumi

PY - 2017/11/2

Y1 - 2017/11/2

N2 - Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms.

AB - Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms.

KW - assembly evaluation

KW - de novo assembly

KW - PacBio SMRT

KW - single-molecule sequencing

KW - third-generation sequencing

UR - http://www.scopus.com/inward/record.url?scp=85056804052&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056804052&partnerID=8YFLogxK

U2 - 10.1093/bib/bbx147

DO - 10.1093/bib/bbx147

M3 - Article

C2 - 29112696

AN - SCOPUS:85056804052

VL - 20

SP - 866

EP - 876

JO - Briefings in Bioinformatics

JF - Briefings in Bioinformatics

SN - 1467-5463

IS - 3

M1 - bbx147

ER -