MetaVelvet: AAAn extension of Velvet assembler to de novo metagenome assembly from short sequence reads

Toshiaki Namiki, Tsuyoshi Hachiya, Hideaki Tanaka, Yasubumi Sakakibara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

Motivation: An important step of "metagenomics" analysis is the assembly of multiple genomes from mixed sequence reads of multiple species in a microbial community. Most conventional pipelines employ a single-genome assembler with carefully optimized parameters and post-process the resulting scaffolds to correct assembly errors. Limitations of the use of a single-genome assembler for de novo metagenome assembly are that highly conserved sequences shared between different species often causes chimera contigs, and sequences of highly abundant species are likely mis-identified as repeats in a single genome, resulting in a number of small fragmented scaffolds. The metagenome assembly problem becomes harder when assembling from very short sequence reads. Method: We modified and extended a single-genome and de Bruijngraph based assembler, known as "Velvet" [27], for short reads to metagenome assembly, called"MetaVelvet", for mixed short reads of multiple species. Our fundamental ideas are first decomposing de Bruijn graph constructed from mixed short reads into individual sub-graphs and second building scaffolds based on every decomposed de Bruijn sub-graph as isolate species genome. We make use of two features, graph connectivity and coverage (abundance) difference, for the decomposition of de Bruijn graph. Results: On simulated datasets, MetaVelvet succeeded to generate higher N50 scores and smaller chimeric scaffolds than any compared single-genome assemblers, produce high-quality scaffolds as well as the separate assembly using Velvet from isolated species sequence reads, and MetaVelvet reconstructed even relatively low-coverage genome sequences as scaffolds. On a real dataset of Human Gut microbial read data, MetaVelvet produced longer scaffolds, increased the number of predicted genes, and improved the assignments of a phylumlevel taxonomy in the sense that the rate of predicted genes that cannot be assigned to any tanoxomy is reduced. Availability The source code of MetaVelvet is freely available at http://metavelvet.dna.bio. keio.ac.jp under the GNU General Public License.

Original languageEnglish
Title of host publication2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2011
Pages116-124
Number of pages9
DOIs
Publication statusPublished - 2011
Event2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, ACM-BCB 2011 - Chicago, IL, United States
Duration: 2011 Aug 12011 Aug 3

Other

Other2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, ACM-BCB 2011
CountryUnited States
CityChicago, IL
Period11/8/111/8/3

Fingerprint

Metagenome
Genes
Genome
Scaffolds
Metagenomics
Conserved Sequence
Licensure
Taxonomies
Pipelines

Keywords

  • Assembly
  • Metagenome
  • Next generation sequencing

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Cite this

Namiki, T., Hachiya, T., Tanaka, H., & Sakakibara, Y. (2011). MetaVelvet: AAAn extension of Velvet assembler to de novo metagenome assembly from short sequence reads. In 2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2011 (pp. 116-124) https://doi.org/10.1145/2147805.2147818

MetaVelvet : AAAn extension of Velvet assembler to de novo metagenome assembly from short sequence reads. / Namiki, Toshiaki; Hachiya, Tsuyoshi; Tanaka, Hideaki; Sakakibara, Yasubumi.

2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2011. 2011. p. 116-124.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Namiki, T, Hachiya, T, Tanaka, H & Sakakibara, Y 2011, MetaVelvet: AAAn extension of Velvet assembler to de novo metagenome assembly from short sequence reads. in 2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2011. pp. 116-124, 2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, ACM-BCB 2011, Chicago, IL, United States, 11/8/1. https://doi.org/10.1145/2147805.2147818
Namiki T, Hachiya T, Tanaka H, Sakakibara Y. MetaVelvet: AAAn extension of Velvet assembler to de novo metagenome assembly from short sequence reads. In 2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2011. 2011. p. 116-124 https://doi.org/10.1145/2147805.2147818
Namiki, Toshiaki ; Hachiya, Tsuyoshi ; Tanaka, Hideaki ; Sakakibara, Yasubumi. / MetaVelvet : AAAn extension of Velvet assembler to de novo metagenome assembly from short sequence reads. 2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2011. 2011. pp. 116-124
@inproceedings{bf35e8bdc7bd414ab0911c728e8eba8b,
title = "MetaVelvet: AAAn extension of Velvet assembler to de novo metagenome assembly from short sequence reads",
abstract = "Motivation: An important step of {"}metagenomics{"} analysis is the assembly of multiple genomes from mixed sequence reads of multiple species in a microbial community. Most conventional pipelines employ a single-genome assembler with carefully optimized parameters and post-process the resulting scaffolds to correct assembly errors. Limitations of the use of a single-genome assembler for de novo metagenome assembly are that highly conserved sequences shared between different species often causes chimera contigs, and sequences of highly abundant species are likely mis-identified as repeats in a single genome, resulting in a number of small fragmented scaffolds. The metagenome assembly problem becomes harder when assembling from very short sequence reads. Method: We modified and extended a single-genome and de Bruijngraph based assembler, known as {"}Velvet{"} [27], for short reads to metagenome assembly, called{"}MetaVelvet{"}, for mixed short reads of multiple species. Our fundamental ideas are first decomposing de Bruijn graph constructed from mixed short reads into individual sub-graphs and second building scaffolds based on every decomposed de Bruijn sub-graph as isolate species genome. We make use of two features, graph connectivity and coverage (abundance) difference, for the decomposition of de Bruijn graph. Results: On simulated datasets, MetaVelvet succeeded to generate higher N50 scores and smaller chimeric scaffolds than any compared single-genome assemblers, produce high-quality scaffolds as well as the separate assembly using Velvet from isolated species sequence reads, and MetaVelvet reconstructed even relatively low-coverage genome sequences as scaffolds. On a real dataset of Human Gut microbial read data, MetaVelvet produced longer scaffolds, increased the number of predicted genes, and improved the assignments of a phylumlevel taxonomy in the sense that the rate of predicted genes that cannot be assigned to any tanoxomy is reduced. Availability The source code of MetaVelvet is freely available at http://metavelvet.dna.bio. keio.ac.jp under the GNU General Public License.",
keywords = "Assembly, Metagenome, Next generation sequencing",
author = "Toshiaki Namiki and Tsuyoshi Hachiya and Hideaki Tanaka and Yasubumi Sakakibara",
year = "2011",
doi = "10.1145/2147805.2147818",
language = "English",
isbn = "9781450307963",
pages = "116--124",
booktitle = "2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2011",

}

TY - GEN

T1 - MetaVelvet

T2 - AAAn extension of Velvet assembler to de novo metagenome assembly from short sequence reads

AU - Namiki, Toshiaki

AU - Hachiya, Tsuyoshi

AU - Tanaka, Hideaki

AU - Sakakibara, Yasubumi

PY - 2011

Y1 - 2011

N2 - Motivation: An important step of "metagenomics" analysis is the assembly of multiple genomes from mixed sequence reads of multiple species in a microbial community. Most conventional pipelines employ a single-genome assembler with carefully optimized parameters and post-process the resulting scaffolds to correct assembly errors. Limitations of the use of a single-genome assembler for de novo metagenome assembly are that highly conserved sequences shared between different species often causes chimera contigs, and sequences of highly abundant species are likely mis-identified as repeats in a single genome, resulting in a number of small fragmented scaffolds. The metagenome assembly problem becomes harder when assembling from very short sequence reads. Method: We modified and extended a single-genome and de Bruijngraph based assembler, known as "Velvet" [27], for short reads to metagenome assembly, called"MetaVelvet", for mixed short reads of multiple species. Our fundamental ideas are first decomposing de Bruijn graph constructed from mixed short reads into individual sub-graphs and second building scaffolds based on every decomposed de Bruijn sub-graph as isolate species genome. We make use of two features, graph connectivity and coverage (abundance) difference, for the decomposition of de Bruijn graph. Results: On simulated datasets, MetaVelvet succeeded to generate higher N50 scores and smaller chimeric scaffolds than any compared single-genome assemblers, produce high-quality scaffolds as well as the separate assembly using Velvet from isolated species sequence reads, and MetaVelvet reconstructed even relatively low-coverage genome sequences as scaffolds. On a real dataset of Human Gut microbial read data, MetaVelvet produced longer scaffolds, increased the number of predicted genes, and improved the assignments of a phylumlevel taxonomy in the sense that the rate of predicted genes that cannot be assigned to any tanoxomy is reduced. Availability The source code of MetaVelvet is freely available at http://metavelvet.dna.bio. keio.ac.jp under the GNU General Public License.

AB - Motivation: An important step of "metagenomics" analysis is the assembly of multiple genomes from mixed sequence reads of multiple species in a microbial community. Most conventional pipelines employ a single-genome assembler with carefully optimized parameters and post-process the resulting scaffolds to correct assembly errors. Limitations of the use of a single-genome assembler for de novo metagenome assembly are that highly conserved sequences shared between different species often causes chimera contigs, and sequences of highly abundant species are likely mis-identified as repeats in a single genome, resulting in a number of small fragmented scaffolds. The metagenome assembly problem becomes harder when assembling from very short sequence reads. Method: We modified and extended a single-genome and de Bruijngraph based assembler, known as "Velvet" [27], for short reads to metagenome assembly, called"MetaVelvet", for mixed short reads of multiple species. Our fundamental ideas are first decomposing de Bruijn graph constructed from mixed short reads into individual sub-graphs and second building scaffolds based on every decomposed de Bruijn sub-graph as isolate species genome. We make use of two features, graph connectivity and coverage (abundance) difference, for the decomposition of de Bruijn graph. Results: On simulated datasets, MetaVelvet succeeded to generate higher N50 scores and smaller chimeric scaffolds than any compared single-genome assemblers, produce high-quality scaffolds as well as the separate assembly using Velvet from isolated species sequence reads, and MetaVelvet reconstructed even relatively low-coverage genome sequences as scaffolds. On a real dataset of Human Gut microbial read data, MetaVelvet produced longer scaffolds, increased the number of predicted genes, and improved the assignments of a phylumlevel taxonomy in the sense that the rate of predicted genes that cannot be assigned to any tanoxomy is reduced. Availability The source code of MetaVelvet is freely available at http://metavelvet.dna.bio. keio.ac.jp under the GNU General Public License.

KW - Assembly

KW - Metagenome

KW - Next generation sequencing

UR - http://www.scopus.com/inward/record.url?scp=84858992875&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858992875&partnerID=8YFLogxK

U2 - 10.1145/2147805.2147818

DO - 10.1145/2147805.2147818

M3 - Conference contribution

AN - SCOPUS:84858992875

SN - 9781450307963

SP - 116

EP - 124

BT - 2011 ACM Conference on Bioinformatics, Computational Biology and Biomedicine, BCB 2011

ER -