TY - JOUR
T1 - MetaVelvet
T2 - An extension of Velvet assembler to de novo metagenome assembly from short sequence reads
AU - Namiki, Toshiaki
AU - Hachiya, Tsuyoshi
AU - Tanaka, Hideaki
AU - Sakakibara, Yasubumi
N1 - Funding Information:
KAKENHI from the Ministry of Education, Culture, Sports, Science and Technology of Japan [Grant-in-Aid for Scientific Research on Innovative Areas No.221S0002]. A grant program for bioinformatics research and development of the Japan Science and Technology Agency (in part). Funding for open access charge: KAKENHI from the Ministry of Education, Culture, Sports, Science and Technology of Japan [Grant-in-Aid for Scientific Research on Innovative Areas No.221S0002].
PY - 2012/11
Y1 - 2012/11
N2 - An important step in 'metagenomics' analysis is the assembly of multiple genomes from mixed sequence reads of multiple species in a microbial community. Most conventional pipelines use a single-genome assembler with carefully optimized parameters. A limitation of a single-genome assembler for de novo metagenome assembly is that sequences of highly abundant species are likely misidentified as repeats in a single genome, resulting in a number of small fragmented scaffolds. We extended a single-genome assembler for short reads, known as 'Velvet', to metagenome assembly, which we called 'MetaVelvet', for mixed short reads of multiple species. Our fundamental concept was to first decompose a de Bruijn graph constructed from mixed short reads into individual sub-graphs, and second, to build scaffolds based on each decomposed de Bruijn sub-graph as an isolate species genome. We made use of two features, the coverage (abundance) difference and graph connectivity, for the decomposition of the de Bruijn graph. For simulated datasets, MetaVelvet succeeded in generating significantly higher N50 scores than any single-genome assemblers. MetaVelvet also reconstructed relatively low-coverage genome sequences as scaffolds. On real datasets of human gut microbial read data, MetaVelvet produced longer scaffolds and increased the number of predicted genes.
AB - An important step in 'metagenomics' analysis is the assembly of multiple genomes from mixed sequence reads of multiple species in a microbial community. Most conventional pipelines use a single-genome assembler with carefully optimized parameters. A limitation of a single-genome assembler for de novo metagenome assembly is that sequences of highly abundant species are likely misidentified as repeats in a single genome, resulting in a number of small fragmented scaffolds. We extended a single-genome assembler for short reads, known as 'Velvet', to metagenome assembly, which we called 'MetaVelvet', for mixed short reads of multiple species. Our fundamental concept was to first decompose a de Bruijn graph constructed from mixed short reads into individual sub-graphs, and second, to build scaffolds based on each decomposed de Bruijn sub-graph as an isolate species genome. We made use of two features, the coverage (abundance) difference and graph connectivity, for the decomposition of the de Bruijn graph. For simulated datasets, MetaVelvet succeeded in generating significantly higher N50 scores than any single-genome assemblers. MetaVelvet also reconstructed relatively low-coverage genome sequences as scaffolds. On real datasets of human gut microbial read data, MetaVelvet produced longer scaffolds and increased the number of predicted genes.
UR - http://www.scopus.com/inward/record.url?scp=84867397631&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84867397631&partnerID=8YFLogxK
U2 - 10.1093/nar/gks678
DO - 10.1093/nar/gks678
M3 - Article
C2 - 22821567
AN - SCOPUS:84867397631
SN - 0305-1048
VL - 40
SP - e155
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 20
ER -