Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data

Yukari Nishito, Yasunori Osana, Tsuyoshi Hachiya, Kris Popendorf, Atsushi Toyoda, Asao Fujiyama, Mitsuhiro Itaya, Yasubumi Sakakibara

Research output: Contribution to journalArticle

60 Citations (Scopus)

Abstract

Background: Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length.Results: We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1.These are specific for γ-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases.Conclusions: The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B. subtilis natto harbors but B. subtilis 168 lacks. Multiple genome-level comparisons among five closely related Bacillus species were also carried out. The determined genome sequence of B. subtilis natto and gene annotations are available from the Natto genome browser http://natto-genome.org/.

Original languageEnglish
Article number243
JournalBMC Genomics
Volume11
Issue number1
DOIs
Publication statusPublished - 2010 Apr 16

Fingerprint

Soy Foods
Bacillus subtilis
Genome
Genes
Operon
Prostaglandins A
Transposases
Polyketides
Molecular Sequence Annotation
Insertional Mutagenesis
Soybeans
Genetic Promoter Regions
Bacillus
Alleles

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data. / Nishito, Yukari; Osana, Yasunori; Hachiya, Tsuyoshi; Popendorf, Kris; Toyoda, Atsushi; Fujiyama, Asao; Itaya, Mitsuhiro; Sakakibara, Yasubumi.

In: BMC Genomics, Vol. 11, No. 1, 243, 16.04.2010.

Research output: Contribution to journalArticle

Nishito, Y, Osana, Y, Hachiya, T, Popendorf, K, Toyoda, A, Fujiyama, A, Itaya, M & Sakakibara, Y 2010, 'Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data', BMC Genomics, vol. 11, no. 1, 243. https://doi.org/10.1186/1471-2164-11-243
Nishito, Yukari ; Osana, Yasunori ; Hachiya, Tsuyoshi ; Popendorf, Kris ; Toyoda, Atsushi ; Fujiyama, Asao ; Itaya, Mitsuhiro ; Sakakibara, Yasubumi. / Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data. In: BMC Genomics. 2010 ; Vol. 11, No. 1.
@article{14526110ab8c4e189dfc63dd9bc4ce44,
title = "Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data",
abstract = "Background: Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food {"}natto{"} made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length.Results: We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4{\%} of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2{\%} are deleted in 168, 14.3{\%} are inserted in BEST195, and 5.9{\%} of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1.These are specific for γ-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases.Conclusions: The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B. subtilis natto harbors but B. subtilis 168 lacks. Multiple genome-level comparisons among five closely related Bacillus species were also carried out. The determined genome sequence of B. subtilis natto and gene annotations are available from the Natto genome browser http://natto-genome.org/.",
author = "Yukari Nishito and Yasunori Osana and Tsuyoshi Hachiya and Kris Popendorf and Atsushi Toyoda and Asao Fujiyama and Mitsuhiro Itaya and Yasubumi Sakakibara",
year = "2010",
month = "4",
day = "16",
doi = "10.1186/1471-2164-11-243",
language = "English",
volume = "11",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data

AU - Nishito, Yukari

AU - Osana, Yasunori

AU - Hachiya, Tsuyoshi

AU - Popendorf, Kris

AU - Toyoda, Atsushi

AU - Fujiyama, Asao

AU - Itaya, Mitsuhiro

AU - Sakakibara, Yasubumi

PY - 2010/4/16

Y1 - 2010/4/16

N2 - Background: Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length.Results: We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1.These are specific for γ-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases.Conclusions: The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B. subtilis natto harbors but B. subtilis 168 lacks. Multiple genome-level comparisons among five closely related Bacillus species were also carried out. The determined genome sequence of B. subtilis natto and gene annotations are available from the Natto genome browser http://natto-genome.org/.

AB - Background: Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length.Results: We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1.These are specific for γ-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases.Conclusions: The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B. subtilis natto harbors but B. subtilis 168 lacks. Multiple genome-level comparisons among five closely related Bacillus species were also carried out. The determined genome sequence of B. subtilis natto and gene annotations are available from the Natto genome browser http://natto-genome.org/.

UR - http://www.scopus.com/inward/record.url?scp=77950797411&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77950797411&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-11-243

DO - 10.1186/1471-2164-11-243

M3 - Article

C2 - 20398357

AN - SCOPUS:77950797411

VL - 11

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 243

ER -