Mass spectrum sequential subtraction speeds up searching large peptide MS/MS spectra datasets against large nucleotide databases for proteogenomics

Mohamed Helmy, Naoyuki Sugiyama, Masaru Tomita, Yasushi Ishihama

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

We have developed a novel bioinformatics method called mass spectrum sequential subtraction (MSSS) to search large peptide spectra datasets produced by liquid chromatography/mass spectrometry (LC-MS/MS) against protein and large-sized nucleotide sequence databases. The main principle in MSSS is to search the peptide spectra set against the protein database, followed by removal of the spectra corresponding to the identified peptides to create a smaller set of the remaining peptide spectra for searching against the nucleotide sequences database. Therefore, we reduce the number of spectra to be searched to limit the peptide search space. Comparing MSSS and conventional search approach using a dataset of 27 LC-MS/MS runs of rice culture cells indicated that MSSS reduced the search queries to 50% and the search time to 75% on average. In addition, MSSS had no effect on the identification false-positive rate (FPR) or the novel peptide sequences identification ability. We used MSSS to analyze another dataset of 34 LC-MS/MS runs, resulting in identifying additional 74 novel peptides. Proteogenomic analysis with these additional peptides yielded 47 new genomic features in 24 rice genes plus 24 intergenic peptides. These results show that the utility of MSSS in searching large databases with large MS/MS datasets for proteogenomics.

Original languageEnglish
Pages (from-to)633-644
Number of pages12
JournalGenes to Cells
Volume17
Issue number8
DOIs
Publication statusPublished - 2012 Aug

Fingerprint

Nucleotides
Databases
Peptides
Protein Databases
Proteogenomics
Datasets
Computational Biology
Liquid Chromatography
Mass Spectrometry
Cell Culture Techniques
Genes
Proteins

ASJC Scopus subject areas

  • Genetics
  • Cell Biology

Cite this

Mass spectrum sequential subtraction speeds up searching large peptide MS/MS spectra datasets against large nucleotide databases for proteogenomics. / Helmy, Mohamed; Sugiyama, Naoyuki; Tomita, Masaru; Ishihama, Yasushi.

In: Genes to Cells, Vol. 17, No. 8, 08.2012, p. 633-644.

Research output: Contribution to journalArticle

@article{b8378b0399fd4c2685d4f62256765c83,
title = "Mass spectrum sequential subtraction speeds up searching large peptide MS/MS spectra datasets against large nucleotide databases for proteogenomics",
abstract = "We have developed a novel bioinformatics method called mass spectrum sequential subtraction (MSSS) to search large peptide spectra datasets produced by liquid chromatography/mass spectrometry (LC-MS/MS) against protein and large-sized nucleotide sequence databases. The main principle in MSSS is to search the peptide spectra set against the protein database, followed by removal of the spectra corresponding to the identified peptides to create a smaller set of the remaining peptide spectra for searching against the nucleotide sequences database. Therefore, we reduce the number of spectra to be searched to limit the peptide search space. Comparing MSSS and conventional search approach using a dataset of 27 LC-MS/MS runs of rice culture cells indicated that MSSS reduced the search queries to 50{\%} and the search time to 75{\%} on average. In addition, MSSS had no effect on the identification false-positive rate (FPR) or the novel peptide sequences identification ability. We used MSSS to analyze another dataset of 34 LC-MS/MS runs, resulting in identifying additional 74 novel peptides. Proteogenomic analysis with these additional peptides yielded 47 new genomic features in 24 rice genes plus 24 intergenic peptides. These results show that the utility of MSSS in searching large databases with large MS/MS datasets for proteogenomics.",
author = "Mohamed Helmy and Naoyuki Sugiyama and Masaru Tomita and Yasushi Ishihama",
year = "2012",
month = "8",
doi = "10.1111/j.1365-2443.2012.01615.x",
language = "English",
volume = "17",
pages = "633--644",
journal = "Genes to Cells",
issn = "1356-9597",
publisher = "Wiley-Blackwell",
number = "8",

}

TY - JOUR

T1 - Mass spectrum sequential subtraction speeds up searching large peptide MS/MS spectra datasets against large nucleotide databases for proteogenomics

AU - Helmy, Mohamed

AU - Sugiyama, Naoyuki

AU - Tomita, Masaru

AU - Ishihama, Yasushi

PY - 2012/8

Y1 - 2012/8

N2 - We have developed a novel bioinformatics method called mass spectrum sequential subtraction (MSSS) to search large peptide spectra datasets produced by liquid chromatography/mass spectrometry (LC-MS/MS) against protein and large-sized nucleotide sequence databases. The main principle in MSSS is to search the peptide spectra set against the protein database, followed by removal of the spectra corresponding to the identified peptides to create a smaller set of the remaining peptide spectra for searching against the nucleotide sequences database. Therefore, we reduce the number of spectra to be searched to limit the peptide search space. Comparing MSSS and conventional search approach using a dataset of 27 LC-MS/MS runs of rice culture cells indicated that MSSS reduced the search queries to 50% and the search time to 75% on average. In addition, MSSS had no effect on the identification false-positive rate (FPR) or the novel peptide sequences identification ability. We used MSSS to analyze another dataset of 34 LC-MS/MS runs, resulting in identifying additional 74 novel peptides. Proteogenomic analysis with these additional peptides yielded 47 new genomic features in 24 rice genes plus 24 intergenic peptides. These results show that the utility of MSSS in searching large databases with large MS/MS datasets for proteogenomics.

AB - We have developed a novel bioinformatics method called mass spectrum sequential subtraction (MSSS) to search large peptide spectra datasets produced by liquid chromatography/mass spectrometry (LC-MS/MS) against protein and large-sized nucleotide sequence databases. The main principle in MSSS is to search the peptide spectra set against the protein database, followed by removal of the spectra corresponding to the identified peptides to create a smaller set of the remaining peptide spectra for searching against the nucleotide sequences database. Therefore, we reduce the number of spectra to be searched to limit the peptide search space. Comparing MSSS and conventional search approach using a dataset of 27 LC-MS/MS runs of rice culture cells indicated that MSSS reduced the search queries to 50% and the search time to 75% on average. In addition, MSSS had no effect on the identification false-positive rate (FPR) or the novel peptide sequences identification ability. We used MSSS to analyze another dataset of 34 LC-MS/MS runs, resulting in identifying additional 74 novel peptides. Proteogenomic analysis with these additional peptides yielded 47 new genomic features in 24 rice genes plus 24 intergenic peptides. These results show that the utility of MSSS in searching large databases with large MS/MS datasets for proteogenomics.

UR - http://www.scopus.com/inward/record.url?scp=84864302486&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84864302486&partnerID=8YFLogxK

U2 - 10.1111/j.1365-2443.2012.01615.x

DO - 10.1111/j.1365-2443.2012.01615.x

M3 - Article

VL - 17

SP - 633

EP - 644

JO - Genes to Cells

JF - Genes to Cells

SN - 1356-9597

IS - 8

ER -