IMSindel: An accurate intermediate-size indel detection tool incorporating de novo assembly and gapped global-local alignment with split read analysis

Daichi Shigemizu, Fuyuki Miya, Shintaro Akiyama, Shujiro Okuda, Keith A. Boroevich, Akihiro Fujimoto, Hidewaki Nakagawa, Kouichi Ozaki, Shumpei Niida, Yonehiro Kanemura, Nobuhiko Okamoto, Shinji Saitoh, Mitsuhiro Kato, Mami Yamasaki, Tatsuo Matsunaga, Hideki Mutai, Kenjiro Kosaki, Tatsuhiko Tsunoda

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Insertions and deletions (indels) have been implicated in dozens of human diseases through the radical alteration of gene function by short frameshift indels as well as long indels. However, the accurate detection of these indels from next-generation sequencing data is still challenging. This is particularly true for intermediate-size indels (≥50 bp), due to the short DNA sequencing reads. Here, we developed a new method that predicts intermediate-size indels using BWA soft-clipped fragments (unmatched fragments in partially mapped reads) and unmapped reads. We report the performance comparison of our method, GATK, PINDEL and ScanIndel, using whole exome sequencing data from the same samples. False positive and false negative counts were determined through Sanger sequencing of all predicted indels across these four methods. The harmonic mean of the recall and precision, F-measure, was used to measure the performance of each method. Our method achieved the highest F-measure of 0.84 in one sample, compared to 0.56 for GATK, 0.52 for PINDEL and 0.46 for ScanIndel. Similar results were obtained in additional samples, demonstrating that our method was superior to the other methods for detecting intermediate-size indels. We believe that this methodology will contribute to the discovery of intermediate-size indels associated with human disease.

Original languageEnglish
Article number5608
JournalScientific Reports
Volume8
Issue number1
DOIs
Publication statusPublished - 2018 Dec 1

Fingerprint

Exome
DNA Sequence Analysis
Genes

ASJC Scopus subject areas

  • General

Cite this

IMSindel : An accurate intermediate-size indel detection tool incorporating de novo assembly and gapped global-local alignment with split read analysis. / Shigemizu, Daichi; Miya, Fuyuki; Akiyama, Shintaro; Okuda, Shujiro; Boroevich, Keith A.; Fujimoto, Akihiro; Nakagawa, Hidewaki; Ozaki, Kouichi; Niida, Shumpei; Kanemura, Yonehiro; Okamoto, Nobuhiko; Saitoh, Shinji; Kato, Mitsuhiro; Yamasaki, Mami; Matsunaga, Tatsuo; Mutai, Hideki; Kosaki, Kenjiro; Tsunoda, Tatsuhiko.

In: Scientific Reports, Vol. 8, No. 1, 5608, 01.12.2018.

Research output: Contribution to journalArticle

Shigemizu, D, Miya, F, Akiyama, S, Okuda, S, Boroevich, KA, Fujimoto, A, Nakagawa, H, Ozaki, K, Niida, S, Kanemura, Y, Okamoto, N, Saitoh, S, Kato, M, Yamasaki, M, Matsunaga, T, Mutai, H, Kosaki, K & Tsunoda, T 2018, 'IMSindel: An accurate intermediate-size indel detection tool incorporating de novo assembly and gapped global-local alignment with split read analysis', Scientific Reports, vol. 8, no. 1, 5608. https://doi.org/10.1038/s41598-018-23978-z
Shigemizu, Daichi ; Miya, Fuyuki ; Akiyama, Shintaro ; Okuda, Shujiro ; Boroevich, Keith A. ; Fujimoto, Akihiro ; Nakagawa, Hidewaki ; Ozaki, Kouichi ; Niida, Shumpei ; Kanemura, Yonehiro ; Okamoto, Nobuhiko ; Saitoh, Shinji ; Kato, Mitsuhiro ; Yamasaki, Mami ; Matsunaga, Tatsuo ; Mutai, Hideki ; Kosaki, Kenjiro ; Tsunoda, Tatsuhiko. / IMSindel : An accurate intermediate-size indel detection tool incorporating de novo assembly and gapped global-local alignment with split read analysis. In: Scientific Reports. 2018 ; Vol. 8, No. 1.
@article{4bd5dc617c104392a871164043c40644,
title = "IMSindel: An accurate intermediate-size indel detection tool incorporating de novo assembly and gapped global-local alignment with split read analysis",
abstract = "Insertions and deletions (indels) have been implicated in dozens of human diseases through the radical alteration of gene function by short frameshift indels as well as long indels. However, the accurate detection of these indels from next-generation sequencing data is still challenging. This is particularly true for intermediate-size indels (≥50 bp), due to the short DNA sequencing reads. Here, we developed a new method that predicts intermediate-size indels using BWA soft-clipped fragments (unmatched fragments in partially mapped reads) and unmapped reads. We report the performance comparison of our method, GATK, PINDEL and ScanIndel, using whole exome sequencing data from the same samples. False positive and false negative counts were determined through Sanger sequencing of all predicted indels across these four methods. The harmonic mean of the recall and precision, F-measure, was used to measure the performance of each method. Our method achieved the highest F-measure of 0.84 in one sample, compared to 0.56 for GATK, 0.52 for PINDEL and 0.46 for ScanIndel. Similar results were obtained in additional samples, demonstrating that our method was superior to the other methods for detecting intermediate-size indels. We believe that this methodology will contribute to the discovery of intermediate-size indels associated with human disease.",
author = "Daichi Shigemizu and Fuyuki Miya and Shintaro Akiyama and Shujiro Okuda and Boroevich, {Keith A.} and Akihiro Fujimoto and Hidewaki Nakagawa and Kouichi Ozaki and Shumpei Niida and Yonehiro Kanemura and Nobuhiko Okamoto and Shinji Saitoh and Mitsuhiro Kato and Mami Yamasaki and Tatsuo Matsunaga and Hideki Mutai and Kenjiro Kosaki and Tatsuhiko Tsunoda",
year = "2018",
month = "12",
day = "1",
doi = "10.1038/s41598-018-23978-z",
language = "English",
volume = "8",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",
number = "1",

}

TY - JOUR

T1 - IMSindel

T2 - An accurate intermediate-size indel detection tool incorporating de novo assembly and gapped global-local alignment with split read analysis

AU - Shigemizu, Daichi

AU - Miya, Fuyuki

AU - Akiyama, Shintaro

AU - Okuda, Shujiro

AU - Boroevich, Keith A.

AU - Fujimoto, Akihiro

AU - Nakagawa, Hidewaki

AU - Ozaki, Kouichi

AU - Niida, Shumpei

AU - Kanemura, Yonehiro

AU - Okamoto, Nobuhiko

AU - Saitoh, Shinji

AU - Kato, Mitsuhiro

AU - Yamasaki, Mami

AU - Matsunaga, Tatsuo

AU - Mutai, Hideki

AU - Kosaki, Kenjiro

AU - Tsunoda, Tatsuhiko

PY - 2018/12/1

Y1 - 2018/12/1

N2 - Insertions and deletions (indels) have been implicated in dozens of human diseases through the radical alteration of gene function by short frameshift indels as well as long indels. However, the accurate detection of these indels from next-generation sequencing data is still challenging. This is particularly true for intermediate-size indels (≥50 bp), due to the short DNA sequencing reads. Here, we developed a new method that predicts intermediate-size indels using BWA soft-clipped fragments (unmatched fragments in partially mapped reads) and unmapped reads. We report the performance comparison of our method, GATK, PINDEL and ScanIndel, using whole exome sequencing data from the same samples. False positive and false negative counts were determined through Sanger sequencing of all predicted indels across these four methods. The harmonic mean of the recall and precision, F-measure, was used to measure the performance of each method. Our method achieved the highest F-measure of 0.84 in one sample, compared to 0.56 for GATK, 0.52 for PINDEL and 0.46 for ScanIndel. Similar results were obtained in additional samples, demonstrating that our method was superior to the other methods for detecting intermediate-size indels. We believe that this methodology will contribute to the discovery of intermediate-size indels associated with human disease.

AB - Insertions and deletions (indels) have been implicated in dozens of human diseases through the radical alteration of gene function by short frameshift indels as well as long indels. However, the accurate detection of these indels from next-generation sequencing data is still challenging. This is particularly true for intermediate-size indels (≥50 bp), due to the short DNA sequencing reads. Here, we developed a new method that predicts intermediate-size indels using BWA soft-clipped fragments (unmatched fragments in partially mapped reads) and unmapped reads. We report the performance comparison of our method, GATK, PINDEL and ScanIndel, using whole exome sequencing data from the same samples. False positive and false negative counts were determined through Sanger sequencing of all predicted indels across these four methods. The harmonic mean of the recall and precision, F-measure, was used to measure the performance of each method. Our method achieved the highest F-measure of 0.84 in one sample, compared to 0.56 for GATK, 0.52 for PINDEL and 0.46 for ScanIndel. Similar results were obtained in additional samples, demonstrating that our method was superior to the other methods for detecting intermediate-size indels. We believe that this methodology will contribute to the discovery of intermediate-size indels associated with human disease.

UR - http://www.scopus.com/inward/record.url?scp=85044986698&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044986698&partnerID=8YFLogxK

U2 - 10.1038/s41598-018-23978-z

DO - 10.1038/s41598-018-23978-z

M3 - Article

AN - SCOPUS:85044986698

VL - 8

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

IS - 1

M1 - 5608

ER -