Fast and global detection of periodic sequence repeats in large genomic resources

Hideto Mori, Daniel Evans-Yamamoto, Soh Ishiguro, Masaru Tomita, Nozomu Yachie

Research output: Contribution to journalArticle

Abstract

Periodically repeating DNA and protein elements are involved in various important biological events including genomic evolution, gene regulation, protein complex formation, and immunity. Notably, the currently used genome editing tools such as ZFNs, TALENs, and CRISPRs are also all associated with periodically repeating biomolecules of natural organisms. Despite the biological importance of periodically repeating sequences and the expectation that new genome editing modules could be discovered from such periodical repeats, no software that globally detects such structured elements in large genomic resources in a high-throughput and unsupervised manner has been developed. We developed new software, SPADE (Search for Patterned DNA Elements), that exhaustively explores periodic DNA and protein repeats from large-scale genomic datasets based on k-mer periodicity evaluation. With a simple constraint, sequence periodicity, SPADE captured reported genome-editing-associated sequences and other protein families involving repeating domains such as tetratricopeptide, ankyrin and WD40 repeats with better performance than the other software designed for limited sets of repetitive biomolecular sequences, suggesting the high potential of this software to contribute to the discovery of new biological events and new genome editing modules.

Original languageEnglish
Pages (from-to)e8
JournalNucleic Acids Research
Volume47
Issue number2
DOIs
Publication statusPublished - 2019 Jan 25

Fingerprint

Software
DNA
Periodicity
Proteins
Clustered Regularly Interspaced Short Palindromic Repeats
Ankyrin Repeat
Nucleic Acid Repetitive Sequences
Immunity
Gene Editing

ASJC Scopus subject areas

  • Genetics

Cite this

Fast and global detection of periodic sequence repeats in large genomic resources. / Mori, Hideto; Evans-Yamamoto, Daniel; Ishiguro, Soh; Tomita, Masaru; Yachie, Nozomu.

In: Nucleic Acids Research, Vol. 47, No. 2, 25.01.2019, p. e8.

Research output: Contribution to journalArticle

Mori, Hideto ; Evans-Yamamoto, Daniel ; Ishiguro, Soh ; Tomita, Masaru ; Yachie, Nozomu. / Fast and global detection of periodic sequence repeats in large genomic resources. In: Nucleic Acids Research. 2019 ; Vol. 47, No. 2. pp. e8.
@article{cc106df415f24129b760a945415a29ff,
title = "Fast and global detection of periodic sequence repeats in large genomic resources",
abstract = "Periodically repeating DNA and protein elements are involved in various important biological events including genomic evolution, gene regulation, protein complex formation, and immunity. Notably, the currently used genome editing tools such as ZFNs, TALENs, and CRISPRs are also all associated with periodically repeating biomolecules of natural organisms. Despite the biological importance of periodically repeating sequences and the expectation that new genome editing modules could be discovered from such periodical repeats, no software that globally detects such structured elements in large genomic resources in a high-throughput and unsupervised manner has been developed. We developed new software, SPADE (Search for Patterned DNA Elements), that exhaustively explores periodic DNA and protein repeats from large-scale genomic datasets based on k-mer periodicity evaluation. With a simple constraint, sequence periodicity, SPADE captured reported genome-editing-associated sequences and other protein families involving repeating domains such as tetratricopeptide, ankyrin and WD40 repeats with better performance than the other software designed for limited sets of repetitive biomolecular sequences, suggesting the high potential of this software to contribute to the discovery of new biological events and new genome editing modules.",
author = "Hideto Mori and Daniel Evans-Yamamoto and Soh Ishiguro and Masaru Tomita and Nozomu Yachie",
year = "2019",
month = "1",
day = "25",
doi = "10.1093/nar/gky890",
language = "English",
volume = "47",
pages = "e8",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - Fast and global detection of periodic sequence repeats in large genomic resources

AU - Mori, Hideto

AU - Evans-Yamamoto, Daniel

AU - Ishiguro, Soh

AU - Tomita, Masaru

AU - Yachie, Nozomu

PY - 2019/1/25

Y1 - 2019/1/25

N2 - Periodically repeating DNA and protein elements are involved in various important biological events including genomic evolution, gene regulation, protein complex formation, and immunity. Notably, the currently used genome editing tools such as ZFNs, TALENs, and CRISPRs are also all associated with periodically repeating biomolecules of natural organisms. Despite the biological importance of periodically repeating sequences and the expectation that new genome editing modules could be discovered from such periodical repeats, no software that globally detects such structured elements in large genomic resources in a high-throughput and unsupervised manner has been developed. We developed new software, SPADE (Search for Patterned DNA Elements), that exhaustively explores periodic DNA and protein repeats from large-scale genomic datasets based on k-mer periodicity evaluation. With a simple constraint, sequence periodicity, SPADE captured reported genome-editing-associated sequences and other protein families involving repeating domains such as tetratricopeptide, ankyrin and WD40 repeats with better performance than the other software designed for limited sets of repetitive biomolecular sequences, suggesting the high potential of this software to contribute to the discovery of new biological events and new genome editing modules.

AB - Periodically repeating DNA and protein elements are involved in various important biological events including genomic evolution, gene regulation, protein complex formation, and immunity. Notably, the currently used genome editing tools such as ZFNs, TALENs, and CRISPRs are also all associated with periodically repeating biomolecules of natural organisms. Despite the biological importance of periodically repeating sequences and the expectation that new genome editing modules could be discovered from such periodical repeats, no software that globally detects such structured elements in large genomic resources in a high-throughput and unsupervised manner has been developed. We developed new software, SPADE (Search for Patterned DNA Elements), that exhaustively explores periodic DNA and protein repeats from large-scale genomic datasets based on k-mer periodicity evaluation. With a simple constraint, sequence periodicity, SPADE captured reported genome-editing-associated sequences and other protein families involving repeating domains such as tetratricopeptide, ankyrin and WD40 repeats with better performance than the other software designed for limited sets of repetitive biomolecular sequences, suggesting the high potential of this software to contribute to the discovery of new biological events and new genome editing modules.

UR - http://www.scopus.com/inward/record.url?scp=85060651746&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060651746&partnerID=8YFLogxK

U2 - 10.1093/nar/gky890

DO - 10.1093/nar/gky890

M3 - Article

VL - 47

SP - e8

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 2

ER -