TY - JOUR
T1 - Fast and global detection of periodic sequence repeats in large genomic resources
AU - Mori, Hideto
AU - Evans-Yamamoto, Daniel
AU - Ishiguro, Soh
AU - Tomita, Masaru
AU - Yachie, Nozomu
N1 - Funding Information:
New Energy and Industrial Technology Development Organization (NEDO) Genome Editing Program; Japan Society for the Promotion of Science (JSPS) 18K19777 (to N.Y.); Japan Science and Technology Agency (JST) PRESTO program 10814 (to N.Y.); Japan Agency for Medical Research and Development (AMED) PRIME program 17gm6110007 (to N.Y.); The Naito Foundation (to N.Y.); The Nakajima Foundation (to N.Y.); The Takeda Foundation (to N.Y.); SECOM Science and Technology Foundation (to N.Y.); TTCK fellowships (to H.M., D.E.-Y., S.I.); Mori Memorial Foundation (to H.M.); Yamagishi Student Project Support Program (to D.E.-Y.) of Keio University; JSPS DC1 Fellowship (to S.I.). Funding for open access charge: Research Budget. Conflict of interest statement. None declared.
Publisher Copyright:
© 2019 The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.
PY - 2019/1/25
Y1 - 2019/1/25
N2 - Periodically repeating DNA and protein elements are involved in various important biological events including genomic evolution, gene regulation, protein complex formation, and immunity. Notably, the currently used genome editing tools such as ZFNs, TALENs, and CRISPRs are also all associated with periodically repeating biomolecules of natural organisms. Despite the biological importance of periodically repeating sequences and the expectation that new genome editing modules could be discovered from such periodical repeats, no software that globally detects such structured elements in large genomic resources in a high-throughput and unsupervised manner has been developed. We developed new software, SPADE (Search for Patterned DNA Elements), that exhaustively explores periodic DNA and protein repeats from large-scale genomic datasets based on k-mer periodicity evaluation. With a simple constraint, sequence periodicity, SPADE captured reported genome-editing-associated sequences and other protein families involving repeating domains such as tetratricopeptide, ankyrin and WD40 repeats with better performance than the other software designed for limited sets of repetitive biomolecular sequences, suggesting the high potential of this software to contribute to the discovery of new biological events and new genome editing modules.
AB - Periodically repeating DNA and protein elements are involved in various important biological events including genomic evolution, gene regulation, protein complex formation, and immunity. Notably, the currently used genome editing tools such as ZFNs, TALENs, and CRISPRs are also all associated with periodically repeating biomolecules of natural organisms. Despite the biological importance of periodically repeating sequences and the expectation that new genome editing modules could be discovered from such periodical repeats, no software that globally detects such structured elements in large genomic resources in a high-throughput and unsupervised manner has been developed. We developed new software, SPADE (Search for Patterned DNA Elements), that exhaustively explores periodic DNA and protein repeats from large-scale genomic datasets based on k-mer periodicity evaluation. With a simple constraint, sequence periodicity, SPADE captured reported genome-editing-associated sequences and other protein families involving repeating domains such as tetratricopeptide, ankyrin and WD40 repeats with better performance than the other software designed for limited sets of repetitive biomolecular sequences, suggesting the high potential of this software to contribute to the discovery of new biological events and new genome editing modules.
UR - http://www.scopus.com/inward/record.url?scp=85060651746&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060651746&partnerID=8YFLogxK
U2 - 10.1093/nar/gky890
DO - 10.1093/nar/gky890
M3 - Article
C2 - 30304510
AN - SCOPUS:85060651746
VL - 47
JO - Nucleic Acids Research
JF - Nucleic Acids Research
SN - 0305-1048
IS - 2
M1 - e8
ER -