Convolutional neural networks for classification of alignments of non-coding RNA sequences

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Motivation: The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides has successfully derived position weight matrices on the learned kernels that corresponded to sequence motifs such as protein-binding sites. Results: We propose a novel application of CNNs to classification of pairwise alignments of sequences for accurate clustering of sequences and show the benefits of the CNN method of inputting pairwise alignments for clustering of non-coding RNA (ncRNA) sequences and for motif discovery. Classification of a pairwise alignment of two sequences into positive and negative classes corresponds to the clustering of the input sequences. After we combined the distributed representation of RNA nucleotides with the secondary-structure information specific to ncRNAs and furthermore with mapping profiles of next-generation sequence reads, the training of CNNs for classification of alignments of RNA sequences yielded accurate clustering in terms of ncRNA families and outperformed the existing clustering methods for ncRNA sequences. Several interesting sequence motifs and secondary-structure motifs known for the snoRNA family and specific to microRNA and tRNA families were identified.

Original languageEnglish
Pages (from-to)i237-i244
JournalBioinformatics
Volume34
Issue number13
DOIs
Publication statusPublished - 2018 Jul 1

Fingerprint

Untranslated RNA
RNA
Cluster Analysis
Alignment
Neural Networks
Neural networks
Sequence Alignment
Nucleotides
Small Nucleolar RNA
Position-Specific Scoring Matrices
Clustering
DNA sequences
Motif Discovery
Pairwise
Transfer RNA
MicroRNAs
Nucleotide Motifs
Binding sites
Secondary Structure
Protein Binding

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Convolutional neural networks for classification of alignments of non-coding RNA sequences. / Aoki, Genta; Sakakibara, Yasubumi.

In: Bioinformatics, Vol. 34, No. 13, 01.07.2018, p. i237-i244.

Research output: Contribution to journalArticle

@article{3ab4f6cf437344d0bf0ad3da2f23633b,
title = "Convolutional neural networks for classification of alignments of non-coding RNA sequences",
abstract = "Motivation: The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides has successfully derived position weight matrices on the learned kernels that corresponded to sequence motifs such as protein-binding sites. Results: We propose a novel application of CNNs to classification of pairwise alignments of sequences for accurate clustering of sequences and show the benefits of the CNN method of inputting pairwise alignments for clustering of non-coding RNA (ncRNA) sequences and for motif discovery. Classification of a pairwise alignment of two sequences into positive and negative classes corresponds to the clustering of the input sequences. After we combined the distributed representation of RNA nucleotides with the secondary-structure information specific to ncRNAs and furthermore with mapping profiles of next-generation sequence reads, the training of CNNs for classification of alignments of RNA sequences yielded accurate clustering in terms of ncRNA families and outperformed the existing clustering methods for ncRNA sequences. Several interesting sequence motifs and secondary-structure motifs known for the snoRNA family and specific to microRNA and tRNA families were identified.",
author = "Genta Aoki and Yasubumi Sakakibara",
year = "2018",
month = "7",
day = "1",
doi = "10.1093/bioinformatics/bty228",
language = "English",
volume = "34",
pages = "i237--i244",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "13",

}

TY - JOUR

T1 - Convolutional neural networks for classification of alignments of non-coding RNA sequences

AU - Aoki, Genta

AU - Sakakibara, Yasubumi

PY - 2018/7/1

Y1 - 2018/7/1

N2 - Motivation: The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides has successfully derived position weight matrices on the learned kernels that corresponded to sequence motifs such as protein-binding sites. Results: We propose a novel application of CNNs to classification of pairwise alignments of sequences for accurate clustering of sequences and show the benefits of the CNN method of inputting pairwise alignments for clustering of non-coding RNA (ncRNA) sequences and for motif discovery. Classification of a pairwise alignment of two sequences into positive and negative classes corresponds to the clustering of the input sequences. After we combined the distributed representation of RNA nucleotides with the secondary-structure information specific to ncRNAs and furthermore with mapping profiles of next-generation sequence reads, the training of CNNs for classification of alignments of RNA sequences yielded accurate clustering in terms of ncRNA families and outperformed the existing clustering methods for ncRNA sequences. Several interesting sequence motifs and secondary-structure motifs known for the snoRNA family and specific to microRNA and tRNA families were identified.

AB - Motivation: The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides has successfully derived position weight matrices on the learned kernels that corresponded to sequence motifs such as protein-binding sites. Results: We propose a novel application of CNNs to classification of pairwise alignments of sequences for accurate clustering of sequences and show the benefits of the CNN method of inputting pairwise alignments for clustering of non-coding RNA (ncRNA) sequences and for motif discovery. Classification of a pairwise alignment of two sequences into positive and negative classes corresponds to the clustering of the input sequences. After we combined the distributed representation of RNA nucleotides with the secondary-structure information specific to ncRNAs and furthermore with mapping profiles of next-generation sequence reads, the training of CNNs for classification of alignments of RNA sequences yielded accurate clustering in terms of ncRNA families and outperformed the existing clustering methods for ncRNA sequences. Several interesting sequence motifs and secondary-structure motifs known for the snoRNA family and specific to microRNA and tRNA families were identified.

UR - http://www.scopus.com/inward/record.url?scp=85050806688&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050806688&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty228

DO - 10.1093/bioinformatics/bty228

M3 - Article

VL - 34

SP - i237-i244

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 13

ER -