High speed error log control method in in-memory cluster computing platform

Ryuichi Saito, Shinichiro Haruyama

Research output: Contribution to journalArticle

Abstract

Since 2010, in-memory cluster computing platform has been increasingly used in firms and research institutions to analyze large amounts of datasets within a short amount of time. In these methods, unexpected errors cause the load to exceed the assumption for computer infrastructures such as a monitoring system, owing to the execution of multithreading, assigning divided datasets to multiple nodes, and storing them in in-memory spaces. In this research, we propose a method that notifies administrators with only information needed to understand the situation in a short period by eliminating duplications of numerous application error logs for that period and clustering messages using an unsupervised learning k-means method with an in-memory cluster computing framework “Apache Spark.” By implementing this method, we can demonstrate that it is possible to eliminate duplications of error messages by 93% on an average compared with conventional methods. Further, we can extract significant messages from the application error messages and notify the administrators in an average of 4.2 min from the time of occurrence of the error.

Original languageEnglish
Pages (from-to)310-319
Number of pages10
JournalJournal of information processing
Volume28
DOIs
Publication statusPublished - 2020 May

Keywords

  • Distributed System
  • Error logs
  • K-means
  • Spark
  • TF-IDF

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint Dive into the research topics of 'High speed error log control method in in-memory cluster computing platform'. Together they form a unique fingerprint.

  • Cite this