Traffic Feature-Based Botnet Detection Scheme Emphasizing the Importance of Long Patterns

Yichen An, Shuichiro Haruta, Sanghun Choi, Iwao Sasase

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The botnet detection is imperative. Among several detection schemes, the promising one uses the communication sequences. The main idea of that scheme is that the communication sequences represent special feature since they are controlled by programs. That sequence is tokenized to truncated sequences by n-gram and the numbers of each pattern’s occurrence are used as a feature vector. However, although the features are normalized by the total number of all patterns’ occurrences, the number of occurrences in larger n are less than those of smaller n. That is, regardless of the value of n, the previous scheme normalizes it by the total number of all patterns’ occurrences. As a result, normalized long patterns’ features become very small value and are hidden by others. In order to overcome this shortcoming, in this paper, we propose a traffic feature-based botnet detection scheme emphasizing the importance of long patterns. We realize the emphasizing by two ideas. The first idea is normalizing occurrences by the total number of occurrences in each n instead of the total number of all patterns’ occurrences. By doing this, smaller occurrences in larger n are normalized by smaller values and the feature becomes more balanced with larger value. The second idea is giving weights to the normalized features by calculating ranks of the normalized feature. By weighting features according to the ranks, we can get more outstanding features of longer patterns. By the computer simulation with real dataset, we show the effectiveness of our scheme.

Original languageEnglish
Title of host publicationImage Processing and Communications - Techniques, Algorithms and Applications, IP and C 2019
EditorsMichal Choras, Ryszard S. Choras
PublisherSpringer Verlag
Pages181-188
Number of pages8
ISBN (Print)9783030312534
DOIs
Publication statusPublished - 2020 Jan 1
EventInternational Conference on Image Processing and Communications, IP and C 2019 - Bydgoszcz, Poland
Duration: 2019 Sep 112019 Sep 13

Publication series

NameAdvances in Intelligent Systems and Computing
Volume1062
ISSN (Print)2194-5357
ISSN (Electronic)2194-5365

Conference

ConferenceInternational Conference on Image Processing and Communications, IP and C 2019
CountryPoland
CityBydgoszcz
Period19/9/1119/9/13

Fingerprint

Communication
Computer simulation
Botnet

Keywords

  • Botnet detection
  • Detection algorithms
  • Feature emphasizing

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science(all)

Cite this

An, Y., Haruta, S., Choi, S., & Sasase, I. (2020). Traffic Feature-Based Botnet Detection Scheme Emphasizing the Importance of Long Patterns. In M. Choras, & R. S. Choras (Eds.), Image Processing and Communications - Techniques, Algorithms and Applications, IP and C 2019 (pp. 181-188). (Advances in Intelligent Systems and Computing; Vol. 1062). Springer Verlag. https://doi.org/10.1007/978-3-030-31254-1_22

Traffic Feature-Based Botnet Detection Scheme Emphasizing the Importance of Long Patterns. / An, Yichen; Haruta, Shuichiro; Choi, Sanghun; Sasase, Iwao.

Image Processing and Communications - Techniques, Algorithms and Applications, IP and C 2019. ed. / Michal Choras; Ryszard S. Choras. Springer Verlag, 2020. p. 181-188 (Advances in Intelligent Systems and Computing; Vol. 1062).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

An, Y, Haruta, S, Choi, S & Sasase, I 2020, Traffic Feature-Based Botnet Detection Scheme Emphasizing the Importance of Long Patterns. in M Choras & RS Choras (eds), Image Processing and Communications - Techniques, Algorithms and Applications, IP and C 2019. Advances in Intelligent Systems and Computing, vol. 1062, Springer Verlag, pp. 181-188, International Conference on Image Processing and Communications, IP and C 2019, Bydgoszcz, Poland, 19/9/11. https://doi.org/10.1007/978-3-030-31254-1_22
An Y, Haruta S, Choi S, Sasase I. Traffic Feature-Based Botnet Detection Scheme Emphasizing the Importance of Long Patterns. In Choras M, Choras RS, editors, Image Processing and Communications - Techniques, Algorithms and Applications, IP and C 2019. Springer Verlag. 2020. p. 181-188. (Advances in Intelligent Systems and Computing). https://doi.org/10.1007/978-3-030-31254-1_22
An, Yichen ; Haruta, Shuichiro ; Choi, Sanghun ; Sasase, Iwao. / Traffic Feature-Based Botnet Detection Scheme Emphasizing the Importance of Long Patterns. Image Processing and Communications - Techniques, Algorithms and Applications, IP and C 2019. editor / Michal Choras ; Ryszard S. Choras. Springer Verlag, 2020. pp. 181-188 (Advances in Intelligent Systems and Computing).
@inproceedings{9c252da6b66144e695491bad364faa38,
title = "Traffic Feature-Based Botnet Detection Scheme Emphasizing the Importance of Long Patterns",
abstract = "The botnet detection is imperative. Among several detection schemes, the promising one uses the communication sequences. The main idea of that scheme is that the communication sequences represent special feature since they are controlled by programs. That sequence is tokenized to truncated sequences by n-gram and the numbers of each pattern’s occurrence are used as a feature vector. However, although the features are normalized by the total number of all patterns’ occurrences, the number of occurrences in larger n are less than those of smaller n. That is, regardless of the value of n, the previous scheme normalizes it by the total number of all patterns’ occurrences. As a result, normalized long patterns’ features become very small value and are hidden by others. In order to overcome this shortcoming, in this paper, we propose a traffic feature-based botnet detection scheme emphasizing the importance of long patterns. We realize the emphasizing by two ideas. The first idea is normalizing occurrences by the total number of occurrences in each n instead of the total number of all patterns’ occurrences. By doing this, smaller occurrences in larger n are normalized by smaller values and the feature becomes more balanced with larger value. The second idea is giving weights to the normalized features by calculating ranks of the normalized feature. By weighting features according to the ranks, we can get more outstanding features of longer patterns. By the computer simulation with real dataset, we show the effectiveness of our scheme.",
keywords = "Botnet detection, Detection algorithms, Feature emphasizing",
author = "Yichen An and Shuichiro Haruta and Sanghun Choi and Iwao Sasase",
year = "2020",
month = "1",
day = "1",
doi = "10.1007/978-3-030-31254-1_22",
language = "English",
isbn = "9783030312534",
series = "Advances in Intelligent Systems and Computing",
publisher = "Springer Verlag",
pages = "181--188",
editor = "Michal Choras and Choras, {Ryszard S.}",
booktitle = "Image Processing and Communications - Techniques, Algorithms and Applications, IP and C 2019",
address = "Germany",

}

TY - GEN

T1 - Traffic Feature-Based Botnet Detection Scheme Emphasizing the Importance of Long Patterns

AU - An, Yichen

AU - Haruta, Shuichiro

AU - Choi, Sanghun

AU - Sasase, Iwao

PY - 2020/1/1

Y1 - 2020/1/1

N2 - The botnet detection is imperative. Among several detection schemes, the promising one uses the communication sequences. The main idea of that scheme is that the communication sequences represent special feature since they are controlled by programs. That sequence is tokenized to truncated sequences by n-gram and the numbers of each pattern’s occurrence are used as a feature vector. However, although the features are normalized by the total number of all patterns’ occurrences, the number of occurrences in larger n are less than those of smaller n. That is, regardless of the value of n, the previous scheme normalizes it by the total number of all patterns’ occurrences. As a result, normalized long patterns’ features become very small value and are hidden by others. In order to overcome this shortcoming, in this paper, we propose a traffic feature-based botnet detection scheme emphasizing the importance of long patterns. We realize the emphasizing by two ideas. The first idea is normalizing occurrences by the total number of occurrences in each n instead of the total number of all patterns’ occurrences. By doing this, smaller occurrences in larger n are normalized by smaller values and the feature becomes more balanced with larger value. The second idea is giving weights to the normalized features by calculating ranks of the normalized feature. By weighting features according to the ranks, we can get more outstanding features of longer patterns. By the computer simulation with real dataset, we show the effectiveness of our scheme.

AB - The botnet detection is imperative. Among several detection schemes, the promising one uses the communication sequences. The main idea of that scheme is that the communication sequences represent special feature since they are controlled by programs. That sequence is tokenized to truncated sequences by n-gram and the numbers of each pattern’s occurrence are used as a feature vector. However, although the features are normalized by the total number of all patterns’ occurrences, the number of occurrences in larger n are less than those of smaller n. That is, regardless of the value of n, the previous scheme normalizes it by the total number of all patterns’ occurrences. As a result, normalized long patterns’ features become very small value and are hidden by others. In order to overcome this shortcoming, in this paper, we propose a traffic feature-based botnet detection scheme emphasizing the importance of long patterns. We realize the emphasizing by two ideas. The first idea is normalizing occurrences by the total number of occurrences in each n instead of the total number of all patterns’ occurrences. By doing this, smaller occurrences in larger n are normalized by smaller values and the feature becomes more balanced with larger value. The second idea is giving weights to the normalized features by calculating ranks of the normalized feature. By weighting features according to the ranks, we can get more outstanding features of longer patterns. By the computer simulation with real dataset, we show the effectiveness of our scheme.

KW - Botnet detection

KW - Detection algorithms

KW - Feature emphasizing

UR - http://www.scopus.com/inward/record.url?scp=85072851734&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072851734&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-31254-1_22

DO - 10.1007/978-3-030-31254-1_22

M3 - Conference contribution

AN - SCOPUS:85072851734

SN - 9783030312534

T3 - Advances in Intelligent Systems and Computing

SP - 181

EP - 188

BT - Image Processing and Communications - Techniques, Algorithms and Applications, IP and C 2019

A2 - Choras, Michal

A2 - Choras, Ryszard S.

PB - Springer Verlag

ER -