TY - GEN
T1 - Dictionary matching
T2 - 8th International Conference on Information Systems and Technologies, ICIST 2018
AU - ZhanPeng, Qiao
AU - Goto, Kento
AU - Ohshima, Takuya
AU - Tajima, Masahiro
AU - Motomichi, Toyama
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/3/16
Y1 - 2018/3/16
N2 - Pattern-matching techniques have recently been applied to network security applications such as intrusion detection, virus protection, and spam filters. The widely used the Aho-Corasick algorithm can simultaneously match multiple patterns while providing a worst-case performance guarantee. The traditional Aho-Corasick algorithm has some problems on the large-scale dictionary matching, such as take a huge memory space, cost too much time when building trie, huge update cost (rebuild trie) and so on. In this paper, we summarize several recent works on the expansion of the AC algorithm: the original Aho-Corasick algorithm(AC), the parallel Aho-Corasick algorithm(PAC), and the parallel failure-less Aho-Corasick algorithm(PFAC). Among them, the PAC used muti-thread to process input text by divide input to few parts, but in this way, this is a problem called boundary detection problem. The PFAC which uses parallel processing, discarding the failure function and enhancing the AC processing capability by the power of GPU is relatively popular. The PFAC not only enhance the speed of matching process but also solve the boundary detection problem. In the original PFAC paper, only a small size dictionary pattern matching test was conducted. In this paper, we verify the feasibility of using PFAC for large-scale dictionary characters. And give a discussion on the possibility of using the PFAC algorithm to match large-scale pattern dictionaries. At last, we present our proposals for PFAC expansion for large-scale dictionary pattern matching. For example, the one called pattern cut approach is to cut the pattern from the common prefix with other words, and the other one is the large automaton is divided according to the input text by the prefix of words. And the sub-automatons are constructed separately. When the input text is processed, each word is processed by the sub-automaton that distributed according to the prefix of words. By this way, we can achieve the purpose of improving the parallel processing efficiency.
AB - Pattern-matching techniques have recently been applied to network security applications such as intrusion detection, virus protection, and spam filters. The widely used the Aho-Corasick algorithm can simultaneously match multiple patterns while providing a worst-case performance guarantee. The traditional Aho-Corasick algorithm has some problems on the large-scale dictionary matching, such as take a huge memory space, cost too much time when building trie, huge update cost (rebuild trie) and so on. In this paper, we summarize several recent works on the expansion of the AC algorithm: the original Aho-Corasick algorithm(AC), the parallel Aho-Corasick algorithm(PAC), and the parallel failure-less Aho-Corasick algorithm(PFAC). Among them, the PAC used muti-thread to process input text by divide input to few parts, but in this way, this is a problem called boundary detection problem. The PFAC which uses parallel processing, discarding the failure function and enhancing the AC processing capability by the power of GPU is relatively popular. The PFAC not only enhance the speed of matching process but also solve the boundary detection problem. In the original PFAC paper, only a small size dictionary pattern matching test was conducted. In this paper, we verify the feasibility of using PFAC for large-scale dictionary characters. And give a discussion on the possibility of using the PFAC algorithm to match large-scale pattern dictionaries. At last, we present our proposals for PFAC expansion for large-scale dictionary pattern matching. For example, the one called pattern cut approach is to cut the pattern from the common prefix with other words, and the other one is the large automaton is divided according to the input text by the prefix of words. And the sub-automatons are constructed separately. When the input text is processed, each word is processed by the sub-automaton that distributed according to the prefix of words. By this way, we can achieve the purpose of improving the parallel processing efficiency.
KW - Aho-corasick algorithm
KW - Dictionary matching
KW - Parallel Aho-Corasick algorithm
KW - Parallel algorithm
KW - Parallel failure-less Aho-Corasick algorithm
KW - Pattern matching
KW - Trie
UR - http://www.scopus.com/inward/record.url?scp=85047997236&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047997236&partnerID=8YFLogxK
U2 - 10.1145/3200842.3200850
DO - 10.1145/3200842.3200850
M3 - Conference contribution
AN - SCOPUS:85047997236
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 8th International Conference on Information Systems and Technologies, ICIST 2018
PB - Association for Computing Machinery
Y2 - 16 March 2018 through 18 March 2018
ER -