TY - JOUR
T1 - Revisiting the vector space model
T2 - Sparse weighted nearest-neighbor method for extreme multi-label classification
AU - Aoshima, Tatsuhiro
AU - Kobayashi, Kei
AU - Minami, Mihoko
N1 - Publisher Copyright:
Copyright © 2018, The Authors. All rights reserved.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2018/2/12
Y1 - 2018/2/12
N2 - Machine learning has played an important role in information retrieval (IR) in re- cent times. In search engines, for example, query keywords are accepted and documents are returned in order of relevance to the given query; this can be cast as a multi-label ranking problem in machine learning. Generally, the number of candidate documents is extremely large (from several thousand to several million); thus, the classifier must handle many labels. This problem is referred to as extreme multi-label classification (XMLC). In this paper, we propose a novel approach to XMLC termed the Sparse Weighted Nearest-Neighbor Method. This technique can be derived as a fast imple- mentation of state-of-the-art (SOTA) one-versus-rest linear classifiers for very sparse datasets. In addition, we show that the classifier can be written as a sparse general- ization of a representer theorem with a linear kernel. Furthermore, our method can be viewed as the vector space model used in IR. Finally, we show that the Sparse Weighted Nearest-Neighbor Method can process data points in real time on XMLC datasets with equivalent performance to SOTA models, with a single thread and smaller storage foot- print. In particular, our method exhibits superior performance to the SOTA models on a dataset with 3 million labels.
AB - Machine learning has played an important role in information retrieval (IR) in re- cent times. In search engines, for example, query keywords are accepted and documents are returned in order of relevance to the given query; this can be cast as a multi-label ranking problem in machine learning. Generally, the number of candidate documents is extremely large (from several thousand to several million); thus, the classifier must handle many labels. This problem is referred to as extreme multi-label classification (XMLC). In this paper, we propose a novel approach to XMLC termed the Sparse Weighted Nearest-Neighbor Method. This technique can be derived as a fast imple- mentation of state-of-the-art (SOTA) one-versus-rest linear classifiers for very sparse datasets. In addition, we show that the classifier can be written as a sparse general- ization of a representer theorem with a linear kernel. Furthermore, our method can be viewed as the vector space model used in IR. Finally, we show that the Sparse Weighted Nearest-Neighbor Method can process data points in real time on XMLC datasets with equivalent performance to SOTA models, with a single thread and smaller storage foot- print. In particular, our method exhibits superior performance to the SOTA models on a dataset with 3 million labels.
UR - http://www.scopus.com/inward/record.url?scp=85093720593&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85093720593&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85093720593
JO - Mathematical Social Sciences
JF - Mathematical Social Sciences
SN - 0165-4896
ER -