Abstract
It is widely believed that unlabeled data are promising for improving prediction accuracy in classification problems. Although theoretical studies about when/how unlabeled data are beneficial exist, an actual prediction improvement has not been sufficiently investigated for a finite sample in a systematic manner. We investigate the impact of unlabeled data in linear discriminant analysis and compare the error rates of the classifiers estimated with/without unlabeled data. Our focus is a labeling mechanism that characterizes the probabilistic structure of occurrence of labeled cases. Results imply that an extremely small proportion of unlabeled data has a large effect on the analysis results.
Original language | English |
---|---|
Pages (from-to) | 184-203 |
Number of pages | 20 |
Journal | Communications in Statistics: Simulation and Computation |
Volume | 46 |
Issue number | 1 |
DOIs | |
Publication status | Published - 2017 Jan 2 |
Externally published | Yes |
Keywords
- Classification error
- Missing data
- Monte Carlo simulation
- Relative efficiency
- Semi-supervised learning
ASJC Scopus subject areas
- Statistics and Probability
- Modelling and Simulation