Finite-sample analysis of impacts of unlabeled data and their labeling mechanisms in linear discriminant analysis

Kenichi Hayashi, Keiji Takai

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

It is widely believed that unlabeled data are promising for improving prediction accuracy in classification problems. Although theoretical studies about when/how unlabeled data are beneficial exist, an actual prediction improvement has not been sufficiently investigated for a finite sample in a systematic manner. We investigate the impact of unlabeled data in linear discriminant analysis and compare the error rates of the classifiers estimated with/without unlabeled data. Our focus is a labeling mechanism that characterizes the probabilistic structure of occurrence of labeled cases. Results imply that an extremely small proportion of unlabeled data has a large effect on the analysis results.

Original languageEnglish
Pages (from-to)184-203
Number of pages20
JournalCommunications in Statistics: Simulation and Computation
Volume46
Issue number1
DOIs
Publication statusPublished - 2017 Jan 2
Externally publishedYes

    Fingerprint

Keywords

  • Classification error
  • Missing data
  • Monte Carlo simulation
  • Relative efficiency
  • Semi-supervised learning

ASJC Scopus subject areas

  • Statistics and Probability
  • Modelling and Simulation

Cite this