Clustering of multivariate binary data with dimension reduction via L1-regularized likelihood maximization

Michio Yamamoto, Kenichi Hayashi

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Clustering methods with dimension reduction have been receiving considerable wide interest in statistics lately and a lot of methods to simultaneously perform clustering and dimension reduction have been proposed. This work presents a novel procedure for simultaneously determining the optimal cluster structure for multivariate binary data and the subspace to represent that cluster structure. The method is based on a finite mixture model of multivariate Bernoulli distributions, and each component is assumed to have a low-dimensional representation of the cluster structure. This method can be considered as an extension of the traditional latent class analysis. Sparsity is introduced to the loading values, which produces the low-dimensional subspace, for enhanced interpretability and more stable extraction of the subspace. An EM-based algorithm is developed to efficiently solve the proposed optimization problem. We demonstrate the effectiveness of the proposed method by applying it to a simulation study and real datasets.

Original languageEnglish
Article number5443
Pages (from-to)3959-3968
Number of pages10
JournalPattern Recognition
Volume48
Issue number12
DOIs
Publication statusPublished - 2015 Dec 1
Externally publishedYes

Keywords

  • Binary data
  • Clustering
  • Dimension reduction
  • EM algorithm
  • Latent class analysis
  • Sparsity

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Clustering of multivariate binary data with dimension reduction via L<sub>1</sub>-regularized likelihood maximization'. Together they form a unique fingerprint.

  • Cite this