Abstract
Clustered data are ubiquitous in a variety of scientific fields. In this article, we propose a flexible and interpretable modeling approach, called grouped heterogeneous mixture modeling, for clustered data, which models cluster-wise conditional distributions by mixtures of latent conditional distributions common to all the clusters. In the model, we assume that clusters are divided into a finite number of groups and mixing proportions are the same within the same group. We provide a simple generalized EM algorithm for computing the maximum likelihood estimator, and an information criterion to select the numbers of groups and latent distributions. We also propose structured grouping strategies by introducing penalties on grouping parameters in the likelihood function. Under the settings where both the number of clusters and cluster sizes tend to infinity, we present asymptotic properties of the maximum likelihood estimator and the information criterion. We demonstrate the proposed method through simulation studies and an application to crime risk modeling in Tokyo.
Original language | English |
---|---|
Pages (from-to) | 999-1010 |
Number of pages | 12 |
Journal | Journal of the American Statistical Association |
Volume | 116 |
Issue number | 534 |
DOIs | |
Publication status | Published - 2021 |
Externally published | Yes |
Keywords
- EM algorithm
- Finite mixture
- Maximum likelihood estimation
- Mixture of experts
- Unobserved heterogeneity
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty