Semiparametric Bayesian multiple imputation for regression models with missing mixed continuous–discrete covariates

Research output: Contribution to journalArticle

Abstract

Issues regarding missing data are critical in observational and experimental research. Recently, for datasets with mixed continuous–discrete variables, multiple imputation by chained equation (MICE) has been widely used, although MICE may yield severely biased estimates. We propose a new semiparametric Bayes multiple imputation approach that can deal with continuous and discrete variables. This enables us to overcome the shortcomings of MICE; they must satisfy strong conditions (known as compatibility) to guarantee obtained estimators are consistent. Our simulation studies show the coverage probability of 95% interval calculated using MICE can be less than 1%, while the MSE of the proposed can be less than one-fiftieth. We applied our method to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, and the results are consistent with those of the previous works that used panel data other than ADNI database, whereas the existing methods, such as MICE, resulted in inconsistent results.

Original languageEnglish
JournalAnnals of the Institute of Statistical Mathematics
DOIs
Publication statusPublished - 2019 Jan 1

Fingerprint

Multiple Imputation
Covariates
Regression Model
Neuroimaging
Alzheimer's Disease
Discrete Variables
Panel Data
Coverage Probability
Continuous Variables
Bayes
Missing Data
Inconsistent
Compatibility
Biased
Simulation Study
Estimator
Interval
Estimate

Keywords

  • Full conditional specification
  • Missing data
  • Multiple imputation
  • Probit stick-breaking process mixture
  • Semiparametric Bayes model

ASJC Scopus subject areas

  • Statistics and Probability

Cite this

@article{865ddcc16ed84314becca99aa6d439ce,
title = "Semiparametric Bayesian multiple imputation for regression models with missing mixed continuous–discrete covariates",
abstract = "Issues regarding missing data are critical in observational and experimental research. Recently, for datasets with mixed continuous–discrete variables, multiple imputation by chained equation (MICE) has been widely used, although MICE may yield severely biased estimates. We propose a new semiparametric Bayes multiple imputation approach that can deal with continuous and discrete variables. This enables us to overcome the shortcomings of MICE; they must satisfy strong conditions (known as compatibility) to guarantee obtained estimators are consistent. Our simulation studies show the coverage probability of 95{\%} interval calculated using MICE can be less than 1{\%}, while the MSE of the proposed can be less than one-fiftieth. We applied our method to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, and the results are consistent with those of the previous works that used panel data other than ADNI database, whereas the existing methods, such as MICE, resulted in inconsistent results.",
keywords = "Full conditional specification, Missing data, Multiple imputation, Probit stick-breaking process mixture, Semiparametric Bayes model",
author = "Ryo Kato and Takahiro Hoshino",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/s10463-019-00710-w",
language = "English",
journal = "Annals of the Institute of Statistical Mathematics",
issn = "0020-3157",
publisher = "Springer Netherlands",

}

TY - JOUR

T1 - Semiparametric Bayesian multiple imputation for regression models with missing mixed continuous–discrete covariates

AU - Kato, Ryo

AU - Hoshino, Takahiro

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Issues regarding missing data are critical in observational and experimental research. Recently, for datasets with mixed continuous–discrete variables, multiple imputation by chained equation (MICE) has been widely used, although MICE may yield severely biased estimates. We propose a new semiparametric Bayes multiple imputation approach that can deal with continuous and discrete variables. This enables us to overcome the shortcomings of MICE; they must satisfy strong conditions (known as compatibility) to guarantee obtained estimators are consistent. Our simulation studies show the coverage probability of 95% interval calculated using MICE can be less than 1%, while the MSE of the proposed can be less than one-fiftieth. We applied our method to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, and the results are consistent with those of the previous works that used panel data other than ADNI database, whereas the existing methods, such as MICE, resulted in inconsistent results.

AB - Issues regarding missing data are critical in observational and experimental research. Recently, for datasets with mixed continuous–discrete variables, multiple imputation by chained equation (MICE) has been widely used, although MICE may yield severely biased estimates. We propose a new semiparametric Bayes multiple imputation approach that can deal with continuous and discrete variables. This enables us to overcome the shortcomings of MICE; they must satisfy strong conditions (known as compatibility) to guarantee obtained estimators are consistent. Our simulation studies show the coverage probability of 95% interval calculated using MICE can be less than 1%, while the MSE of the proposed can be less than one-fiftieth. We applied our method to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, and the results are consistent with those of the previous works that used panel data other than ADNI database, whereas the existing methods, such as MICE, resulted in inconsistent results.

KW - Full conditional specification

KW - Missing data

KW - Multiple imputation

KW - Probit stick-breaking process mixture

KW - Semiparametric Bayes model

UR - http://www.scopus.com/inward/record.url?scp=85062883998&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062883998&partnerID=8YFLogxK

U2 - 10.1007/s10463-019-00710-w

DO - 10.1007/s10463-019-00710-w

M3 - Article

AN - SCOPUS:85062883998

JO - Annals of the Institute of Statistical Mathematics

JF - Annals of the Institute of Statistical Mathematics

SN - 0020-3157

ER -