Efficient testing and effect size estimation for set-based genetic association inference via semiparametric multilevel mixture modeling

Shonosuke Sugasawa, Hisashi Noma

Research output: Contribution to journalArticlepeer-review

Abstract

In genetic association studies, rare variants with extremely low allele frequencies play a crucial role in complex traits. Therefore, set-based testing methods that jointly assess the effects of groups of single nucleotide polymorphisms (SNPs) were developed to increase the powers of the association tests. However, these powers are still insufficient, and precise estimations of the effect sizes of individual SNPs are largely impossible. In this article, we provide an efficient set-based statistical inference framework that addresses both of these important issues simultaneously using an empirical Bayes method with semiparametric multilevel mixture modeling. We propose to utilize the hierarchical model that incorporates variations in set-specific effects and to apply the optimal discovery procedure (ODP) that achieves the largest overall power in multiple significance testing. In addition, we provide an optimal “set-based” estimator of the empirical distribution of effect sizes. The efficiency of the proposed methods is demonstrated through application to a genome-wide association study of coronary artery disease and through simulation studies. The results demonstrated numerous rare variants with large effect sizes for coronary artery disease, and the number of significant sets detected by the ODP was much greater than those identified by existing methods.

Original languageEnglish
Pages (from-to)1142-1152
Number of pages11
JournalBiometrical Journal
Volume64
Issue number6
DOIs
Publication statusPublished - 2022 Aug
Externally publishedYes

Keywords

  • effect size estimation
  • empirical Bayes
  • genome-wide association study
  • optimal discovery procedure

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Efficient testing and effect size estimation for set-based genetic association inference via semiparametric multilevel mixture modeling'. Together they form a unique fingerprint.

Cite this