TY - JOUR
T1 - Salivary metabolomics with alternative decision tree-based machine learning methods for breast cancer discrimination
AU - Murata, Takeshi
AU - Yanagisawa, Takako
AU - Kurihara, Toshiaki
AU - Kaneko, Miku
AU - Ota, Sana
AU - Enomoto, Ayame
AU - Tomita, Masaru
AU - Sugimoto, Masahiro
AU - Sunamura, Makoto
AU - Hayashida, Tetsu
AU - Kitagawa, Yuko
AU - Jinno, Hiromitsu
N1 - Funding Information:
This study was funded by JSPS KAKENHI Grant Numbers 16H05408 and 25461996, and research Grants from the Yamagata Prefectural Government and the City of Tsuruoka.
Publisher Copyright:
© 2019, Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2019/10/1
Y1 - 2019/10/1
N2 - Purpose: The aim of this study is to explore new salivary biomarkers to discriminate breast cancer patients from healthy controls. Methods: Saliva samples were collected after 9 h fasting and were immediately stored at − 80 °C. Capillary electrophoresis and liquid chromatography with mass spectrometry were used to quantify hundreds of hydrophilic metabolites. Conventional statistical analyses and artificial intelligence-based methods were used to assess the discrimination abilities of the quantified metabolites. A multiple logistic regression (MLR) model and an alternative decision tree (ADTree)-based machine learning method were used. The generalization abilities of these mathematical models were validated in various computational tests, such as cross-validation and resampling methods. Results: One hundred sixty-six unstimulated saliva samples were collected from 101 patients with invasive carcinoma of the breast (IC), 23 patients with ductal carcinoma in situ (DCIS), and 42 healthy controls (C). Of the 260 quantified metabolites, polyamines were significantly elevated in the saliva of patients with breast cancer. Spermine showed the highest area under the receiver operating characteristic curves [0.766; 95% confidence interval (CI) 0.671–0.840, P ' 0.0001] to discriminate IC from C. In addition to spermine, polyamines and their acetylated forms were elevated in IC only. Two hundred each of two-fold, five-fold, and ten-fold cross-validation using different random values were conducted and the MLR model had slightly better accuracy. The ADTree with an ensemble approach showed higher accuracy (0.912; 95% CI 0.838–0.961, P ' 0.0001). These prediction models also included spermine as a predictive factor. Conclusions: These data indicated that combinations of salivary metabolomics with the ADTree-based machine learning methods show potential for non-invasive screening of breast cancer.
AB - Purpose: The aim of this study is to explore new salivary biomarkers to discriminate breast cancer patients from healthy controls. Methods: Saliva samples were collected after 9 h fasting and were immediately stored at − 80 °C. Capillary electrophoresis and liquid chromatography with mass spectrometry were used to quantify hundreds of hydrophilic metabolites. Conventional statistical analyses and artificial intelligence-based methods were used to assess the discrimination abilities of the quantified metabolites. A multiple logistic regression (MLR) model and an alternative decision tree (ADTree)-based machine learning method were used. The generalization abilities of these mathematical models were validated in various computational tests, such as cross-validation and resampling methods. Results: One hundred sixty-six unstimulated saliva samples were collected from 101 patients with invasive carcinoma of the breast (IC), 23 patients with ductal carcinoma in situ (DCIS), and 42 healthy controls (C). Of the 260 quantified metabolites, polyamines were significantly elevated in the saliva of patients with breast cancer. Spermine showed the highest area under the receiver operating characteristic curves [0.766; 95% confidence interval (CI) 0.671–0.840, P ' 0.0001] to discriminate IC from C. In addition to spermine, polyamines and their acetylated forms were elevated in IC only. Two hundred each of two-fold, five-fold, and ten-fold cross-validation using different random values were conducted and the MLR model had slightly better accuracy. The ADTree with an ensemble approach showed higher accuracy (0.912; 95% CI 0.838–0.961, P ' 0.0001). These prediction models also included spermine as a predictive factor. Conclusions: These data indicated that combinations of salivary metabolomics with the ADTree-based machine learning methods show potential for non-invasive screening of breast cancer.
KW - Alternative decision tree
KW - Biomarker
KW - Breast cancer
KW - Metabolomics
KW - Polyamines
KW - Saliva
UR - http://www.scopus.com/inward/record.url?scp=85068836365&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068836365&partnerID=8YFLogxK
U2 - 10.1007/s10549-019-05330-9
DO - 10.1007/s10549-019-05330-9
M3 - Article
C2 - 31286302
AN - SCOPUS:85068836365
SN - 0167-6806
VL - 177
SP - 591
EP - 601
JO - Breast Cancer Research and Treatment
JF - Breast Cancer Research and Treatment
IS - 3
ER -