TY - JOUR
T1 - An introduction to the predictive technique AdaBoost with a comparison to generalized additive models
AU - Kawakita, M.
AU - Minami, M.
AU - Eguchi, S.
AU - Lennert-Cody, C. E.
N1 - Funding Information:
We thank the IATTC for providing the shark bycatch data and two anonymous reviewers for providing useful comments that improved this manuscript. This study was in part supported by ISM Project Research 2004.
Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2005/12
Y1 - 2005/12
N2 - The recently developed statistical learning method boosting is introduced for use with fisheries data. Boosting is a predictive technique for classification that has been shown to perform well with problematic data. The use of boosting algorithms AdaBoost and AsymBoost, with decision stumps, are described in detail, and their use is demonstrated with shark bycatch data from the eastern Pacific Ocean tuna purse-seine fishery. In addition, results of AdaBoost are compared to those obtained from generalized additive models (GAM). Compared to the logistic GAM, the prediction performance of AdaBoost was more stable, even with correlated predictors. Standard deviations of the test error were often considerably smaller for AdaBoost than for the logistic GAM. AdaBoost score plots, graphical displays of the contribution of each predictor to the discriminant function, were also more stable than score plots of the logistic GAM, particularly in regions of sparse data. AsymBoost, a variant of AdaBoost developed for binary classification of a skewed response variable, was shown to be effective at reducing the false negative ratio without substantially increasing the overall test error. Boosting shows promise for applications to fisheries data, both as a predictive technique and as a tool for exploratory data analysis.
AB - The recently developed statistical learning method boosting is introduced for use with fisheries data. Boosting is a predictive technique for classification that has been shown to perform well with problematic data. The use of boosting algorithms AdaBoost and AsymBoost, with decision stumps, are described in detail, and their use is demonstrated with shark bycatch data from the eastern Pacific Ocean tuna purse-seine fishery. In addition, results of AdaBoost are compared to those obtained from generalized additive models (GAM). Compared to the logistic GAM, the prediction performance of AdaBoost was more stable, even with correlated predictors. Standard deviations of the test error were often considerably smaller for AdaBoost than for the logistic GAM. AdaBoost score plots, graphical displays of the contribution of each predictor to the discriminant function, were also more stable than score plots of the logistic GAM, particularly in regions of sparse data. AsymBoost, a variant of AdaBoost developed for binary classification of a skewed response variable, was shown to be effective at reducing the false negative ratio without substantially increasing the overall test error. Boosting shows promise for applications to fisheries data, both as a predictive technique and as a tool for exploratory data analysis.
KW - AsymBoost
KW - Boosting
KW - Classification
KW - Decision stump
KW - Logistic regression
KW - Shark bycatch
UR - http://www.scopus.com/inward/record.url?scp=26844531336&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=26844531336&partnerID=8YFLogxK
U2 - 10.1016/j.fishres.2005.07.011
DO - 10.1016/j.fishres.2005.07.011
M3 - Article
AN - SCOPUS:26844531336
SN - 0165-7836
VL - 76
SP - 328
EP - 343
JO - Fisheries Research
JF - Fisheries Research
IS - 3
ER -