TY - JOUR
T1 - Tug-of-war model for the two-bandit problem
T2 - Nonlocally-correlated parallel exploration via resource conservation
AU - Kim, Song Ju
AU - Aono, Masashi
AU - Hara, Masahiko
PY - 2010/7/1
Y1 - 2010/7/1
N2 - We propose a model - the " tug-of-war (TOW) model" - to conduct unique parallel searches using many nonlocally-correlated search agents. The model is based on the property of a single-celled amoeba, the true slime mold Physarum, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a " nonlocal correlation" among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision making in the case of a dilemma. The multi-armed bandit problem is to determine the optimal strategy for maximizing the total reward sum with incompatible demands, by either exploiting the rewards obtained using the already collected information or exploring new information for acquiring higher payoffs involving risks. Our model can efficiently manage the " exploration-exploitation dilemma" and exhibits good performances. The average accuracy rate of our model is higher than those of well-known algorithms such as the modified ε{lunate}-greedy algorithm and modified softmax algorithm, especially, for solving relatively difficult problems. Moreover, our model flexibly adapts to changing environments, a property essential for living organisms surviving in uncertain environments.
AB - We propose a model - the " tug-of-war (TOW) model" - to conduct unique parallel searches using many nonlocally-correlated search agents. The model is based on the property of a single-celled amoeba, the true slime mold Physarum, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a " nonlocal correlation" among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision making in the case of a dilemma. The multi-armed bandit problem is to determine the optimal strategy for maximizing the total reward sum with incompatible demands, by either exploiting the rewards obtained using the already collected information or exploring new information for acquiring higher payoffs involving risks. Our model can efficiently manage the " exploration-exploitation dilemma" and exhibits good performances. The average accuracy rate of our model is higher than those of well-known algorithms such as the modified ε{lunate}-greedy algorithm and modified softmax algorithm, especially, for solving relatively difficult problems. Moreover, our model flexibly adapts to changing environments, a property essential for living organisms surviving in uncertain environments.
KW - Amoeba-based computing
KW - Bio-inspired computing
KW - Multi-armed bandit problem
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=77953609815&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77953609815&partnerID=8YFLogxK
U2 - 10.1016/j.biosystems.2010.04.002
DO - 10.1016/j.biosystems.2010.04.002
M3 - Article
C2 - 20399248
AN - SCOPUS:77953609815
SN - 0303-2647
VL - 101
SP - 29
EP - 36
JO - Currents in modern biology
JF - Currents in modern biology
IS - 1
ER -