Amoeba-inspired Tug-of-War algorithms for exploration-exploitation dilemma in extended Bandit Problem

Masashi Aono, Song Ju Kim, Masahiko Hara, Toshinori Munakata

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

The true slime mold Physarum polycephalum, a single-celled amoeboid organism, is capable of efficiently allocating a constant amount of intracellular resource to its pseudopod-like branches that best fit the environment where dynamic light stimuli are applied. Inspired by the resource allocation process, the authors formulated a concurrent search algorithm, called the Tug-of-War (TOW) model, for maximizing the profit in the multi-armed Bandit Problem (BP). A player (gambler) of the BP should decide as quickly and accurately as possible which slot machine to invest in out of the N machines and faces an "exploration-exploitation dilemma." The dilemma is a trade-off between the speed and accuracy of the decision making that are conflicted objectives. The TOW model maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a nonlocal correlation among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). Owing to this nonlocal correlation, the TOW model can efficiently manage the dilemma. In this study, we extend the TOW model to apply it to a stretched variant of BP, the Extended Bandit Problem (EBP), which is a problem of selecting the best M-tuple of the N machines. We demonstrate that the extended TOW model exhibits better performances for 2-tuple-3-machine and 2-tuple-4-machine instances of EBP compared with the extended versions of well-known algorithms for BP, the ε-Greedy and SoftMax algorithms, particularly in terms of its short-term decision-making capability that is essential for the survival of the amoeba in a hostile environment.

Original languageEnglish
Pages (from-to)1-9
Number of pages9
JournalBioSystems
Volume117
Issue number1
DOIs
Publication statusPublished - 2014 Mar
Externally publishedYes

Fingerprint

Bandit Problems
Amoeba
Dilemma
Exploitation
Branch
Decision Making
Myxomycetes
Decision making
Physarum polycephalum
Multi-armed Bandit
Pseudopodia
Resource Allocation
Fungi
Resources
Model
Resource allocation
Shrinking
Conservation
Profitability
Conservation Laws

Keywords

  • Decision making
  • Multi-armed Bandit Problem
  • Natural computing
  • Physarum polycephalum
  • Resource allocation

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Applied Mathematics
  • Modelling and Simulation
  • Statistics and Probability

Cite this

Amoeba-inspired Tug-of-War algorithms for exploration-exploitation dilemma in extended Bandit Problem. / Aono, Masashi; Kim, Song Ju; Hara, Masahiko; Munakata, Toshinori.

In: BioSystems, Vol. 117, No. 1, 03.2014, p. 1-9.

Research output: Contribution to journalArticle

Aono, Masashi ; Kim, Song Ju ; Hara, Masahiko ; Munakata, Toshinori. / Amoeba-inspired Tug-of-War algorithms for exploration-exploitation dilemma in extended Bandit Problem. In: BioSystems. 2014 ; Vol. 117, No. 1. pp. 1-9.
@article{9b32264f4286479dae785ff6cf148053,
title = "Amoeba-inspired Tug-of-War algorithms for exploration-exploitation dilemma in extended Bandit Problem",
abstract = "The true slime mold Physarum polycephalum, a single-celled amoeboid organism, is capable of efficiently allocating a constant amount of intracellular resource to its pseudopod-like branches that best fit the environment where dynamic light stimuli are applied. Inspired by the resource allocation process, the authors formulated a concurrent search algorithm, called the Tug-of-War (TOW) model, for maximizing the profit in the multi-armed Bandit Problem (BP). A player (gambler) of the BP should decide as quickly and accurately as possible which slot machine to invest in out of the N machines and faces an {"}exploration-exploitation dilemma.{"} The dilemma is a trade-off between the speed and accuracy of the decision making that are conflicted objectives. The TOW model maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a nonlocal correlation among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). Owing to this nonlocal correlation, the TOW model can efficiently manage the dilemma. In this study, we extend the TOW model to apply it to a stretched variant of BP, the Extended Bandit Problem (EBP), which is a problem of selecting the best M-tuple of the N machines. We demonstrate that the extended TOW model exhibits better performances for 2-tuple-3-machine and 2-tuple-4-machine instances of EBP compared with the extended versions of well-known algorithms for BP, the ε-Greedy and SoftMax algorithms, particularly in terms of its short-term decision-making capability that is essential for the survival of the amoeba in a hostile environment.",
keywords = "Decision making, Multi-armed Bandit Problem, Natural computing, Physarum polycephalum, Resource allocation",
author = "Masashi Aono and Kim, {Song Ju} and Masahiko Hara and Toshinori Munakata",
year = "2014",
month = "3",
doi = "10.1016/j.biosystems.2013.12.007",
language = "English",
volume = "117",
pages = "1--9",
journal = "BioSystems",
issn = "0303-2647",
publisher = "Elsevier Ireland Ltd",
number = "1",

}

TY - JOUR

T1 - Amoeba-inspired Tug-of-War algorithms for exploration-exploitation dilemma in extended Bandit Problem

AU - Aono, Masashi

AU - Kim, Song Ju

AU - Hara, Masahiko

AU - Munakata, Toshinori

PY - 2014/3

Y1 - 2014/3

N2 - The true slime mold Physarum polycephalum, a single-celled amoeboid organism, is capable of efficiently allocating a constant amount of intracellular resource to its pseudopod-like branches that best fit the environment where dynamic light stimuli are applied. Inspired by the resource allocation process, the authors formulated a concurrent search algorithm, called the Tug-of-War (TOW) model, for maximizing the profit in the multi-armed Bandit Problem (BP). A player (gambler) of the BP should decide as quickly and accurately as possible which slot machine to invest in out of the N machines and faces an "exploration-exploitation dilemma." The dilemma is a trade-off between the speed and accuracy of the decision making that are conflicted objectives. The TOW model maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a nonlocal correlation among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). Owing to this nonlocal correlation, the TOW model can efficiently manage the dilemma. In this study, we extend the TOW model to apply it to a stretched variant of BP, the Extended Bandit Problem (EBP), which is a problem of selecting the best M-tuple of the N machines. We demonstrate that the extended TOW model exhibits better performances for 2-tuple-3-machine and 2-tuple-4-machine instances of EBP compared with the extended versions of well-known algorithms for BP, the ε-Greedy and SoftMax algorithms, particularly in terms of its short-term decision-making capability that is essential for the survival of the amoeba in a hostile environment.

AB - The true slime mold Physarum polycephalum, a single-celled amoeboid organism, is capable of efficiently allocating a constant amount of intracellular resource to its pseudopod-like branches that best fit the environment where dynamic light stimuli are applied. Inspired by the resource allocation process, the authors formulated a concurrent search algorithm, called the Tug-of-War (TOW) model, for maximizing the profit in the multi-armed Bandit Problem (BP). A player (gambler) of the BP should decide as quickly and accurately as possible which slot machine to invest in out of the N machines and faces an "exploration-exploitation dilemma." The dilemma is a trade-off between the speed and accuracy of the decision making that are conflicted objectives. The TOW model maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a nonlocal correlation among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). Owing to this nonlocal correlation, the TOW model can efficiently manage the dilemma. In this study, we extend the TOW model to apply it to a stretched variant of BP, the Extended Bandit Problem (EBP), which is a problem of selecting the best M-tuple of the N machines. We demonstrate that the extended TOW model exhibits better performances for 2-tuple-3-machine and 2-tuple-4-machine instances of EBP compared with the extended versions of well-known algorithms for BP, the ε-Greedy and SoftMax algorithms, particularly in terms of its short-term decision-making capability that is essential for the survival of the amoeba in a hostile environment.

KW - Decision making

KW - Multi-armed Bandit Problem

KW - Natural computing

KW - Physarum polycephalum

KW - Resource allocation

UR - http://www.scopus.com/inward/record.url?scp=84892849955&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84892849955&partnerID=8YFLogxK

U2 - 10.1016/j.biosystems.2013.12.007

DO - 10.1016/j.biosystems.2013.12.007

M3 - Article

C2 - 24384066

AN - SCOPUS:84892849955

VL - 117

SP - 1

EP - 9

JO - BioSystems

JF - BioSystems

SN - 0303-2647

IS - 1

ER -