Tug-of-war model for multi-armed bandit problem

Song Ju Kim, Masashi Aono, Masahiko Hara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

We propose a model - the "tug-of-war (TOW) model" - to conduct unique parallel searches using many nonlocally correlated search agents. The model is based on the property of a single-celled amoeba, the true slime mold Physarum, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a "nonlocal correlation" among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision making in the case of a dilemma. The multi-armed bandit problem is to determine the optimal strategy for maximizing the total reward sum with incompatible demands. Our model can efficiently manage this "exploration-exploitation dilemma" and exhibits good performances. The average accuracy rate of our model is higher than those of well-known algorithms such as the modified ε-greedy algorithm and modified softmax algorithm.

Original languageEnglish
Title of host publicationUnconventional Computation - 9th International Conference, UC 2010, Proceedings
Pages69-80
Number of pages12
Volume6079 LNCS
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event9th International Conference on Unconventional Computation, UC 2010 - Tokyo, Japan
Duration: 2010 Jun 212010 Jun 25

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6079 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other9th International Conference on Unconventional Computation, UC 2010
CountryJapan
CityTokyo
Period10/6/2110/6/25

Fingerprint

Multi-armed Bandit
Bandit Problems
Branch
Dilemma
Model
Shrinking
Greedy Algorithm
Optimal Strategy
Fungi
Reward
Exploitation
Conservation Laws
Increment
Immediately
Conservation
Decision making
Decision Making
Resources

Keywords

  • Amoeba-based computing
  • Bio-inspired computation
  • Multi-armed bandit problem
  • Reinforcement learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Kim, S. J., Aono, M., & Hara, M. (2010). Tug-of-war model for multi-armed bandit problem. In Unconventional Computation - 9th International Conference, UC 2010, Proceedings (Vol. 6079 LNCS, pp. 69-80). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6079 LNCS). https://doi.org/10.1007/978-3-642-13523-1_10

Tug-of-war model for multi-armed bandit problem. / Kim, Song Ju; Aono, Masashi; Hara, Masahiko.

Unconventional Computation - 9th International Conference, UC 2010, Proceedings. Vol. 6079 LNCS 2010. p. 69-80 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6079 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, SJ, Aono, M & Hara, M 2010, Tug-of-war model for multi-armed bandit problem. in Unconventional Computation - 9th International Conference, UC 2010, Proceedings. vol. 6079 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6079 LNCS, pp. 69-80, 9th International Conference on Unconventional Computation, UC 2010, Tokyo, Japan, 10/6/21. https://doi.org/10.1007/978-3-642-13523-1_10
Kim SJ, Aono M, Hara M. Tug-of-war model for multi-armed bandit problem. In Unconventional Computation - 9th International Conference, UC 2010, Proceedings. Vol. 6079 LNCS. 2010. p. 69-80. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-13523-1_10
Kim, Song Ju ; Aono, Masashi ; Hara, Masahiko. / Tug-of-war model for multi-armed bandit problem. Unconventional Computation - 9th International Conference, UC 2010, Proceedings. Vol. 6079 LNCS 2010. pp. 69-80 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{8b83311936a345c9ac57c5a218b871c1,
title = "Tug-of-war model for multi-armed bandit problem",
abstract = "We propose a model - the {"}tug-of-war (TOW) model{"} - to conduct unique parallel searches using many nonlocally correlated search agents. The model is based on the property of a single-celled amoeba, the true slime mold Physarum, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a {"}nonlocal correlation{"} among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision making in the case of a dilemma. The multi-armed bandit problem is to determine the optimal strategy for maximizing the total reward sum with incompatible demands. Our model can efficiently manage this {"}exploration-exploitation dilemma{"} and exhibits good performances. The average accuracy rate of our model is higher than those of well-known algorithms such as the modified ε-greedy algorithm and modified softmax algorithm.",
keywords = "Amoeba-based computing, Bio-inspired computation, Multi-armed bandit problem, Reinforcement learning",
author = "Kim, {Song Ju} and Masashi Aono and Masahiko Hara",
year = "2010",
doi = "10.1007/978-3-642-13523-1_10",
language = "English",
isbn = "3642135226",
volume = "6079 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "69--80",
booktitle = "Unconventional Computation - 9th International Conference, UC 2010, Proceedings",

}

TY - GEN

T1 - Tug-of-war model for multi-armed bandit problem

AU - Kim, Song Ju

AU - Aono, Masashi

AU - Hara, Masahiko

PY - 2010

Y1 - 2010

N2 - We propose a model - the "tug-of-war (TOW) model" - to conduct unique parallel searches using many nonlocally correlated search agents. The model is based on the property of a single-celled amoeba, the true slime mold Physarum, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a "nonlocal correlation" among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision making in the case of a dilemma. The multi-armed bandit problem is to determine the optimal strategy for maximizing the total reward sum with incompatible demands. Our model can efficiently manage this "exploration-exploitation dilemma" and exhibits good performances. The average accuracy rate of our model is higher than those of well-known algorithms such as the modified ε-greedy algorithm and modified softmax algorithm.

AB - We propose a model - the "tug-of-war (TOW) model" - to conduct unique parallel searches using many nonlocally correlated search agents. The model is based on the property of a single-celled amoeba, the true slime mold Physarum, which maintains a constant intracellular resource volume while collecting environmental information by concurrently expanding and shrinking its branches. The conservation law entails a "nonlocal correlation" among the branches, i.e., volume increment in one branch is immediately compensated by volume decrement(s) in the other branch(es). This nonlocal correlation was shown to be useful for decision making in the case of a dilemma. The multi-armed bandit problem is to determine the optimal strategy for maximizing the total reward sum with incompatible demands. Our model can efficiently manage this "exploration-exploitation dilemma" and exhibits good performances. The average accuracy rate of our model is higher than those of well-known algorithms such as the modified ε-greedy algorithm and modified softmax algorithm.

KW - Amoeba-based computing

KW - Bio-inspired computation

KW - Multi-armed bandit problem

KW - Reinforcement learning

UR - http://www.scopus.com/inward/record.url?scp=79956331440&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79956331440&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-13523-1_10

DO - 10.1007/978-3-642-13523-1_10

M3 - Conference contribution

SN - 3642135226

SN - 9783642135224

VL - 6079 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 69

EP - 80

BT - Unconventional Computation - 9th International Conference, UC 2010, Proceedings

ER -