Causal bandits with propagating inference

Akihiro Yabe, Daisuke Hatano, Hanna Sumita, Shinji Ito, Naonori Kakimura, Takuro Fukunaga, Ken Ichi Kawarabayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Bandit is a framework for designing sequential experiments, where a learner selects an arm A ϵ A and obtains an observation corresponding to A in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms |A|, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the Ω(√\A\/T) simple-regret lower-bound; however, their algorithms work only when the interventions A are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves O(√γ log(|A|T)/T) regret bound, where γ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then γ = O(N2), where N is the number of nodes.

Original languageEnglish
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsJennifer Dy, Andreas Krause
PublisherInternational Machine Learning Society (IMLS)
Pages8761-8781
Number of pages21
Volume12
ISBN (Electronic)9781510867963
Publication statusPublished - 2018 Jan 1
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: 2018 Jul 102018 Jul 15

Other

Other35th International Conference on Machine Learning, ICML 2018
CountrySweden
CityStockholm
Period18/7/1018/7/15

Fingerprint

Experiments
Polynomials

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Cite this

Yabe, A., Hatano, D., Sumita, H., Ito, S., Kakimura, N., Fukunaga, T., & Kawarabayashi, K. I. (2018). Causal bandits with propagating inference. In J. Dy, & A. Krause (Eds.), 35th International Conference on Machine Learning, ICML 2018 (Vol. 12, pp. 8761-8781). International Machine Learning Society (IMLS).

Causal bandits with propagating inference. / Yabe, Akihiro; Hatano, Daisuke; Sumita, Hanna; Ito, Shinji; Kakimura, Naonori; Fukunaga, Takuro; Kawarabayashi, Ken Ichi.

35th International Conference on Machine Learning, ICML 2018. ed. / Jennifer Dy; Andreas Krause. Vol. 12 International Machine Learning Society (IMLS), 2018. p. 8761-8781.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yabe, A, Hatano, D, Sumita, H, Ito, S, Kakimura, N, Fukunaga, T & Kawarabayashi, KI 2018, Causal bandits with propagating inference. in J Dy & A Krause (eds), 35th International Conference on Machine Learning, ICML 2018. vol. 12, International Machine Learning Society (IMLS), pp. 8761-8781, 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, 18/7/10.
Yabe A, Hatano D, Sumita H, Ito S, Kakimura N, Fukunaga T et al. Causal bandits with propagating inference. In Dy J, Krause A, editors, 35th International Conference on Machine Learning, ICML 2018. Vol. 12. International Machine Learning Society (IMLS). 2018. p. 8761-8781
Yabe, Akihiro ; Hatano, Daisuke ; Sumita, Hanna ; Ito, Shinji ; Kakimura, Naonori ; Fukunaga, Takuro ; Kawarabayashi, Ken Ichi. / Causal bandits with propagating inference. 35th International Conference on Machine Learning, ICML 2018. editor / Jennifer Dy ; Andreas Krause. Vol. 12 International Machine Learning Society (IMLS), 2018. pp. 8761-8781
@inproceedings{ae86b9e85b6942588b4cd2b3865003f5,
title = "Causal bandits with propagating inference",
abstract = "Bandit is a framework for designing sequential experiments, where a learner selects an arm A ϵ A and obtains an observation corresponding to A in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms |A|, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the Ω(√\A\/T) simple-regret lower-bound; however, their algorithms work only when the interventions A are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves O(√γ∗ log(|A|T)/T) regret bound, where γ∗ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then γ∗ = O(N2), where N is the number of nodes.",
author = "Akihiro Yabe and Daisuke Hatano and Hanna Sumita and Shinji Ito and Naonori Kakimura and Takuro Fukunaga and Kawarabayashi, {Ken Ichi}",
year = "2018",
month = "1",
day = "1",
language = "English",
volume = "12",
pages = "8761--8781",
editor = "Jennifer Dy and Andreas Krause",
booktitle = "35th International Conference on Machine Learning, ICML 2018",
publisher = "International Machine Learning Society (IMLS)",

}

TY - GEN

T1 - Causal bandits with propagating inference

AU - Yabe, Akihiro

AU - Hatano, Daisuke

AU - Sumita, Hanna

AU - Ito, Shinji

AU - Kakimura, Naonori

AU - Fukunaga, Takuro

AU - Kawarabayashi, Ken Ichi

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Bandit is a framework for designing sequential experiments, where a learner selects an arm A ϵ A and obtains an observation corresponding to A in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms |A|, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the Ω(√\A\/T) simple-regret lower-bound; however, their algorithms work only when the interventions A are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves O(√γ∗ log(|A|T)/T) regret bound, where γ∗ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then γ∗ = O(N2), where N is the number of nodes.

AB - Bandit is a framework for designing sequential experiments, where a learner selects an arm A ϵ A and obtains an observation corresponding to A in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms |A|, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the Ω(√\A\/T) simple-regret lower-bound; however, their algorithms work only when the interventions A are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves O(√γ∗ log(|A|T)/T) regret bound, where γ∗ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then γ∗ = O(N2), where N is the number of nodes.

UR - http://www.scopus.com/inward/record.url?scp=85057298452&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057298452&partnerID=8YFLogxK

M3 - Conference contribution

VL - 12

SP - 8761

EP - 8781

BT - 35th International Conference on Machine Learning, ICML 2018

A2 - Dy, Jennifer

A2 - Krause, Andreas

PB - International Machine Learning Society (IMLS)

ER -