### Abstract

Bandit is a framework for designing sequential experiments, where a learner selects an arm A ϵ A and obtains an observation corresponding to A in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms |A|, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the Ω(√\A\/T) simple-regret lower-bound; however, their algorithms work only when the interventions A are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves O(√γ^{∗} log(|A|T)/T) regret bound, where γ^{∗} is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then γ^{∗} = O(N^{2}), where N is the number of nodes.

Original language | English |
---|---|

Title of host publication | 35th International Conference on Machine Learning, ICML 2018 |

Editors | Jennifer Dy, Andreas Krause |

Publisher | International Machine Learning Society (IMLS) |

Pages | 8761-8781 |

Number of pages | 21 |

Volume | 12 |

ISBN (Electronic) | 9781510867963 |

Publication status | Published - 2018 Jan 1 |

Event | 35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden Duration: 2018 Jul 10 → 2018 Jul 15 |

### Other

Other | 35th International Conference on Machine Learning, ICML 2018 |
---|---|

Country | Sweden |

City | Stockholm |

Period | 18/7/10 → 18/7/15 |

### Fingerprint

### ASJC Scopus subject areas

- Computational Theory and Mathematics
- Human-Computer Interaction
- Software

### Cite this

*35th International Conference on Machine Learning, ICML 2018*(Vol. 12, pp. 8761-8781). International Machine Learning Society (IMLS).

**Causal bandits with propagating inference.** / Yabe, Akihiro; Hatano, Daisuke; Sumita, Hanna; Ito, Shinji; Kakimura, Naonori; Fukunaga, Takuro; Kawarabayashi, Ken Ichi.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*35th International Conference on Machine Learning, ICML 2018.*vol. 12, International Machine Learning Society (IMLS), pp. 8761-8781, 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, 18/7/10.

}

TY - GEN

T1 - Causal bandits with propagating inference

AU - Yabe, Akihiro

AU - Hatano, Daisuke

AU - Sumita, Hanna

AU - Ito, Shinji

AU - Kakimura, Naonori

AU - Fukunaga, Takuro

AU - Kawarabayashi, Ken Ichi

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Bandit is a framework for designing sequential experiments, where a learner selects an arm A ϵ A and obtains an observation corresponding to A in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms |A|, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the Ω(√\A\/T) simple-regret lower-bound; however, their algorithms work only when the interventions A are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves O(√γ∗ log(|A|T)/T) regret bound, where γ∗ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then γ∗ = O(N2), where N is the number of nodes.

AB - Bandit is a framework for designing sequential experiments, where a learner selects an arm A ϵ A and obtains an observation corresponding to A in each experiment. Theoretically, the tight regret lower-bound for the general bandit is polynomial with respect to the number of arms |A|, and thus, to overcome this bound, the bandit problem with side-information is often considered. Recently, a bandit framework over a causal graph was introduced, where the structure of the causal graph is available as side-information and the arms are identified with interventions on the causal graph. Existing algorithms for causal bandit overcame the Ω(√\A\/T) simple-regret lower-bound; however, their algorithms work only when the interventions A are localized around a single node (i.e., an intervention propagates only to its neighbors). We then propose a novel causal bandit algorithm for an arbitrary set of interventions, which can propagate throughout the causal graph. We also show that it achieves O(√γ∗ log(|A|T)/T) regret bound, where γ∗ is determined by using a causal graph structure. In particular, if the maximum in-degree of the causal graph is a constant, then γ∗ = O(N2), where N is the number of nodes.

UR - http://www.scopus.com/inward/record.url?scp=85057298452&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057298452&partnerID=8YFLogxK

M3 - Conference contribution

VL - 12

SP - 8761

EP - 8781

BT - 35th International Conference on Machine Learning, ICML 2018

A2 - Dy, Jennifer

A2 - Krause, Andreas

PB - International Machine Learning Society (IMLS)

ER -