TY - JOUR
T1 - Oracle-efficient algorithms for online linear optimization with bandit feedback
AU - Ito, Shinji
AU - Hatano, Daisuke
AU - Sumita, Hanna
AU - Takemura, Kei
AU - Fukunaga, Takuro
AU - Kakimura, Naonori
AU - Kawarabayashi, Ken Ichi
N1 - Funding Information:
∗This work was supported by JST, ERATO, Grant Number JPMJER1201, Japan. †This work was supported by JST, ACT-I, Grant Number JPMJPR18U5, Japan. ‡This work was supported by JST, PRESTO, Grant Number JPMJPR1759, Japan. §This work was supported by JSPS, KAKENHI, Grant Number JP18H05291, Japan.
Publisher Copyright:
© 2019 Neural information processing systems foundation. All rights reserved.
PY - 2019
Y1 - 2019
N2 - We propose computationally efficient algorithms for online linear optimization with bandit feedback, in which a player chooses an action vector from a given (possibly infinite) set A ? Rd, and then suffers a loss that can be expressed as a linear function in action vectors. Although existing algorithms achieve an optimal regret bound of Õ(vT) for T rounds (ignoring factors of poly(d, log T)), computationally efficient ways of implementing them have not yet been specified, in particular when |A| is not bounded by a polynomial size in d. A standard way to pursue computational efficiency is to assume that we have an efficient algorithm referred to as oracle that solves (offline) linear optimization problems over A. Under this assumption, the computational efficiency of a bandit algorithm can then be measured in terms of oracle complexity, i.e., the number of oracle calls. Our contribution is to propose algorithms that offer optimal regret bounds of Õ(vT) as well as low oracle complexity for both non-stochastic settings and stochastic settings. Our algorithm for non-stochastic settings has an oracle complexity of Õ(T) and is the first algorithm that achieves both a regret bound of Õ(vT) and an oracle complexity of Õ(poly(T)), given only linear optimization oracles. Our algorithm for stochastic settings calls the oracle only O(poly(d, log T)) times, which is smaller than the current best oracle complexity of O(T) if T is sufficiently large.
AB - We propose computationally efficient algorithms for online linear optimization with bandit feedback, in which a player chooses an action vector from a given (possibly infinite) set A ? Rd, and then suffers a loss that can be expressed as a linear function in action vectors. Although existing algorithms achieve an optimal regret bound of Õ(vT) for T rounds (ignoring factors of poly(d, log T)), computationally efficient ways of implementing them have not yet been specified, in particular when |A| is not bounded by a polynomial size in d. A standard way to pursue computational efficiency is to assume that we have an efficient algorithm referred to as oracle that solves (offline) linear optimization problems over A. Under this assumption, the computational efficiency of a bandit algorithm can then be measured in terms of oracle complexity, i.e., the number of oracle calls. Our contribution is to propose algorithms that offer optimal regret bounds of Õ(vT) as well as low oracle complexity for both non-stochastic settings and stochastic settings. Our algorithm for non-stochastic settings has an oracle complexity of Õ(T) and is the first algorithm that achieves both a regret bound of Õ(vT) and an oracle complexity of Õ(poly(T)), given only linear optimization oracles. Our algorithm for stochastic settings calls the oracle only O(poly(d, log T)) times, which is smaller than the current best oracle complexity of O(T) if T is sufficiently large.
UR - http://www.scopus.com/inward/record.url?scp=85090178223&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090178223&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85090178223
SN - 1049-5258
VL - 32
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019
Y2 - 8 December 2019 through 14 December 2019
ER -