This paper addresses the problem of cell selection in dynamic open-access femtocell networks. We model this problem as decentralized restless multi-armed bandit (MAB) with unknown dynamics and multiple players. Each channel is modeled as an arbitrary finite-state Markov chain with different state space and statistics. Each user tries to learn the best channel that maximizes its capacity and reduces its number of handovers. This is a classic exploration/exploitation problem, where the reward of each channel is considered to be Markovian. In addition, the reward process is restless because the state of each Markov chain evolves independently of the user action. This leads to a decentralized restless bandit problem. To solve this problem, we refer to the decentralized restless upper confidence bound (RUCB) algorithm that achieves a logarithmic regret over time for the MAB problem (proposal 1). Then, we extend this algorithm to cope with dynamic environment by applying a change point detection test based on the Page-Hinkley test (PHT) (proposal 2). However, this test would entail some waste of time if the change-point detection was actually a false alarm. To face this problem, we extend our previous proposal by referring to a meta-bandit algorithm (proposal 3) to solve the dilemma between Exploration and Exploitation after the change-point detection occurs. Simulation results show that. our proposal come close to the performance of opportunistic method in terms of capacity, while fewer average number of handovers is required. The use of a change point test and meta-bandit algorithm allows better performance than RUCB in terms of capacity particularly in a changing environment.
ASJC Scopus subject areas