### Abstract

This paper addresses the problem of cell selection in dynamic open-access femtocell networks. We model this problem as decentralized restless multi-armed bandit (MAB) with unknown dynamics and multiple players. Each channel is modeled as an arbitrary finite-state Markov chain with different state space and statistics. Each user tries to learn the best channel that maximizes its capacity and reduces its number of handovers. This is a classic exploration/exploitation problem, where the reward of each channel is considered to be Markovian. In addition, the reward process is restless because the state of each Markov chain evolves independently of the user action. This leads to a decentralized restless bandit problem. To solve this problem, we refer to the decentralized restless upper confidence bound (RUCB) algorithm that achieves a logarithmic regret over time for the MAB problem (proposal 1). Then, we extend this algorithm to cope with dynamic environment by applying a change point detection test based on the Page-Hinkley test (PHT) (proposal 2). However, this test would entail some waste of time if the change-point detection was actually a false alarm. To face this problem, we extend our previous proposal by referring to a meta-bandit algorithm (proposal 3) to solve the dilemma between Exploration and Exploitation after the change-point detection occurs. Simulation results show that. our proposal come close to the performance of opportunistic method in terms of capacity, while fewer average number of handovers is required. The use of a change point test and meta-bandit algorithm allows better performance than RUCB in terms of capacity particularly in a changing environment.

Original language | English |
---|---|

Pages (from-to) | 42-52 |

Number of pages | 11 |

Journal | Physical Communication |

Volume | 13 |

Issue number | PB |

DOIs | |

Publication status | Published - 2014 Dec 1 |

### Fingerprint

### Keywords

- Cell selection
- Femtocell networks
- Learning in dynamic environment
- Meta-bandit
- Multi-armed bandit (MAB)

### ASJC Scopus subject areas

- Physics and Astronomy(all)

### Cite this

**Cell selection for open-access femtocell networks : Learning in changing environment.** / Dhahri, Chaima; Ohtsuki, Tomoaki.

Research output: Contribution to journal › Article

*Physical Communication*, vol. 13, no. PB, pp. 42-52. https://doi.org/10.1016/j.phycom.2014.04.008

}

TY - JOUR

T1 - Cell selection for open-access femtocell networks

T2 - Learning in changing environment

AU - Dhahri, Chaima

AU - Ohtsuki, Tomoaki

PY - 2014/12/1

Y1 - 2014/12/1

N2 - This paper addresses the problem of cell selection in dynamic open-access femtocell networks. We model this problem as decentralized restless multi-armed bandit (MAB) with unknown dynamics and multiple players. Each channel is modeled as an arbitrary finite-state Markov chain with different state space and statistics. Each user tries to learn the best channel that maximizes its capacity and reduces its number of handovers. This is a classic exploration/exploitation problem, where the reward of each channel is considered to be Markovian. In addition, the reward process is restless because the state of each Markov chain evolves independently of the user action. This leads to a decentralized restless bandit problem. To solve this problem, we refer to the decentralized restless upper confidence bound (RUCB) algorithm that achieves a logarithmic regret over time for the MAB problem (proposal 1). Then, we extend this algorithm to cope with dynamic environment by applying a change point detection test based on the Page-Hinkley test (PHT) (proposal 2). However, this test would entail some waste of time if the change-point detection was actually a false alarm. To face this problem, we extend our previous proposal by referring to a meta-bandit algorithm (proposal 3) to solve the dilemma between Exploration and Exploitation after the change-point detection occurs. Simulation results show that. our proposal come close to the performance of opportunistic method in terms of capacity, while fewer average number of handovers is required. The use of a change point test and meta-bandit algorithm allows better performance than RUCB in terms of capacity particularly in a changing environment.

AB - This paper addresses the problem of cell selection in dynamic open-access femtocell networks. We model this problem as decentralized restless multi-armed bandit (MAB) with unknown dynamics and multiple players. Each channel is modeled as an arbitrary finite-state Markov chain with different state space and statistics. Each user tries to learn the best channel that maximizes its capacity and reduces its number of handovers. This is a classic exploration/exploitation problem, where the reward of each channel is considered to be Markovian. In addition, the reward process is restless because the state of each Markov chain evolves independently of the user action. This leads to a decentralized restless bandit problem. To solve this problem, we refer to the decentralized restless upper confidence bound (RUCB) algorithm that achieves a logarithmic regret over time for the MAB problem (proposal 1). Then, we extend this algorithm to cope with dynamic environment by applying a change point detection test based on the Page-Hinkley test (PHT) (proposal 2). However, this test would entail some waste of time if the change-point detection was actually a false alarm. To face this problem, we extend our previous proposal by referring to a meta-bandit algorithm (proposal 3) to solve the dilemma between Exploration and Exploitation after the change-point detection occurs. Simulation results show that. our proposal come close to the performance of opportunistic method in terms of capacity, while fewer average number of handovers is required. The use of a change point test and meta-bandit algorithm allows better performance than RUCB in terms of capacity particularly in a changing environment.

KW - Cell selection

KW - Femtocell networks

KW - Learning in dynamic environment

KW - Meta-bandit

KW - Multi-armed bandit (MAB)

UR - http://www.scopus.com/inward/record.url?scp=84909944447&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84909944447&partnerID=8YFLogxK

U2 - 10.1016/j.phycom.2014.04.008

DO - 10.1016/j.phycom.2014.04.008

M3 - Article

AN - SCOPUS:84909944447

VL - 13

SP - 42

EP - 52

JO - Physical Communication

JF - Physical Communication

SN - 1874-4907

IS - PB

ER -