Cell selection for open-access femtocell networks: Learning in changing environment

Chaima Dhahri, Tomoaki Ohtsuki

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

This paper addresses the problem of cell selection in dynamic open-access femtocell networks. We model this problem as decentralized restless multi-armed bandit (MAB) with unknown dynamics and multiple players. Each channel is modeled as an arbitrary finite-state Markov chain with different state space and statistics. Each user tries to learn the best channel that maximizes its capacity and reduces its number of handovers. This is a classic exploration/exploitation problem, where the reward of each channel is considered to be Markovian. In addition, the reward process is restless because the state of each Markov chain evolves independently of the user action. This leads to a decentralized restless bandit problem. To solve this problem, we refer to the decentralized restless upper confidence bound (RUCB) algorithm that achieves a logarithmic regret over time for the MAB problem (proposal 1). Then, we extend this algorithm to cope with dynamic environment by applying a change point detection test based on the Page-Hinkley test (PHT) (proposal 2). However, this test would entail some waste of time if the change-point detection was actually a false alarm. To face this problem, we extend our previous proposal by referring to a meta-bandit algorithm (proposal 3) to solve the dilemma between Exploration and Exploitation after the change-point detection occurs. Simulation results show that. our proposal come close to the performance of opportunistic method in terms of capacity, while fewer average number of handovers is required. The use of a change point test and meta-bandit algorithm allows better performance than RUCB in terms of capacity particularly in a changing environment.

Original languageEnglish
Pages (from-to)42-52
Number of pages11
JournalPhysical Communication
Volume13
Issue numberPB
DOIs
Publication statusPublished - 2014 Dec 1

Fingerprint

Femtocell
learning
proposals
cells
Markov processes
Markov chains
exploitation
confidence
false alarms
Statistics
statistics
simulation

Keywords

  • Cell selection
  • Femtocell networks
  • Learning in dynamic environment
  • Meta-bandit
  • Multi-armed bandit (MAB)

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this

Cell selection for open-access femtocell networks : Learning in changing environment. / Dhahri, Chaima; Ohtsuki, Tomoaki.

In: Physical Communication, Vol. 13, No. PB, 01.12.2014, p. 42-52.

Research output: Contribution to journalArticle

@article{a3afa8bfc76e42daa462a21899c8fe28,
title = "Cell selection for open-access femtocell networks: Learning in changing environment",
abstract = "This paper addresses the problem of cell selection in dynamic open-access femtocell networks. We model this problem as decentralized restless multi-armed bandit (MAB) with unknown dynamics and multiple players. Each channel is modeled as an arbitrary finite-state Markov chain with different state space and statistics. Each user tries to learn the best channel that maximizes its capacity and reduces its number of handovers. This is a classic exploration/exploitation problem, where the reward of each channel is considered to be Markovian. In addition, the reward process is restless because the state of each Markov chain evolves independently of the user action. This leads to a decentralized restless bandit problem. To solve this problem, we refer to the decentralized restless upper confidence bound (RUCB) algorithm that achieves a logarithmic regret over time for the MAB problem (proposal 1). Then, we extend this algorithm to cope with dynamic environment by applying a change point detection test based on the Page-Hinkley test (PHT) (proposal 2). However, this test would entail some waste of time if the change-point detection was actually a false alarm. To face this problem, we extend our previous proposal by referring to a meta-bandit algorithm (proposal 3) to solve the dilemma between Exploration and Exploitation after the change-point detection occurs. Simulation results show that. our proposal come close to the performance of opportunistic method in terms of capacity, while fewer average number of handovers is required. The use of a change point test and meta-bandit algorithm allows better performance than RUCB in terms of capacity particularly in a changing environment.",
keywords = "Cell selection, Femtocell networks, Learning in dynamic environment, Meta-bandit, Multi-armed bandit (MAB)",
author = "Chaima Dhahri and Tomoaki Ohtsuki",
year = "2014",
month = "12",
day = "1",
doi = "10.1016/j.phycom.2014.04.008",
language = "English",
volume = "13",
pages = "42--52",
journal = "Physical Communication",
issn = "1874-4907",
publisher = "Elsevier",
number = "PB",

}

TY - JOUR

T1 - Cell selection for open-access femtocell networks

T2 - Learning in changing environment

AU - Dhahri, Chaima

AU - Ohtsuki, Tomoaki

PY - 2014/12/1

Y1 - 2014/12/1

N2 - This paper addresses the problem of cell selection in dynamic open-access femtocell networks. We model this problem as decentralized restless multi-armed bandit (MAB) with unknown dynamics and multiple players. Each channel is modeled as an arbitrary finite-state Markov chain with different state space and statistics. Each user tries to learn the best channel that maximizes its capacity and reduces its number of handovers. This is a classic exploration/exploitation problem, where the reward of each channel is considered to be Markovian. In addition, the reward process is restless because the state of each Markov chain evolves independently of the user action. This leads to a decentralized restless bandit problem. To solve this problem, we refer to the decentralized restless upper confidence bound (RUCB) algorithm that achieves a logarithmic regret over time for the MAB problem (proposal 1). Then, we extend this algorithm to cope with dynamic environment by applying a change point detection test based on the Page-Hinkley test (PHT) (proposal 2). However, this test would entail some waste of time if the change-point detection was actually a false alarm. To face this problem, we extend our previous proposal by referring to a meta-bandit algorithm (proposal 3) to solve the dilemma between Exploration and Exploitation after the change-point detection occurs. Simulation results show that. our proposal come close to the performance of opportunistic method in terms of capacity, while fewer average number of handovers is required. The use of a change point test and meta-bandit algorithm allows better performance than RUCB in terms of capacity particularly in a changing environment.

AB - This paper addresses the problem of cell selection in dynamic open-access femtocell networks. We model this problem as decentralized restless multi-armed bandit (MAB) with unknown dynamics and multiple players. Each channel is modeled as an arbitrary finite-state Markov chain with different state space and statistics. Each user tries to learn the best channel that maximizes its capacity and reduces its number of handovers. This is a classic exploration/exploitation problem, where the reward of each channel is considered to be Markovian. In addition, the reward process is restless because the state of each Markov chain evolves independently of the user action. This leads to a decentralized restless bandit problem. To solve this problem, we refer to the decentralized restless upper confidence bound (RUCB) algorithm that achieves a logarithmic regret over time for the MAB problem (proposal 1). Then, we extend this algorithm to cope with dynamic environment by applying a change point detection test based on the Page-Hinkley test (PHT) (proposal 2). However, this test would entail some waste of time if the change-point detection was actually a false alarm. To face this problem, we extend our previous proposal by referring to a meta-bandit algorithm (proposal 3) to solve the dilemma between Exploration and Exploitation after the change-point detection occurs. Simulation results show that. our proposal come close to the performance of opportunistic method in terms of capacity, while fewer average number of handovers is required. The use of a change point test and meta-bandit algorithm allows better performance than RUCB in terms of capacity particularly in a changing environment.

KW - Cell selection

KW - Femtocell networks

KW - Learning in dynamic environment

KW - Meta-bandit

KW - Multi-armed bandit (MAB)

UR - http://www.scopus.com/inward/record.url?scp=84909944447&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84909944447&partnerID=8YFLogxK

U2 - 10.1016/j.phycom.2014.04.008

DO - 10.1016/j.phycom.2014.04.008

M3 - Article

AN - SCOPUS:84909944447

VL - 13

SP - 42

EP - 52

JO - Physical Communication

JF - Physical Communication

SN - 1874-4907

IS - PB

ER -