Device-to-Device (D2D) communication has emerged as a vital component for 5G cellular networks to improve spectrum utilization and enhance system capacity. A critical issue for realizing these benefits in D2D-enabled networks is to properly allocate radio resources while coordinating the co-channel interference in a time-varying communication environment. In this paper, we propose a Stackelberg game (SG) guided multi-agent deep reinforcement learning (MADRL) approach, which allows D2D users to make smart power control and channel allocation decisions in a distributed manner. In particular, we define a crucial Stackelberg Q-value (ST-Q) to guide the learning direction, which can be calculated based on the equilibrium achieved in the Stackelberg game. With the guidance of the Stackelberg equilibrium, our approach converges faster with fewer iterations than the general MADRL method and thereby exhibits better performance in handling the network dynamics. After the initial training, each agent can infer timely D2D resource allocation strategies with distributed execution. Extensive simulations are conducted to validate the efficacy of our proposed scheme in developing timely resource allocation strategies. The results also show that our method outperforms the general MADRL based approach in terms of the average utility, channel capacity, and training time consumption.
ASJC Scopus subject areas
- コンピュータ ネットワークおよび通信