Transmission power allocation method based on user clustering and reinforcement learning
11647468 · 2023-05-09
Assignee
Inventors
Cpc classification
International classification
H04W52/36
ELECTRICITY
H04W52/26
ELECTRICITY
Abstract
Provided is a transmission power allocation method based on reinforcement learning with an efficient user clustering method. According to an embodiment of the present disclosure, a transmission power allocation method based on user clustering and reinforcement learning of a base station in a non-orthogonal multiple access (NOMA) system includes a sorting step of sorting channel gains of user equipments located in a coverage of the base station in a size order, a clustering step of allocating the user equipment to each cluster based on the size order, and a power allocation step of allocating power to each user equipment included in the cluster by using a quality function based on a state and an action.
Claims
1. A transmission power allocation method based on user clustering and reinforcement learning of a base station in a non-orthogonal multiple access (NOMA) system, comprising: a sorting step of sorting channel gains of user equipments located in a coverage of the base station in a size order; a clustering step of allocating the user equipment to each cluster based on the size order; and a power allocation step of allocating power to each user equipment included in the cluster by using a quality function based on a state and an action, wherein the state is an index of the user equipment having a minimum data rate in a current time slot, the action corresponds to a power level of the corresponding user equipment in the cluster, and the quality function is a function providing a discount expected reward for a combination of each state and the action.
2. The transmission power allocation method based on user clustering and reinforcement learning of claim 1, wherein the clustering step includes a step of allocating n+(z−1)*k-th user equipment among the user equipments sorted in the size order of the channel gains to an n-th cluster, wherein n represents an index of the cluster, z represents an order of the corresponding user equipment in the n-th cluster, and k represents the number of clusters.
3. The transmission power allocation method based on user clustering and reinforcement learning of claim 2, wherein the clustering step includes a step of determining the number of user equipments included in each cluster based on a modular operation of the number of clusters to the number of all user equipments in the coverage.
4. The transmission power allocation method based on user clustering and reinforcement learning of claim 1, wherein the power allocation step further includes an initialization step of allocating any action with respect to each user equipment in the coverage before allocating the power to the user equipment.
5. The transmission power allocation method based on user clustering and reinforcement learning of claim 1, wherein the power allocation step includes a step of acquiring an optimal action corresponding to the action and state combination providing a maximum discount expected reward in the quality function.
6. The transmission power allocation method based on user clustering and reinforcement learning of claim 5, wherein the power allocation step includes a step of allocating a value obtained by multiplying a power budget per cluster in the acquired optimal action.
7. The transmission power allocation method based on user clustering and reinforcement learning of claim 1, wherein the power allocation step further includes a data rate acquisition step of acquiring a data rate of each user equipment in the cluster.
8. The transmission power allocation method based on user clustering and reinforcement learning of claim 7, wherein the power allocation step further includes a quality function update step of updating the quality function based on the data rate of each user equipment in the cluster.
9. The transmission power allocation method based on user clustering and reinforcement learning of claim 8, wherein the quality function update step includes a step of setting a sum data rate of user equipments located in the coverage as a reward when a minimum data rate is larger than a minimum data rate requirement in the cluster; a step of setting 0 as the reward when the minimum data rate is smaller than or equal to the minimum data rate requirement in the cluster; and a step of updating the quality function using the set reward.
10. A NOMA system comprising: a base station configured to perform the operations according to claim 1; and user equipments served by the base station.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) The above and other aspects, features and other advantages of the present invention will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
(10) Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so as to easily implement those with ordinary skill in the art to which the present disclosure pertains. The present disclosure may be implemented in various different forms and is not limited to embodiments described herein.
(11) A part irrelevant to the description will be omitted to clearly describe the present disclosure, and like or similar components will be designated by like reference numerals throughout the specification.
(12) In various embodiments, components having the same configuration are described using the same reference numerals only in a representative embodiment, and in other embodiments, only configurations different from the representative embodiment will be described.
(13) Further, throughout the specification, when it is described that a certain part is “connected (or coupled)” with the other part, it means that the certain part may be “directly connected (or coupled)” with the other part and may be “indirectly connected (or coupled)” with another member therebetween. In the present specification, when a certain part “comprises” a certain component, unless explicitly described to the contrary, it will be understood to imply the inclusion of stated elements but not the exclusion of any other elements.
(14) Unless contrarily defined, all terms used herein including technological or scientific terms have the same meanings as those generally understood by those skilled in the art to which the present disclosure pertains. Terms which are defined in a generally used dictionary should be interpreted to have the same meaning as the meaning in the context of the related art, and are not interpreted as an ideal meaning or excessively formal meanings unless otherwise defined in the present application.
(15) Non-Orthogonal Multiple Access (NOMA)
(16) In a NOMA system, a plurality of users may be served at different power levels using a single resource block, and successive interference cancellation (SIC) is performed by a receiver side to decode an allocated user's signal.
(17) It is assumed that the NOMA system is configured by m users with different channel gains. A base station BS with one transmitter transmits a non-orthogonally signal using the same radio resource block (RRB) (that is, frequency, time, and code). The non-orthogonal means that a plurality of signals having different power levels are superimposed to each other to form a single signal. Since the base station BS uses the same RRB, all users receive the same signal and signals of other users become interference. In order to acquire a desired signal, the respective users initially decode the largest interference signal using SIC and removes the largest interference signal from the original signal. After decoding and removing all interference signals, the user acquires a desired signal. In order to perform the SIC process, the intensities of the interference signals need to be much larger than that of the desired signal. Therefore, the selecting of the power level for each user becomes a core of the NOMA system.
(18) The power level for each user depends on a channel gain of the corresponding user. A larger channel gain means that the user is close to the base station BS and low power is required when the signal is transmitted to the corresponding user. A low channel gain implies that the corresponding user is far from the base station BS and high power is required to transmit the signal to the corresponding user. Therefore, the user with the high channel gain may receive large interference due to a high power signal of the user with the low channel gain, and easily suppress all interferences using the SIC. On the other hand, the user with the low channel gain may receive low interference due to a low power signal of the user with the high channel gain, and may not sufficiently suppress the interference.
(19) In
(20)
(21) In Equation 1, P.sub.i represents transmission power for a user i, h.sub.i represents a channel gain of the user i, and n.sub.0 represents a noise power spectral density.
(22) System Model
(23) It is considered that a macro base station BS serves distributed M user equipments UEs. The base station BS and the user equipments UEs are configured by one antenna, respectively. A total available bandwidth (BW) is divided into a plurality of resource blocks that are orthogonal to each other. The number of users served by each NOMA cluster is represented by m, wherein m has a range of 2<=M<=M. Thus, the total number of clusters is k, wherein k has a range of 1<=K<=M/2. The maximum transmission power per NOMA cluster is P.sub.t, and a channel gain for an i-th user is h.sub.i, which depends on a distance between the base station BS and the user equipment UE. The users are sorted in a size order (ascending order) of the channel gain, such as h.sub.1<h.sub.2<h.sub.3< . . . <h.sub.M.
(24) In this specification, a sum data rate for verifying the performance of the NOMA system is used. In this system, a sum data rate R.sub.S is defined as Equation 2 below.
(25)
(26) In Equation 2, P.sub.i represents transmission power for a user i, h.sub.i represents a channel gain of the user i, n.sub.0 represents a noise power spectral density, m represents the number of users served by each NOMA cluster, and k represents the total number of clusters.
(27) A total of power allocated to all users of any cluster needs to be smaller than or equal to P.sub.t, which is expressed as Equation 3 below.
(28)
(29) In Equation 3, P.sub.i represents transmission power for a user m represents the number of users served by each NOMA cluster, and P.sub.t represents a total of power allocated in a cluster.
(30) The condition of the data rate Ri for ensuring a minimum data rate requirement of the i-th user is expressed as Equation 4 below.
(31)
(32) In Equation 4, P.sub.i represents transmission power for a user i, h.sub.i represents a channel gain of the user i, n.sub.0 represents a noise power spectral density, m represents the number of users served by each NOMA cluster, and k represents the total number of clusters.
(33) One user equipment UE may be served by at most one cluster
(34) Hereinafter, in order to maximize the sum data rate, an efficient and intuitive user clustering method to which the power allocation method based on reinforcement learning is applied will be described.
(35) User Clustering
(36) It is assumed that m users are served from one resource block using a power domain (PD) NOMA method. With respect to the system, an available throughput of each user may be calculated as Equation 1 for i=1, 2, 3, . . . , m. The largest factor that affects the sum data rate of the cluster is a channel gain of the user. The user with the high channel gain will contribute significantly to the increase of the sum data rate, but the sum data rate of the user with the low channel gain mostly depends on the allocated power. Therefore, when the user with the low channel gain is paired with a user with a significantly high channel gain, the sum data rate will be maximized.
(37) In order to meet the demand, a coverage area of the base station BS is divided into m circles shown in
(38) Referring to
(39) Then, in step 3, the user grouping is performed. For example, the user equipments of an n-th cluster includes user equipments corresponding to h.sub.n, h.sub.n+k, h.sub.n+2*k, . . . , and h.sub.n+(z−1)*k. Here, z represents a position (order) of the user equipment in the corresponding cluster.
(40) Then, in step 4, a size of the cluster is determined. As illustrated in
(41)
(42) Power Allocation Using Reinforcement Learning
(43) In various reinforcement learning methods, a Q-learning algorithm may be used to allocate power in the NOMA system. The Q-learning may obtain a suitable strategy with a maximum probability using a Markov decision process. (Reference: E. R. Gomes and R. Kowalczyk, Dynamic analysis of multiagent Qlearning with ε-greedy exploration, in Proceedings ACM Annual International Conference on Machine Learning, Montreal, QC, Canada, June 2009, pp. 369-376). The Q-learning searches for other states that occur every time a different action is taken, and utilizes experiences that provide a maximum sum data rate of the base station BS.
(44) The power allocation method according to the present disclosure depends on a quality function (Q-function), which provides a discount expected reward for each state-action pair. Here, a state S.sub.t represents an index of a user having a minimum data rate in a time slot t, and an action θ is responsible for a power level in the cluster. During a learning process, the exchange between the search and utilization affects the performance of the algorithm.
(45) Thus, the algorithm acquires the action θ using a ε-greedy policy, which is shown as Equation 5.
(46)
(47) Initially, any search is made in starting of standard Q-learning due to all null values in a Q-table. Therefore, a hot-booting method is used to acquire pre-training data in a scale. (Reference: L. Xiao, Y. Li, C. Dai, H. Dai and H. Poor, “Reinforcement Learning-Based NOMA Power Allocation in the Presence of Smart Jamming” IEEE Transactions on Vehicular Technology, vol. 67, no. 4, pp. 3377-3389, 2018). After hot-booting, as illustrated in
(48) Referring to
(49) Then, the user is selected through the algorithm of
(50) In an algorithm of
Q(S.sub.t,θ)=(1−α)×Q(S.sub.t,θ)+α(r+δ(max Q(S.sub.t,θ))) [Equation 6]
(51) In Equation 6, α∈(0,1] represents a learning rate of an algorithm that reflects a weight of the current experience, r represents a reward obtained for the action, and δ represents a discount factor that is selected according to the uncertainty of a future gain in the range of (0,1].
(52) The transmission power allocation method based on user clustering and reinforcement learning of the base station in the downlink NOMA system described above may be as illustrated in
(53) According to the embodiment of the present disclosure, the clustering step (S815) includes a step of allocating n+(z−1)*k-th user equipment among the user equipments sorted in the size order of the channel gains to an n-th cluster, wherein n represents an index of the cluster, z represents an order of the corresponding user equipment UE in the n-th cluster, and k represents the number of clusters. For example, a user for each cluster may be allocated as shown in step 3 of
(54) According to the embodiment of the present disclosure, the clustering step (S815) may include a step of determining the number of user equipments included in each cluster based on a modular operation (M mod k) of the number k of clusters to the number M of all user equipments in the coverage. For example, in step 4 of
(55) According the embodiment of the present disclosure, the power allocation step (S820) may further include a step of initializing any action with respect to each user equipment in the coverage before allocating the power to the user equipment UE. For example, in the algorithm of
(56) According to the embodiment of the present disclosure, the power allocation step (S820) may include a step of acquiring an optimal action (θ=argmax Q(S.sub.t, θ)) corresponding to the action and state combination providing the maximum discount expected reward in the quality function Q(S.sub.t, θ). Further, the power allocation step (S820) may include a step of allocating a value θ.sub.mP.sub.t obtained by multiplying a power budget P.sub.t per cluster in the acquired optimal action θ.sub.m.
(57) According to the embodiment of the present disclosure, the power allocation step (S820) may further include a data rate acquisition step of acquiring a data rate R.sub.i of each user equipment UE in the cluster. For example, a process corresponding to a 22-th line of
(58) According to the embodiment of the present disclosure, the power allocation step (S820) may further include a quality function update step of updating the quality function Q(S.sub.t, θ) based on the data rate of each user equipment UE in the cluster.
(59) According to the embodiment of the present disclosure, the quality function update step may include a step of setting a sum data rate R.sub.S of user equipments located in the coverage as a reward r when a minimum data rate min(R.sub.i) is larger than a minimum data rate requirement R.sub.0 (min(R.sub.i)>R.sub.0) in the cluster, a step of setting 0 as the reward r when the minimum data rate min(R.sub.i) is smaller than or equal to the minimum data rate requirement R.sub.0 (min(R.sub.i)<=R.sub.0) in the cluster, and a step of updating the quality function Q(S.sub.t, θ) using the set reward. In the updating of the quality function Q(S.sub.t, θ), Equation 6 may be used.
(60) The NOMA system according to the embodiment of the present disclosure may include the base station BS performing the transmission power allocation method based on user clustering and reinforcement learning described above and user equipments UEs served by the base station BS.
(61) Evaluation of Performance
(62) The performances of NOMA systems to which a user clustering algorithm is applied according to a Q-learning based power allocation algorithm, only the Q-learning based power allocation algorithm, and user clustering with uniform power allocation were compared with each other. In order to evaluate the performance, parameters given in Table 1 below are used.
(63) TABLE-US-00001 TABLE 1 Parameter Value Bandwidth of a resource block, BW 1 MHz Power Budget, P.sub.t 20 W Number of users, M 12 Number of users per cluster, m 2, 3, 4, 6 Number of antennas at BS and UE 1 Learning rate, α 0.2 Discount rate, δ 0.7 Exploration rate, ϵ 1 Number of episodes, E 200 Minimum data rate requirement, R.sub.0 1 bps/Hz
(64) Initially, a distance between the base station BS and the users is optionally selected within the coverage of the base station BS. Thereafter, a channel gain h.sub.i is calculated using a Rayleigh fading model, wherein a path loss index η is 4 (η=4).
(65)
(66)
(67) As described above, the Q-learning based power allocation algorithm to which a simple and efficient user clustering method was applied in the NOMA system was introduced and analyzed. In addition, like a NOMA system to which only the Q-learning based power allocation algorithm is applied and a NOMA system to which only user clustering with uniform power distribution is applied, other scenarios have been reviewed together. It was confirmed that the power allocation algorithm to which the user clustering is applied derives optimal performance as compared to other scenarios. Furthermore, a plurality of NOMA constraints, such as a transmission power budget and a user's data rate minimal requirement, are incorporated into the Q-learning algorithm to be overcome. The proposed user clustering method supports the downlink and uplink NOMA systems to achieve a maximum throughput.
(68) The drawings accompanied in the embodiment and the specification just clearly represent part of the technical idea included in the present disclosure, and it will be apparent that modifications and specific embodiments that can be easily derived by those skilled in the art within the scope of the technical idea contained in the specification and drawings of the present disclosure are all included in the scope of the present disclosure.
(69) Therefore, the spirit of the present disclosure should not be defined only by the described exemplary embodiments, and it should be appreciated that claims to be described below and all which are equivalent to the claims or equivalently modified to the claims are included in the scope of the spirit of the present disclosure.