PDP estimation for bundle-based channel estimation via learning approach
11277284 · 2022-03-15
Assignee
Inventors
Cpc classification
International classification
Abstract
A method of channel estimation for a precoded channel includes generating an initial frequency autocorrelation of the precoded channel for a current bundle of a received data transmission, generating an expanded frequency autocorrelation based on the initial frequency autocorrelation of the precoded channel, providing the expanded frequency autocorrelation to a neural network, generating, by the neural network, an estimated frequency autocorrelation of an unprecoded channel based on the expanded frequency autocorrelation, and generating an estimated power distribution profile of the unprecoded channel based on the estimated frequency autocorrelation.
Claims
1. A method of channel estimation for a precoded channel, the method comprising: generating an initial frequency autocorrelation of the precoded channel for a current bundle of a received data transmission; generating an expanded frequency autocorrelation based on the initial frequency autocorrelation of the precoded channel; providing the expanded frequency autocorrelation to a neural network; generating, by the neural network, an estimated frequency autocorrelation of an unprecoded channel based on the expanded frequency autocorrelation; and generating an estimated power distribution profile of the unprecoded channel based on the estimated frequency autocorrelation.
2. The method of claim 1, wherein the current bundle comprises a plurality of resource blocks, each one of the resource blocks comprising a plurality of subcarriers.
3. The method of claim 1, wherein the unprecoded channel is an estimate of the precoded channel absent precoding.
4. The method of claim 1, wherein the generating the expanded frequency autocorrelation comprises: performing edge expansion on the initial frequency autocorrelation to expand a size of the estimate frequency autocorrelation to a fast fourier transform (FFT) size, wherein the FFT size is an input size of the neural network.
5. The method of claim 4, wherein the edge expansion comprises a linear interpolation of values of the initial frequency autocorrelation via an expansion matrix.
6. The method of claim 1, wherein the providing the expanded frequency autocorrelation to the neural network comprises: providing a first half of values of the expanded frequency autocorrelation to the neural network, wherein a second half of values of the expanded frequency autocorrelation are complex conjugates of the first half of values of the expanded frequency autocorrelation.
7. The method of claim 1, wherein the generating the estimated frequency autocorrelation by the neural network comprises: generating, by the neural network, at least some of values of the estimated frequency autocorrelation of the unprecoded channel based on the expanded frequency autocorrelation.
8. The method of claim 1, wherein the generating the estimated power distribution profile comprises: filtering the estimated frequency autocorrelation output by the neural network via a low pass filter to generate a refined autocorrelation of the unprecoded channel; and performing an inverse FFT (IFFT) operation on the refined autocorrelation to generate the estimated power distribution profile.
9. The method of claim 8, wherein the low pass filter is a moving average filter.
10. The method of claim 1, wherein the generating the initial frequency autocorrelation of the precoded channel for the current bundle comprises: generating a time autocorrelation for a previous bundle of the received data transmission; generating a previous frequency autocorrelation for the previous bundle based on a previous estimated power distribution profile; generating an estimated channel input response based on the time autocorrelation and the previous frequency autocorrelation; and generating the initial frequency autocorrelation of the precoded channel for the current bundle based on the estimated channel input response.
11. The method of claim 1, further comprising: generating a truncated estimated power distribution profile by truncating a size of the estimated power distribution profile to match a size of the initial frequency autocorrelation of the precoded channel.
12. The method of claim 11, further comprising: normalizing the truncated estimated power distribution profile to a unit power to generate a normalized estimated power distribution profile.
13. The method of claim 11, wherein the truncated estimated power distribution profile has a length of a maximum delay spread of the precoded channel.
14. A system for channel estimation of a precoded channel, the system comprising: a processor; and memory storing instructions that, when executed on the processor, cause the processor to perform: generating an initial frequency autocorrelation of the precoded channel for a current bundle of a received data transmission; generating an expanded frequency autocorrelation based on the initial frequency autocorrelation of the precoded channel; providing the expanded frequency autocorrelation to a neural network; generating, by the neural network, an estimated frequency autocorrelation of an unprecoded channel based on the expanded frequency autocorrelation; and generating an estimated power distribution profile of the unprecoded channel based on the estimated frequency autocorrelation.
15. A method of channel estimation for a precoded channel, the method comprising: generating an initial frequency autocorrelation of the precoded channel for a current bundle of a received data transmission; providing the initial frequency autocorrelation to a policy network; generating, by the policy network, an estimated frequency autocorrelation of an unprecoded channel based on the initial frequency autocorrelation; determining, by a value network, an instantaneous reward based on the estimated frequency autocorrelation; determining an advantage based on the instantaneous reward and a predicted total reward of forward propagation at the value network; and updating a policy of the policy network based on the advantage via back propagation to reduce a block error rate.
16. The method of claim 15, wherein the updating the policy of the policy network comprises: determining a policy gradient based on the advantage; and updating coefficients of the policy network based on the policy gradient.
17. The method of claim 15, wherein the policy network and the value network are multi-layer perceptrons.
18. The method of claim 15, further comprising: adding gaussian noise to the estimated frequency autocorrelation to convert a discrete action space of the policy network to a continuous action space.
19. The method of claim 15, further comprising: generating an expanded frequency autocorrelation based on the initial frequency autocorrelation of the precoded channel, wherein the providing the initial frequency autocorrelation to the policy network comprises: providing the expanded frequency autocorrelation to the policy network, and wherein the generating the estimated frequency autocorrelation of the unprecoded channel is based on the expanded frequency autocorrelation.
20. The method of claim 15, further comprising: filtering the estimated frequency autocorrelation via a low pass filter to generate a refined autocorrelation of the unprecoded channel; and performing an inverse FFT (IFFT) operation on the refined autocorrelation to generate an estimated power distribution profile.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
(1) These and other features of some example embodiments of the present disclosure will be appreciated and understood with reference to the specification, claims, and appended drawings, wherein:
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
DETAILED DESCRIPTION
(13) The detailed description set forth below in connection with the appended drawings is intended as a description of some example embodiments of a system and a method for channel estimation provided in accordance with the present disclosure and is not intended to represent the only forms in which the present disclosure may be constructed or utilized. The description sets forth the features of the present disclosure in connection with the illustrated embodiments. It is to be understood, however, that the same or equivalent functions and structures may be accomplished by different embodiments that are also intended to be encompassed within the scope of the disclosure. As denoted elsewhere herein, like element numbers are intended to indicate like elements or features.
(14) Recent generations of technology standards for communication networks support bundle-based configurations in which each bundle may be precoded with its own selection from the precoding matrix. To facilitate communication in such a system, the user equipment (UE) has to estimate channels (e.g., estimate the power delay profile (PDP) of the channel) in the frequency domain. One channel estimation technique, minimum mean squared error (MMSE), utilizes a channel's 2.sup.nd order statistics consisting of frequency and time correlations (e.g., frequency and time autocorrelations). Time correlation (e.g., time autocorrelation) may be determined based on known techniques. However, frequency correlation (e.g., frequency autocorrelation) involves accurate estimation of PDP information. Assuming uniform PDP when deriving frequency correlation may result in performance degradation, especially when channel delay is relatively long.
(15) Thus, the channel estimator, according to some embodiments, utilizes a neural network that receives frequency correlation of a precoded channel from a preceding slot/bundle and outputs the PDP information of a current slot/bundle. In some embodiments, the channel estimator further performs filtering, truncation, and normalization to refine the output of the neural network, which is utilized to estimate the channel for the current bundle/slot.
(16)
(17) The communication system 1 may include a transmitter 10, a communication channel (e.g., a wireless multi-path channel) 20, and a receiver 30. The transmitter 10 may include a source 12 of input data, a channel encoder 14 configured to encode the input data to enable error correction at the receiver 30, a modulator 14 configured to generate a transmit signal based on the encoded input data, and a precoder 18 for precoding one or more bundles of data prior to transmission through the communication channel 20.
(18) The receiver 30 includes a receiver filter 32 for filtering out noise that may have been added to the transmitted signal in the multi-path channel 20, a detector 34 configured to reconstruct the encoded data from the received signal, and a channel decoder 36 configured to decode the reconstructed data to retrieve the input data generated by the source 12.
(19) The transmitter 10 may be a radio node, and the receiver 30 may be part of the user equipment, which may be mobile. The communication channel 20 may not be constant and may change over time, for example, as a result of the transmitter 10 and/or the receiver 30 being in motion. Mobile wireless communication may be adversely affected by the multi-path interference resulting from reflections from surroundings, such as hills, buildings, and other obstacles. Having an accurate estimate of the time-varying channel is key to providing reliability and high data rates at the receiver 30. Thus, according to some embodiments, the receiver 30 further includes a channel estimator 100 that utilizes a neural network to estimate the channel, that is, the channel impulse response (CIR), for each bundle of transmitted signal and provides the CIR to the detector 34.
(20) The signal y received by the receiver 30 may be expressed as:
y=p+n (Eq. 1)
(21) where p is a reference signal (RS) channel vector of demodulation reference signals (DMRS) and n is a background noise, which has a zero-mean and covariance of σ.sup.2I (where I is an identity matrix). The estimate of channel input response ĥ may be expressed as:
ĥ=R.sub.hp(R.sub.pp+σ.sup.2I).sup.−1y (Eq. 2)
(22) where R.sub.hp represents the correlation matrix between h and p. Similarly, R.sub.pp denotes the auto-correlation matrix of p. The auto-correlation R.sub.pp may be solely a function of p, which is a DMRS channel vector known to the receiver 30.
(23) Here, it is assumed that the channel distribution follows a wide-sense stationary uncorrelated scattering (WSSUS) model. In other words, the second-order moment of the channel is stationary and only depends on the amount of either time or frequency difference, instead of each instantaneous value. Under the WSSUS model, the channel autocorrelation can be decomposed into a frequency domain part and a time domain part as:
R.sub.h.sub.
(24) Where h.sub.i,j is the complex channel gain at the ith subcarrier of the jth symbol, and r.sub.f( ) and r.sub.t( ) are frequency and time autocorrelation functions, respectively. Appropriate selection of the subcarrier numbers i and k and the symbol values j and l allows for the calculation of the R.sub.hp and R.sub.pp based on R.sub.h.sub.
(25) The time autocorrelation function r.sub.t( ) may be calculated in a number of ways. For example, the time autocorrelation function may rely on linear interpolation to get the correlation value between two symbols, which is given by
(26)
(27) where TC(x) is the correlation value of interval x, and T.sub.s represents the symbol duration. In other examples, Jakes' model may be used to yield:
r.sub.t(l)=J.sub.0(2πT.sub.sf.sub.Dl) (Eq. 5)
(28) Where J.sub.0 is the first kind zero-th order Bessel function, f.sub.D represents the Doppler spread corresponding to the largest Doppler shift.
(29) Given the power delay profile (PDP) of the channel 20, the frequency autocorrelation function can be expressed with the fast fourier transform (FFT) of channel powers as
r.sub.f(k)=Σ.sub.i=0.sup.L−1P.sub.ie.sup.−j2πkΔfτ.sup.
(30) where L is the number of channel taps (also referred to as a maximum delay spread) in the time domain and Δf is the subcarrier spacing. P.sub.i and τ.sub.i are the power and delay of the ith channel tap, respectively. The maximum delay spread L may be measured with a quasi-co-located (QCL) reference signal. Here, a total amount of power in the profile is normalized to be a unit power, i.e.,
Σ.sub.i=0.sup.L−1P.sub.i=1 (Eq. 7)
(31) According to some embodiments, the channel estimator 100 estimates the values P.sub.i (e.g., as close to ideal as possible) by utilizing a neural network. The PDP values may be used to determine the frequency autocorrelation function r.sub.f( ) using Equation 6. The frequency autocorrelation function r.sub.f( ) together with the time autocorrelation function r.sub.t( ) (as, e.g., determined by Equation 4 or 5) the receiver 30 may determine the channel correlation matrix R.sub.h,h via Equation 3, from which the channel autocorrelations R.sub.hp and R.sub.pp can be calculated. The receiver 30 may then determine the estimated channel input response ĥ via Equation 2. In some embodiments, the receiver 30 individually estimates the channel response for each bundle of a transmission.
(32)
(33) Referring to
(34) Referring to
(35) TABLE-US-00001 TABLE 1 CHBW (MHz) 5 10 15 20 25 40 50 60 80 100 15 kHz 25 52 79 106 133 216 270 30 kHz 11 24 38 51 65 106 133 162 217 273 60 kHz 11 18 24 31 51 65 79 107 135
(36) As shown in
(37) In order to determine the PDP (which is a time domain characteristic) for each bundle of a transmission, according to some embodiments, the frequency correlation of the physical downlink shared channel (PDSCH; see, Equation 20 below) combined with DMRS in the previous slot is provided to a neural network. The output of frequency autocorrelation is post-processed to estimate PDP for the current slot. Accordingly, the channel estimator improves the block error rate (BLER) as compared to the related art, which assume for the channel to have a uniform PDP.
(38)
(39) According to some embodiments, the channel estimator 100 includes an edge expander 110, a neural network 120, a post-processor 125, and a narrowband channel estimator (NBCE) 160. In some embodiments, the post-processor 125 includes a filter 130, an inverse fast fourier transform (IFFT) converter 140, and a truncation and normalization block 150.
(40) Referring to
(41) The input size of a neural network 120 is fixed to be the same as a fast fourier transform (FFT) size. Here, the FFT size may represent the number of pins in the analysis window of the frequency spectrum. This allows a single network to cover all resource block configurations allocated for PDSCH and DMRS, for examples, up to 273 resource blocks (as in the example of Table 1). Sizing the neural network input to be the same as the FFT size prevents the need to design multiple networks, each corresponding to a single resource block size. This may be particularly desirable in that the channel estimator 100 (e.g., the channel estimator 100) may not be aware of the frequency resource allocation at the transmitter 10, and is thus sized to accommodate different frequency resource allocation at the transmitter 10.
(42) Thus, according to some embodiments, in order to maintain the same size of input features, the edge expander 110 expands the measured autocorrelation to the FFT size by using the edge expansion. In some embodiments, the edge expander 110 interpolates signals (e.g., via linear interpolation) with an expansion matrix of
(43)
(44) where N.sub.f is the size of FFT, N.sub.d is the size of the measured/calculated autocorrelation (also referred to as the initial frequency autocorrelation)
{tilde over (r)}.sub.f,i(k)=A
(45) where
(46) However, embodiments of the present invention are not limited to the above interpolation, and any suitable expansion/interpolation technique may be employed to arrive at the expanded frequency autocorrelation based on the measured autocorrelation.
(47) According to some embodiments, the channel estimator 100 utilizes the symmetric property of autocorrelation to remove duplicated information from the neural network 120. Thus, in some embodiments, the channel estimator 100 provides half of the expanded frequency autocorrelation values {tilde over (r)}.sub.f(k) to the neural network 120 (as shown in
{tilde over (r)}.sub.f,o(k)={tilde over (r)}.sub.f,o(−k)* (Eq. 17)
(48) That is, one half of the estimated channel autocorrelation may be calculated as the complex conjugate of the other half. Accordingly, the output of the neural network 120 may be restored to a full size of FFT from a half-size FFT at the input of the neural network 120. Performing inference frequency autocorrelation values significantly reduces the computational load on the neural network 120 and improved inference performance.
(49) According to some embodiments, the filter 130, the IFFT converter 140, and the truncation and normalization block 150 apply post processing to the output of the neural network 120 for further stabilization. In some embodiments, the filter 130 applies a low-pass filter to the neural network output, {tilde over (r)}.sub.f(k), which is the estimated autocorrelation of unprecoded channels to generate a refined frequency autocorrelation {tilde over ({tilde over (r)})}.sub.fi[k]. The low-pass filter may be a moving average over frequency expressed as
(50)
(51) where 2n+1 is the order of moving average.
(52)
(53) According to some embodiments, the IFFT converter 140 converts the refined frequency autocorrelation into the estimated PDP (i.e., the estimated P.sub.i values in Equation 6) by performing an IFFT operation. The truncation and normalization block 150 further refines the estimated PDP in the time domain. In some embodiments, the truncation and normalization block 150 truncates/prunes the estimated PDP to the length of the maximum delay spread L and normalizes the estimated PDP to a unit power to satisfy the condition of Equation 7. The PDP estimation is then is given by
(54)
(55) where {tilde over (P)}.sub.k is the power value at each tap, k, derived from the output of the neural network 120. As such, the truncation and normalization block 150 stabilizes the PDP estimation
(56)
(57) In some embodiments, the NBCE 160 generates the frequency autocorrelation function r.sub.f( ) by performing an FFT operation on the refined PDP estimate according to Equation 6.
(58) According to some embodiments, the NBCE 160 further uses the frequency autocorrelation function r.sub.f( ) generated for the current bundle/time slot to calculate the generate the neural network input for the subsequent/next bundle/time slot. In some embodiments, the NBCE 160 uses the calculated frequency autocorrelation function r.sub.f( ) together with the time autocorrelation function r.sub.t( ) (as, e.g., determined by Equation 4 or 5) to determine the channel autocorrelation R.sub.h,h via Equation 3, which is used to compute the channel autocorrelations R.sub.hp and R.sub.pp. The NBCE 160 then calculates the estimated channel input response ĥ using Equation 2.
(59) According to some embodiments, the NBCE 160 then proceeds to calculate the frequency autocorrelation for the subsequent slot/bundle by using
(60)
(61) where s is a symbol index within the slot, r is the antenna index of the receiver 30, l is the layer index assigned to the PDSCH and DMRS ports, and n is the resource element (RE) index. As
(62) According to some embodiments, the channel estimator 100 (e.g., the NBCE 160) uses a uniform PDP in calculating the channel autocorrelation r.sub.f( ) for the very first slot being analyzed by the neural network 120 in a data transmission.
(63) In some embodiments, rather than rely on the channel autocorrelation r.sub.f( ) from only the previous slot, the channel estimator 100 computes the autocorrelation over multiple past slots and averages them prior to injecting the averaged autocorrelation to the neural network 120.
(64) According to some embodiments, the neural network 120 utilizes a model that correlates a plurality of frequency autocorrelations of precoded channel across bundles with a plurality of frequency autocorrelations without precoding. By utilizing the model and a supervised machine learning algorithm, such as a one of various known regression or back propagation algorithms, the neural network 120 estimates the autocorrelation {tilde over (r)}.sub.f, which is the estimated frequency autocorrelation of an unprecoded channel for a given bundle. Here, the unprecoded channel refers to an estimate of the precoded channel absent (e.g., stripped of or without) precoding.
(65) According to some embodiments, the neural network 120 (e.g., the deep neural network) may be a specialized AI or a general AI and is trained using training data (e.g., precoded and non-coded frequency autocorrelations) and an algorithm, such as a back-propagation algorithm.
(66) The neural network 120 may include a set of weights for each of the parameters of a linear regression model, or the neural network 120 may include a set of weights for connections between the neurons of a trained neural network. In some embodiments, frequency autocorrelation functions r.sub.f( ) of a precoded channel across bundles are supplied to the neural network 120 as values to the input layer of the neural network 120, and the values (or a set of intermediate values) are forward propagated through the neural network 120 to generate an output, where the outputs are estimated autocorrelations {tilde over (r)}.sub.f of the channel without precoding.
(67) In an example of training, 3 different types of precoding with a bundle of 2 resource blocks such as bypass (i.e., identity), random, and PMI (precoding matrix indicator)-based precoding. Under the configuration specified to generate samples, genie PDP per channel may be also be used to calculate R.sub.pp and R.sub.hp within a bundle. Thus, a pair of samples, i.e., frequency autocorrelation of precoded channels, and labels, i.e., frequency autocorrelation of unprecoded channels reversely computed from genie PDP, may be collected via simulation.
(68) The maximum number of resource blocks (RBs) per subcarriers may be allocated to generate data samples, rather than all numbers of RBs per RB subcarriers. As stated, using the edge expansion, the input to the neural network 120 is maintained to be the size of FFT. For example, when a subcarrier spacing of 15 kHz is used at a channel bandwidth of 20 MHz, the maximum configurable number of RBs in a radio is 106. Similarly, with a subcarrier spacing of 30 kHz, 273 RBs can be allocated over a channel bandwidth of 100 MHz, as specified in Table 1.
(69)
(70) As described above, the channel estimator 100 estimates the power delay profile (PDP), that is, the P.sub.i values, in Equation 6 by using frequency autocorrelation of PDSCH combined with DMRS in the previous slot via neural networks and finally enhances the error of channel estimation at the current slot.
(71) As provided above, the channel estimator 100 according to some embodiments aims to estimate PDP as close to ideal as possible. However, embodiments of the present invention are not limited thereto.
(72) Given the assumption that the channel distribution follows a wide-sense stationary uncorrelated scattering (WSSUS) model and due to the imperfectness of NBCE e.g., due to estimation error and background noise), the ideal PDP values may not guarantee the optimization (e.g., minimization) of the block error rate (BLER). As a result, according to some embodiments, the channel estimation is performed in such a way as to minimize the mean square errors (MSE) of channel estimation, which may lead to the reduction (e.g., minimization) of BLER.
(73) According to some examples, NBCE PDP estimation is formulated to be a one-step markov decision process (MDP). That is, the action at the ith time slot does not impact the state at the (i+1)th slot. The action is the receiver's PDP estimation per slot and the state is solely associated with channels. Therefore, the one-step MDP is modeled to terminate a trajectory after a single time step is proceeded with a reward.
(74) The MDP framework includes states, actions, and rewards. According to some embodiments, a state denotes frequency autocorrelation of channels, each of which may be precoded per bundle. As precoding matrices used by the transmitter 10 are transparent to receiver 30, the frequency autocorrelation at each slot is computed by using the estimated channels combined with the precoding at the previous slot as per Equations 20-22.
(75)
(76) According to some embodiments, the edge expander 110, the post-processor 125, and the narrowband channel estimator (NBCE) 160 of the channel estimator 200 are the same as the corresponding components of the channel estimator 100 of
(77) In some embodiments, the channel estimator 200 includes a gaussian noise generator 170 for adding a gaussian noise to the output of the policy network 122, and a value network 180 for evaluating the output of the policy network 122 and correcting the coefficients or neural weights of the policy network 122 to reduce (e.g., minimizes) the overall BLER of the receiver 30. In some examples, the gaussian noise may have a mean of zero and a preset variance (e.g., a small fixed variance), and may convert the discrete action space of the policy network to a continuous action space. The policy network 122 takes both real and imaginary elements of {tilde over (r)}.sub.f,i(k) to produce an action with gaussian noise induced from a gaussian noise generator 170, which is the frequency autocorrelation {tilde over (r)}.sub.f(k) of estimated unprecoded channels.
(78) In some embodiments, the value network 180 receives the state (i.e., the estimated frequency autocorrelation
(79) Here, the policy network 122 is referred to as the actor, and the value network 180 as the critic that measures how good or bad the actions taken by the actor are.
(80) In some examples, a pair of state and reward is sampled for training in a value network with multiple random seeds (e.g., 20 random seeds). The network with the lowest loss function is selected as a value network 180. The value network 180 may be a multi-layer perceptron. According to some examples, the value network 180 has a single hidden layer with 128 nodes. The sigmoid function may be used at the activation layer of the value network 180 and the output layer of the value network 180 may be bypassed without a specific function. The loss function may be designed to reduce (e.g., minimize) the mean square error (MSE).
(81) In some embodiment, after the value network 180 calculates the reward, the channel estimator 200 computes the advantage which may be expressed as:
Â.sup.π(s.sub.i,a.sub.i)=r(s.sub.i,a.sub.i)−{circumflex over (V)}.sub.Ø.sup.π(s.sub.i) (Eq. 23)
(82) where r(s.sub.i, a.sub.i) is the instantaneous reward caused by the action a.sub.i at state s.sub.i, {circumflex over (V)}.sub.Ø.sup.π(s.sub.i) is the predicted total reward of forward propagation at the output of the value network 180, and i is the slot index. The advantage indicates the improvement in expected rewards (relative to the average state) if action a.sub.i is taken by the policy network 122 at state s.sub.i. In other words, if the advantage is positive, the gradient is moved in that direction, and if negative, the gradient is moved in the opposite direction. The channel estimator 200 then calculates the objective gradient
∇.sub.θJ(θ)=Σ.sub.t=0.sup.T−1∇.sub.θ log π.sub.θ(a.sub.t|s.sub.t)Â.sup.π(s.sub.t,a.sub.t) (Eq. 24)
(83) where ∇.sub.θJ(θ) is the gradient of the objective J(θ), θ represents the coefficients of the policy network 122, t represents a time index increasing from 0 to T represents the number of time steps, and π.sub.θ(a.sub.t/s.sub.t) represents the probability function of the policy network 122 for determining the action a.sub.t when given a state s.sub.t. The probability function π.sub.θ(a.sub.t/s.sub.t) of the policy network 122 may be trained through supervised learning. In some examples, the one-step MDP is modeled to terminate a trajectory after a single time step, that is, T=1.
(84) According to some embodiments, the channel estimator 200 then updates the policy (e.g., updates the coefficients of the policy network 170) via back propagation using the policy gradient ∇.sub.θJ(θ) by substituting the policy coefficients (or network coefficients) θ with θ+α∇.sub.θJ(θ), where α is a small coefficient, which may be 0.05, for example.
(85) According to some embodiments, during the training phase of the policy network 122, the channel estimator repeatedly performs the steps of determining an action by the policy network 122 given a state, determining a reward for the action and state by the value network 180, evaluating the advantage via one-step reinforcement learning, calculating the objective gradient, and updating the policy coefficients via back propagation. This loop may continue until the improvement converges or until a predetermined threshold is met.
(86) According to some embodiments, the value network 180 may be a specialized AI or a general AI and is trained using training data and an algorithm, such as a back-propagation algorithm.
(87) The value network 180 may include a set of weights for each of the parameters of a linear regression model, or the value network 180 may include a set of weights for connections between the neurons of a trained neural network. In some embodiments, frequency autocorrelation functions of an unprecoded channel across bundles are supplied to the value network 180 as values to the input layer of the value network 180, and the values (or a set of intermediate values) are forward propagated through the value network 180 to generate an output, where the outputs are instantaneous reward caused by the actions taken by the policy network 122.
(88) According to some embodiments, while the value network 180 is present in the channel estimator 200 for purpose of training the policy network 122, the value network 180 may be omitted from the channel estimator 200 during the inference phase when the channel estimator 200 is being used to perform channel estimation for incoming signals.
(89) In some examples, the receiver 30 may be equipped with 2 or 4 receive antennas and the transmitter 10 transmits a signal with the same rank as the receive antennas. Here, the rank refers to the matrix rank (i.e., the number of columns in the matrix) of the channel input response ĥ. The number of resource blocks may be set to 106 over a channel bandwidth of 20 MHz.
(90) In some examples, training may be performed with samples from all of extended pedestrian A model (EPA), extended vehicular A model (EVA), and extended typical urban model (ETA) channels, and each of the value and policy networks is covers all channels. The initial policy network 122 may be initially trained through supervised learning where all precoding options are sampled such as identity, random, and PMI-based precoding. The neural network may enable batch normalization for its training so that the input to the hidden layer has been normalized with zero mean and unit variance.
(91)
(92) The EPA, EVA, and ETU are multipath fading channel model delay profiles that represent a low, medium, and high delay spread environment, respectively. Given that an EPA channel has a relatively shorter delay spread (e.g., only up to 410 ns), there may be little room to improve PDP estimation as compared to the uniform or ideal PDP assumptions. However, as the maximum delay spread increases in EVA and ETU channels, the PDP estimation can vary much over the delay spread and the performance can be much affected. Hence, it can be further improved or rather degraded as well.
(93) As illustrated in
(94) Table 2 provides the performance gain of channel estimation using A2C against the uniform PDP assumption for NBCE. Channel estimation using A2C may outperform channel estimation using supervised learning and may also outperform the scheme using ideal PDP. In other words, under the assumption of WSSUS, ideal PDP may not be optimal for NBCE.
(95) TABLE-US-00002 TABLE 2 EPA ETU EVA Rank 2 0.5 dB >2.0 dB 0.6 dB Rank 4 0.3 dB >1.7 dB 0.5 dB
(96) As described above, according to some embodiments, the channel estimator 200 uses A2C to improve NBCE performance. While supervised learning is effective in estimating PDP that is close to ideal, even an ideal PDP may not guarantee a low block error rate in NBCE under the WSSUS model. Accordingly, the channel estimator 200 uses A2C to train a policy network by criticizing its policy by comparing with results from a value network. As such, the channel estimator using A2C reduces (e.g., minimizes) the MSE of channel estimation, which may lead to the performance enhancement in terms of BLER.
(97) The operations performed by the constituent components of the transmitter 100 and the receiver 200 (e.g., by the channel estimator 100 and 200) may be performed by a “processing circuit” that may include any combination of hardware, firmware, and software, employed to process data or digital signals. Processing circuit hardware may include, for example, application specific integrated circuits (ASICs), general purpose or special purpose central processing units (CPUs), digital signal processors (DSPs), graphics processing units (GPUs), and programmable logic devices such as field programmable gate arrays (FPGAs). In a processing circuit, as used herein, each function is performed either by hardware configured, i.e., hard-wired, to perform that function, or by more general-purpose hardware, such as a CPU, configured to execute instructions stored in a non-transitory storage medium. A processing circuit may be fabricated on a single printed wiring board (PWB) or distributed over several interconnected PWBs. A processing circuit may contain other processing circuits; for example, a processing circuit may include two processing circuits, an FPGA and a CPU, interconnected on a PWB.
(98) As used herein, the singular forms “a” and “an” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising”, when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. Expressions such as “at least one of,” when preceding a list of elements, modify the entire list of elements and do not modify the individual elements of the list. Further, the use of “may” when describing embodiments of the inventive concept refers to “one or more embodiments of the present disclosure”. Also, the term “exemplary” is intended to refer to an example or illustration. As used herein, the terms “use,” “using,” and “used” may be considered synonymous with the terms “utilize,” “utilizing,” and “utilized,” respectively.
(99) For the purposes of this disclosure, “at least one of X, Y, and Z” and “at least one selected from the group consisting of X, Y, and Z” may be construed as X only, Y only, Z only, or any combination of two or more of X, Y, and Z, such as, for instance, XYZ, XYY, YZ, and ZZ.
(100) Further, the use of “may” when describing embodiments of the inventive concept refers to “one or more embodiments of the inventive concept.” Also, the term “exemplary” is intended to refer to an example or illustration.
(101) While the present invention has been illustrated and described with reference to the embodiments thereof, it will be apparent to those of ordinary skill in the art that various suitable changes in form and detail may be formed thereto without departing from the spirit and scope of the present invention, as defined by the following claims and equivalents thereof.