PRODUCT AUTOENCODER FOR ERROR-CORRECTING VIA SUB-STAGE PROCESSING
20230104143 · 2023-04-06
Inventors
- Mohammad Vahid Jamali (Ann Arbor, MI, US)
- Hamid SABER (San Diego, CA, US)
- Homayoon HATAMI (San Diego, CA, US)
- Jung Hyun BAE (San Diego, CA, US)
Cpc classification
H03M13/09
ELECTRICITY
International classification
Abstract
A processing circuit implements: an encoder configured to: supply k symbols of original data to a neural product encoder including M neural encoder stages, a j-th neural encoder stage including a j-th neural network configured by j-th parameters to implement an (n.sub.j,k.sub.j) error correction code (ECC), where n.sub.j is a factor of n and k.sub.j is a factor of k; and output n symbols representing the k symbols of original data encoded by an error correcting code; or a decoder configured to supply n symbols of a received message to a neural product decoder including neural decoder stages grouped into a l pipeline stages, an i-th pipeline stage of the neural product decoder including M neural decoder stages, a j-th neural decoder stage comprising a j-th neural network configured by j-th parameters to implement an (n.sub.j,k.sub.j) ECC; and output k symbols decoded from the n symbols of the received message.
Claims
1. A processing circuit implementing an encoder for an (n,k) error correction code, the encoder being configured to: receive k symbols of original data; supply the k symbols of original data to a neural product encoder comprising a plurality of M neural encoder stages, a j-th neural encoder stage of the plurality of M neural encoder stages comprising a j-th neural network configured by a j-th plurality of parameters to implement an (n.sub.j,k.sub.j) error correction code, where n.sub.j is a factor of n and k.sub.j is a factor of k; and output n symbols of encoded data representing the k symbols of original data encoded by an error correcting code.
2. The processing circuit of claim 1, wherein the j-th neural network comprises a fully connected neural network, and wherein the j-th plurality of parameters comprise a plurality of weights of connections between neurons of the fully connected neural network.
3. The processing circuit of claim 1, wherein the encoder is further configured to reshape the k symbols of original data into a M-dimensional original data, and wherein the j-th neural encoder stage is configured to encode a j-th dimension of k.sub.j symbols of the M-dimensional original data.
4. The processing circuit of claim 1, wherein the j-th neural network is configured to output a real-valued vector having length n.sub.j.
5. The processing circuit of claim 1, wherein the processing circuit is integrated into a mobile device, and wherein the processing circuit is configured to encode the original data for transmission in accordance with a cellular communication protocol.
6. A processing circuit implementing a decoder for an (n,k) error correction code, the decoder being configured to: receive n symbols of a received message; supply the n symbols of the received message to a neural product decoder comprising a plurality of neural decoder stages grouped into a plurality of l pipeline stages, an i-th pipeline stage of the neural product decoder comprising a plurality of M neural decoder stages, a j-th neural decoder stage of the plurality of M neural decoder stages comprising a j-th neural network configured by a j-th plurality of parameters to implement an (n.sub.j,k.sub.j) error correction code, where n.sub.j is a factor of n and k.sub.j is a factor of k; and output k symbols of estimated original data decoded from the n symbols of the received message.
7. The processing circuit of claim 6, wherein the j-th neural network comprises a fully connected neural network, and wherein the j-th plurality of parameters comprise a plurality of weights of connections between neurons of the fully connected neural network.
8. The processing circuit of claim 6, wherein the decoder is further configured to reshape the n symbols of the received message into a M-dimensional received data, and wherein the j-th neural decoder stage is configured to decode a j-th dimension of n.sub.j symbols of the M-dimensional received data.
9. The processing circuit of claim 6, wherein the j-th neural network is configured to output a real-valued vector having length n.sub.j.
10. The processing circuit of claim 6, wherein the j-th neural network is configured to output a real-valued vector having length Fn.sub.j, where F is an integer greater than 1.
11. The processing circuit of claim 6, wherein the decoder is configured to supply the n symbols of the received message to at least two of the plurality of neural decoder stages of the neural product decoder.
12. The processing circuit of claim 6, wherein the processing circuit is integrated into a mobile device, and wherein the processing circuit is configured to decode the received data, where the received data is encoded in accordance with a cellular communication protocol.
13. The processing circuit of claim 6, wherein l is greater than 1.
14. A method for jointly training a neural product coding system, comprising: initializing a plurality of parameters of a plurality of neural encoder stages of a neural product encoder and a plurality of parameters of a plurality of neural decoder stages of a neural product decoder; iteratively alternating between: training the parameters of the neural decoder stages while keeping the plurality of parameters of the neural encoder stages fixed; training the parameters of the neural encoder stages while keeping the plurality of parameters of the neural decoder stages fixed; and outputting trained parameters of the plurality of neural encoder stages of the neural product encoder and trained parameters of the plurality of neural decoder stages of the neural product decoder.
15. The method of claim 14, wherein an iteration of training the parameters of the neural encoder stages comprises: sending a batch of training sequences to the plurality of neural encoder stages configured with the parameters of the neural encoder stages to compute real-valued codewords; modifying the real-valued codewords based on channel characteristics to compute received codewords; decoding the received codewords using the neural decoder stages configured with the parameters of the neural decoder stages to compute estimated sequences; and updating the parameters of the neural encoder stages based on loss values computed based on the training sequences and the estimated sequences.
16. The method of claim 15, wherein a j-th neural encoder stage of the neural encoder stages comprises a neural network, and wherein a j-th plurality of parameters of the parameters of the neural encoder stages comprise a plurality of weights of connections between neurons of the neural network.
17. The method of claim 14, wherein an iteration of training the parameters of the neural decoder stages comprises: sending a batch of training sequences to the plurality of neural encoder stages configured with the parameters of the neural encoder stages to compute real-valued codewords; modifying the real-valued codewords based on channel characteristics to compute received codewords; decoding the received codewords using the neural decoder stages configured with the parameters of the neural decoder stages to compute estimated sequences; and updating the parameters of the neural decoder stages based on loss values computed based on the training sequences and the estimated sequences.
18. The method of claim 17, wherein the neural decoder stages grouped into a plurality of l pipeline stages, an i-th pipeline stage of the neural product decoder comprising a plurality of M neural decoder stages, a j-th neural decoder stage of the plurality of M neural decoder stages comprises a neural network, and wherein a j-th plurality of parameters of the parameters of the neural decoder stages comprise a plurality of weights of connections between neurons of the neural network.
19. The method of claim 17, wherein the modifying the real-valued codewords based on channel characteristics comprises: applying additive white Gaussian noise to the real-valued codewords at a range of different signal to noise ratio (SNR) values to compute the received codewords.
20. The method of claim 17, wherein the received codewords are supplied to a plurality of the neural decoder stages.
Description
BRIEF DESCRIPTION OF THE DRAWINGS
[0026] The accompanying drawings, together with the specification, illustrate example embodiments of the present invention, and, together with the description, serve to explain the principles of the present invention.
[0027]
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
DETAILED DESCRIPTION
[0038] In the following detailed description, only certain example embodiments of the present invention are shown and described, by way of illustration. As those skilled in the art would recognize, the invention may be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Like reference numerals designate like elements throughout the specification.
[0039] Aspects of embodiments of the present invention are directed to implementing channel encoders and channel decoders using neural networks, including neural network architectures for implementing channel encoders, neural network architectures for implementing channel decoders, hardware implementations of channel encoders and channel decoders using such trained neural networks, and systems and methods for applying machine learning techniques for training neural networks to implement channel encoders and channel decoders.
[0040] Generally, a channel encoder E maps an input sequence of information bits u having length k to a length-n sequence of coded bits c (c=ε(u)) by adding redundancy to protect the transmission of the information bits across a noisy communication channel. Here, k and n are called the code dimension and block-length, respectively, and the resulting code is denoted by an (n, k) code, where n>k. The code rate R=k/n provides an indication of the efficiency of the code. In the case of additive white Gaussian noise (AWGN) channel, the received length-n sequence is y=c+n, where n is the channel noise vector whose components are Gaussian random variables of mean zero and variance σ.sup.2. The ratio of the average energy E.sub.s per coded symbol to the noise variance is called the signal-to-noise ratio (SNR): SNR=E.sub.s/σ.sup.2. A channel decoder exploits the added redundancy in the encoder to compute an estimated message û of the original message u based on the noisy codewords y (û=
(y)) while trying to minimize the number of errors caused by the channel noise on the messages.
[0041] Depending on the amount of channel noise n (e.g., depending on the SNR of the channel) and depending on the characteristics of the code (e.g., the number of errors that can be corrected by the code), the decoder may or may not be able to succeed in correcting all the errors introduced by the channel noise n. The error rate after decoding may be characterized as a bit error rate (BER) indicating the fraction of bits in the block of size k that are erroneous:
and also as a block error rate (BLER) indicating the probability that the block has any erroneous bits at all:
BLER=Pr(û≠u)
[0042] One technical challenge in deep learning-based design of encoders and decoders is a dimensionality issue that arises in the context of channel coding due to huge code spaces (there are 2.sup.k distinct codewords for a message length or binary linear code of dimension k). This is problematic because only a small portion of all possible codewords will be seen by the machine learning model (e.g., a neural network) during training (e.g., presenting all 2.sup.k possible codewords during training would be intractable for practical binary linear code dimensions k). Therefore, it was commonly believed that the trained machine learning models for the encoders and decoders would fail to generalize to codewords that were not seen during training and, as noted above, these unseen codewords constitute the vast majority of possible codewords for large values of k (e.g., practical values of k that would be used in modern communication systems). Additionally, it was commonly believed that huge networks with excessively large number of learnable parameters would be needed in order to account for larger code dimensions (e.g., values of k larger than 100 information bits) and that it was therefore prohibitively complex, if not impossible, to train neural networks to encode and decode relatively large channel codes. Furthermore, it was commonly believed that jointly training an encoder neural network and a decoder neural network would cause these trained neural networks to settle in unfavorable local optima due to non-convex loss functions, and therefore some approaches relate to only training a decoder neural network for decoding messages that were encoded using an existing, classical error correction code such as Reed-Solomon codes, Turbo codes, low-density parity-check (LDPC) codes, and polar codes.
[0043] Aspects of embodiments of the present disclosure demonstrate that it is possible to train neural networks to perform channel encoding and channel decoding for large values of k (e.g., larger than 100 information bits) and that the neural encoder and the neural decoder can be jointly trained.
[0044] In more detail, some aspects of embodiments of the present disclosure relate to a product autoencoder architecture for an encoder neural network (or neural encoder) and a decoder neural network (or neural decoder), where the product autoencoder architecture constructs large neural codes using smaller code components or stages. In some embodiments, multiple smaller encoder and decoder neural network components are trained and connected in M stages (where M is a positive integer), the stages having parameters (n.sub.1, k.sub.1), (n.sub.2, k.sub.2), . . . , (n.sub.M, k.sub.M) such that n.sub.1n.sub.2 . . . n.sub.M=n and k.sub.1k.sub.2 . . . k.sub.M=k.
[0045] Aspects of embodiments of the present disclosure further relate to systems and methods for automatically developing novel error correction codes using a machine learning process (instead of through manual theoretical analysis). These machine learning process include applying a deep learning process to the joint training of an encoder neural network and a decoder neural network. The trained encoder neural network and decoder neural network implement an error correcting code that encodes the information supplied as input along with additional redundant information and that can robustly decode the encoded messages in the presence of noise. In some embodiments, these jointly trained novel error correction codes (or neural codes or neural error correction codes) outperform state-of-the-art classical error correction codes (e.g., Turbo Autoencoders, polar codes, and LDPC codes) along performance metrics such as bit error rate.
[0046]
[0047] According to various embodiments of the present disclosure, the encoder 110 and the decoder 120 may, respectively, be referred to as an encoder circuit or encoder processing circuit and a decoder circuit or decoder processing circuit and may be implemented using various processing circuits such as a central processing unit (CPU), an application processor (AP) or application processing unit (APU), a field programmable gate array (FPGA), an application specific integrated circuit (ASIC) such as a display driver integrated circuit (DDIC), and/or a graphics processing unit (GPU) of one or more computing systems. For example, the encoder 110 and the decoder 120 may be components of the same computer system (e.g., integrated within a single enclosure, such as in the case of a smartphone or other mobile device, tablet computer, or laptop computer), may be separate components of a computer system (e.g., a desktop computer in communication with an external monitor), or may be separate computer systems (e.g., two independent computer systems communicating over the communication channel 150), or variations thereof (e.g., implemented within special purpose processing circuits such as microcontrollers configured to communicate over the communication channel 150, where the microcontrollers are peripherals within a computer system). As would be understood by one of skill in the art, the encoder circuit may be implemented using a different type of processing circuit than the decoder circuit. In addition, as would be understood to one of skill in the art, the various processing circuits may be components of a same integrated circuit (e.g., as being components of a same system on a chip or SoC) or may be components of different integrated circuits that may be connected through pins and lines on a printed circuit board.
[0048] As a concrete example, the communication link may be a wireless communication link such as a cellular connection or a local wireless network connection (e.g., Wi-Fi connection) between a client mobile device (e.g., smartphone, laptop, or other user equipment (UE)) and a base station (e.g., gNodeB or gNB in the case of a 5G-NR base station or a Wi-Fi access point or Wi-Fi router), where the devices may transmit and receive data over the communication channel 150 using processing circuits according to the present disclosure integrated into the respective devices, where the data is formatted or encoded and decoded in accordance with a communication protocol (e.g., a cellular communication protocol such as 6G wireless or a wireless networking communication protocol such as a protocol in the IEEE 802.11 family of protocols).
[0049]
[0050] A product code structure or architecture allows a large error correction code or channel code to be constructed from smaller components or stages.
[0051] As noted above, k=k.sub.1k.sub.2 and therefore the k-bit input codeword u can be reshaped into a k.sub.2×k.sub.1 matrix. Because each row of the matrix has k.sub.1 symbols, the first encoder stage 211 applies the (n.sub.1, k.sub.1) code C.sub.1 independently to each of the k.sub.2 rows of the input to generate a k.sub.2×n.sub.1 matrix u.sup.(1). This first intermediate encoded message u.sup.(1)is supplied to the second encoder stage 212. Here, because each column of the first intermediate encoded message u.sup.(1) has k.sub.2 symbols, the second encoder stage 212 applies the (n.sub.2, k.sub.2) code C.sub.2 independently to each column to generate an n.sub.2×n.sub.1 output matrix u.sup.(2). Because this is a two-dimensional product code, there are no further stages in this pipeline of encoder stages, and this n.sub.2×n.sub.1 output matrix u.sup.(2) is the length n coded message to be transmitted on a channel 250.
[0052] Noise and other interference in the channel 250 can modify the data in the message such that the product decoder 220 receives a message y that may differ from the message output by the product encoder 210.
[0053] As shown in
[0054] In the example shown in
[0055] In addition, in some circumstances, decoding performance can be improved by applying a soft-input soft-output (SISO) decoder and also applying several iterations, where the output of the product decoder (e.g., the output of the last decoder stage, in this case the first decoder stage 221 as shown in
[0056] While the above discussion of
[0057] Block-length: n=Π.sub.l=1.sup.Mn.sub.l
[0058] Dimension: k=Π.sub.l=1.sup.Mk.sub.l
[0059] Rate: R=Π.sub.l=1.sup.MR.sub.l
[0060] Minimum distance: d=Π.sub.l=1.sup.Md.sub.l
[0061] Generator matrix: G=G.sub.1.Math.G.sub.2.Math. . . . .Math.G.sub.M
[0062] Aspects of embodiments of the present disclosure build upon product codes by implementing each stage of a product encoder and each stage of a product decoder as a separate neural network. These neural networks may be referred to as neural encoder stages and neural decoder stages, respectively. In addition, in some embodiments of the present disclosure, the I iterations performed by the product decoder are unrolled into a decoder pipeline of M×I separate neural decoder stages (e.g., M×I separate neural networks) grouped into I groups or pipeline stages or sub-pipelines of M neural decoder stages.
[0063] Some examples of the present disclosure will be described in more detail below in the context of two-dimensional product codes (where M=2). While embodiments of the present disclosure are not limited to cases where M=2 and can also include embodiments where M>2, in some practical use cases, M=2 represents a good tradeoff between complexity and performance.
[0064] The process of training a neural encoder and neural decoder pair generally includes of two main steps: (1) a decoder training schedule; and (2) encoder training schedule. More specifically, during each training epoch, the neural decoder stages are trained (e.g., end-to-end) several times while keeping the neural encoder stages fixed, and then the neural encoder stages are trained multiple times while keeping the neural decoder stages unchanged. In some alternative embodiments, the training process starts with the neural encoder stages, followed by training the neural decoder stages. In the following, the encoder architecture and its training schedule will be described first, followed by the decoder architecture and its training schedule.
[0065]
[0066] In various embodiments, a neural product encoder and a neural product decoder may be implemented by processing circuits of a communication device. Examples of processing circuits include, but are not limited to, a general-purpose processor core (e.g., included within application processors, system-on-chip processors, and the like), a field programmable gate array (FPGA which may include a general-purpose processor core), an application specific integrated circuit (ASIC), a digital signal processor (DSP), a neural accelerator or neural processing unit, and combinations thereof (e.g., controlling an overall encoding or decoding process using a general-purpose processor core that controls a neural accelerator to perform neural network operations such as vector multiplications and accumulations and to apply non-linear activation functions). The neural product encoder and the neural product decoder may be defined in accordance with an architecture and a plurality of parameters such as weights and biases of connections between neurons of different layers of the neural networks of various neural stages of the neural product encoder and the neural product decoder. In some embodiments of the present disclosure, these parameters may be stored in memory and accessed by the processing circuits during runtime to perform computations implementing the neural stages. In some embodiments of the present disclosure, the processing circuit is configured with these parameters (e.g., fixed in a lookup table or as constant values in a special-purpose DSP, ASIC, FPGA, or neural processing unit).
[0067] The arrangement of components shown in
[0068]
[0069] In a manner similar to that described in
[0070] At 373, the neural product encoder 310 applies a pipeline of M neural encoder stages along each of the corresponding M dimensions of the original data to generate encoded data c having a length of n symbols. In the example shown in
[0071] The first neural encoder stage 311 is configured to take an input of k.sub.1 symbols (e.g., k.sub.1 bits) and to produce an output of n.sub.1 symbols (e.g., n.sub.1 bits). Likewise, the second neural encoder stage is configured to take an input of k symbols (e.g., k.sub.2bits) and to produce an output of n.sub.2 symbols (e.g., n.sub.2 bits). Accordingly, the first neural encoder stage 311 can operate on the input on a row-by-row basis to generate a k.sub.2×n.sub.1 matrix u.sup.(1), and this first intermediate encoded message u(.sup.1) is supplied to the second neural encoder stage 312. Here, because each column of the first intermediate encoded message u.sup.(1) has k.sub.2 symbols, the second neural encoder stage 312 operates on each each column independently to generate an n.sub.2×n.sub.1 output matrix u.sup.(2). Because this is a two-dimensional product code, there are no further stages, and this n×n.sub.1 output matrix u.sup.(2) represents the length n coded message c to be transmitted on a channel 350 (e.g., as a data packet). For example, in some embodiments, the real-valued n.sub.2×n.sub.1 output matrix u.sup.(2) is reshaped into a length n coded message c and binarized (e.g., converted into binary values) or otherwise discretized into discrete values. In a practical coding system, the length n coded message c is typically also modulated, by a transmitter device, in accordance with a modulation method associated with a physical layer protocol (e.g., various protocols for wireless communication, wired electrical communication, wired optical communication, etc.).
[0072] The channel 350 may distort the length n coded message c, such that the neural product decoder 320 receives a length n received coded message y that may differ from the length n coded message c (e.g., where differences between the received message y and the transmitted coded message c may be referred to as errors or erasures).
[0073] As shown in
[0074]
[0075] Each decoder pair includes a first neural decoder stage and a second neural decoder stage (e.g., one neural decoder stage for each of the M dimensions of the product code), where each decoder stage is configured to operate along a different dimension of the input data (e.g., where the second neural decoder stage is configured to operate along a column dimension and the first neural decoder stage is configured to operate along a row dimension). For example, the first neural decoder pair shown in
[0076] The output of the last neural decoder pair (the I-th decoder pair) is supplied to an extraction circuit 360 to extract the k message symbols at 385 therefrom to produce the output length k estimated message û 307.
[0077] As noted above,
[0078]
[0079] At 391, the training system initializes parameters of neural encoder stages and neural decoder stages. As noted above, the neural encoder stages and the neural decoder stages include neural networks whose behaviors are specified by parameters (e.g., weights and biases associated with connections between the neurons in different layers of the neural network). In some embodiments of the present disclosure, each of the neural encoder stages and each of the neural decoder stages is a fully connected neural network (FCNN). In some embodiments, the fully connected neural network is a deep neural network having more than one hidden layer (L.sub.enc>1 hidden layers). Accordingly, in some embodiments of the present disclosure, the training system initializes the parameters of the neural networks of the neural encoder stages and the neural decoder stages. In some embodiments, these parameters may be initialized to random values (e.g., set by a pseudorandom number generator). In some embodiments, the neural encoder stages and the neural decoder stages are initialized using parameter values from previously trained neural networks (e.g., previously trained based on some set of training data).
[0080] At 393 and 395, the training system alternates between training the neural encoder and training the neural decoder. In the particular example shown in
[0081] At 397, the training system determines if training is complete. In some embodiments the training system tests the performance of the neural encoder and the neural decoder, in accordance with the newly updated parameters. Performance may be measured by supplying input test codewords to the trained network (e.g., in accordance with the method 370 described above with respect to
[0082]
[0083] As shown in
[0084] In some embodiments, in order to ensure that the average power per coded bit is equal to one and thus that the average SNR is equal to a given SNR, the length-n real-valued vector c=(c.sub.1, c.sub.2, . . . , c.sub.n) of the coded sequence at the output of the encoder as following:
Therefore, ∥c′μ.sub.2.sup.2=n, and thus the average power per coded symbol is equal to one.
[0085] between the training sequences (the transmitted sequences) U and the estimated sequences U (
(U, U) using loss calculator 461) and backpropagating the loss to compute its gradients (through backpropagator 463), the encoder optimization takes a step to update the weights of the NN encoders (using optimizer 465) to compute updated encoder weights ϕ.sub.1 and ϕ.sub.2. These updated encoder weights ϕ.sub.1 and ϕ.sub.2 are used to reconfigure the first neural encoder stage 411 and the second neural encoder stage 412. This procedure will be repeated T.sub.enc times to update the encoder weights ϕ.sub.1 and ϕ.sub.2 during each iteration, while keeping the decoder weights Θ fixed (for a fixed neural decoder 420).
[0086]
[0087] As shown above in
[0088]
[0089] At 571, the training system loads a batch of B binary information sequences of length k (e.g., where k=k.sub.1k.sub.2 in the case where M=2) and shaped in accordance with U.sub.B×k.sub.(U, U) based on differences between the training sequences U and the estimated sequences .Math.) and its gradients, the decoder optimizer then computes, at 579, updated decoder parameters Θ for the neural decoder stages. This procedure is repeated T.sub.dec times while each time only updating the decoder weights Θ while keeping the encoder parameters Φ for the neural encoder 510 fixed.
[0090] In some embodiments, the neural encoder is trained with the channel being modeled as applying AWGN with a single SNR of γ dB and where the decoder is trained with a range of training SNR values where, during each decoding training batch, B random values are chosen uniformly from within the range of training SNR values and assigned to corresponding training codewords. In some embodiments, the decoder is trained with a range of SNR values from γ−2.5 dB to γ+1 dB. In some embodiments, the training is performed with γ=3 dB.
[0091] The embodiment depicted in
[0092] In more detail, in some embodiments, the first NN decoder D.sub.2.sup.(1), which is the decoder of code C.sub.2 in the first iteration, only takes the channel output y of size n.sub.1×n.sub.2. It then outputs n.sub.1 length-n.sub.2 vectors formed in a matrix {tilde over (y)}.sub.2.sup.(1). All the next 2I-2 decoders D.sub.1.sup.(1), D.sub.2.sup.(2), . . . , and D.sub.2.sup.(I) take the channel output y in addition to the output of the previous decoder. The last decoder D.sub.1.sup.(I) only takes the output of the previous decoder as the input. Some embodiments further include an additional neural decoder stage (as another FCNN) that maps each of n.sub.1 length-n.sub.2 vectors of the channel output to a length-k.sub.2 vector such that they can be concatenated with the k.sub.2 length-n.sub.1 vectors of the previous NN decoder to form the k.sub.2 length-2n.sub.1 vectors at the input of the last decoder.
[0093] In some embodiments, only the difference of the channel output and the input of the decoder is given to the next decoder. In these embodiments, the first decoder D.sub.2.sup.(1) only takes the channel output, the second decoder D.sub.1.sup.(1) takes the output of the previous decoder D.sub.2.sup.(1), and the next 2I-3 decoders take the difference of the soft information (e.g., the differences between the real-valued outputs of previous decoders). The last decoder D.sub.1.sup.(1) takes the output of the previous decoder D.sub.2.sup.(1) given the dimensionality issue in subtraction.
[0094] In some embodiments, each neural decoder (except the last) outputs F vectors instead of one vector, as the soft information (e.g., as the real-valued output of the stage, which may represent probabilities or confidences that particular bits have a value of 1). For example, in some embodiments, a j-th neural decoder stage generates Fn.sub.j outputs, such as by configuring the neural network to produce Fn.sub.j outputs (e.g., by having an output layer or final layer with Fn.sub.j neurons, in the case of a FCNN). These Fn.sub.j outputs can be thought of as F vectors each having length n.sub.j (e.g., an F×n.sub.j matrix).
[0095] While some specific embodiments of the present disclosure are described above where the neural encoders and the neural decoders are implemented using fully connected neural networks (FCNNs), embodiments of the present disclosure are not limited thereto and the neural networks included in the neural encoder stages and neural decoder stages may have different neural architectures, such as a convolutional neural network (CNN) architecture, a recurrent neural network (RNN) architecture, or a transformer neural network architecture .
[0096] In some embodiments of the present disclosure, the optimizer 465 and optimizer 565 are implemented using the Adam optimizer (see, e.g., Kingma, Diederik P., and Jimmy Ba. “Adam: A method for stochastic optimization.” arXiv preprint arXiv: 1412.6980 (2014).) In some embodiments, the scale exponential linear unit (SELU) activation function is used as the activation function to the hidden layers of the FCNNs. In some embodiments, the loss function implemented by the loss computer 461 and 561 uses the BCE with Logits Loss function, which combines a sigmoid layer with a binary cross-entropy loss. However, embodiments of the present disclosure are not limited to training neural encoders and neural decoders using these approaches and other optimizers, other activation functions, and other loss functions may be used in the training schedules of the neural encoder and the training schedule of the neural decoder.
[0097] Accordingly, aspects of embodiments of the present disclosure relate to a neural product encoder and a neural product decoder and methods for jointly training the neural product encoder and the neural product decoder to train a neural product coding system.
[0098] In experimental testing, embodiments of the present disclosure implementing a (15,10).sup.2 neural code (e.g., a (255,100) neural code) outperformed the BER of the equivalent polar code (obtained by puncturing 31 coded bits of a (256,100) polar code at random) by a large margin over all ranges of SNR, while maintaining approximately the same BLER as the polar code.
[0099] In additional experimental tests, a neural product autoencoder implementing a (21,14).sup.2 neural code (e.g., a (441,196) code) according to one embodiment of the present disclosure outperformed a polar code with the same length under SC decoding. The polar code was obtained by puncturing 71 coded bits of a (512,196) polar code at random. The moderate-length (21,14).sup.2 neural code outperformed the BER of the equivalent polar code by a large margin over all ranges of SNR, while maintaining approximately the same BLER performance.
[0100] In some experimental tests, embodiments of the present disclosure were compared against a polar under CRC-List-SC decoder, LDPC, and a tail-biting convolutional code (TBCC). All three of these classical codes have parameters (300,100). A (15,10).sup.2 neural code according to the present disclosure outperformed TBCC with a good margin over all ranges of E.sub.b/N.sub.0 and that of polar and LDPC codes for BER values of larger than 10.sup.−4. A fair comparison requires reducing the blocklength of the considered classical codes to 225 bits, e.g., through puncturing 75 bits. In addition, a (21,14).sup.2 neural code according to embodiments of the present disclosure, with less than 1.5 times larger blocklength but almost twice larger code dimension, is able to outperform the state-of-the-art classical codes, with a good margin, over all BER ranges of interest.
[0101] It should be understood that the sequence of steps of the processes described herein in regard to various methods and with respect various flowcharts is not fixed, but can be modified, changed in order, performed differently, performed sequentially, concurrently, or simultaneously, or altered into any desired order consistent with dependencies between steps of the processes, as recognized by a person of skill in the art. Further, as used herein and in the claims, the phrase “at least one of element A, element B, or element C” is intended to convey any of: element A; element B; element C; elements A and B; elements A and C; elements B and C; and elements A, B, and C.
[0102] Embodiments of the present invention can be implemented in a variety of ways as would be appreciated by a person of ordinary skill in the art, and the term “processor” as used herein may refer to any computing device capable of performing the described operations, such as a programmed general purpose processor (e.g., an ARM processor) with instructions stored in memory connected to the general purpose processor, a field programmable gate array (FPGA), and a custom application specific integrated circuit (ASIC). Embodiments of the present invention can be integrated into a serial communications controller (e.g., a universal serial bus or USB controller), a graphical processing unit (GPU), an intra-panel interface, and other hardware or software systems configured to transmit and receive digital data.
[0103] While the present invention has been described in connection with certain example embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.